Tech Story

Vibe Coding

Vibe coding is nice because it can deliver results fast. But very soon, as the project grows, that stops being enough. Problems include lack of reviews, validation, testing, PR handling, and instructions that need to be repeated in every session. Additionally, it's hard to produce the expected behavior without a tight set of instructions. That leads to poor code, lots of bugs, and headaches.

Claude Instructions

So I started working on my instructions. Eventually, I ended up with the following setup:

Main CLAUDE.md in the root of the repo.
Both modules in my repo have their own CLAUDE.md files maintained by Claude itself. They contain the basic source layout, key patterns, startup commands, etc. They help the agent understand the codebase without reading the code itself, saving tokens and speeding up the process.
3 skills: coder (general coding instructions), backend-go (Go-specific instructions), and frontend-react (React-specific instructions).
3 rules: validation (how to perform code validation), git-rules (how to use Git), and github-operations (how to handle PRs, issues, etc. on GitHub).
2 agents: code-reviewer (reviews code, gives feedback, approves PRs) and ui-tester (uses Chrome DevTools MCP for UI testing).

A few tips for writing instructions

The internet is full of advice on how to write instructions, but here are a few that actually worked best for me:

The fewer instructions, the better. The more you have, the more likely they are to be ignored. Try to keep only the most important ones and make them as concise as possible.
Do not write "default" instructions. There is no point in stating things like "write good code" or "follow best practices". Instead, write specific instructions for things you want done in a particular way. For example, I have instructions for Git commit formatting because I want commits to follow a specific convention.
Ask the agent why it did something in a way you did not expect. The answers can be very helpful. Once, I asked an agent about this, and it explained that it chose a certain approach because there were conflicting instructions and it followed one of them. A perfectly legitimate answer, so I fixed my instructions.
Use capital letters, bold text, emojis, and other formatting to highlight important instructions. For example, I have a big red warning not to commit code without my approval, while other Git commands do not require approval.
Don't use (sub)agents unless you need a fresh pair of eyes. Subagents can be useful because they run in a separate context (which is both good and bad), but for short-running tasks, it's overkill to start another context. In my case, I use separate agents for code review and UI testing. I actually find it better if they do not get the entire context - seeing just the feature spec and the plan is usually enough for their job.
Favor rules over skills. I noticed that rules are more likely to be followed than skills.
Avoid MEMORY.md files under the user's home directory because they are not part of the codebase. If multiple developers are working on the same project, this can lead to conflicts and inconsistencies.

Writing Feature Specs

The most important part of the development process is writing good feature specs from the beginning. Like instructions, they should be clear, concise, and contain all the necessary information for the agent to implement the feature. Spend some time writing a proper spec. Try to cover edge cases, define acceptance criteria, and provide a test plan. It's especially important to clearly define functional requirements, non-functional requirements, and acceptance criteria. If the spec is unclear, the agent will make too many assumptions, and the final result will be unpredictable.

I found it useful to have a template for feature specs, which includes:

# Feature Title

## Problem statement

## Goal

## Functional requirements

### User/UX flow

### API behavior

## Non-functional requirements

## Acceptance criteria

## Test/Validation plan

## Out of scope

## Relevant files

The bare minimum is "Goal", "Functional requirements", and "Test/Validation plan". The rest are optional. But in general, the more detailed the spec is, the better. Keep in mind that "detailed" != "verbose".

Working with Different Agent Models

While it's possible to change models and effort levels during implementation, I mainly work with Sonnet 4.6 using medium effort. For bigger features, I use Opus 4.7 for planning. My two agents (code-reviewer and ui-tester) are configured to use Sonnet with different effort levels - high for code-reviewer and low for ui-tester.

I found Sonnet with medium effort to be the best balance between performance and quality. Opus is better for complex tasks, but it burns more tokens and can sometimes overengineer solutions. Haiku is fine for "donkey" work, but it does not follow instructions very well.

One positive trend I'm seeing lately is that Opus 4.7 has become much better at asking clarifying questions when the spec is unclear. This is especially important for planning. If the spec is not clear, I want the agent to ask questions instead of making assumptions. Still, I do my best to write clear specs from the beginning to minimize the need for clarifications.

Implementation Workflow

After the instructions and feature spec are ready, the implementation workflow looks like this:

## Planning
- [M] Create a plan (user calls `/plan`)
- [M] User reviews and accepts the plan

## Pre-Implementation
- [A] Review relevant documentation (overview, functional/non-functional requirements, architecture)
- [A] Create feature branch

## Implementation Phase
- [A] Implement the plan
- [A] Run relevant code validation commands (e.g. build, lint, test)
- [A] Run code-reviewer agent after changes (`@code-reviewer`)
- [A] If issues are found: fix -> code-reviewer re-review until clean

## 2nd Review
- [M] Run built-in review skill (user runs `/review local changes only`)
- [A] If issues are found -> Implementation Phase
- [M] Wait for user approval before continuing

## UI Testing
- [A] Run ui-tester agent (`@ui-tester`) that uses chrome-devtools-mcp
- [A] Test the cases defined in the plan
- [A] If issues are found -> Implementation Phase

## Pre-Commit Validation
- [A] Run validation script (local CI-like script)

## Git Commit
- [A] Stage all changes
- [A] Commit using conventional commit format
- [A] Push to remote

## PR Creation
- [A] Generate PR summary in predefined format
- [A] Create PR
- [A] Add @copilot as reviewer
- [A] @copilot reviews the PR and leaves comments
- [M] Wait for GitHub Copilot to finish the review

## Addressing PR Review Comments
- [A] Retrieve all comments
- [A] Analyze every comment
- [A] Implement changes if justified -> Implementation Phase
- [A] Git Commit Phase
- [A] Reply individually to every @copilot comment
- [A] Post final summary comment tagging @copilot for re-review
- [A] @copilot re-reviews the PR
- [M] If issues are found -> Addressing PR Review Comments Phase

## Final Steps
- [M] Wait for @copilot final approval
- [M] Merge PR
- [M] Verify main branch deployment

[M] means a manual step, [A] means an agent step.

For small tasks, I skip some steps. For bigger features, I follow the full workflow.

Code Reviews

There's no need to explain why code reviews are important. Having multiple agents is like having multiple developers reviewing the code. The formula is simple: the more reviewers, the better. To be fair, not all reviewers are equal, and too many reviewers can lead to noise and conflicting feedback. But in this case, I have a single coder that accepts or rejects feedback based on its judgment. So even if some feedback is not great, it won't be accepted unless it makes sense.

So I currently have 3 separate review phases:

My custom code-reviewer agent, which reviews code after every change. As mentioned earlier, it uses a high-effort model. I usually perform most actions with a medium-effort model, but for code review, I want the agent to be as thorough and careful as possible.
Claude's built-in /review skill. It is mainly designed for reviewing PRs, but it can also review local changes. I invoke it using /review local changes only. It's not as powerful as my custom reviewer, but it's fast and still catches issues.
GitHub Copilot reviews PRs directly on GitHub. I find it very useful. Its comments make sense in about 95% of cases and are rarely nitpicky. My Claude agent accepts them whenever they are justified.

Code Validation

Local Code Validation

Since the code is being generated by an AI agent, it's extremely important to have a robust validation process in place. During the implementation phase, the agent only runs the necessary validation commands (e.g. build, lint, test).

On one hand, that's not enough to catch every issue. On the other hand, running the full validation suite after every single change is too time-consuming and counterproductive. So I decided to run the full validation suite only before committing the code. That way, the agent can iterate quickly during implementation while still catching issues before the code is merged.

I created CI-like validation scripts for both frontend and backend, which run several checks:

build
test
lint
vulnerable dependency scan
security audits for bad practices, etc.

The more validations I add, the better, because more potential issues can be caught before merging.

GitHub Code Validation

I run GitHub validation workflows on every PR and every push to main. That gives me an additional validation layer. It contains the same validations as the local setup, but also includes additional tools:

CodeQL
Trivy
Snyk
Gitleaks
Hadolint
Semgrep

Just like local validation, the more validations I add, the better. The only downside is that they can take time, but since they run on GitHub, they do not block my work.

For more personal story about the project, check out My Story.

If you are curious about AI-assisted development, explore NetWatchly, send feedback, or suggest ideas. I am still experimenting and improving both the product and development workflows.

Support this AI Experiment

NetWatchly is free and will stay free while the project evolves.

If the tool is useful to you, you can support further development with a donation.

Buy me a beer →