← All articles
AI9 min read

Claude Code Best Practices for Vibe Coders: Ship More, Burn Fewer Tokens

Vibe coding with Claude Code is fast until the token bill isn't. After a year of daily use, I found the habits that fix both sides: the explore-plan-code workflow, path-scoped rules that trim CLAUDE.md by 60%, deterministic hooks that enforce guardrails at zero token cost, and the context tricks that keep long sessions from derailing.

0:000:00

Vibe coding with Claude Code is fast until the token bill lands. After a year of daily use running terminal-native agents, I keep hitting the same two failures: the agent writes confident code that misses the architectural target, and the context window bloats until the model forgets its own instructions and starts spiraling.

TL;DR, the habits that stuck:

  • Lean on .gitignore (Claude respects it) plus permissions.deny Read rules to keep noise out of context.
  • Run explore, plan, review, then code. Don't let the agent jump straight to a solution.
  • Keep CLAUDE.md under 200 lines and push topic rules into path-scoped .claude/rules/ files.
  • Enforce the rules that must never break with hooks, not prose: PreToolUse to block, PostToolUse to verify.
  • Trust the environment, not the agent's prose. Verify every "done" programmatically.

The Foundation: How AI Remembers and Forgets

Before optimizing a workflow with Claude Code, you have to understand its hard memory boundaries. In her excellent two-part series on DeveloperWay, Nadia Makarevich breaks down the raw mechanics of how LLMs manage conversation history.

Here is the quick architectural reality check:

  • The Illusion of Memory: Claude doesn't "remember" previous turns the way humans do. Every time you run a command or write a prompt, the entire chat history is packed up, bundled, and sent back to the API from scratch.
  • The Token Tax: Every file read, command output, and error stack trace converts into tokens. As your session grows, the prompt size snowballs. You pay for the entire history on every single turn.
  • The Context Window Wall: Once you hit Claude's max context limit, it cannot just keep chatting. The system must either reject your prompt or silently discard the oldest parts of your conversation to make room for new inputs (context sliding).

For the visual diagrams and the mechanics under the hood, read Nadia's full write-up:

Cost and quality are directly driven by context engineering. A single messy grep command can flood the prompt, derail the agent, and burn through your budget. Here is how to regain control.


Baseline Token Hygiene: Lean on .gitignore, Not .claudeignore

The quickest way to burn tokens is letting Claude explore your workspace blindly. Your instinct is to reach for a .claudeignore. There isn't one. Claude Code never reads that file, so creating it does nothing, and any secret paths you parked in there to "block" are still wide open.

It doesn't help that the model hallucinates the file. Ask the Claude web app how to stop Claude Code from reading certain files and it will confidently tell you to "just create a .claudeignore in your root." It's wrong, and it's been wrong often enough to spawn a trail of confused GitHub issues. Don't trust the model on its own tooling here.

Two mechanisms actually work.

First, Claude Code respects your .gitignore by default (respectGitignore is on). Anything you already ignore stays out of what Glob and Grep surface, so node_modules/, dist/, and *.log are handled the moment your .gitignore is sane. For most projects, that one file is already doing the heavy lifting on token hygiene.

Second, the native way to block specific files, no third-party npm hooks required, is Anthropic's permissions system. Create a .claude/settings.json in your project root and use the deny array:

{
  "permissions": {
    "deny": [
      "Read(.env*)",
      "Read(node_modules/**)",
      "Read(dist/**)"
    ]
  }
}

A deny rule beats any allow, can't be overridden, and also stops cat, head, and tail from sneaking a file in through a Bash call. Use it for secrets and anything checked in that you never want read, and let .gitignore cover the rest.

The Catch with Prompt Caching: Anthropic's prompt caching makes trailing context cheaper, but it does not fix the model's attention mechanism. A massive context window loaded with background noise still degrades Claude's reasoning. It makes the model lazy and prone to hallucinations, even if those cached tokens are heavily discounted. Keep the context clean anyway.


Explore, Then Plan, Then Code

Letting Claude jump straight to a solution is how you get an elegant implementation for the wrong problem.

Instead, break your workflow into distinct phases:

  1. Explore: Let Claude run targeted searches to understand the existing state.
  2. Plan: Force the agent into a planning phase. Ask for a written technical approach first.
  3. Review: Edit the plan, fix structural choices, and sign off on the design.
  4. Code: Switch to implementation mode and let it execute against that signed-off plan.

Skip the plan only when the diff fits in a single sentence. For larger features, say migrating a monolithic commerce component to an edge-optimized delivery script, have Claude interview you about edge cases and performance trade-offs first.

I use the /grill-with-docs pattern from Matt Pocock before every major session. It forces the agent to cross-reference my goals against actual project documentation before touching production files.


Architectural Rules: Stop Bloating CLAUDE.md

Claude Code reads CLAUDE.md on every single turn. It rides along in the context window indefinitely. If your root rulebook bloats, your token burn spikes.

I trimmed my configuration files by 60% using two techniques: Path Scoping and Lazy Loading.

1. Path-Scoped Rules

The root CLAUDE.md should only hold truly global details: high-level directory layout, primary build commands, and global constraints. Keep it under 200 lines.

Move topic-specific engineering standards into .claude/rules/. Use frontmatter globs to tell Claude exactly when to load them:

---
paths:
  - "src/edge/scripts/**/*.js"
  - "blocks/**/*.js"
---
 
# Edge Delivery Performance Rules
- Never use heavy external HTTP clients (e.g., Axios). Use native `fetch`.
- Keep the bundle footprint minimal; prioritize Core Web Vitals and DOM hydration speed.
- All network requests must resolve within edge execution limits.
 

Now, the rules governing edge performance stay out of the context window until Claude opens a matching script or block file.

2. Deterministic Hooks Over Prose

Prose instructions like "Do not import external libraries in edge routes" are purely advisory. As the context fills, adherence drops.

If a rule must hold true deterministically, pull it out of CLAUDE.md and enforce it via code. A shell hook costs zero tokens per turn and guarantees compliance.


Hooks: Enforcing Guardrails Without Token Cost

Claude Code hooks are lifecycle scripts that execute automatically around tool calls. Unlike prose instructions, the agent cannot skip or ignore them.

There are two primary flavors:

Hook TypeExecution TimingBlocking AbilityBest Use Case
PreToolUseBefore the operation occursYes (Exit code 2 blocks the tool)Security, secrets protection, input filtering
PostToolUseAfter the operation completesNo (The change is already live)Linters, type-checkers, automated formatters

1. The Secrets Guardrail (PreToolUse)

Agents run agentic searches using glob and grep. A broad search can easily ingest a .env file or local cloud credentials directly into the prompt history.

This PreToolUse script intercepts reads and blocks them before the text ever reaches the model:

#!/bin/bash
# .claude/hooks/protect-secrets.sh
INPUT=$(cat)
FILE_PATH=$(echo "$INPUT" | jq -r '.tool_input.file_path // .tool_input.path // empty')
 
if echo "$FILE_PATH" | grep -qE '(^|/)\.env|/\.aws/credentials|credentials\.json|\.npmrc$|\.pem$'; then
  echo "Security Violation: Access to credentials or environment variables is blocked." >&2
  exit 2
fi
exit 0
 

Note: Pair this with client-side permissions.deny rules in your settings file as your primary layer of defense. The hook acts as your catch-all safety net.

2. The Architecture Feedback Loop (PostToolUse)

When an agent edits a highly optimized layout or structural file, it often misses deep breaking changes or introduces regression patterns.

Instead of manually reminding the model, let a PostToolUse hook feed the compiler or performance linter errors back into the agent loop automatically:

#!/bin/bash
# .claude/hooks/verify-edge.sh
INPUT=$(cat)
FILE_PATH=$(echo "$INPUT" | jq -r '.tool_input.file_path // empty')
 
[[ ! "$FILE_PATH" =~ src/edge/ ]] && exit 0
 
# Run a fast, incremental optimization check
OUTPUT=$(npm run lint:edge -- "$FILE_PATH" 2>&1)
if [ $? -ne 0 ]; then
  echo "Performance/Linter errors found after editing $FILE_PATH:" >&2
  echo "$OUTPUT" >&2
fi
exit 0
 

If Claude breaks a configuration or pulls in an unapproved library, the hook returns the error stack trace as tool feedback. Claude reads the failure, corrects its own code, and repeats until the codebase passes verification.


Evaluating the Agentic Loop

Evaluating text outputs is simple; evaluating an interactive agent loop is notoriously difficult. The model is naturally optimistic about its own work and will routinely claim success when a task is half-finished.

Never trust the agent's prose. Only trust what the environment confirms. If the task was to fix a data mapping issue in a commerce API route, programmatically verify the schema output or query the test endpoint directly.

Automated Benchmarking with skillspector-quality

To take the subjective guesswork out of this process, I built an open-source library called skillspector-quality.

I needed a programmatic way to measure exactly how well different agent instructions performed and how much they cost. Instead of reading through hundreds of terminal transcripts by hand, it provides automated evaluation metrics that check prompt adherence, token efficiency, and final environmental states. It turns agent optimization into an engineering discipline rather than a guessing game.

I put the scores to the test with a benchmark across Haiku, Sonnet, and Opus, where a strict SKILL.md moved structured-output correctness from 0% to 100%. The full numbers are in A Good SKILL.md Is the Cheapest Reliability Upgrade You'll Make.


The Reality Check

Agentic development moves fast enough that today's hard limits are often next week's defaults. I traced that shift from hype to production at the AWS Summit Hamburg 2026. So if a task fails completely under the current model, treat it as a "not yet", not a hard "no."

The responsibility doesn't move, though: the model writes the code, but you own the commit. A confident, well-optimized hallucination is still your production incident at 2 AM. Keep your path rules surgical and let hooks automate the verification.

Which guardrails are actually saving you tokens? Tell me on LinkedIn.

What do you think?

Common questions

What is vibe coding with Claude Code?
Vibe coding is an AI-first development style where you describe intent in natural language and let the agent write, run, and iterate on the code. Claude Code is the terminal-native agent from Anthropic designed for exactly this workflow: it explores the repo, writes the code, runs checks, reads the output, and loops until the task is done, with no manual file switching needed.
How do I reduce token usage in Claude Code?
The biggest wins are: keep CLAUDE.md under 200 lines and move topic-specific rules to path-scoped files in .claude/rules/ so they only load when relevant; use /compact when the window fills instead of letting it bloat; scope every grep and search so raw matches don't flood the context; switch to a smaller model like Haiku for repetitive or navigation tasks; and send research-heavy subtasks to a subagent so the digging stays out of your main session.
What is a Claude Code hook?
A Claude Code hook is a shell command that runs automatically at a point in the agent's lifecycle, such as before a file read or after an edit. Unlike a CLAUDE.md instruction, which the model can choose to ignore, a hook fires every time, which makes it the right tool for things that must never be skipped like secrets protection, formatting, and type checks.
What is the difference between PreToolUse and PostToolUse hooks?
A PreToolUse hook runs before the tool call and can block it by exiting with code 2, so it suits security and validation. A PostToolUse hook runs after the operation has already happened, so it cannot block; it sends feedback back to Claude and suits formatting and type checks.
How do I stop Claude Code from reading my .env file?
Register a PreToolUse hook with the matcher Read|Grep that reads the file path from the JSON event on stdin and exits with code 2 when the path matches a secret, and widen the pattern past .env to credentials files, keys, and secrets directories. Treat the hook as a second layer: the hard control is settings-level permissions.deny, which the client enforces before the tool runs, while the hook only reports back after it fires.
Do Claude Code hooks work outside TypeScript?
Yes. The type-check feedback loop pattern works in any typed language: swap tsc --noEmit for mypy in Python, go vet in Go, or cargo check in Rust. For untyped languages, run a linter or the test suite in the hook instead.
Should I use hooks or CLAUDE.md rules?
Use CLAUDE.md for guidance the model should weigh, like conventions and architecture notes, and use a hook for any rule that must hold deterministically. CLAUDE.md instructions are advisory and drift as the file grows; a hook costs zero tokens per turn and either passes or fails.
What should go in CLAUDE.md?
Keep the root CLAUDE.md under 200 lines and limit it to rules that are truly global: project conventions, directory layout, and anything that applies in every file. Topic-specific rules (API patterns, component conventions, test standards) belong in path-scoped files under .claude/rules/ so they only load when Claude reads a matching file. Rules that must hold deterministically belong in hooks, not CLAUDE.md, because hooks cannot be ignored the way prose instructions can.
Lars Roettig

Lars Roettig

Senior Technical Architect writing about AI, engineering, and building things that last.

LinkedIn →

// recommended

You might also enjoy