Claude Code Token Tips

⚙ Settings 9 one-time configuration changes

⚙ Setting 01

Configure .claudeignore

Token Savings

Every file Claude reads costs tokens. Without a .claudeignore, it can pull in node_modules, compiled output, lock files, and generated assets before you've typed a word. It uses the same syntax as .gitignore and lives at the project root. Set it up once and it silently guards every session.

# .claudeignore
node_modules/
dist/
.next/
build/
coverage/
*.lock
*.min.js
.git/

↑ back to top

⚙ Setting 02

Write a tight CLAUDE.md

Token Savings

CLAUDE.md loads into every session. A bloated one wastes tokens on boilerplate before you've asked anything. Keep it to facts that are non-obvious from the code: stack choices, banned patterns, naming conventions, key commands. Strip everything Claude could derive by reading the files themselves.

# Good — tight, non-obvious facts only
- Stack: Next.js 14 / TypeScript / Prisma / Postgres
- Run `pnpm test` before any commit
- Never use `any` type — use `unknown` + narrowing
- DB migrations: /prisma/migrations only

# Bad — Claude can read this itself
- This is a web app built with React...
- The folder structure is as follows...

↑ back to top

⚙ Setting 03

Pick the right model per task

Token Savings

Haiku is roughly 20× cheaper than Opus per token — and for lookups, renaming, simple edits, and explaining a function it's equally capable. Use --model haiku as your default and only escalate when the task genuinely needs deeper reasoning. The cost cliff between tiers is steep; matching model to task is the highest-leverage change you can make.

# Quick lookup or small edit — use Haiku
claude --model haiku "what does getUserById return?"

# Non-trivial implementation — Sonnet
claude --model sonnet "implement the checkout flow"

# Complex reasoning only — Opus
claude --model opus "design the multi-tenant auth model"

↑ back to top

⚙ Setting 04

Disable unused MCP servers

Token Savings

Every active MCP server injects its full tool schema into every single request — whether you use it or not. A project with five connected MCPs can add thousands of tokens of tool definitions before Claude has read a line of your code. Audit ~/.claude/settings.json regularly. Remove servers you're not actively using; enable them only for sessions that need them.

// ~/.claude/settings.json
{
  "mcpServers": {
    // Keep only what this session needs
    "filesystem": { "command": "npx", ... }

    // Disable idle servers — they still cost tokens
    // "slack": { ... },
    // "github": { ... }
  }
}

↑ back to top

⚙ Setting 05

CLI over IDE connector

Token Savings

The VSCode and JetBrains extensions for Claude Code automatically inject context you didn't ask for: open file tabs, workspace diagnostics, editor state. Running claude directly in a terminal gives you complete control over what enters the context window. For focused coding sessions, the CLI is leaner and more predictable — you provide exactly what's needed, nothing more.

↑ back to top

⚙ Setting 06

Scope --add-dir tightly

Token Savings

--add-dir makes additional directories available for Claude to explore. Adding a whole monorepo when you're fixing a bug in one service is wasteful — Claude will read broadly when it feels it needs context. Pass only the directories relevant to the current task. If you're working on services/auth, add that, not the project root.

# Focused — only what matters
claude --add-dir ./services/auth

# Wasteful — Claude can roam the whole repo
claude --add-dir .

↑ back to top

⚙ Setting 07

Use the memory system

Token Savings

Without persistent memory, Claude re-derives context from scratch every session — re-reading files, re-asking questions, re-learning preferences. Use /memory to persist decisions, patterns, and architectural context across sessions. Well-maintained memory means Claude arrives already oriented, spending tokens on actual work instead of orientation.

# Save a fact during a session
/memory

# Memories persist at:
# ~/.claude/projects/[project]/memory/
# Relevant ones load automatically at session start

↑ back to top

⚙ Setting 08

Allowlist bash commands

Token Savings

Each unapproved bash command triggers a permission prompt round-trip that interrupts flow and adds overhead. Safe, read-only commands like ls, git status, and npm test don't need to ask. Add them to the allowlist in .claude/settings.json once — your sessions run faster and the permission noise disappears.

// .claude/settings.json
{
  "permissions": {
    "allow": [
      "Bash(git status:*)",
      "Bash(git log:*)",
      "Bash(npm test:*)",
      "Bash(ls:*)"
    ]
  }
}

↑ back to top

⚙ Setting 09

--print for quick Q&A

Token Savings

When you just want an answer — "what does this regex do?", "explain this algorithm", "how should I structure this?" — use --print for a non-interactive single-shot response. No session state is maintained, no conversation history accumulates, and you skip the interactive session overhead entirely. Perfect for quick lookups and one-off explanations.

# Non-interactive — answer and exit
claude --print "explain the difference between \
optimistic and pessimistic locking"

# Pipe it straight to a file or stdout
claude --print "summarise this function" | pbcopy

↑ back to top

◆ Practices 11 habits that compound over time

◆ Practice 10

/compact or /clear at every task boundary

Token Savings

Context accumulates relentlessly — every tool call, file read, and exchange adds to the running total. When you finish a task and move to something else, that history is pure overhead. /compact summarises the session before continuing; /clear resets entirely. The discipline of doing this at every task boundary is the single highest-impact habit to build.

# Summarise session and continue lean
/compact

# Full reset — start from zero
/clear

↑ back to top

◆ Practice 11

Write specific, scoped prompts

Token Savings

Vague prompts trigger exploratory behaviour. "Auth is broken" causes Claude to scan broadly, read multiple files, and form hypotheses — all at your expense. "Fix the null check at src/auth/verify.ts:42 — it throws when user.id is undefined" goes straight to the answer. The more precisely you describe location and nature, the fewer tokens get burned on exploration.

# Vague — triggers broad, expensive exploration
"the login is broken"

# Specific — straight to the fix
"fix the null check at src/auth/verify.ts:42
— throws when user.id is undefined"

↑ back to top

◆ Practice 12

Reference file:line — never paste raw code

Token Savings

Claude Code reads files natively. Pasting code you could reference just duplicates that content in the context window — you pay for it twice. Instead of pasting 60 lines, write src/utils/parse.ts:45-105 and let Claude read it directly. For larger functions this is a meaningful saving, and it keeps prompts clean.

# Wasteful — pasting content Claude can read
"here's the function: [60 lines pasted]
can you refactor it?"

# Efficient — reference by location
"refactor the parser at src/utils/parse.ts:45-105"

↑ back to top

◆ Practice 13

Batch related changes into one turn

Token Savings

Every prompt carries the full conversation history. Five separate prompts for related changes = five times the context overhead. If you know you want to fix a bug, update its test, and update the docs, say all three at once. Claude handles multi-part instructions well, and you pay the context cost once instead of five times.

# Expensive — 3 turns, 3× context overhead
"fix the null check in auth.ts"
"now update the test for it"
"now update the docs"

# Efficient — one turn
"fix null check in auth.ts, update its test,
and update the relevant docs section"

↑ back to top

◆ Practice 14

Plan Mode before complex implementations

Token Savings

Jumping to implementation on a complex task risks Claude going in the wrong direction — which you then spend tokens correcting. Plan Mode forces a read-only planning phase first. You align on approach before a single file is changed. The cost of a short planning session is almost always less than the cost of a wrong-direction implementation and its correction.

# Enter plan mode — read-only, no edits made
/plan

# Review the plan → approve → implement
# Catches misaligned assumptions before they cost you

↑ back to top

◆ Practice 15

One task per session

Token Savings

Mixing "fix a bug + add a feature + refactor something" in one session creates a sprawling context where earlier work crowds out the current task. Each new task in the same session inherits the full history of everything before it. Separate sessions mean separate, clean contexts — each task starts lean, without carrying irrelevant history as overhead.

↑ back to top

◆ Practice 16

Ask targeted questions, not "explore and summarise"

Token Savings

"How does authentication work?" is open-ended with no natural stopping point — Claude scans broadly and reads extensively before answering. "What does verifyToken() in src/auth/verify.ts return when the token is expired?" is bounded to a single read. Frame questions around specific functions, files, and outcomes rather than broad topics.

# Open-ended — triggers broad, unbounded scan
"how does authentication work in this codebase?"

# Targeted — one read, direct answer
"what does verifyToken() return when the JWT
is expired? (src/auth/verify.ts)"

↑ back to top

◆ Practice 17

/clear and restart after going off-track

Token Savings

When Claude misunderstands and heads in the wrong direction, the instinct is to correct in-session. But corrective messages compound the context — you carry the wrong attempt, plus the correction, plus the context of how you got there. It's almost always cheaper to /clear and start fresh with a better prompt than to course-correct mid-session.

# Expensive — correcting in-session
"no that's wrong, I meant X not Y"
"still not right, the issue is actually..."

# Cheaper — clear and rephrase
/clear
"[better, more specific prompt]"

↑ back to top

◆ Practice 18

Use sub-agents for parallel independent tasks

Token Savings

When you have two or more independent research tasks — say, investigating an error log while also checking the schema — running them sequentially in one conversation doubles the context you carry. Dispatching sub-agents keeps that exploratory work out of the main context entirely. The main thread stays lean; sub-agents do the heavy reading in parallel.

↑ back to top

◆ Practice 19

Run targeted tests, not the full suite

Token Savings

Test output lands in the context window. Running a full suite when you've changed one file means Claude reads (and you pay for) hundreds of unrelated test results. Target the pattern relevant to your change — feedback is faster and the context impact is a fraction of the full suite.

# Full suite — lots of unrelated output in context
npm test

# Targeted — only the output that matters
npm test -- --testPathPattern=auth

# Or with Vitest
vitest run src/auth

↑ back to top

◆ Practice 20

Monitor spend with /cost

Token Savings

You can't optimise what you don't measure. /cost shows token spend for the current session. Seeing "that exploration cost $0.40" reshapes how you prompt next time. Build a habit of checking it after expensive-feeling sessions — patterns become visible quickly: which types of prompt are cheap, which burn through budget.

# Check spend for the current session
/cost

# Shows input tokens, output tokens, and estimated cost
# Run it after any session that felt expensive

↑ back to top