Less Impressed, More Involved: A Guide to Using Claude Code Effectively

Something unexpected happened to me when I started using Claude Code to build Derive — I didn't think about my code lying in bed at night anymore.

At Churnkey, I'd lie in bed mentally refactoring, obsessing over architectural decisions, seeing data flows in my head. I knew every line. Every function. Every quirky workaround and why it existed. The codebases lived in my head rent-free.

With Derive, built mostly with Claude Code, I didn't lie in bed thinking about the codebase. This should be a good thing, right? It's not.

For the first time in my career, I was building a product where I didn't know every line of code, and I think this matters more than the productivity gains.

Behind the founding of every great product, I think there needs to be at least one person who is unconditionally obsessed. It's the sacrifice that's needed to birth a product into the world. It's what drives us to spend nights and weekends putting care and craftsmanship into every detail.

Shifting Software Paradigms

Let's take a step back for context. Over the past three months, I've averaged about $1,000 per month in Claude Code usage, according to ccusage. I'm not an AI evangelist, but I'm not a skeptic either. I'm someone trying to figure out how to use a new tool in the most effective way possible.

In the past 24 months, I went from Microsoft Co‑Pilot to Cursor Tab++, to Cursor's sidebar chat (one file at a time), then to Cursor multi‑file edits, and now to Claude Code.

Date	Models	Input	Output	Cache Create	Cache Read	Total Tokens	Cost (USD)
2025-08-01	opus-4, sonnet-4	19.5k	70.7k	5.2M	90.2M	95.5M	$71.43
2025-08-02	opus-4, sonnet-4	834	21.1k	916.6k	24.8M	25.7M	$16.12
2025-08-03	opus-4, sonnet-4	4.2k	46.6k	4.6M	52.7M	57.4M	$75.93
2025-08-04	opus-4, sonnet-4	14.1k	68.9k	5.0M	107.6M	112.7M	$174.90
2025-08-05	opus-4, sonnet-4	1.7k	22.6k	1.7M	35.0M	36.7M	$74.77
2025-08-07	opus-4, sonnet-4	3.8k	99.7k	2.9M	61.5M	64.5M	$152.23
Total (30 days)	—	91.2k	792.5k	68.8M	961.6M	1.03B	$1,194.43

Heavy Claude Code usage across four projects gave me a useful spread of data points:

Churnkey: I co-founded Churnkey and was CTO for 4+ years. It's a large codebase that I know like the back of my hand. I now do part-time technical IC (individual contributor) work.
Derive: A new AI education platform I'm building from scratch. I'm using a Next.js stack, which I haven't used before in a professional setting, and the Vercel AI SDK, which I also haven't used before.
PerThirtySix: A hobby sports analytics project that I've been intermittently working on with Shri Khalpada for the past 4 years. It's a Nuxt 3 application.
DraftAnything: A new hobby weekend project that I built this month with Shri. Using the same Next.js + Tailwind + Shadcn stack as Derive.

I'll break this work into some categories that will be helpful for the discussion that follows:

Project	Codebase Size	Codebase Familiarity	Framework/Tech Familiarity	Setting	Claude Code Effectiveness
Churnkey	Large	Deep	Deep	Professional	Fantastic
Derive	New	-	Outdated	Professional	Good then bad
DraftAnything	New	-	Outdated	Hobby	Good
PerThirtySix	Medium	Medium	Deep	Hobby	Bad

The Robustness–Specificity Trade-Off

The fundamental frustration of software engineering has always been this: someone says "we should add auto‑save" and everyone roughly knows what that means. Five words. Seemingly clear intent. But within that "roughly" lie all the edge cases, and that's where the explosion happens. Data structures, debouncing logic, conflict resolution, error states, retry mechanisms, user preferences, performance considerations.

We've spent our careers learning to bridge this gap—taking nonspecific human intent and transforming it into zero-tolerance, fully explicit, machine instructions. This transformation has always been the bottleneck in building software. And it's difficult. It's why software engineers get paid what they do. It's why the last 10% of the project takes 50% of the time, and it's why technical staff and non-technical PMs tend to butt heads.

LLMs invert the equation: they trade specificity for robustness—they'll understand "add auto‑save" however you phrase it, but there's a huge range of potential implementations. In many ways, this is exactly the trade-off we've been craving. It's why LLMs writing code have taken the world by storm. We've been living in this world of zero robustness for so long—it's ripe for injections of robustness.

The Task–Text Compression Ratio

Some tasks can be compressed incredibly well into short text descriptions, with little loss in intent. They have high semantic compressibility.

My favorite one-shot Claude Code to date was a highly compressible task—adding a real-time theme picker to DraftAnything. It's something I've always wanted to build out, but never really felt like the time cost justified the benefit. It's very clear what a color theme picker means—there is little wiggle room with potential UX. When you add in a "...following best practices of Tailwind and Shadcn", the implementation path is narrowed. So, our short free-text description is also highly specific.

A theme picker built for DraftAnything, a task with high semantic compressibility.

Comparatively, I recently built an artifact system, similar to Claude, for Derive. Artifacts are React components rendered in a secure iframe next to a chat with our AI model. This can be built in many ways. A prompt like "Add artifacts similar to Claude Code that render React components" sits at the other end of the specificity spectrum.

A new artifact system built for Derive, a task with low semantic compressibility.

To get some sense of semantic compressibility, and if you've reached adequate prompting, we can borrow a favorite phrase from textbook authors—how well can the implementation be left as an exercise for the ~~reader~~ AI?

Sometimes Specifics Don’t Matter—But Mostly They Do

Lovable and Bolt.new work because sometimes the implementation details don't matter. Like if it's a proof of concept or a standalone component where architecture decisions that are made in a vacuum serve us perfectly well.

This also holds for new projects, where we want boilerplate that roughly follows best practices and can be updated later. I stood up the initial DraftAnything app (Next.js, Supabase Auth, Drizzle, Tailwind, Shadcn) in about 90 minutes. Previously, this would probably have taken me two days of work.

The more a project or feature grows, the more we should care about the specifics. This doesn't mean we shouldn't use agentic coding agents. It just means we need better patterns.

The Tumbleweed Effect

The tumbleweed effect is the failure mode where iterative AI suggestions create increasingly tangled code. It starts innocently: you ask for a webhook handler with a half-baked prompt. Claude writes one; you notice it doesn't verify signatures, so you ask for that, which breaks the parsing. You ask for a fix, which creates a problem in a shared function.

We hit the tipping point. Instead of jumping into the codebase and doing the critical thinking yourself, you start to rely on the AI agent to fix the problem. You do things like "No, Claude—really take a step back, think about why this is happening, and address the root cause." It might work, but normally it just lands you further down the rabbit hole. It's frustrating, feels crusty, and is a waste of time. It's at this move that AI has transformed from a tool into a crutch.

The tumbleweed effect is, in my view, the worst outcome of using a coding agent. The solution isn’t to fix it after—it’s to prevent it through upfront planning.

How I Work With Claude Code

Generally speaking, Claude Code has made it much faster for me to go from knowing how I want to build something to getting it built. But if I don't know what, or at a high level how, to build something, Claude cannot fill this gap in an effective way.

Claude Code, and LLMs in general, are fantastic at writing code in small, isolated environments. This has been the case with AI since day one—the more controlled and constrained, the better we can expect a model to perform.

When the model must account for a larger context—including context outside the code (brand, business constraints)—performance degrades.

It's odd because we interact with a model that can appear perfectly logical—but it isn’t. Our proxies for how to work with it are nonexistent, so people assume it's either a god or a moron. In reality, it’s superhuman in some contexts and lacking in others.

What follows are the patterns that have emerged and worked well for me. I am optimizing for shipping speed while:

Writing code of the same or better quality than I would without a coding agent. This means fewer bugs and tidier code that isn't over‑engineered.
Maintaining deep codebase familiarity. I define this as a feeling that I could, at any point, jump into the codebase and write code by hand.

These are best practices for wrestling robustness into specificity. How can we harness the robust power of LLMs to turn squishy project descriptions into less squishy RFCs and then into perfectly specific and verifiable code.

Directory Structure

Before I let Claude write a single line of code, I create a clean directory containing only what's relevant to the current task. I actively rm -rf repos that aren't related to the current project, and git clone back in repos that are. It's not about disk space or performance—it's about attention and context. When Claude searches for patterns or imports utilities, I want it finding the right ones.

So a directory that previously had five company repos becomes:

Directory Structure

/derive
  ├── docs
  │   └── [project]
  │       ├── 1.overview.md
  │       ├── 2.codebase-knowledge.md
  │       └── 3.working-plan.md
  ├── inference-api   # Just the API I'm working on
  └── derive.to       # Frontend application

# remove repos that are unnecessary for current project
/derive
  ├── model-training   # rm -rf
  ├── model-evaluation # rm -rf
  └── datasync         # rm -rf

When I type @ to reference files, I get signal, not noise. When Claude looks for examples of how we handle errors, it finds the right one—not three different approaches across different repos. This simple change focuses context and eliminates an entire category of problems before they start.

Document 1: Project Overview (I Write This)

Before I let Claude write code, I write a document. Not just for Claude, but for myself. I spend thirty minutes to upwards of two hours writing what I would write for a human engineer: what we're building, why we're building it, how it fits into the existing system, what patterns to follow, what pitfalls to avoid. This document becomes the North Star that prevents architectural drift.

The structure I've converged on after dozens of projects:

Overview: A 2–5 sentence description of what we're building and why we're building it. Instead of something like "Add subscription pausing", we want something more like "Users are churning because they can't temporarily pause subscriptions during vacations. We need a pause feature that maintains their subscription state but stops billing for 1–30 days." The "why" matters for the same reason it matters when you're working with other people. Claude will make hundreds of little decisions as it implements—how to name variables, when to extract a function, whether to optimize for readability or performance. The more aligned it is with the big picture, the more those micro-decisions will be correct.

How It Will Work: High‑level architecture. Make the first cut of decisions to prune the biggest branches. It's fine if some decisions remain—call them out. Discuss alternatives and why you recommend one.

Technical Context: This is where you should call out all relevant things to be aware of. Codebase patterns (and established anti-patterns), bottlenecks, what additional business context means for this project.

Relevant Codebase Files: Important file paths and why they matter.

This document can and should take time. It's always been best practice to write a document like this before starting project work, but often we just jump in and form this document in our minds while coding. With AI doing most of the typing, we need to explicitly make space to do this thinking. Every hour here saves three hours of untangling later. Claude can't save you from unclear requirements—it will just implement your confusion faster.

Below is an example of a project overview document for Derive.

1.overview.md

# Generate Flashcards from Chat Highlights

## Overview

Users learn in chat, but turning those moments into durable memory still means copy/pasting into another application or reviewing chats. We’re adding highlight‑to‑flashcard in the chat: select text, get a clean question/answer, tweak if needed, save. This reduces friction now and sets us up for spaced repetition later—without over‑designing the schema up front.

## How It Will Work

In the chat, selecting text reveals a small action, similar to Kindle. Clicking "Generate Flashcard" sends the selection and a bit of surrounding context to the server, which generates one Q/A pair. The preview lets the user edit wording or ask for a re‑roll; saving writes a minimal record that links back to the conversation.

Key decisions

- Default model: Claude 3.5 Haiku for cost/quality. Alternative: GPT‑4o mini; re‑evaluate with production samples.
- Scope: one card per selection in v1. Batch generation is a follow‑on.
- Re‑rolls: vary temperature and prompt emphasis to improve specificity; cap attempts to avoid loops.
- Storage: keep a minimal per‑card record; skip decks/sharing in v1.
- Selection: allow across adjacent messages; cap selection length for quality and latency.

Open questions

- Adjacent vs arbitrary cross‑message ranges vs full context (recommend adjacent only for v1).
- Basic moderation/validation for low‑quality or unsafe inputs (ignore for now)

Alternatives considered (deferred)

- Bulk conversation → card set: great for long sessions, but needs deduplication, importance ranking, and a batch‑edit UX. Defer to phase 2.
- Event‑sourced storage: perfect history/undo, too heavy for v1.

## Technical Context

- Frontend: use shadcn primitives when possible.
- AI: reuse our streaming/tool patterns; centralize prompts.
- Auth: Supabase SSR helpers; no secrets on the client; derive user on the server.
- Database: Drizzle migrations; per‑user ownership with RLS; minimal schema for v1.
- Performance: cap selection length and set sane timeouts; show clear, recoverable errors; debounce rapid retries.
- Observability: counters for generate/re‑roll/save; record failure reasons without content.
- Privacy: scrub PII in telemetry; do not log selected text or full answers.

## Relevant Codebase Files

- `components/chat.tsx` — chat container and state wiring.
- `components/chat-messages.tsx` — message rendering; selection integration point.
- `components/message.tsx` — per‑message selection boundaries and affordance placement.
- `components/chat-panel.tsx` — input area; minor integration for feedback.
- `components/user-text-section.tsx` — user content rendering; selection hooks.
- `app/api/chat/route.ts` — reference for streaming patterns.
- `lib/actions/chat-db.ts` — DB operations pattern to mirror for flashcards.
- `lib/db/index.ts` — Drizzle/Postgres setup.
- `lib/agents/master-teacher.ts` — model invocation style to align with.
- `lib/streaming/` — AI SDK v5 streaming utilities.
- `lib/tools/` — existing tool calling patterns.
- `lib/supabase/client.ts`, `lib/supabase/server.ts`, `lib/auth/get-current-user.ts` — auth context.

Document 2: Codebase Knowledge (Claude Writes This)

Neurosemantic coherence, a term coined by Cal Newport, describes the mental state which emerges when you've been black‑hole‑focused on something. It's when all of your neural wiring is primed for work on a given task, and all of the brain noise is quieted.

We can achieve LLM neurosemantic coherence by optimizing our available context window. We want it to have everything it needs in that window (recall), and for everything in there to matter (precision).

After writing the RFC document, I have the coding agent bootstrap its own context.

Prompt to create 2.codebase-knowledge.md

We're going to begin work on the flashcard project. An overview of this project is in @docs/flashcards/1.overview.md. Please read through the project overview and all linked files. Think deeply about the codebase and the files and coding patterns that will be relevant for effectively completing this project. Then, read all relevant code into context, documenting helpful code knowledge into a markdown document /docs/flashcards/2.codebase-knowledge.md, linking to the relevant files.

Document 3: Working Plan (Claude Writes, I Review and Iterate)

Now that we've bootstrapped our context window, I task the coding agent with creating a detailed implementation plan.

Prompt to create 3.working-plan.md

Acting as a senior engineer, create an implementation plan in @docs/flashcards/3.working-plan.md. This document will serve as our working document, which you will update as you go along and always keep up to date. Your implementation should follow a tracer pattern—we should get a basic implementation working end-to-end first, and then we can add additional features. At every step along the way, you should be able to test if the implementation so far is working as expected. Before beginning implementation, I will review your implementation plan. Please ask clarifying questions when needed.

First, I read the implementation plan as written. I review it as I would an implementation plan from any engineer. I point out things that don't make sense and either get Claude Code to update the plan or convince me its way of doing something is correct. After I'm happy with the plan, I tell it to begin implementation.

Tracer Bullet Implementation

The tracer bullet pattern has become my most reliable technique for maintaining quality. Instead of letting Claude build complete features, I insist on the simplest possible end-to-end implementation first.

At this point, the LLM should have everything it needs to begin feature development. We've been opinionated about architectural decisions that could misguide it, and we have loaded the right context, attaining neurosemantic coherence for the LLM.

Before running, make sure you commit any previous changes that may be lingering. This way, you can clearly see (and easily revert) any bad coding decisions that might be made.

Tracer Bullet Implementation Prompt

Please begin work on [project]. An overview is in @docs/project/1.overview.md.

Please be careful to follow best practices and patterns used throughout the codebase. Be very thoughtful before writing any code, using tidy practices while not overengineering. Follow a tracer bullet implementation pattern, getting a basic implementation working and tested end-to-end, and then we will add features. Do brilliant, well thought-out work.

I've found the tracer bullet pattern achieves three crucial things:

Guards against Claude's strong tendency to over-engineer.
Makes it available as soon as possible for end-to-end testing.
Makes it easier for us to keep up with code changes.

When it is done with this first one-shot attempt, you must read the code. It's tempting to see if it works and then start giving feedback on things to change and update, but this is where things start to go south and turn into a tumbleweed. We want to course-correct as early as possible.

If there are errors, first wrap your head around the missteps and poor patterns. Then, instruct it specifically why what they were doing was a bad idea and what it should do instead.

Iteration

This is the fun part. Once we have an initial tracer implementation that works and the new code has a stamp of approval, the iterative feature development and edge‑case cleanup tends to be blissful. Both you and the coding agent have reached optimal neurosemantic coherence, and productivity is peaking.

This is when you can hit a series of one-shots and the coding agent tends to just "do the right thing".

Iteration Prompt

I think we'll also need to keep track of retryStrategy on the campaign level, not just the scheduled action, right?

Iteration Prompt

Retry strategy also needs to be NONE if we don't do retries, for instance, if it's a stolen‑card decline code

The Best Agent Is a Self-Testing Agent

Whenever possible, get yourself out of the testing loop. This will sometimes require architecture changes—standing up a local Postgres instance, creating test users, exposing new endpoints, etc. These changes will pay for themselves again and again. When Claude can test its own code while it's working, it saves you from being the slow page‑refresh monkey that copies and pastes errors from the console into the terminal or from manually testing and explaining what's happening. It's reinforcement learning at its best.

The Frontend Challenge

Frontend work presents unique challenges. Backend code has clearer success criteria—the endpoint returns the right data or it doesn't. Frontend code is squishier. It needs to look right, feel right, and work across different states and screen sizes.

Claude's weakness here is consistency. Left to its own devices, it will create five different button styles, none matching your brand, none using your existing component library. It will inline styles in one component and use CSS modules in another. It will recreate utility functions that already exist with slightly different names.

The solution I've found to work well is to create two reference pages which serve as Claude's coding and style guides.

The Components Page

Every reusable component, properly styled, with usage examples:

ComponentsPage.tsx

export default function ComponentsPage() {
  return (
    <div>
      <Section title="Buttons">
        <Button variant="primary">Primary Action</Button>
        <Button variant="secondary">Secondary</Button>
        <Button variant="danger">Delete</Button>
        <Button disabled>Disabled</Button>
      </Section>

      <Section title="Forms">
        <Input label="Email" type="email" placeholder="[email protected]" />
        <TextArea label="Description" rows={4} />
        <Select label="Country" options={countries} />
      </Section>

      {/* ... more components ... */}
    </div>
  );
}

The Showcase Page

Your best UI patterns, the purest standard for how things should look and work. Take your favorite parts of the UI, and put them here. I've made the Derive showcase public as an example.

When Claude needs to build a new feature, I reference these pages:

Example Component Reference Prompt

Use components from @pages/components and follow patterns from @pages/showcase.

Conclusion

Coder agents are remarkable tools that should not be ignored. The productivity gains are real—upwards of 5x for familiar work, and probably closer to 30x for coding with unfamiliar languages and frameworks.

That said, we must become painfully aware of when we start to use them as a crutch instead of a tool. You cannot outsource understanding. You can delegate implementation, accelerate development, eliminate boilerplate, but you must own the architecture and maintain a connection to your code.

At Churnkey, I literally debugged a race condition at 3 AM because I could walk through the code line-by-line in my head. When I was on calls with enterprise customers discussing new features, I would know, in real-time, what an implementation would look like, and give accurate development estimates on the spot.

With Derive, when something broke, I had to read the code. I understood it quickly—Claude does good work—but I didn't know it. Understanding is intellectual. Knowing is intimate.

This extends beyond debugging. Product building requires obsession. The best insights come when you're not actively working, when your subconscious is processing problems. But when you don't know your code intimately, your subconscious has nothing to work with. You can't mentally refactor and build on what you haven't internalized.

Less Impressed, More Involved

In the first month of using Claude Code, I was impressed—especially when bootstrapping projects. Look what it can do! Look how fast! In a way, I became a spectator to my own development process.

Now, I'm less impressed and more involved. I use my project overview documents to make sure I'm thinking critically about everything I build. I read every line Claude generates, not just for correctness but for understanding. I refactor not because the code needs it but because I need to make it mine. I often write sections by hand just to maintain that connection.

This approach is slower at first than pure AI-driven development. But I'm maintaining something important: the connection to my codebase that makes great products possible.

Building great software has always required obsession. You need to think about your product constantly. In the shower. On walks. Before sleep. If you don't know your code well enough to think about it away from your computer, you lose this superpower.

The engineers who will thrive with these tools are those who find ways to maintain that intimacy while leveraging the acceleration. Who can ship faster without caring less. Who understand that building great products requires not just productivity but obsession.

AI Code Agents are a powerful catalyst. They accelerate our ability to build, but also our tendency to become disconnected from our work. The challenge is using them while staying engaged, thoughtful, and obsessed with the details that make software great.

Because users don't care how fast you shipped. They care whether you cared enough to get the details right. And you can't get the details right if you don't know what the details are.

Less Impressed, More Involved

Discussion

Newsletter

Get updates on AI, philosophy, and tech

Thank you for subscribing!