Ternary Search: Reflections on using Claude Code

This is the follow-up to my post from two weeks ago on observations using Claude Code (Opus 4.5 to start, 4.6 after it released) to rebuild the kfchess.com website, with the constraint that I would not write any of the code myself. 60k lines of code and 100 commits later, I'm happy to share that I've successfully deployed the new site to production! I estimate that it took me 60-80 hours of total work over the course of 3.5 weeks, which I estimate is ~3x less time than it would have taken me to do myself. All of the code can be found in this repository.

My goal for this post is to document what I found to be easy vs hard for Claude Code (CC) in order to refine my mental model about what AI coding tools are currently capable of. In the same way that a good software engineer understands when to use Postgres vs Redis (I use both in kfchess), the engineers who have the best mental models of CC will get the most out of it. And increasingly, your ability to leverage tools like CC is strongly correlated to how quickly you can build software.

So, let's start with the good stuff -- things where I was genuinely impressed and found a ton of value in what CC did for me. There's a lot.

Project bootstrap. I mentioned this in my previous post, but I was able to get started developing the new version of kfchess much faster than I would have otherwise. CC does a great job of choosing solid technologies and putting together a development environment that is both modern and easy to work with.
Anything CRUD+. For your typical website, you have lots of pretty straightforward CRUD operations that just need to be built. It's no surprise that CC is good at doing this pattern matching. But it was also really good at the slightly less common, but still standard parts of the site: Google OAuth, email verification, WebSocket setup, database migrations, etc.
Architecture design. I asked CC to design a multi-server architecture that allows games to persist across restarts and move between processes. This is a nontrivial capability, and it came up with a solid design that I was able to work off of. I had to steer the implementation and help it resolve a bunch of edge cases, but it was directionally good.
Bonus UX. After I describe a feature to CC, it does its own small extrapolation of what would be useful from a UX perspective and automatically implements those features. Some examples of things I didn't specify but were great: the entire lobby feature, replay playback controls, game over modal, pagination of games/replays.
Writing CSS + responsiveness. My own CSS skill is limited at best, and CC is great at covering for that. CSS as a domain is easily verifiable -- after I ask CC to make a change, it is easy for me to see if it worked and provide feedback. It is substantially more pleasant to write CSS through CC than it is to write it manually.
Extensive unit testing. To double down on the importance of verifiability, unit tests are really important for CC as a way for it to understand whether the changes it makes are good. The best thing here is that CC itself can write all the tests, and it can write way more tests than a human typically would because it's essentially "free." The test coverage on this repo is, without a doubt, the highest of any code I've ever written.
Easy debugging. Most of my debugging workflows look like: describe the observed issue to CC, ask it to investigate and fix, then have it write a test to catch the issue in the future. Probably 80-90% of bugs are fixed in this way without me having to do a deeper dive into the code to understand why the bug was happening at all.

It's easy to see how you get a boost to productivity with all of the above. There's also a thread running through most of these: CC is at its best when the problem is well-defined and the output is easy to verify. CSS looks right or it doesn't. Tests pass or they don't. CRUD endpoints either work or they throw errors. But when you stretch it beyond these domains and things get fuzzier, there are some clear limitations.

Game engine and AI player. Using CC to build the game engine and AI player took about the same amount of time as it would have taken me to build them myself. In the case of the game engine, there were so many edge cases in how a real-time chess game behaves that CC didn't cover, so I had to discover them myself and then teach CC how to fix them. On the AI player front, this is an inherently open-ended problem of coming up with a set of cheap heuristics that "feel" strong to play against, so I had to keep going back and forth on the ideas that CC would implement. Both of these domains lack the verifiability aspect that allows CC to thrive.
Complex debugging. This is the hardest aspect of software engineering, so it's not surprising that CC struggles here. There were three bugs in particular that CC was never able to solve even with many iterations: board resizing causing an infinite loop, the AI player incorrectly counting dangerous positions as safe, and stale games getting stuck in the registry. Interestingly, in each of these cases, gpt-5.3-codex (xhigh) made significantly more progress in identifying root causes.
Campaign levels. Designing campaign levels has an aspect of creativity and taste to it so that the levels are interesting and fun to play. CC's hit rate on levels that felt good to me was about 10%, and it mostly just created variants of other levels I had already designed. This is actually the only time where I did write "code" myself, i.e. describing the campaign levels.
Multi-system interactions. As the codebase grew over the weeks, I noticed that it became harder and harder for CC to keep track of exactly how different parts of the system should interact. For example, the games played as part of the campaign feature didn't integrate well with the replay feature. Increasingly, I felt that my role was to probe these multi-system interactions, much like how a senior engineer would inspect a junior engineer's work.

The mental model I came away with is that CC replaces most of the time-consuming, boilerplate parts of engineering, which lets me focus on the more open-ended and deep problems. That's a fantastic improvement to the workflow, and it's the first time that an AI coding tool has enabled me to build much better (not just faster) than before.

Ternary Search

Analytics

Wednesday, February 11, 2026

Reflections on using Claude Code

No comments:

Post a Comment