Climbchess

How Climbchess Works

TL;DR. Most chess training plateaus you because it conflates four distinct cognitive skills — pattern recognition, calculation, memory, and candidate-move selection — into one undifferentiated puzzle stream with one rating. Climbchess separates the four skills into four dedicated training modes with four ratings, sitting on top of 4,505 interpretable concepts extracted from Leela Chess Zero via sparse autoencoders. When you miss a puzzle you learn which skill failed and which specific pattern wasn't recognised.

4cognitive skills, separate ratings
4,505SAE concepts
9,950tagged puzzles
80curated endgames

Why most chess training plateaus you

When a coach watches you play they don't say "you're a 1500." They say things like "your pattern recognition is solid but your calculation breaks down past three moves," or "you see the right ideas, you just don't prune candidates well — you burn five minutes calculating an irrelevant move." Those are different skills. They fail in different ways. They need different drills. A single rating can't tell them apart.

Every popular puzzle trainer — chess.com, Chessable, Lichess, Aimchess — works the same way underneath: a single difficulty rating per puzzle, a single rating per user, match them up, repeat. The unit of feedback is "you got it right" or "you got it wrong." If you got it wrong, you don't know whether the issue was that you didn't see the idea, didn't read the line accurately, didn't remember the position long enough to plan, or wasted your time on the wrong candidate. So you grind through more of the same and your weak skill stays weak.

Climbchess is built around the assumption that your rating is bottlenecked by one of those four skills, and the diagnostic loop has to surface which one. The rest of this page explains how it does that.

Skill 1 — Pattern recognition

Pattern recognition is the ability to look at a position and see the right idea — the back-rank weakness, the loose piece, the breakthrough sacrifice. It's the skill conventional puzzle trainers think they're training. They sort of are; they just can't tell you, when you miss a puzzle, which pattern you didn't see.

This is what the SAE substrate is for. Every one of the 9,950 puzzles in Climbchess is decomposed into the specific Leela features that fire on the position. When you miss a puzzle, the trainer looks at the puzzle's hardest-firing concept, checks your per-concept rating against it, and queues puzzles tagged with that concept until your rating moves. Pattern recognition is trained in the open Puzzles mode and (more aggressively) inside Today and the Curriculum, both of which sample weighted toward your weakest concepts.

Skill 2 — Calculation

Calculation is reading a line accurately N moves ahead. It's a different skill from pattern recognition: you can know the right idea is "rook lift then sacrifice" and still miscount the resulting attack. Conventional trainers don't isolate this skill — they bundle calculation in with everything else, so weak calculators learn to lean on pattern recognition and never improve the actual calculation.

Climbchess has a dedicated Calculate mode. The trick: it strips the position to the minimum number of pieces required to read the line, asks you to visualise two to three ply ahead, and scales the piece count up as you improve. Calibrating to "minimum required pieces" first lets you train calculation in isolation, without it being bottlenecked by your pattern recognition or short-term visual memory simultaneously. As your accuracy holds steady, the trainer adds pieces back in until you're calculating in full positions again.

The piece-stripping heuristic isn't perfect — it occasionally removes a piece that mattered for the line. We log those cases and tune the heuristic over time. But the principle holds: a 1600 player with weak calculation can train two-piece sequences accurately before adding the cognitive load of a full board.

Skill 3 — Memory

Memory is holding a sequence in working memory long enough to plan around it. It overlaps with pattern recognition and with calculation, but it's separable: chess players have measurable differences in span — the longest sequence they can hold and reproduce — and span correlates with rating, but not perfectly.

Replay mode shows a position and a sequence of moves, hides it, and asks you to reconstruct. Your span — the longest sequence repeated correctly — is logged separately from your tactical rating. The drill scales the same way Calculate does: short sequences in simplified positions first, then longer sequences in richer positions. Players who feel "fuzzy" past three moves of calculation often discover their issue is span, not calculation per se. Targeting it as a separate skill lets them train it directly.

Skill 4 — Move selection (candidate pruning)

This is the skill coaches care most about and the skill puzzle trainers train least. In a real game you don't calculate every legal move. You look at a position, identify two or three candidates worth thinking about, and go deep on those. The skill being measured is the pruning step — what makes the cut before you start calculating. A player who burns five minutes calculating an irrelevant move and a player who spends five seconds pruning then goes deep on the right two have very different in-game ratings even with identical pattern-recognition and calculation skill.

Climbchess has a Move Selection mode where you pick your two or three candidates without calculation. The trainer reveals whether the engine's strongest move was in your chosen set. You're not graded on accuracy of the line — you're graded on whether the right move made it past your filter. Over time the trainer builds a profile of where your filter fails: positions where you exclude the strongest move, kinds of pieces you under-consider, structural cues you ignore.

The four ratings, side by side

Each of the four modes maintains its own rating. After about fifty sessions you have a profile that looks something like:

That profile tells you, by name, which skill is costing you rating. Calculation is the gap. Drill it directly in Calculate mode rather than grinding more open puzzles. Coaches teach this; no other app cleanly splits the rating signal.

The SAE substrate — 4,505 interpretable concepts

Underneath the four-skill split is the concept layer. Conventional trainers tag puzzles with twenty to sixty manually-curated themes: fork, pin, skewer, mate-in-2, deflection, decoy. Useful but coarse. A 1600 player who mishandles back-rank weaknesses but reads minority attacks well needs different drills than a 1600 player with the opposite blind spots. "Both fail at tactics" doesn't capture that.

A sparse autoencoder (SAE) is a small neural network that takes the dense, entangled activations from inside a larger model and decomposes them into a much wider dictionary of features — most of which are zero on any given input. Of the dictionary, typically only a handful fire at a time. Those firing features tend to correspond to one specific thing the model is thinking about. SAEs are the workhorse of modern mechanistic interpretability. Anthropic used them to find features inside Claude. DeepMind used them on Gemma. The Lc0 interpretability community has been applying the same recipe to chess for the last two years.

Why Leela Chess Zero? Lc0 is the strongest open-source neural chess engine. The weights are public, the architecture is published, the playing style is shaped by self-play rather than human pedagogy. The patterns it cares about are the patterns that win games, not the ones a 1950s textbook said mattered. We tap its residual stream — the running working memory the network reads and writes to as it processes a position — and train an SAE on it.

How we extracted 4,505 concepts

The extraction run, in numbers:

The full methodology — corpus, hyperparameters, sparsity schedule, dead-feature pruning, labelling pipeline, matchmaking math — is in the open-access preprint at the bottom of this page.

What the concepts look like

The labelled concepts are not vague themes. They are specific, structural, and check against your chess intuition. A handful of real examples:

Roughly 17% of the 562 firing concepts cluster around back-rank and king-safety motifs, ~14% around pawn structure, ~11% around piece coordination, with the rest spanning tactics, prophylaxis, and endgame technique. Detail is in the paper.

Per-concept ELO and the effort audit

Every user starts at a single base ELO. Every time you solve or fail a puzzle, the concepts that fired on that puzzle each get their own Glicko-style update. After about fifty puzzles you have ratings on thirty or more concepts; after a few hundred, on most of the 562 that fire. Most ratings cluster near your base. A few sit 200+ points lower. Those are your blind spots.

The diagnostic loop sits on top of those ratings. When you miss a puzzle, the trainer surfaces which of your concept-ELOs is below the puzzle's hardest-firing concept, then queues puzzles tagged with that concept until the rating moves. That's pattern recognition training, mechanised.

The companion to per-concept ELO is the effort audit. After a few weeks of training you have logged time-on-task per concept. The audit panel sorts concepts by (time spent) × (gap from your overall rating) and surfaces over-trained strengths versus under-trained gaps. Most players over-train what they're already good at — it's the comfortable thing to do — and the rating sits flat as a result. Surfacing that explicitly, by name, with a number, is the diagnostic the app exists to provide.

Real-game import + post-mortem

Abstract puzzles are useful but they're not what costs you rating in your actual games. What costs you rating is the same three or four pattern-blindnesses recurring in real positions you reach over and over. The real-game import closes that loop.

You paste a PGN. The post-mortem walks the game move by move, identifies positions where you played a clearly inferior move (engine eval drop above a threshold), and runs each of those positions through the SAE pipeline. The concepts that fired on your mistake positions get pushed into your curriculum. So next session, the trainer is drilling the patterns that are losing you actual rating in actual games — not random tactics.

It also surfaces game-level patterns: if every PGN you import shows the same concept firing on your mistakes, that concept is the bottleneck to fix this month. Coaches do this manually for $80–150 an hour. The mechanised version isn't as good as a strong coach, but it's the same loop and it runs every time you paste a game.

The seven training modes, end to end

  1. Today — daily 12-rep session tailored to your current concept profile and skill ratings. Mixes weak-concept drills with whichever of the four modes you've been underusing.
  2. Curriculum — adaptive 30-day journey across three layers: principles, named tactics, SAE patterns. Limited to the ~20 concepts that matter at your current band.
  3. Puzzles — open SAE-tagged tactic stream. Pure pattern recognition.
  4. Calculate — minimum-piece visualisation trainer described above.
  5. Replay — sequence memory trainer described above.
  6. Move Selection — candidate-pruning trainer described above.
  7. Rating Test — bisects your true ELO in roughly ten minutes via an adaptive 15-position test. No signup.

Plus an Endgame set of 80 curated studies with concept tags and solving guides, and the real-game import + post-mortem described above. The 562 concept-tagged opening pattern drills are surfaced inside the Curriculum and the Openings view — concept-tagged so opening study trains pattern recognition on positions you'll actually reach, rather than rote move memorisation.

What makes this different from other trainers

chess.com, Chessable, Lichess, and Aimchess all serve the same puzzles to every user at a given rating. Their unit of personalisation is "your overall puzzle ELO." Climbchess's unit is "your rating on each of four cognitive skills, plus your rating on each of the SAE concepts that fire in the puzzle pool."

That changes what training feels like. You stop drilling random tactics. You start hitting your specific weaknesses, in order, with diagnostics that get sharper the more you train. Because the four skills are decomposed, you can see — by name — which is costing you rating. Because the concept layer comes from a top neural engine, you're training the patterns that move the needle in real games, not the ones tradition says you should care about.

It is also free, runs in the browser, requires no email, and stores nothing about you on a server unless you opt in.

Honest tradeoffs and known limitations

We don't know if this works for everyone yet. Things we already know don't work, or work poorly:

Frequently asked questions

Is Climbchess free?

Yes. The full trainer — every mode, every puzzle, every concept rating — is free with no email signup. Future paid tiers may add cloud-sync upgrades or curated curricula; existing free features stay free.

Do I need an account?

No. Climbchess runs in your browser. Your progress saves to local storage by default. If you want cross-device sync, claim an anonymous UUID — no email, no PII.

What is a sparse autoencoder, in one paragraph?

A small neural network that decomposes a bigger model's dense internal activations into a much wider dictionary of features — usually with only a handful firing at once. Each feature tends to correspond to one specific human-interpretable concept. SAEs are the standard tool of modern mechanistic interpretability.

Why split training into four skills?

Because they fail in different ways and need different drills. Pattern recognition, calculation, memory, and candidate-move selection are separable cognitive skills with different rating curves. A single rating can't tell them apart, so you can't tell which one is bottlenecking your play.

How does Calculate mode train calculation in isolation?

It strips the position to the minimum number of pieces required to read the line, asks you to visualise 2–3 ply, and scales the piece count up as you improve. Calibrating to "minimum required pieces" first lets you train calculation without it being bottlenecked by your pattern recognition or short-term visual memory at the same time.

How does Replay mode train memory?

It shows a position and a sequence, hides them, asks you to reconstruct. Your span — the longest sequence repeated correctly — is logged separately from your tactical rating. Memory and pattern recognition correlate but are not the same.

How does Move Selection mode train candidate pruning?

You pick two or three candidate moves without calculation. The trainer reveals whether the engine's strongest move was in your chosen set. The skill is pruning, not calculation. Distinguishes "spent five minutes on the wrong move" from "spent five seconds pruning, then went deep on the right two."

What is the effort audit?

A panel that sorts your concepts by (time spent) × (distance from your overall rating). Surfaces over-trained strengths versus under-trained gaps. Most players over-train what they're already good at; surfacing it explicitly is the diagnostic the app exists to provide.

How does the real-game import work?

You paste a PGN. The post-mortem extracts the SAE concepts that fired on positions where you made mistakes, then feeds them into the curriculum. The trainer learns what's costing you rating in actual games, not just abstract puzzles.

Why is this better than chess.com puzzles?

Chess.com serves you puzzles based on a single overall rating. Climbchess separates four training modes with four ratings, and underneath uses 4,505 interpretable concepts so when you miss a puzzle you learn which specific skill failed and which specific pattern wasn't recognised.

How long until I improve?

Most users feel sharper recognition of recurring patterns inside two weeks of fifteen minutes per day. Measurable rating gains depend on starting ELO, time invested, and whether you also play and review games. Climbchess accelerates pattern recognition; it does not replace play.

What ELO range does Climbchess work for?

Roughly 1000 to 2200 FIDE-equivalent works cleanly. Below 1000 and above 2200 the per-concept curriculum gets noisy because puzzle density per concept drops.

Do you store my data?

By default, no. Progress lives in your browser. If you opt into cloud sync, we store an anonymous UUID and your concept-ELO vector — no email, no name, no game history.

How accessible is Climbchess?

Four colour-blind palettes (deuteranopia, protanopia, tritanopia, monochrome) with full UI flip — not just the chessboard. TTS pattern narration speaks the concept name and outcome, useful while walking. Keyboard-first navigation. Twenty-language UI including Hindi, Bengali, Tamil, Vietnamese, Persian, Arabic, and Indonesian.

What are the known limitations?

Per-concept curriculum noisy below 1000 and above 2200 ELO. Some SAE concept labels too narrow vs the underlying feature. A few concepts fire so rarely you can't get a stable rating. Calculate mode's piece-stripping heuristic occasionally removes a piece that mattered. iOS Safari rough on mobile. Climbchess is a trainer, not a chess server.

Why Leela Chess Zero specifically?

Lc0 is the strongest open-source neural chess engine. Weights and architecture public. Style comes from self-play, not human bias. The patterns it cares about are the patterns that win games.

How many puzzles does Climbchess have?

9,950 SAE-tagged tactical puzzles, 80 curated endgame studies, 562 concept-tagged opening pattern drills.

Read the paper

The full methodology — SAE training corpus, sparsity schedule, dead-feature pruning, labelling pipeline, and the matchmaking math — is written up in an open-access preprint with DOI. The SAE pipeline and labelling code are open. The trainer code is currently closed.

Start training → Read the paper