Books and papers I keep coming back to.

Books 07

01
Infinite Jest David Foster Wallace
02
Memories, Dreams, Reflections Carl Gustav Jung
03
The Verificationist Donald Antrim
04
The Poetics of Space Gaston Bachelard
05
London Fields Martin Amis
06
CivilWarLand in Bad Decline George Saunders
07
At the Mountains of Madness H.P. Lovecraft

Whitepapers 27

01
The Bitter Lesson Rich Sutton · 2019
General methods leveraging computation beat hand-crafted structure. Foundational to how I think about AI engineering.
02
Self-Discover: Large Language Models Self-Compose Reasoning Structures Zhou et al. (DeepMind) · 2024
LLMs compose reasoning modules at query time. Built a prototype implementing this.
03
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs Gandhi et al. (Stanford) · 2025
Four cognitive traits a model needs for self-improving reasoning: Verification, Backtracking, Subgoal Setting, Backward Chaining.
04
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning DeepSeek · 2025
Landmark paper demonstrating RL-based reasoning emergence in open-weight models.
05
s1: Simple Test-Time Scaling 2025
Minimalist approach to test-time compute scaling that matches or exceeds far more complex methods.
06
Chain of Draft: Thinking Faster by Writing Less 2025
LLMs reason effectively with much sparser intermediate steps — an efficiency counterpoint to chain-of-thought.
07
Learning to Reason in 13 Parameters (TinyLoRA) 2026
8B model to 91% GSM8K accuracy with only 13 trainable parameters via RL. arXiv:2602.04118.
08
Building Effective AI Agents Anthropic · 2025
The canonical practical guide to agent construction. Shaped how the field thinks about agent architectures and the workflow-vs-autonomy spectrum.
09
Why Do Multi-Agent LLM Systems Fail? (MAST) Cemri et al. · 2025
First failure taxonomy for multi-agent systems: 14 failure modes across 3 categories. A sobering reality check on benchmark gains.
10
Multi-agentic Software Development is a Distributed Systems Problem 2025
Reframes multi-agent coordination through distributed systems theory: consensus, partial failure, message passing.
11
Expensively Quadratic: the LLM Agent Cost Curve 2025
Rigorous analysis of why agent costs grow quadratically with complexity. Essential for production agent systems.
12
Agentic Software Engineering: Foundational Pillars and a Research Roadmap 2025
Comprehensive research agenda at the intersection of agents and software engineering.
13
AsymFlow: Asymmetric Flow Models Hansheng Chen et al. (Stanford/Princeton) · 2026
1.57 FID on ImageNet 256x256. Introduces a latent-to-pixel finetuning route.
14
Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity Zhang et al. · 2025
Identifies typicality bias as the root cause of mode collapse. Training-free, 1.6-2.1x diversity improvement.
15
Specifications: The missing link to making the development of LLM systems an engineering discipline 2025
Argues that formal specifications are the key to transforming LLM development from alchemy into engineering.
16
Spec-Driven Development with AI Personal synthesis · 2025
Synthesis of 45+ papers on how specifications, formal methods, and AI intersect to create reliable software engineering workflows.
17
Augmented Coding: Beyond the Vibes Kent Beck · 2025
A sober, thoughtful assessment of AI-assisted coding from the creator of Extreme Programming and TDD.
18
QualityFlow: An Agentic Workflow for Program Synthesis Controlled by LLM Quality Checks Hu et al. · 2025
SOTA on MBPP and HumanEval. Dynamic quality-check gating for code generation workflows.
19
Cognitive Load is What Matters 2024
The fundamental constraint in software is human working memory, not tool speed or language features.
20
Malleable Software in the Age of LLMs 2025
How LLMs change the nature of software itself — from fixed artifacts to fluid, recomposable systems.
21
TextGrad: Automatic Differentiation via Text 2024
Applies backpropagation-style optimization to text, treating language model outputs as differentiable.
22
MegaTrain: Full Precision Training of 100B+ Parameter LLMs on a Single GPU 2026
120B params on a single H200. 1.84x throughput over DeepSpeed ZeRO-3. arXiv:2604.05091.
23
From Local to Global: A GraphRAG Approach to Query-Focused Summarization Microsoft · 2024
The paper that launched the GraphRAG paradigm. Directly relevant to the graphrag-claude-code project.
24
The Illustrated Transformer Jay Alammar · 2018
The most widely referenced educational resource on transformer architecture. Visual clarity on the mechanism that started it all.
25
Magic Ink: Information Software and the Graphical Interface Bret Victor · 2006
A vision for software as a medium for understanding rather than a toolbox for tasks.
26
The Homogenizing Effect of Large Language Models on Human Expression and Thought Sourati et al. · 2025
Evidence across linguistics, psychology, and CS that LLMs risk standardizing language and reasoning.
27
The Space of Minds Andrej Karpathy · 2025
A philosophical exploration of what kinds of intelligence are possible, grounding AI capabilities in a broader framework.

Last updated May 22, 2026