Books and papers I keep coming back to.
- 01 Infinite Jest David Foster Wallace
- 02 Memories, Dreams, Reflections Carl Gustav Jung
- 03 The Verificationist Donald Antrim
- 04 The Poetics of Space Gaston Bachelard
- 05 London Fields Martin Amis
- 06 CivilWarLand in Bad Decline George Saunders
- 07 At the Mountains of Madness H.P. Lovecraft
- 01 The Bitter Lesson Rich Sutton · 2019
General methods leveraging computation beat hand-crafted structure. Foundational to how I think about AI engineering.
- 02 Self-Discover: Large Language Models Self-Compose Reasoning Structures Zhou et al. (DeepMind) · 2024
LLMs compose reasoning modules at query time. Built a prototype implementing this.
- 03 Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs Gandhi et al. (Stanford) · 2025
Four cognitive traits a model needs for self-improving reasoning: Verification, Backtracking, Subgoal Setting, Backward Chaining.
- 04 DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning DeepSeek · 2025
Landmark paper demonstrating RL-based reasoning emergence in open-weight models.
- 05 s1: Simple Test-Time Scaling 2025
Minimalist approach to test-time compute scaling that matches or exceeds far more complex methods.
- 06 Chain of Draft: Thinking Faster by Writing Less 2025
LLMs reason effectively with much sparser intermediate steps — an efficiency counterpoint to chain-of-thought.
- 07 Learning to Reason in 13 Parameters (TinyLoRA) 2026
8B model to 91% GSM8K accuracy with only 13 trainable parameters via RL. arXiv:2602.04118.
- 08 Building Effective AI Agents Anthropic · 2025
The canonical practical guide to agent construction. Shaped how the field thinks about agent architectures and the workflow-vs-autonomy spectrum.
- 09 Why Do Multi-Agent LLM Systems Fail? (MAST) Cemri et al. · 2025
First failure taxonomy for multi-agent systems: 14 failure modes across 3 categories. A sobering reality check on benchmark gains.
- 10 Multi-agentic Software Development is a Distributed Systems Problem 2025
Reframes multi-agent coordination through distributed systems theory: consensus, partial failure, message passing.
- 11 Expensively Quadratic: the LLM Agent Cost Curve 2025
Rigorous analysis of why agent costs grow quadratically with complexity. Essential for production agent systems.
- 12 Agentic Software Engineering: Foundational Pillars and a Research Roadmap 2025
Comprehensive research agenda at the intersection of agents and software engineering.
- 13 AsymFlow: Asymmetric Flow Models Hansheng Chen et al. (Stanford/Princeton) · 2026
1.57 FID on ImageNet 256x256. Introduces a latent-to-pixel finetuning route.
- 14 Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity Zhang et al. · 2025
Identifies typicality bias as the root cause of mode collapse. Training-free, 1.6-2.1x diversity improvement.
- 15 Specifications: The missing link to making the development of LLM systems an engineering discipline 2025
Argues that formal specifications are the key to transforming LLM development from alchemy into engineering.
- 16 Spec-Driven Development with AI Personal synthesis · 2025
Synthesis of 45+ papers on how specifications, formal methods, and AI intersect to create reliable software engineering workflows.
- 17 Augmented Coding: Beyond the Vibes Kent Beck · 2025
A sober, thoughtful assessment of AI-assisted coding from the creator of Extreme Programming and TDD.
- 18 QualityFlow: An Agentic Workflow for Program Synthesis Controlled by LLM Quality Checks Hu et al. · 2025
SOTA on MBPP and HumanEval. Dynamic quality-check gating for code generation workflows.
- 19 Cognitive Load is What Matters 2024
The fundamental constraint in software is human working memory, not tool speed or language features.
- 20 Malleable Software in the Age of LLMs 2025
How LLMs change the nature of software itself — from fixed artifacts to fluid, recomposable systems.
- 21 TextGrad: Automatic Differentiation via Text 2024
Applies backpropagation-style optimization to text, treating language model outputs as differentiable.
- 22 MegaTrain: Full Precision Training of 100B+ Parameter LLMs on a Single GPU 2026
120B params on a single H200. 1.84x throughput over DeepSpeed ZeRO-3. arXiv:2604.05091.
- 23 From Local to Global: A GraphRAG Approach to Query-Focused Summarization Microsoft · 2024
The paper that launched the GraphRAG paradigm. Directly relevant to the graphrag-claude-code project.
- 24 The Illustrated Transformer Jay Alammar · 2018
The most widely referenced educational resource on transformer architecture. Visual clarity on the mechanism that started it all.
- 25 Magic Ink: Information Software and the Graphical Interface Bret Victor · 2006
A vision for software as a medium for understanding rather than a toolbox for tasks.
- 26 The Homogenizing Effect of Large Language Models on Human Expression and Thought Sourati et al. · 2025
Evidence across linguistics, psychology, and CS that LLMs risk standardizing language and reasoning.
- 27 The Space of Minds Andrej Karpathy · 2025
A philosophical exploration of what kinds of intelligence are possible, grounding AI capabilities in a broader framework.
Last updated May 22, 2026