I Tested a 3,300-Line Agent on 18 PC Tasks — It Shouldn't Beat Claude Code by 6×

A Fudan professor open-sourced an agent whose entire core is 3,300 lines of Python. I ran it against Claude Code and OpenClaw on the same 18 desktop tasks. It finished with 6× fewer tokens — and I cannot find the catch.

11 min read

2 hours ago

Press enter or click to view image in full size

Here is the number that made me re-run the whole benchmark twice, because I assumed I had a logging bug: across 18 real PC-automation tasks, all three agents driving the same backbone model, GenericAgent burned 0.43 million tokens. Claude Code burned 2.6 million on the identical task list. That is a 6.0× gap, and it came from a codebase of just 3,300 lines.

GenericAgent is a project from Jiaqing Liang, an assistant professor at Fudan University’s School of Data Science in Shanghai. The V1.0 dropped on GitHub on January 11, 2026, and the technical report — “GenericAgent: A Token-Efficient Self-Evolving LLM Agent via Contextual Information Density Maximization” — landed on arXiv on April 18, 2026 (paper ID 2604.17091). The repo is small by viral-agent standards: 697 stars, 187 commits, MIT licensed. I almost scrolled past it. The thing that stopped me was a single line in the README comparison table.