Agent Memory with Vector Stores: HNSW, Forgetting, and Budgets

12 min read

Just now

At 1M stored memories, exact cosine search takes 1,000ms per query. HNSW approximate search takes 1.28ms — a 783× speedup — with 95% recall. That 5% recall loss rarely matters in practice: the forgotten memory is typically the 5th-most-similar match, not the most important one. But memory management does not stop at search latency. Which memories you keep, how you score and rank them, and how many you inject into context — these decisions determine whether your agent actually uses its memory or wastes its context budget on noise.

Press enter or click to view image in full size

Disclaimer: The opinions expressed in this article are my own and do not represent the views of Google. This content is based solely on publicly available information.

Search Latency vs Memory Size

Before choosing a search strategy, it helps to see exactly how the two approaches scale as memory grows from hundreds of entries to millions.

Exact search is O(N) per query: every new memory makes queries proportionally slower. HNSW (Malkov & Yashunin, 2016) is O(log N) per query in the layered graph and constant in the search-time ef budget, so latency…