Home

Cloudflare can remember it for you wholesale

Not only is hardware memory scarce these days, but context memory, the conversational data exchanged with AI models, can be an issue too.

Cloudflare's answer to this particular problem is Agent Memory, a managed service for siphoning AI conversations when space is scarce, then injecting the data back on demand.

"It gives AI agents persistent memory, allowing them to recall what matters, forget what doesn't, and get smarter over time," said Tyson Trautmann, senior director of engineering, and Rob Sutter, engineering manager, in a blog post.

AI models can accept a limited amount of input, referred to as context. Measured in tokens, the amount varies by model.

Anthropic's Claude Opus 4.7, for example, has a 1M token context window, which can accommodate ~555,000 words or ~2.5 million Unicode characters. Claude Sonnet 4.6 also has a 1M context window, but it holds ~750,000 words or ~3.4 million Unicode characters because it relies on a different tokenizer. 

Google's Gemma 4 family of models has context windows of 128,000 for the smaller models and 256,000 for the larger ones.

That may seem like an ample amount of space for model prompts, but there's a lot of extra text that accompanies every prompt – the system prompt, system tools, custom agents, memory files, skills, messages, and the auto-compact buffer. So the actual context space available might be 10 to 20 percent less.

Storing prompts and responses as "memories" makes the most of available space by providing a place to offload useful chat details that may not be needed for every conversational turn (prompt).

At the same time, more context isn't always better – there may be times when AI models provide better results when given less context. So memory is potentially useful for pulling data out of a conversation as a quality enhancement as well as a storage management option.

There are already various software projects and integrated memory tools available to help remember AI conversations. Cloudflare is proposing that AI memory should be a managed service.

"Agents running for weeks or months against real codebases and production systems need memory that stays useful as it grows — not just memory that performs well on a clean benchmark dataset that may fit entirely into a newer model's context window," wrote Trautmann and Sutter, arguing that this can be done quickly at a reasonable per-query cost, in a way that doesn't block the conversation.

Basically, they're talking about an asynchronous CRUD operation. For example, after storing a memory about the user's preferred package manager (e.g. pnpm), that memory could be recalled via the following commands:

const results = await profile.recall("What package manager does the user prefer?");

console.log(results.result); // "The user prefers pnpm over npm."

Agent Memory can be accessed via a binding to a Cloudflare Worker, but also via REST API for those outside the Cloudflare Worker ecosystem. It's currently in private beta.

And in case anyone is possessive about their AI chat logs, Trautmann and Sutter offer reassurance that the memory data belongs to the customer.

"Agent Memory is a managed service, but your data is yours," they wrote. "Every memory is exportable, and we're committed to making sure the knowledge your agents accumulate on Cloudflare can leave with you if your needs change."

That's a touching thought, though some work might be required to take your text dump of conversations and make memories functional on another platform. ®

Source: The register

Previous

Next