Building Better Memory Systems for AI Agents
Why most multi-agent systems forget everything between conversations, and five architectural patterns that fix it.
I spent six months building a multi-agent platform where specialized AI agents collaborate on tasks for users. The agents were good at their jobs. The orchestration worked. The tool integrations were solid. But users kept hitting the same wall: "Why does my AI keep forgetting what I told it?"
The answer was embarrassingly simple. We had built a sophisticated agent system on top of a memory architecture that treated every conversation like a blank slate. The agents could reason, search the web, check calendars, and coordinate with each other. But they couldn't remember that the user hates cilantro, prefers window seats, or told a different agent about their sister's birthday last week.
Memory is the foundation that everything else sits on. Get it wrong and your agents are brilliant strangers. Get it right and they become trusted collaborators who compound knowledge over time.
The Problem: Stateless by Default
Most agent frameworks handle memory as an afterthought. LangChain gives you a conversation buffer. LangGraph lets you persist state between nodes. But "memory" in these frameworks usually means "the last N messages in this conversation." That's not memory. That's a chat log.
Real memory needs to solve three problems that chat logs don't:
| Problem | Chat Log Answer | Real Memory Answer |
|---|---|---|
| Cross-conversation recall | Starts fresh every session | Facts persist across conversations and channels |
| Multi-agent knowledge sharing | Each agent has its own buffer | Agents build individual + shared knowledge |
| Entity relationships | "User said X" | "Devon said X, Sarah disagreed, Marcus suggested Y" |
If you're building agents that interact with the same users over time, you need architecture that goes beyond conversation buffers. Here are five patterns that moved us from "brilliant strangers" to "trusted collaborators."
Pattern 1: Tiered Memory Architecture
Human memory isn't a single system. You have working memory (what you're thinking about right now), episodic memory (specific experiences), and semantic memory (general knowledge and facts). Agent memory should mirror this.
Near-Term: Conversation Context
The current conversation's message history, trimmed to fit the model's context window. This is what most frameworks give you out of the box. It handles the immediate back-and-forth but evaporates when the conversation ends.
Mid-Term: Session and Channel Memory
Facts and context scoped to a specific channel or interaction thread. Persists across messages within a session but doesn't bleed into unrelated conversations. Think of it as a meeting's shared notes: useful during the meeting, archived after.
Long-Term: Knowledge Graph
Extracted facts, relationships, and preferences stored in a graph database or dedicated memory service like Zep. Persists indefinitely. Queryable across conversations. This is where 'my user prefers window seats' lives.
Practical Tip
Pattern 2: Entity Disambiguation
This was our most impactful fix. When syncing conversation messages to a knowledge graph, we were labeling every human message as "User" and every AI message as "AI." In a one-on-one chat, that's fine. In a group conversation with three humans and two agents, it's useless.
The knowledge graph couldn't distinguish between "User prefers Thai food" (said by Devon) and "User prefers Italian food" (said by Sarah). Both got stored as facts about "User." Contradictory knowledge with no way to resolve the conflict.
The Fix
Tag every message with the participant's actual identity before syncing to the knowledge graph. We use a format like Display Name (@handle) which gives the graph enough information to build distinct entity nodes.
This seems obvious in hindsight. But most knowledge graph services expect a "name" field per message, and most agent frameworks pass a generic role like "human" or "assistant." You have to deliberately bridge that gap in your sync pipeline.
Pattern 3: Trust Boundaries for Knowledge Acquisition
Not all conversations should contribute to long-term memory equally. If your agent has an open API or public-facing chat widget, anyone can talk to it. Should a random interaction reshape what the agent "knows" about the user's preferences?
We implemented trust boundaries: a classification of whether a conversation's participants are trusted enough to contribute to permanent knowledge. The agent can still participate in any conversation (using near-term context), but only trusted conversations flow into the long-term knowledge graph.
- Trusted participants: Friends, team members, explicitly connected users. Their conversations build long-term knowledge.
- Untrusted participants: Anonymous users, public interactions, one-off conversations. The agent responds using near-term context only.
- The decision point: Check trust status at the memory sync boundary, not the conversation boundary. The agent still chats normally, it just doesn't store permanently.
The Analogy
Pattern 4: Multi-Participant Memory Sync
In a multi-agent system, group conversations are common. Three agents and two humans in a channel, working on a task together. The naive approach is to sync memory only for the agent that generated the response. After all, that's the agent whose turn it was.
The problem: agents that were "listening" but not responding don't build any memory of the conversation. Next time you ask Agent B about something that was discussed while Agent A was responding, Agent B has no knowledge of it.
Sync to All Participants, Not Just the Responder
After each conversation turn, dispatch memory sync to every agent that is a participant in the channel. Each agent builds its own knowledge graph entry from the same messages. The strategic agent extracts strategic insights. The content agent extracts editorial preferences. Same conversation, different knowledge.
Watch the Cost
Pattern 5: Context Filtering at the Sync Boundary
Modern agents use tools: calendar lookups, web searches, API calls, file uploads. Each tool invocation generates messages in the conversation history. A single user question might produce five tool-call messages and five tool-response messages before the agent gives its final answer.
If you sync the entire conversation to the knowledge graph, half your stored knowledge is "Calling calendar API with parameters..." and "Search results for 'best Thai restaurants'..." That's not knowledge. That's plumbing.
Filter by Message Role
This is a single line of filtering that dramatically improves the signal-to-noise ratio of your knowledge graph. Human messages carry intent and preferences. AI messages carry synthesized responses. Tool messages carry intermediate artifacts that don't need to persist.
The result is measurable: with tool messages filtered, more of the knowledge graph's capacity goes to actual user preferences, decisions, and conversational context. The agent's recall becomes more relevant because the stored memories are more relevant.
Putting It Together: The Memory Pipeline
These five patterns form a pipeline that runs after every conversation turn. The order matters:
Filter tool messages from the conversation
Resolve participant identities
Check trust boundaries
Dispatch to all agent participants
Sync to the knowledge graph
Make It Async
Anti-Patterns to Avoid
| Anti-Pattern | Why It Hurts | Better Approach |
|---|---|---|
| Syncing entire conversation history | Knowledge graph fills with tool artifacts and internal plumbing | Filter to human + AI messages only |
| Generic "User" / "AI" labels | Contradictory facts from different people stored as one entity | Resolve actual participant names before sync |
| Treating all conversations equally | Random interactions pollute the agent's permanent knowledge | Trust boundaries: only trusted conversations build long-term memory |
| Single-agent sync in group conversations | Non-responding agents have no memory of what was discussed | Dispatch to all agent participants |
| Synchronous memory writes | Adds latency to every response; users feel the delay | Async dispatch via background task queue |
The Compound Effect
Good memory architecture creates a flywheel. Each conversation makes the next one better. After a week, the agent knows your basic preferences. After a month, it understands your patterns. After three months, it anticipates your needs because it has built a rich, attributed, trust-filtered knowledge base from hundreds of interactions.
Bad memory architecture creates the opposite: a flat experience where every conversation starts from the same baseline, or worse, a noisy knowledge graph full of contradictions and tool artifacts that actively degrades the agent's ability to be helpful.
The five patterns in this post aren't individually complex. Entity disambiguation is a name lookup. Trust boundaries are a boolean check. Context filtering is one line of list comprehension. But together, they transform the quality of what your agents remember and how they apply that knowledge.
Memory is the difference between an AI that answers questions and an AI that knows you. If you're building multi-agent systems and your users are re-explaining themselves across sessions, the problem isn't your agents. It's your memory architecture.
Build the pipeline. Tag the entities. Filter the noise. Trust the compound effect.
Also by Devon: A product version of this work, written for ASI:One, explains how your Personal AI now remembers who said what, learns only from people you trust, and keeps every conversation in context.
Building memory into your agent stack?
I'm working on memory architecture for multi-agent systems and would love to compare notes. Reach out if you're solving similar problems.