AI Engineering

14 min read

Building Better Memory Systems for AI Agents

Why most multi-agent systems forget everything between conversations, and five architectural patterns that fix it.

February 25, 2026

I spent six months building a multi-agent platform where specialized AI agents collaborate on tasks for users. The agents were good at their jobs. The orchestration worked. The tool integrations were solid. But users kept hitting the same wall: "Why does my AI keep forgetting what I told it?"

The answer was embarrassingly simple. We had built a sophisticated agent system on top of a memory architecture that treated every conversation like a blank slate. The agents could reason, search the web, check calendars, and coordinate with each other. But they couldn't remember that the user hates cilantro, prefers window seats, or told a different agent about their sister's birthday last week.

Memory is the foundation that everything else sits on. Get it wrong and your agents are brilliant strangers. Get it right and they become trusted collaborators who compound knowledge over time.

The Problem: Stateless by Default

Most agent frameworks handle memory as an afterthought. LangChain gives you a conversation buffer. LangGraph lets you persist state between nodes. But "memory" in these frameworks usually means "the last N messages in this conversation." That's not memory. That's a chat log.

Real memory needs to solve three problems that chat logs don't:

Problem	Chat Log Answer	Real Memory Answer
Cross-conversation recall	Starts fresh every session	Facts persist across conversations and channels
Multi-agent knowledge sharing	Each agent has its own buffer	Agents build individual + shared knowledge
Entity relationships	"User said X"	"Devon said X, Sarah disagreed, Marcus suggested Y"

If you're building agents that interact with the same users over time, you need architecture that goes beyond conversation buffers. Here are five patterns that moved us from "brilliant strangers" to "trusted collaborators."

Pattern 1: Tiered Memory Architecture

Human memory isn't a single system. You have working memory (what you're thinking about right now), episodic memory (specific experiences), and semantic memory (general knowledge and facts). Agent memory should mirror this.

Near-Term: Conversation Context

The current conversation's message history, trimmed to fit the model's context window. This is what most frameworks give you out of the box. It handles the immediate back-and-forth but evaporates when the conversation ends.

Mid-Term: Session and Channel Memory

Facts and context scoped to a specific channel or interaction thread. Persists across messages within a session but doesn't bleed into unrelated conversations. Think of it as a meeting's shared notes: useful during the meeting, archived after.

Long-Term: Knowledge Graph

Extracted facts, relationships, and preferences stored in a graph database or dedicated memory service like Zep. Persists indefinitely. Queryable across conversations. This is where 'my user prefers window seats' lives.

Practical Tip

Start with near-term (you already have it) and long-term (add a knowledge graph service). Mid-term is the hardest to get right because scoping rules are context-dependent. You can usually skip it initially and add it when your users start hitting cases where channel-scoped context matters.

Pattern 2: Entity Disambiguation

This was our most impactful fix. When syncing conversation messages to a knowledge graph, we were labeling every human message as "User" and every AI message as "AI." In a one-on-one chat, that's fine. In a group conversation with three humans and two agents, it's useless.

The knowledge graph couldn't distinguish between "User prefers Thai food" (said by Devon) and "User prefers Italian food" (said by Sarah). Both got stored as facts about "User." Contradictory knowledge with no way to resolve the conflict.

The Fix

Tag every message with the participant's actual identity before syncing to the knowledge graph. We use a format like Display Name (@handle) which gives the graph enough information to build distinct entity nodes.

# Before: ambiguous

User: I prefer window seats

User: I always pick the aisle

# After: disambiguated

Devon (@devon): I prefer window seats

Sarah (@sarah): I always pick the aisle

This seems obvious in hindsight. But most knowledge graph services expect a "name" field per message, and most agent frameworks pass a generic role like "human" or "assistant." You have to deliberately bridge that gap in your sync pipeline.

Pattern 3: Trust Boundaries for Knowledge Acquisition

Not all conversations should contribute to long-term memory equally. If your agent has an open API or public-facing chat widget, anyone can talk to it. Should a random interaction reshape what the agent "knows" about the user's preferences?

We implemented trust boundaries: a classification of whether a conversation's participants are trusted enough to contribute to permanent knowledge. The agent can still participate in any conversation (using near-term context), but only trusted conversations flow into the long-term knowledge graph.

Trusted participants: Friends, team members, explicitly connected users. Their conversations build long-term knowledge.
Untrusted participants: Anonymous users, public interactions, one-off conversations. The agent responds using near-term context only.
The decision point: Check trust status at the memory sync boundary, not the conversation boundary. The agent still chats normally, it just doesn't store permanently.

The Analogy

A hotel concierge chats with every guest. But they only update the guest profile for loyalty members. The concierge is equally helpful to both. The difference is what goes into the permanent record.

Pattern 4: Multi-Participant Memory Sync

In a multi-agent system, group conversations are common. Three agents and two humans in a channel, working on a task together. The naive approach is to sync memory only for the agent that generated the response. After all, that's the agent whose turn it was.

The problem: agents that were "listening" but not responding don't build any memory of the conversation. Next time you ask Agent B about something that was discussed while Agent A was responding, Agent B has no knowledge of it.

Sync to All Participants, Not Just the Responder

After each conversation turn, dispatch memory sync to every agent that is a participant in the channel. Each agent builds its own knowledge graph entry from the same messages. The strategic agent extracts strategic insights. The content agent extracts editorial preferences. Same conversation, different knowledge.

# Dispatch pattern (pseudocode)

for participant in channel.participants:

if participant.is_agent and is_trusted(channel):

sync_memory(

agent=participant,

messages=conversation_messages,

names=resolve_participant_names(channel)

)

Watch the Cost

Multi-participant sync multiplies your knowledge graph API calls by the number of agent participants. If you're using a paid service like Zep Cloud, this can get expensive fast. Consider batching, deduplication, or only syncing to agents that are "active" in the channel versus agents that have been idle for a configurable period.

Pattern 5: Context Filtering at the Sync Boundary

Modern agents use tools: calendar lookups, web searches, API calls, file uploads. Each tool invocation generates messages in the conversation history. A single user question might produce five tool-call messages and five tool-response messages before the agent gives its final answer.

If you sync the entire conversation to the knowledge graph, half your stored knowledge is "Calling calendar API with parameters..." and "Search results for 'best Thai restaurants'..." That's not knowledge. That's plumbing.

Filter by Message Role

# Only sync meaningful content

syncable_messages = [

message for message in conversation

if message.role in ("human", "ai")

and message.role != "tool"

]

This is a single line of filtering that dramatically improves the signal-to-noise ratio of your knowledge graph. Human messages carry intent and preferences. AI messages carry synthesized responses. Tool messages carry intermediate artifacts that don't need to persist.

The result is measurable: with tool messages filtered, more of the knowledge graph's capacity goes to actual user preferences, decisions, and conversational context. The agent's recall becomes more relevant because the stored memories are more relevant.

Putting It Together: The Memory Pipeline

These five patterns form a pipeline that runs after every conversation turn. The order matters:

Filter tool messages from the conversation

Remove messages with a "tool" role. Keep only human and AI messages that carry meaningful content.

Resolve participant identities

Look up each participant's display name and handle. Build the formatted name string that the knowledge graph will use for entity nodes.

Check trust boundaries

Determine whether this conversation's participants qualify for long-term memory sync. If not, stop here. The conversation remains in near-term context only.

Dispatch to all agent participants

For every agent in the channel, trigger an asynchronous memory sync with the filtered messages and resolved names.

Sync to the knowledge graph

Each agent's memory service processes the messages through its own lens, extracting and storing facts relevant to its role and the user's relationship with it.

Make It Async

Memory sync should never block the response to the user. Dispatch it as a background task (Celery, a task queue, whatever your stack uses). The user gets their response immediately. The knowledge graph updates happen out of band.

Anti-Patterns to Avoid

Anti-Pattern	Why It Hurts	Better Approach
Syncing entire conversation history	Knowledge graph fills with tool artifacts and internal plumbing	Filter to human + AI messages only
Generic "User" / "AI" labels	Contradictory facts from different people stored as one entity	Resolve actual participant names before sync
Treating all conversations equally	Random interactions pollute the agent's permanent knowledge	Trust boundaries: only trusted conversations build long-term memory
Single-agent sync in group conversations	Non-responding agents have no memory of what was discussed	Dispatch to all agent participants
Synchronous memory writes	Adds latency to every response; users feel the delay	Async dispatch via background task queue

The Compound Effect

Good memory architecture creates a flywheel. Each conversation makes the next one better. After a week, the agent knows your basic preferences. After a month, it understands your patterns. After three months, it anticipates your needs because it has built a rich, attributed, trust-filtered knowledge base from hundreds of interactions.

Bad memory architecture creates the opposite: a flat experience where every conversation starts from the same baseline, or worse, a noisy knowledge graph full of contradictions and tool artifacts that actively degrades the agent's ability to be helpful.

The five patterns in this post aren't individually complex. Entity disambiguation is a name lookup. Trust boundaries are a boolean check. Context filtering is one line of list comprehension. But together, they transform the quality of what your agents remember and how they apply that knowledge.

Memory is the difference between an AI that answers questions and an AI that knows you. If you're building multi-agent systems and your users are re-explaining themselves across sessions, the problem isn't your agents. It's your memory architecture.

Build the pipeline. Tag the entities. Filter the noise. Trust the compound effect.

Also by Devon: A product version of this work, written for ASI:One, explains how your Personal AI now remembers who said what, learns only from people you trust, and keeps every conversation in context.

Your AI Just Got a Major Memory Upgrade →

Building memory into your agent stack?

I'm working on memory architecture for multi-agent systems and would love to compare notes. Reach out if you're solving similar problems.

Get in Touch Building AI Teams