Skip to main content
Go to documentation:
⌘U
Weaviate Database

Develop AI applications using Weaviate's APIs and tools

Deploy

Deploy, configure, and maintain Weaviate Database

Query Agent

Run agentic search over your Weaviate Cloud collections

Weaviate Cloud

Manage and scale Weaviate in the cloud

Engram

Persistent memory for LLM agents and applications

Additional resources

Integrations
Contributor guide
Events & Workshops
Weaviate Academy

Need help?

Weaviate LogoAsk AI Assistant⌘K
Community Forum

Add Long-Term Memory to a Chat App

LLMs are stateless — every API call starts from scratch with no memory of past conversations. Engram adds persistent memory so your chatbot can remember user preferences, past discussions, and context across sessions.

In this tutorial, you'll build a chat app that:

  • Stores conversations as memories after each exchange
  • Retrieves relevant memories before generating a response
  • Personalizes responses based on what it remembers about the user

Prerequisites

  • An Engram project with an API key (Quickstart)
  • An Anthropic or OpenAI API key
  • Python packages: pip install weaviate-engram anthropic openai

Set your environment variables:

export ENGRAM_API_KEY="eng_..."
export ANTHROPIC_API_KEY="sk-ant-..." # or OPENAI_API_KEY

Step 1: Set up the Engram client

Initialize the Engram client with your API key. The user_id parameter scopes all memories to a specific user, so different users get isolated memory stores.

client = EngramClient(
api_key=os.environ["ENGRAM_API_KEY"]
)
user_id = f"tutorial-chat-{uuid.uuid4().hex[:8]}"

Step 2: Create a chat completion function

Create a helper function that sends messages to your LLM provider and returns the response. This function accepts a system_prompt parameter — you'll use it later to inject memory context.

def chat_anthropic(
user_message,
conversation_history,
anthropic_client,
system_prompt="You are a helpful assistant.",
):
conversation_history.append({"role": "user", "content": user_message})
response = anthropic_client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
system=system_prompt,
messages=conversation_history,
)
assistant_message = response.content[0].text
conversation_history.append({"role": "assistant", "content": assistant_message})
return assistant_message

Step 3: Store conversations as memories

After each conversation, send the messages to Engram. The pipeline extracts discrete facts (e.g. "lives in Berlin", "prefers specialty coffee") and stores them as individual memories.

conversation = [
{
"role": "user",
"content": "I just moved to Berlin and I'm looking for a good coffee shop.",
},
{
"role": "assistant",
"content": "Welcome to Berlin! Here are some popular coffee shops in the city...",
},
{"role": "user", "content": "I prefer specialty coffee, not chains."},
]

run = client.memories.add(
conversation,
user_id=user_id,
)

print(f"Run ID: {run.run_id}")
print(f"Status: {run.status}")

memories.add() accepts a list of message dicts with role and content keys — the same format used by Anthropic and OpenAI. The call returns immediately with a run_id — there's no need to wait for the pipeline to complete before continuing. Memories are eventually consistent and will be available for search once processing finishes.

This is fine in practice because memories are most useful across sessions or from much earlier in a conversation. The most recent messages are still in the LLM's context window, so you don't need to wait for them to be stored as memories before generating the next response.

Step 4: Retrieve relevant memories

Before generating a response, search Engram for memories relevant to the user's current message. Format the results into a system prompt that gives the LLM context about the user.

results = client.memories.search(
query="What kind of coffee does the user like?",
user_id=user_id,
group="default",
retrieval_config=HybridRetrieval(limit=5),
)

memory_context = "\n".join(f"- {m.content}" for m in results)

system_prompt = f"""You are a helpful assistant with memory of past conversations.

Here is what you remember about this user:
{memory_context}

Use these memories to personalize your responses."""

print(system_prompt)

Hybrid search combines semantic similarity with keyword matching, so it finds relevant memories even when the wording doesn't match exactly.

Step 5: Build the full chat loop

Here's the complete script that ties everything together. Each iteration of the loop:

  1. Searches Engram for memories relevant to the user's input
  2. Builds a system prompt with the retrieved context
  3. Sends the message to the LLM with memory-augmented context
  4. Stores the new exchange as a memory for future retrieval
def memory_chat_loop_anthropic():
"""Complete chat loop with Engram memory and Anthropic."""
import anthropic

engram = EngramClient(
api_key=os.environ["ENGRAM_API_KEY"],
)
anthropic_client = anthropic.Anthropic()
user_id = "user-123"
conversation_history = []

print("Chat with memory (type 'quit' to exit)\n")

while True:
user_input = input("You: ")
if user_input.lower() == "quit":
break

# Retrieve relevant memories
results = engram.memories.search(
query=user_input,
user_id=user_id,
group="default",
retrieval_config=HybridRetrieval(limit=5),
)
memory_context = "\n".join(f"- {m.content}" for m in results)
system_prompt = f"""You are a helpful assistant with memory of past conversations.

Here is what you remember about this user:
{memory_context}

Use these memories to personalize your responses."""

# Get response from Claude
conversation_history.append({"role": "user", "content": user_input})
response = anthropic_client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
system=system_prompt,
messages=conversation_history,
)
assistant_message = response.content[0].text
conversation_history.append({"role": "assistant", "content": assistant_message})
print(f"Assistant: {assistant_message}\n")

# Store the conversation turn as a memory (fire-and-forget)
engram.memories.add(
[conversation_history[-2], conversation_history[-1]],
user_id=user_id,
)

engram.close()


if __name__ == "__main__":
memory_chat_loop_anthropic()

Step 6: Test it

Run the script and have a multi-turn conversation. Then stop and restart it — the assistant remembers context from the previous session.

Session 1:

You: I just moved to Berlin and I'm looking for a good coffee shop.
Assistant: Welcome to Berlin! Here are some popular spots...

You: I prefer specialty coffee, not chains.
Assistant: Great taste! For specialty coffee in Berlin, check out...

You: quit

Session 2 (after restart):

You: Any new coffee recommendations?
Assistant: Since you mentioned preferring specialty coffee in Berlin,
I'd suggest checking out...

The assistant remembers the user's location and coffee preferences from the previous session because those facts were extracted and stored as memories in Engram.

Next steps

  • Context Window Management — Learn how memory search replaces full conversation history to reduce token costs.
  • Personalized RAG — Combine a Weaviate knowledge base with per-user memory for personalized responses.
  • Search memories — Explore vector, BM25, and hybrid retrieval options.

Questions and feedback

If you have any questions or feedback, let us know in the user forum.