Add Long-Term Memory to a Chat App

LLMs are stateless — every API call starts from scratch with no memory of past conversations. Engram adds persistent memory so your chatbot can remember user preferences, past discussions, and context across sessions.

In this tutorial, you'll build a chat app that:

Stores conversations as memories after each exchange
Retrieves relevant memories before generating a response
Personalizes responses based on what it remembers about the user

Prerequisites

An Engram project with an API key (Quickstart)
An Anthropic or OpenAI API key
Python packages: pip install weaviate-engram anthropic openai

Set your environment variables:

export ENGRAM_API_KEY="eng_..."
export ANTHROPIC_API_KEY="sk-ant-..."  # or OPENAI_API_KEY

Step 1: Set up the Engram client

Initialize the Engram client with your API key. The user_id parameter scopes all memories to a specific user, so different users get isolated memory stores.

client = EngramClient(
    api_key=os.environ["ENGRAM_API_KEY"]
)
user_id = f"tutorial-chat-{uuid.uuid4().hex[:8]}"

Step 2: Create a chat completion function

Create a helper function that sends messages to your LLM provider and returns the response. This function accepts a system_prompt parameter — you'll use it later to inject memory context.

def chat_anthropic(
    user_message,
    conversation_history,
    anthropic_client,
    system_prompt="You are a helpful assistant.",
):
    conversation_history.append({"role": "user", "content": user_message})
    response = anthropic_client.messages.create(
        model="claude-sonnet-4-5-20250929",
        max_tokens=1024,
        system=system_prompt,
        messages=conversation_history,
    )
    assistant_message = response.content[0].text
    conversation_history.append({"role": "assistant", "content": assistant_message})
    return assistant_message

Step 3: Store conversations as memories

After each conversation, send the messages to Engram. The pipeline extracts discrete facts (e.g. "lives in Berlin", "prefers specialty coffee") and stores them as individual memories.

conversation = [
    {
        "role": "user",
        "content": "I just moved to Berlin and I'm looking for a good coffee shop.",
    },
    {
        "role": "assistant",
        "content": "Welcome to Berlin! Here are some popular coffee shops in the city...",
    },
    {"role": "user", "content": "I prefer specialty coffee, not chains."},
]

run = client.memories.add(
    conversation,
    user_id=user_id,
)

print(f"Run ID: {run.run_id}")
print(f"Status: {run.status}")

memories.add() accepts a list of message dicts with role and content keys — the same format used by Anthropic and OpenAI. The call returns immediately with a run_id — there's no need to wait for the pipeline to complete before continuing. Memories are eventually consistent and will be available for search once processing finishes.

This is fine in practice because memories are most useful across sessions or from much earlier in a conversation. The most recent messages are still in the LLM's context window, so you don't need to wait for them to be stored as memories before generating the next response.

Step 4: Retrieve relevant memories

Before generating a response, search Engram for memories relevant to the user's current message. Format the results into a system prompt that gives the LLM context about the user.

results = client.memories.search(
    query="What kind of coffee does the user like?",
    user_id=user_id,
    group="default",
    retrieval_config=HybridRetrieval(limit=5),
)

memory_context = "\n".join(f"- {m.content}" for m in results)

system_prompt = f"""You are a helpful assistant with memory of past conversations.

Here is what you remember about this user:
{memory_context}

Use these memories to personalize your responses."""

print(system_prompt)

Hybrid search combines semantic similarity with keyword matching, so it finds relevant memories even when the wording doesn't match exactly.

Step 5: Build the full chat loop

Here's the complete script that ties everything together. Each iteration of the loop:

Searches Engram for memories relevant to the user's input
Builds a system prompt with the retrieved context
Sends the message to the LLM with memory-augmented context
Stores the new exchange as a memory for future retrieval

def memory_chat_loop_anthropic():
    """Complete chat loop with Engram memory and Anthropic."""
    import anthropic

    engram = EngramClient(
        api_key=os.environ["ENGRAM_API_KEY"],
    )
    anthropic_client = anthropic.Anthropic()
    user_id = "user-123"
    conversation_history = []

    print("Chat with memory (type 'quit' to exit)\n")

    while True:
        user_input = input("You: ")
        if user_input.lower() == "quit":
            break

        # Retrieve relevant memories
        results = engram.memories.search(
            query=user_input,
            user_id=user_id,
            group="default",
            retrieval_config=HybridRetrieval(limit=5),
        )
        memory_context = "\n".join(f"- {m.content}" for m in results)
        system_prompt = f"""You are a helpful assistant with memory of past conversations.

Here is what you remember about this user:
{memory_context}

Use these memories to personalize your responses."""

        # Get response from Claude
        conversation_history.append({"role": "user", "content": user_input})
        response = anthropic_client.messages.create(
            model="claude-sonnet-4-5-20250929",
            max_tokens=1024,
            system=system_prompt,
            messages=conversation_history,
        )
        assistant_message = response.content[0].text
        conversation_history.append({"role": "assistant", "content": assistant_message})
        print(f"Assistant: {assistant_message}\n")

        # Store the conversation turn as a memory (fire-and-forget)
        engram.memories.add(
            [conversation_history[-2], conversation_history[-1]],
            user_id=user_id,
        )

    engram.close()


if __name__ == "__main__":
    memory_chat_loop_anthropic()

Step 6: Test it

Run the script and have a multi-turn conversation. Then stop and restart it — the assistant remembers context from the previous session.

Session 1:

You: I just moved to Berlin and I'm looking for a good coffee shop.
Assistant: Welcome to Berlin! Here are some popular spots...

You: I prefer specialty coffee, not chains.
Assistant: Great taste! For specialty coffee in Berlin, check out...

You: quit

Session 2 (after restart):

You: Any new coffee recommendations?
Assistant: Since you mentioned preferring specialty coffee in Berlin,
I'd suggest checking out...

The assistant remembers the user's location and coffee preferences from the previous session because those facts were extracted and stored as memories in Engram.

Next steps

Context Window Management — Learn how memory search replaces full conversation history to reduce token costs.
Personalized RAG — Combine a Weaviate knowledge base with per-user memory for personalized responses.
Search memories — Explore vector, BM25, and hybrid retrieval options.

Questions and feedback

Have a question or feedback? Here's how to reach us.

Community Forum

Ask questions and connect with other developers on our Community forum.

Support

Weaviate Cloud user or customer? Find the right channel on the Support page.

Additional resources

Need help?

Add Long-Term Memory to a Chat App

Prerequisites

Step 1: Set up the Engram client

Step 2: Create a chat completion function

Step 3: Store conversations as memories

Step 4: Retrieve relevant memories

Step 5: Build the full chat loop

Step 6: Test it

Next steps

Questions and feedback

Additional resources

Need help?

Prerequisites​

Step 1: Set up the Engram client​

Step 2: Create a chat completion function​

Step 3: Store conversations as memories​

Step 4: Retrieve relevant memories​

Step 5: Build the full chat loop​

Step 6: Test it​

Next steps​

Questions and feedback​

Prerequisites

Step 1: Set up the Engram client

Step 2: Create a chat completion function

Step 3: Store conversations as memories

Step 4: Retrieve relevant memories

Step 5: Build the full chat loop

Step 6: Test it

Next steps

Questions and feedback