Personalized RAG with Per-User Memory

Standard RAG retrieves the same documents for every user. But a Python developer and a JavaScript developer asking "How do I use the API?" should get different answers — tailored to their language, experience level, and preferences.

This tutorial combines a Weaviate knowledge base (shared product documentation) with Engram per-user memory (individual preferences and context) to build a personalized, multi-tenant RAG assistant.

You'll learn how to:

Search both a knowledge base and user memory in parallel
Build prompts that merge shared knowledge with personal context
Isolate memory between users with user_id scoping
Handle concurrent users with AsyncEngramClient
Manage user data for privacy compliance

Prerequisites

An Engram project with an API key (Quickstart)
A Weaviate Cloud instance
An Anthropic or OpenAI API key
Python packages: pip install weaviate-engram weaviate-client anthropic openai

Step 1: Set up both clients

Initialize the Engram client for user memory and the Weaviate client for your knowledge base.

engram = EngramClient(
    api_key=os.environ["ENGRAM_API_KEY"]
)

import weaviate

weaviate_client = weaviate.connect_to_weaviate_cloud(
    cluster_url=os.environ["WEAVIATE_URL"],
    auth_credentials=weaviate.auth.AuthApiKey(os.environ["WEAVIATE_API_KEY"]),
)

Step 2: Populate the knowledge base

Create a Weaviate collection with sample product documentation. This represents your shared knowledge base — the same docs are available to all users.

def populate_knowledge_base(weaviate_client):
    """Create a Weaviate collection and insert sample product documentation."""
    collection = weaviate_client.collections.create(
        name="ProductDocs",
        description="Product documentation for the Acme platform",
    )

    docs = [
        "Acme API supports REST and GraphQL endpoints for data access.",
        "The Acme dashboard provides real-time analytics and custom reports.",
        "Acme's Python SDK supports async operations with asyncio.",
        "Acme pricing: Free tier up to 1K requests/day, Pro at $49/month for 100K requests/day.",
        "Acme supports SSO with SAML and OIDC for enterprise customers.",
    ]

    with collection.batch.dynamic() as batch:
        for doc in docs:
            batch.add_object(properties={"content": doc})

    return collection

Step 3: Store user context in Engram

Store different preferences and facts for different users. Engram extracts discrete facts and scopes them to each user_id.

# Store preferences for Alice (Python developer)
run_a = engram.memories.add(
    "I'm a Python developer. I prefer concise code examples. I'm building a FastAPI microservice.",
    user_id=f"tutorial-rag-alice-{_test_suffix}",
    group="default",
)
status_a = engram.runs.wait(run_a.run_id)
print(f"Alice: {status_a.status}, {len(status_a.memories_created)} memories")

# Store preferences for Bob (JavaScript developer)
run_b = engram.memories.add(
    "I'm a JavaScript developer. I prefer detailed explanations with context. I'm building a React dashboard.",
    user_id=f"tutorial-rag-bob-{_test_suffix}",
    group="default",
)
status_b = engram.runs.wait(run_b.run_id)
print(f"Bob: {status_b.status}, {len(status_b.memories_created)} memories")

Alice and Bob now have separate memory stores. Alice's memories include facts like "Python developer" and "prefers concise examples", while Bob's include "JavaScript developer" and "prefers detailed explanations".

Step 4: Build the dual-search function

Create a function that searches both the Weaviate knowledge base and Engram user memory. The knowledge base provides the factual content; Engram provides the personalization context.

def dual_search(query, user_id, kb_results=None):
    """Search both a knowledge base and Engram user memory."""
    # Search Engram for user-specific memories
    user_memories = engram.memories.search(
        query=query,
        user_id=user_id,
        group="default",
        retrieval_config=HybridRetrieval(limit=5),
    )

    return {
        "knowledge_base": kb_results or [],
        "user_memories": user_memories,
    }

In production, you'd also search Weaviate here:

def dual_search(query, user_id, weaviate_collection):
    # Search knowledge base
    kb_results = weaviate_collection.query.hybrid(query=query, limit=5)
    kb_docs = [obj.properties["content"] for obj in kb_results.objects]

    # Search user memory
    user_memories = engram.memories.search(
        query=query, user_id=user_id, group="default",
        retrieval_config=HybridRetrieval(limit=5),
    )

    return {"knowledge_base": kb_docs, "user_memories": user_memories}

Step 5: Construct a personalized prompt

Merge the knowledge base results and user memories into a single prompt. The LLM uses the shared docs for accuracy and the user context for personalization.

def build_prompt_anthropic(query, kb_docs, user_memories, anthropic_client):
    """Build a personalized prompt combining KB docs and user memory."""
    kb_context = "\n".join(f"- {doc}" for doc in kb_docs)
    memory_context = "\n".join(f"- {m.content}" for m in user_memories)

    system_prompt = f"""You are a helpful product assistant.

Product documentation:
{kb_context}

What you know about this user:
{memory_context}

Tailor your response to the user's background and preferences."""

    response = anthropic_client.messages.create(
        model="claude-sonnet-4-5-20250929",
        max_tokens=1024,
        system=system_prompt,
        messages=[{"role": "user", "content": query}],
    )
    return response.content[0].text

With this prompt structure, the same question produces different responses:

Alice gets a concise Python example using the SDK's async features
Bob gets a detailed explanation with JavaScript/REST examples

Step 6: Demo with two users

Both users ask the same question, but each gets a personalized response based on their stored context.

query = "How do I access the API?"

# Alice's personalized search
alice_memories = engram.memories.search(
    query=query,
    user_id=f"tutorial-rag-alice-{_test_suffix}",
    group="default",
    retrieval_config=HybridRetrieval(limit=5),
)
print("Alice's context:")
for m in alice_memories:
    print(f"  - {m.content}")

# Bob's personalized search
bob_memories = engram.memories.search(
    query=query,
    user_id=f"tutorial-rag-bob-{_test_suffix}",
    group="default",
    retrieval_config=HybridRetrieval(limit=5),
)
print("\nBob's context:")
for m in bob_memories:
    print(f"  - {m.content}")

Alice's search returns memories about Python and FastAPI. Bob's returns memories about JavaScript and React. The LLM uses this context to tailor its answer.

Step 7: Add user isolation

Engram's user_id scoping ensures strict memory isolation. User A's memories are never returned when searching as User B, even if the query matches.

# Alice searches for Bob's topics — should get no relevant results
alice_cross_search = engram.memories.search(
    query="React dashboard JavaScript",
    user_id=f"tutorial-rag-alice-{_test_suffix}",
    group="default",
)
print(f"Alice searching for Bob's topics: {len(alice_cross_search)} results")

# Bob searches for Alice's topics — should get no relevant results
bob_cross_search = engram.memories.search(
    query="FastAPI Python microservice",
    user_id=f"tutorial-rag-bob-{_test_suffix}",
    group="default",
)
print(f"Bob searching for Alice's topics: {len(bob_cross_search)} results")

Alice searching for "React dashboard JavaScript" finds nothing — those memories belong to Bob. And vice versa. This isolation is enforced at the storage layer, not just filtered in application code.

info

User isolation is automatic when you use user_id with user-scoped topics. You don't need additional access control logic — Engram handles it.

Step 8: Scale with async

For production applications handling multiple concurrent users, switch to AsyncEngramClient. It uses the same API but supports async/await for non-blocking operations.

async_client = AsyncEngramClient(
    api_key=os.environ["ENGRAM_API_KEY"]
)

Use asyncio.gather() to handle multiple user requests concurrently:

async def search_for_user(async_engram, user_id, query):
    """Search memories for a single user."""
    results = await async_engram.memories.search(
        query=query,
        user_id=user_id,
        group="default",
        retrieval_config=HybridRetrieval(limit=5),
    )
    return user_id, results


async def handle_concurrent_users():
    """Handle multiple users concurrently."""
    tasks = [
        search_for_user(async_client, f"tutorial-rag-alice-{_test_suffix}", "programming preferences"),
        search_for_user(async_client, f"tutorial-rag-bob-{_test_suffix}", "programming preferences"),
    ]
    results = await asyncio.gather(*tasks)

    for uid, memories in results:
        print(f"\n{uid}:")
        for m in memories:
            print(f"  - {m.content}")

    return results


concurrent_results = asyncio.run(handle_concurrent_users())

Both searches run in parallel, reducing total latency compared to sequential calls.

Step 9: User data management

For privacy compliance (e.g. GDPR right to deletion), you can retrieve and delete individual memories or all memories for a user.

# Retrieve a specific memory by ID
alice_memories = engram.memories.search(
    query="Python developer",
    user_id=f"tutorial-rag-alice-{_test_suffix}",
    group="default",
)

if alice_memories:
    memory = engram.memories.get(
        alice_memories[0].id,
        user_id=f"tutorial-rag-alice-{_test_suffix}",
        group="default",
    )
    print(f"Retrieved: {memory.content}")

# Delete all of a user's memories (e.g. GDPR right-to-deletion)
for m in alice_memories:
    engram.memories.delete(m.id, user_id=f"tutorial-rag-alice-{_test_suffix}", group="default")
    print(f"Deleted: {m.id}")

# Verify deletion
remaining = engram.memories.search(
    query="Python developer",
    user_id=f"tutorial-rag-alice-{_test_suffix}",
    group="default",
)
print(f"Remaining memories for Alice: {len(remaining)}")

Use memories.search() to find all memories for a user, then memories.delete() each one. After deletion, subsequent searches return no results for that user.

Next steps

Memory Chat App — The foundational tutorial for integrating Engram with a chat app.
Context Window Management — Reduce token costs by replacing conversation history with memory.
Manage memories — API reference for get and delete operations.
Core concepts — Learn about topics, groups, scoping, and pipelines.

Questions and feedback

Have a question or feedback? Here's how to reach us.

Community Forum

Ask questions and connect with other developers on our Community forum.

Support

Weaviate Cloud user or customer? Find the right channel on the Support page.

Additional resources

Need help?

Personalized RAG with Per-User Memory

Prerequisites

Step 1: Set up both clients

Step 2: Populate the knowledge base

Step 3: Store user context in Engram

Step 4: Build the dual-search function

Step 5: Construct a personalized prompt

Step 6: Demo with two users

Step 7: Add user isolation

Step 8: Scale with async

Step 9: User data management

Next steps

Questions and feedback

Additional resources

Need help?

Prerequisites​

Step 1: Set up both clients​

Step 2: Populate the knowledge base​

Step 3: Store user context in Engram​

Step 4: Build the dual-search function​

Step 5: Construct a personalized prompt​

Step 6: Demo with two users​

Step 7: Add user isolation​

Step 8: Scale with async​

Step 9: User data management​

Next steps​

Questions and feedback​

Prerequisites

Step 1: Set up both clients

Step 2: Populate the knowledge base

Step 3: Store user context in Engram

Step 4: Build the dual-search function

Step 5: Construct a personalized prompt

Step 6: Demo with two users

Step 7: Add user isolation

Step 8: Scale with async

Step 9: User data management

Next steps

Questions and feedback