Skip to main content
Go to documentation:
⌘U
Weaviate Database

Develop AI applications using Weaviate's APIs and tools

Deploy

Deploy, configure, and maintain Weaviate Database

Query Agent

Run agentic search over your Weaviate Cloud collections

Weaviate Cloud

Manage and scale Weaviate in the cloud

Engram

Persistent memory for LLM agents and applications

Additional resources

Integrations
Contributor guide
Events & Workshops
Weaviate Academy

Need help?

Weaviate LogoAsk AI Assistant⌘K
Community Forum

Personalized RAG with Per-User Memory

Standard RAG retrieves the same documents for every user. But a Python developer and a JavaScript developer asking "How do I use the API?" should get different answers — tailored to their language, experience level, and preferences.

This tutorial combines a Weaviate knowledge base (shared product documentation) with Engram per-user memory (individual preferences and context) to build a personalized, multi-tenant RAG assistant.

You'll learn how to:

  • Search both a knowledge base and user memory in parallel
  • Build prompts that merge shared knowledge with personal context
  • Isolate memory between users with user_id scoping
  • Handle concurrent users with AsyncEngramClient
  • Manage user data for privacy compliance

Prerequisites

Step 1: Set up both clients

Initialize the Engram client for user memory and the Weaviate client for your knowledge base.

engram = EngramClient(
api_key=os.environ["ENGRAM_API_KEY"]
)
import weaviate

weaviate_client = weaviate.connect_to_weaviate_cloud(
cluster_url=os.environ["WEAVIATE_URL"],
auth_credentials=weaviate.auth.AuthApiKey(os.environ["WEAVIATE_API_KEY"]),
)

Step 2: Populate the knowledge base

Create a Weaviate collection with sample product documentation. This represents your shared knowledge base — the same docs are available to all users.

def populate_knowledge_base(weaviate_client):
"""Create a Weaviate collection and insert sample product documentation."""
collection = weaviate_client.collections.create(
name="ProductDocs",
description="Product documentation for the Acme platform",
)

docs = [
"Acme API supports REST and GraphQL endpoints for data access.",
"The Acme dashboard provides real-time analytics and custom reports.",
"Acme's Python SDK supports async operations with asyncio.",
"Acme pricing: Free tier up to 1K requests/day, Pro at $49/month for 100K requests/day.",
"Acme supports SSO with SAML and OIDC for enterprise customers.",
]

with collection.batch.dynamic() as batch:
for doc in docs:
batch.add_object(properties={"content": doc})

return collection

Step 3: Store user context in Engram

Store different preferences and facts for different users. Engram extracts discrete facts and scopes them to each user_id.

# Store preferences for Alice (Python developer)
run_a = engram.memories.add(
"I'm a Python developer. I prefer concise code examples. I'm building a FastAPI microservice.",
user_id=f"tutorial-rag-alice-{_test_suffix}",
group="default",
)
status_a = engram.runs.wait(run_a.run_id)
print(f"Alice: {status_a.status}, {len(status_a.memories_created)} memories")

# Store preferences for Bob (JavaScript developer)
run_b = engram.memories.add(
"I'm a JavaScript developer. I prefer detailed explanations with context. I'm building a React dashboard.",
user_id=f"tutorial-rag-bob-{_test_suffix}",
group="default",
)
status_b = engram.runs.wait(run_b.run_id)
print(f"Bob: {status_b.status}, {len(status_b.memories_created)} memories")

Alice and Bob now have separate memory stores. Alice's memories include facts like "Python developer" and "prefers concise examples", while Bob's include "JavaScript developer" and "prefers detailed explanations".

Step 4: Build the dual-search function

Create a function that searches both the Weaviate knowledge base and Engram user memory. The knowledge base provides the factual content; Engram provides the personalization context.

def dual_search(query, user_id, kb_results=None):
"""Search both a knowledge base and Engram user memory."""
# Search Engram for user-specific memories
user_memories = engram.memories.search(
query=query,
user_id=user_id,
group="default",
retrieval_config=HybridRetrieval(limit=5),
)

return {
"knowledge_base": kb_results or [],
"user_memories": user_memories,
}

In production, you'd also search Weaviate here:

def dual_search(query, user_id, weaviate_collection):
# Search knowledge base
kb_results = weaviate_collection.query.hybrid(query=query, limit=5)
kb_docs = [obj.properties["content"] for obj in kb_results.objects]

# Search user memory
user_memories = engram.memories.search(
query=query, user_id=user_id, group="default",
retrieval_config=HybridRetrieval(limit=5),
)

return {"knowledge_base": kb_docs, "user_memories": user_memories}

Step 5: Construct a personalized prompt

Merge the knowledge base results and user memories into a single prompt. The LLM uses the shared docs for accuracy and the user context for personalization.

def build_prompt_anthropic(query, kb_docs, user_memories, anthropic_client):
"""Build a personalized prompt combining KB docs and user memory."""
kb_context = "\n".join(f"- {doc}" for doc in kb_docs)
memory_context = "\n".join(f"- {m.content}" for m in user_memories)

system_prompt = f"""You are a helpful product assistant.

Product documentation:
{kb_context}

What you know about this user:
{memory_context}

Tailor your response to the user's background and preferences."""

response = anthropic_client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=1024,
system=system_prompt,
messages=[{"role": "user", "content": query}],
)
return response.content[0].text

With this prompt structure, the same question produces different responses:

  • Alice gets a concise Python example using the SDK's async features
  • Bob gets a detailed explanation with JavaScript/REST examples

Step 6: Demo with two users

Both users ask the same question, but each gets a personalized response based on their stored context.

query = "How do I access the API?"

# Alice's personalized search
alice_memories = engram.memories.search(
query=query,
user_id=f"tutorial-rag-alice-{_test_suffix}",
group="default",
retrieval_config=HybridRetrieval(limit=5),
)
print("Alice's context:")
for m in alice_memories:
print(f" - {m.content}")

# Bob's personalized search
bob_memories = engram.memories.search(
query=query,
user_id=f"tutorial-rag-bob-{_test_suffix}",
group="default",
retrieval_config=HybridRetrieval(limit=5),
)
print("\nBob's context:")
for m in bob_memories:
print(f" - {m.content}")

Alice's search returns memories about Python and FastAPI. Bob's returns memories about JavaScript and React. The LLM uses this context to tailor its answer.

Step 7: Add user isolation

Engram's user_id scoping ensures strict memory isolation. User A's memories are never returned when searching as User B, even if the query matches.

# Alice searches for Bob's topics — should get no relevant results
alice_cross_search = engram.memories.search(
query="React dashboard JavaScript",
user_id=f"tutorial-rag-alice-{_test_suffix}",
group="default",
)
print(f"Alice searching for Bob's topics: {len(alice_cross_search)} results")

# Bob searches for Alice's topics — should get no relevant results
bob_cross_search = engram.memories.search(
query="FastAPI Python microservice",
user_id=f"tutorial-rag-bob-{_test_suffix}",
group="default",
)
print(f"Bob searching for Alice's topics: {len(bob_cross_search)} results")

Alice searching for "React dashboard JavaScript" finds nothing — those memories belong to Bob. And vice versa. This isolation is enforced at the storage layer, not just filtered in application code.

info

User isolation is automatic when you use user_id with user-scoped topics. You don't need additional access control logic — Engram handles it.

Step 8: Scale with async

For production applications handling multiple concurrent users, switch to AsyncEngramClient. It uses the same API but supports async/await for non-blocking operations.

async_client = AsyncEngramClient(
api_key=os.environ["ENGRAM_API_KEY"]
)

Use asyncio.gather() to handle multiple user requests concurrently:

async def search_for_user(async_engram, user_id, query):
"""Search memories for a single user."""
results = await async_engram.memories.search(
query=query,
user_id=user_id,
group="default",
retrieval_config=HybridRetrieval(limit=5),
)
return user_id, results


async def handle_concurrent_users():
"""Handle multiple users concurrently."""
tasks = [
search_for_user(async_client, f"tutorial-rag-alice-{_test_suffix}", "programming preferences"),
search_for_user(async_client, f"tutorial-rag-bob-{_test_suffix}", "programming preferences"),
]
results = await asyncio.gather(*tasks)

for uid, memories in results:
print(f"\n{uid}:")
for m in memories:
print(f" - {m.content}")

return results


concurrent_results = asyncio.run(handle_concurrent_users())

Both searches run in parallel, reducing total latency compared to sequential calls.

Step 9: User data management

For privacy compliance (e.g. GDPR right to deletion), you can retrieve and delete individual memories or all memories for a user.

# Retrieve a specific memory by ID
alice_memories = engram.memories.search(
query="Python developer",
user_id=f"tutorial-rag-alice-{_test_suffix}",
group="default",
)

if alice_memories:
memory = engram.memories.get(
alice_memories[0].id,
user_id=f"tutorial-rag-alice-{_test_suffix}",
group="default",
)
print(f"Retrieved: {memory.content}")

# Delete all of a user's memories (e.g. GDPR right-to-deletion)
for m in alice_memories:
engram.memories.delete(m.id, user_id=f"tutorial-rag-alice-{_test_suffix}", group="default")
print(f"Deleted: {m.id}")

# Verify deletion
remaining = engram.memories.search(
query="Python developer",
user_id=f"tutorial-rag-alice-{_test_suffix}",
group="default",
)
print(f"Remaining memories for Alice: {len(remaining)}")

Use memories.search() to find all memories for a user, then memories.delete() each one. After deletion, subsequent searches return no results for that user.

Next steps

Questions and feedback

If you have any questions or feedback, let us know in the user forum.