Switching vectorizers in Weaviate
This tutorial demonstrates two methods for migrating a Weaviate collection to a new vectorizer (embedding model) with minimal disruption to ongoing services. These techniques are helpful for scenarios such as model upgrades, provider changes, or performance optimization.
Prerequisites
Before starting this tutorial, ensure you have:
- A Weaviate Cloud instance (version
v1.32
or newer) - Python 3.8+ installed
- Required Python packages installed:
pip install weaviate-client datasets
- Environment variables set for your Weaviate Cloud credentials:
export WEAVIATE_URL="your-weaviate-cloud-url"
export WEAVIATE_API_KEY="your-api-key" - Basic familiarity with Weaviate collections and vector search
Sign up for a free Weaviate Cloud sandbox at console.weaviate.cloud
Introduction
In a production environment, you might need to change your embedding model for several reasons. You may want to adopt a newer model for performance improvements like better search accuracy, or switch models due to the deprecation of your current model.
There are three basic steps when it comes to switching embedding models:
-
Baseline performance analysis Select a representative subset of the data. Using the existing embedding model, calculate a baseline for query/search accuracy. This metric will serve as the benchmark for comparison.
-
New model evaluation Generate new vector embeddings for the identical data sample using the updated model. Re-calculate the query/search accuracy using these new embeddings.
-
Decision & deployment Compare the accuracy results from the new model against the established baseline. If the new embeddings demonstrate a clear improvement in performance, proceed with deploying the updated model system-wide.
This tutorial demonstrates two approaches for switching vectorizers in your application:
- Method A: Collection aliases This approach uses aliases to instantly switch between separate collections. It's perfect for a complete model replacement, minimizing risk and providing an immediate rollback option.
- Method B: Adding new vectors
This method allows multiple vectors per data object within a single collection. It's ideal for testing new models alongside existing ones.
For most production use cases, we recommend using collection aliases for a seamless and reversible migration.
Step 0: Create a demo collection (optional)
If you want to follow this tutorial locally and execute the code snippets, you can check out the collapsible element below for steps on how to create a collection and import a demo dataset with precomputed vector embeddings.
Step 0: Setup a collection and populate it with demo data
Step 0.1: Connect to Weaviate Cloud
First, establish a connection to your Weaviate Cloud instance:
# Connect to Weaviate Cloud
import os
import weaviate
from weaviate.auth import Auth
# Best practice: store your credentials in environment variables
weaviate_url = os.environ["WEAVIATE_URL"]
weaviate_api_key = os.environ["WEAVIATE_API_KEY"]
client = weaviate.connect_to_weaviate_cloud(
cluster_url=weaviate_url,
auth_credentials=Auth.api_key(weaviate_api_key),
)
print(client.is_ready()) # Should print: `True`
Step 0.2: Create a collection
Create a collection that accepts self-provided vectors. We will use the existing embeddings from the HuggingFace e-commerce dataset:
from weaviate.classes.config import Configure, DataType, Property
# Create collection with original vector configuration
products = client.collections.create(
name="ECommerceProducts",
vector_config=[
Configure.Vectors.text2vec_weaviate(
name="original_vector", # Name for existing vectors
)
],
properties=[
Property(name="category", data_type=DataType.TEXT),
Property(name="name", data_type=DataType.TEXT),
Property(name="description", data_type=DataType.TEXT),
Property(name="price", data_type=DataType.NUMBER),
],
)
Step 0.3: Import data
We'll use the Weaviate ECommerce dataset from Hugging Face, which contains clothing items with pre-computed embeddings. This dataset represents a real-world scenario where you have existing vector embeddings that need migration.
The dataset includes:
- Product information (name, description, category, price, brand, etc.)
- Pre-computed 768-dimensional vectors
from datasets import load_dataset
def load_ecommerce_data():
"""
Load the Weaviate ECommerce dataset from Hugging Face.
This dataset contains clothing items with pre-computed vectors.
"""
# Load the dataset from Hugging Face
dataset = load_dataset(
"weaviate/agents", "query-agent-ecommerce", split="train", streaming=True
)
# Convert to list for easier handling
# Note: Limited to 100 items for demo purposes
# In production, process the full dataset or use batching
ecommerce_data = []
for i, item in enumerate(dataset):
if i >= 10:
break
ecommerce_data.append(
{"properties": item["properties"], "vector": item["vector"]}
)
return ecommerce_data
# Load the data once for use in examples
ecommerce_data = load_ecommerce_data()
print(f"Loaded {len(ecommerce_data)} items from ECommerce dataset")
Import your e-commerce data along with the pre-computed vectors:
# Import data with existing vectors
products = client.collections.use("ECommerceProducts")
with products.batch.fixed_size(batch_size=200) as batch:
for item in ecommerce_data:
batch.add_object(
properties=item["properties"],
vector={"original_vector": item["vector"]},
)
failed_objects = products.batch.failed_objects
if failed_objects:
print(f"Number of failed imports: {len(failed_objects)}")
else:
print(f"Successfully imported {len(ecommerce_data)} products with original vectors")
Step 0.4: Query with original vectors
Verify that searches work with your original vectors:
# Query using original vectors
products = client.collections.use("ECommerceProducts")
# For the demo get the first item's vector and search for
query_vector = ecommerce_data[0]["vector"]
results = products.query.near_vector(
near_vector=query_vector,
target_vector="original_vector", # Specify which vector to search
limit=3,
return_properties=["name", "description", "price", "category"],
)
print("Search results with original vectors:")
for obj in results.objects:
print(
f"- {obj.properties['name']} ({obj.properties['category']}): ${obj.properties['price']}"
)
Method A: Collection aliases migration
Collection aliases allow instant switching between collections with different vectorizers. Instead of using a collection name in your application, you use the collection alias instead. One you create a new collection with a different embedding mode, you can update the alias to point to the new collection. Now, all of the queries using the alias will be performed with the new vector embeddings.
Step 1: Create collection alias and query using alias
Create an alias that your application will use:
# Create an alias pointing to the current production collection
client.alias.create(
alias_name="ECommerceProduction", target_collection="ECommerceProducts"
)
Your application should always use the alias:
# Your application always uses the alias name
products = client.collections.use("ECommerceProduction")
# Query using the alias (currently points to ECommerceProducts)
# Use a vector from our dataset as query
query_vector = ecommerce_data[0]["vector"]
results = products.query.near_vector(
near_vector=query_vector,
target_vector=["original_vector"],
limit=3,
return_properties=["name", "price"],
)
print("Query results via alias:")
for obj in results.objects:
print(f"{obj.properties['name']}: ${obj.properties['price']}")
Step 2: Create new collection with Weaviate Embeddings
Create a new collection using the Weaviate Embeddings vectorizer:
from weaviate.classes.config import Configure, DataType, Property
# Create new collection with Weaviate Embeddings vectorizer
client.collections.delete("ECommerce_v2") # Clean up if exists
products_v2 = client.collections.create(
name="ECommerce_v2",
vector_config=Configure.Vectors.text2vec_weaviate(
name="new_vector",
source_properties=["name", "description"], # Properties to vectorize
),
properties=[
Property(name="category", data_type=DataType.TEXT),
Property(name="name", data_type=DataType.TEXT),
Property(name="description", data_type=DataType.TEXT),
Property(name="price", data_type=DataType.NUMBER),
],
)
print("Created ECommerce_v2 with Weaviate Embeddings")
Choose a different embedding model
If you prefer a different model provider integration, or prefer to import your own vectors, see one of the following guides:
See the embedding model providers page for information on other available vectorizers, such as AWS, Cohere, Google, and many more.
If you prefer to add custom vectors yourself along with the object data, see the Bring Your Own Vectors starter guide.
Step 3: Migrate data to new collection
Copy data to the new collection (vectors will be auto-generated):
# Migrate data to new collection (vectors will be auto-generated)
products_v1 = client.collections.use("ECommerceProducts")
products_v2 = client.collections.use("ECommerce_v2")
# Fetch all data from v1
all_products = products_v1.query.fetch_objects(limit=10000).objects
# Import to v2 (without vectors - they'll be auto-generated)
with products_v2.batch.fixed_size(batch_size=200) as batch:
for obj in all_products:
batch.add_object(
properties=obj.properties
# No vector provided - Weaviate Embeddings will generate it
)
failed_objects = products_v2.batch.failed_objects
if failed_objects:
print(f"Number of failed migrations: {len(failed_objects)}")
else:
print(f"Migrated {len(all_products)} products to ECommerce_v2")
Step 4: Switch the alias and query using alias
Update the alias to point to the new collection:
# Switch the alias to the new collection (instant switch!)
client.alias.update(
alias_name="ECommerceProduction", new_target_collection="ECommerce_v2"
)
print("Switched alias 'ECommerceProduction' -> 'ECommerce_v2'")
# Now queries using the alias automatically use the new collection
products = client.collections.use("ECommerceProduction")
results = products.query.near_text(
query="comfortable athletic wear",
target_vector="new_vector",
limit=3,
return_properties=["name", "price", "category"],
)
print("\nQuery results after switch (now using Weaviate Embeddings):")
for obj in results.objects:
print(
f"- {obj.properties['name']} ({obj.properties['category']}): ${obj.properties['price']}"
)
After verification, remove the old collection:
# Optional: delete old collection if no longer needed
client.collections.delete("ECommerceProducts")
Method B: Add new vector
Weaviate allows you to have multiple vector representations in the same collection. Once a new vector is created and all the objects in the database are vectorized, you can use the new vector instead of the original one.
Step 1: Add new vector with Weaviate Embeddings
Add a new vector using Weaviate's built-in embedding service Weaviate Embeddings:
# Add a new vector with Weaviate Embeddings vectorizer
from weaviate.classes.config import Configure
# Add new vector configuration to existing collection
products.config.add_vector(
vector_config=Configure.Vectors.text2vec_weaviate( # Add new Weaviate Embeddings vector
name="new_vector",
source_properties=["name", "description"], # Properties to vectorize
),
)
print("Added new Weaviate Embeddings vector to collection")
Choose a different embedding model
If you prefer a different model provider integration, or prefer to import your own vectors, see one of the following guides:
See the embedding model providers page for information on other available vectorizers, such as AWS, Cohere, Google, and many more.
If you prefer to add custom vectors yourself along with the object data, see the Bring Your Own Vectors starter guide.
Step 2: Trigger vectorization
To trigger vectorization with the new embedding model, you will need to delete and reinsert the objects in question. We are going to delete and insert the whole dataset.
# Re-insert all objects to trigger vectorization with new vectorizer
from weaviate.classes.query import Filter
products = client.collections.use("ECommerceProducts")
# Fetch all object UUIDs to delete them
all_objects = products.query.fetch_objects().objects
products.data.delete_many(
where=Filter.by_id().contains_any([obj.uuid for obj in all_objects])
)
# Insert all objects into the collection to re-calculate the vector embeddings
# for the new vector "new_vector"
with products.batch.fixed_size(batch_size=200) as batch:
for item in ecommerce_data:
batch.add_object(
properties=item["properties"],
vector={"original_vector": item["vector"]},
)
print(f"Triggered vectorization for all objects")
Step 3: Query with the new vector
Now you can query using the new vector:
# Query using the new Weaviate Embeddings vector
products = client.collections.use("ECommerceProducts")
# Now we can use text search with the new vector
results = products.query.near_text(
query="comfortable athletic wear",
target_vector="new_vector", # Use the new vector
limit=3,
return_properties=["name", "description", "price", "category"],
)
print("\nSearch results with Weaviate Embeddings:")
for obj in results.objects:
print(
f"- {obj.properties['name']} ({obj.properties['category']}): ${obj.properties['price']}"
)
Step 4: Cleanup and important considerations
A critical point to remember is that you cannot delete a named vector from a collection once it has been added. Both the original and new vectors will coexist for each object.
This means your storage usage for this collection will permanently increase. Due to this limitation, adding new vectors best suited for temporary testing or when a permanent increase in storage is acceptable, rather than for production migrations where a clean state is needed.
Choosing the right method
While both methods achieve the goal of switching vectorizers, we strongly recommend using collection aliases for production migrations. This approach provides a reversible and efficient way to upgrade your embedding models without long-term side effects like increased storage.
Method A: Collection aliases (Recommended)
- Zero-downtime switch: Instantly redirect application traffic to the new collection.
- Instant rollback: If issues arise, you can immediately switch the alias back to the old collection.
- Clean architecture: Old and new models are completely isolated in separate collections.
- No storage overhead: After migration, the old collection can be deleted, reclaiming all its storage space.
Method B: Adding a new vector
- Good for experimentation: Useful for quickly comparing model performance within a single dataset.
- Permanent storage increase: The old vector cannot be deleted, leading to a larger collection size and higher costs.
- No simple rollback: Reverting requires application-level changes to specify the old vector name in every query.
Best practices
- Test thoroughly: Always test the new vectorizer with a subset of data first.
- Monitor quality: Compare search results between old and new models.
- Plan for rollback: Keep the old setup available until you're confident. The alias method makes it easy to rollback.
- Consider costs: Different vectorizers have different pricing models. Remember that adding new vectors will increase your storage costs.
- Document changes: Track which model versions and configurations you're using.
Summary
This tutorial demonstrated two powerful approaches for migrating vectorizers in Weaviate:
- Collection aliases: Best for instant, zero-downtime switching between collections with different vectorizers.
- Adding new vectors: Creating new vector embeddings in an existing collection, best for experimentation.
For production environments, the collection alias method is the preferred approach due to its safety, flexibility, and clean rollback capabilities.
Further resources
Questions and feedback
If you have any questions or feedback, let us know in the user forum.