Basic collection operations
Every object in Weaviate belongs to exactly one collection. Use the examples on this page to manage your collections.
Newer Weaviate documentation discuses "collections." Older Weaviate documentation refers to "classes" instead. Expect to see both terms throughout the documentation.
Starting with Weaviate Python client v4.16.0, the vectorizer configuration API has been updated.
Starting with Weaviate JS/TS client v3.8.0, the vectorizer configuration API has been updated.
Action required: Update to the latest client version and migrate your code to use the new vectorizer configuration API.
Create a collection
To create a collection, specify at least the collection name. If you don't specify any properties, auto-schema creates them.
Weaviate follows GraphQL naming conventions.
- Start collection names with an upper case letter.
- Start property names with a lower case letter.
If you use an initial upper case letter to define a property name, Weaviate changes it to a lower case letter internally.
If a snippet doesn't work or you have feedback, please open a GitHub issue.
client.collections.create("Article")
- Manually define you data schema:
Avoid using the
auto-schemafeature, instead, manually define the properties for your collection. - Avoid creating too many collections: Using too many collections can lead to scalability issues like high memory usage and degraded query performance. Instead, consider using multi-tenancy, where a single collection is subdivided into multiple tenants. For more details, see Starter Guides: Scaling limits with collections.
Create a collection and define properties
Properties are the data fields in your collection. Each property has a name and a data type.
Additional information
Use properties to configure additional parameters such as data type, index characteristics, or tokenization.
For details, see:
If a snippet doesn't work or you have feedback, please open a GitHub issue.
from weaviate.classes.config import Property, DataType
# Note that you can use `client.collections.create_from_dict()` to create a collection from a v3-client-style JSON object
client.collections.create(
"Article",
properties=[
Property(name="title", data_type=DataType.TEXT),
Property(name="body", data_type=DataType.TEXT),
],
)
Create a collection with a vectorizer
Specify a vectorizer for a collection that will generate vector embeddings when creating objects and executing vector search queries.
If a snippet doesn't work or you have feedback, please open a GitHub issue.
from weaviate.classes.config import Configure, Property, DataType
client.collections.create(
"Article",
vector_config=Configure.Vectors.text2vec_openai(),
properties=[
Property(name="title", data_type=DataType.TEXT),
Property(name="body", data_type=DataType.TEXT),
],
)
Find out more about the vectorizer and vector index configuration in Manage collections: Vectorizer and vector index.
Disable auto-schema
By default, Weaviate creates missing collections and missing properties. When you configure collections manually, you have more precise control of the collection settings.
To disable auto-schema set AUTOSCHEMA_ENABLED: 'false' in your system configuration file.
Check if a collection exists
Get a boolean indicating whether a given collection exists.
If a snippet doesn't work or you have feedback, please open a GitHub issue.
exists = client.collections.exists("Article") # Returns a boolean
Read a single collection definition
Retrieve a collection definition from the schema.
If a snippet doesn't work or you have feedback, please open a GitHub issue.
articles = client.collections.use("Article")
articles_config = articles.config.get()
print(articles_config)
Sample configuration: Text objects
This configuration for text objects defines the following:
- The collection name (
Article) - The vectorizer module (
text2vec-cohere) and model (embed-multilingual-v2.0) - A set of properties (
title,body) withtextdata types.
{
"class": "Article",
"vectorizer": "text2vec-cohere",
"moduleConfig": {
"text2vec-cohere": {
"model": "embed-multilingual-v2.0"
}
},
"properties": [
{
"name": "title",
"dataType": ["text"]
},
{
"name": "body",
"dataType": ["text"]
}
]
}
Sample configuration: Nested objects
v1.22This configuration for nested objects defines the following:
-
The collection name (
Person) -
The vectorizer module (
text2vec-huggingface) -
A set of properties (
last_name,address)last_namehastextdata typeaddresshasobjectdata type
-
The
addressproperty has two nested properties (streetandcity)
{
"class": "Person",
"vectorizer": "text2vec-huggingface",
"properties": [
{
"dataType": ["text"],
"name": "last_name"
},
{
"dataType": ["object"],
"name": "address",
"nestedProperties": [
{ "dataType": ["text"], "name": "street" },
{ "dataType": ["text"], "name": "city" }
]
}
]
}
Sample configuration: Generative search
This configuration for retrieval augmented generation defines the following:
- The collection name (
Article) - The default vectorizer module (
text2vec-openai) - The generative module (
generative-openai) - A set of properties (
title,chunk,chunk_noandurl) - The tokenization option for the
urlproperty - The vectorization option (
skipvectorization) for theurlproperty
{
"class": "Article",
"vectorizer": "text2vec-openai",
"vectorIndexConfig": {
"distance": "cosine"
},
"moduleConfig": {
"generative-openai": {}
},
"properties": [
{
"name": "title",
"dataType": ["text"]
},
{
"name": "chunk",
"dataType": ["text"]
},
{
"name": "chunk_no",
"dataType": ["int"]
},
{
"name": "url",
"dataType": ["text"],
"tokenization": "field",
"moduleConfig": {
"text2vec-openai": {
"skip": true
}
}
}
]
}
Sample configuration: Images
This configuration for image search defines the following:
-
The collection name (
Image) -
The vectorizer module (
img2vec-neural)- The
imageproperty configures collection to store image data.
- The
-
The vector index distance metric (
cosine) -
A set of properties (
image), with theimageproperty set asblob.
For image searches, see Image search.
{
"class": "Image",
"vectorizer": "img2vec-neural",
"vectorIndexConfig": {
"distance": "cosine"
},
"moduleConfig": {
"img2vec-neural": {
"imageFields": ["image"]
}
},
"properties": [
{
"name": "image",
"dataType": ["blob"]
}
]
}
Read all collection definitions
Fetch the database schema to retrieve all of the collection definitions.
If a snippet doesn't work or you have feedback, please open a GitHub issue.
response = client.collections.list_all(simple=False)
print(response)
Update a collection definition
The replication factor of a collection cannot be updated by updating the collection's definition.
From v1.32 by using replica movement, the replication factor of a shard can be changed.
You can update a collection definition to change the mutable collection settings.
If a snippet doesn't work or you have feedback, please open a GitHub issue.
from weaviate.classes.config import (
Reconfigure,
VectorFilterStrategy,
ReplicationDeletionStrategy,
)
articles = client.collections.use("Article")
# Update the collection definition
articles.config.update(
description="An updated collection description.",
property_descriptions={
"title": "The updated title description for article",
}, # Available from Weaviate v1.31.0
inverted_index_config=Reconfigure.inverted_index(bm25_k1=1.5),
vector_config=Reconfigure.Vectors.update(
name="default",
vector_index_config=Reconfigure.VectorIndex.hnsw(
filter_strategy=VectorFilterStrategy.ACORN # Available from Weaviate v1.27.0
),
),
replication_config=Reconfigure.replication(
deletion_strategy=ReplicationDeletionStrategy.TIME_BASED_RESOLUTION # Available from Weaviate v1.28.0
),
)
articles = client.collections.use("Article")
article_shards = articles.config.update_shards(
status="READY",
shard_names=shard_names, # The names (List[str]) of the shard to update (or a shard name)
)
print(article_shards)
Delete a collection
You can delete any unwanted collection(s), along with the data that they contain.
When you delete a collection, you delete all associated objects!
Be very careful with deletes on a production database and anywhere else that you have important data.
This code deletes a collection and its objects.
If a snippet doesn't work or you have feedback, please open a GitHub issue.
# collection_name can be a string ("Article") or a list of strings (["Article", "Category"])
client.collections.delete(
collection_name
) # THIS WILL DELETE THE SPECIFIED COLLECTION(S) AND THEIR OBJECTS
# Note: you can also delete all collections in the Weaviate instance with:
# client.collections.delete_all()
Add a property
Indexing limitations after data import
There are no index limitations when you add collection properties before you import data.
If you add a new property after you import data, there is an impact on indexing.
Property indexes are built at import time. If you add a new property after importing some data, pre-existing objects index aren't automatically updated to add the new property. This means pre-existing objects aren't added to the new property index. Queries may return unexpected results because the index only includes new objects.
To create an index that includes all of the objects in a collection, do one of the following:
- New collections: Add all of the collection's properties before importing objects.
- Existing collections: Export the existing data from the collection. Re-create it with the new property. Import the data into the updated collection.
We are working on a re-indexing API to allow you to re-index the data after adding a property. This will be available in a future release.
If a snippet doesn't work or you have feedback, please open a GitHub issue.
from weaviate.classes.config import Property, DataType
articles = client.collections.use("Article")
articles.config.add_property(Property(name="onHomepage", data_type=DataType.BOOL))
Further resources
- Manage collections: Vectorizer and vector index
- References: Collection definition
- Concepts: Data structure
-
API References: REST: Schema
Questions and feedback
If you have any questions or feedback, let us know in the user forum.
