Collection definition

A collection definition specifies how to store and index a set of data objects in Weaviate. This page discusses the available parameters for configuring a collection.

Collection definition parameters

These are the top-level parameters you can set when creating a collection.

Parameter	Type	Description	Default	Mutable
`class`	String	The name of the collection.	(Required)	No
`description`	String	A description of the collection.	`""`	Yes
`properties`	Array	An array of property objects defining the data schema.	`[]`	Partially*
`invertedIndexConfig`	Object	Configuration for the inverted index, affecting filtering and keyword search.	See Inverted Index reference	Yes
`vectorConfig`	Object	Configure multiple named vectors each with their own `vectorizer`, `vectorIndexType`, and `vectorIndexConfig` fields.	`null`	Partially**
`vectorizer`	String	The vectorizer module to use.	Default vectorizer defined by environment variable. See Model provider for module-specific config defaults	No
`vectorIndexType`	String	The type of vector index to use (`hnsw`, `flat`, `dynamic`).	`hnsw`	No
`moduleConfig`	Object	Module-specific configuration settings.	See Module configuration	Partially
`vectorIndexConfig`	Object	Configuration settings specific to the chosen `vectorIndexType`.	See Vector index reference	Partially
`shardingConfig`	Object	Controls sharding behavior in a multi-node cluster.	See Sharding section	No
`replicationConfig`	Object	Controls data replication settings for fault tolerance.	See Replication section	Partially
`multiTenancyConfig`	Object	Configuration to enable multi-tenancy for the collection.	See Multi-tenancy section	Partially

* New properties can be added; existing properties cannot be modified
** New named vectors can be added; some vector index settings are mutable

Example collection configuration - JSON object

An example of a complete collection object including properties:

{
  "class": "Article",                       // The name of the collection in string format
  "description": "An article",              // A description for your reference
  "vectorIndexType": "hnsw",                // Defaults to hnsw
  "vectorIndexConfig": {
    ...                                     // Vector index type specific settings, including distance metric
  },
  "vectorizer": "text2vec-contextionary",   // Vectorizer to use for data objects added to this collection
  "moduleConfig": {
    "text2vec-contextionary": {
      "vectorizeClassName": true            // Include the collection name in vector calculation (default true)
    }
  },
  "properties": [                           // An array of the properties you are adding, same as a Property Object
    {
      "name": "title",                     // The name of the property
      "description": "title of the article",              // A description for your reference
      "dataType": [                         // The data type of the object as described above. When
                                            //    creating cross-references, a property can have
                                            //    multiple data types, hence the array syntax.
        "text"
      ],
      "moduleConfig": {                     // Module-specific settings
        "text2vec-contextionary": {
          "skip": true,                     // If true, the whole property will NOT be included in
                                            //    vectorization. Default is false, meaning that the
                                            //    object will be NOT be skipped.
          "vectorizePropertyName": true,    // Whether the name of the property is used in the
                                            //    calculation for the vector position of data
                                            //    objects. Default false.
        }
      },
      "indexFilterable": true,              // Optional, default is true. By default each property
                                            //    is indexed with a roaring bitmap index where
                                            //     available for efficient filtering.
      "indexSearchable": true               // Optional, default is true. By default each property
                                            //    is indexed with a searchable index for
                                            //    BM25-suitable Map index for BM25 or hybrid
                                            //    searching.
    }
  ],
  "invertedIndexConfig": {                  // Optional, index configuration
    "stopwords": {
      ...                                   // Optional, controls which words should be ignored in the inverted index, see section below
    },
    "indexTimestamps": false,               // Optional, maintains inverted indexes for each object by its internal timestamps
    "indexNullState": false,                // Optional, maintains inverted indexes for each property regarding its null state
    "indexPropertyLength": false            // Optional, maintains inverted indexes for each property by its length
  },
  "shardingConfig": {
    ...                                     // Optional, controls behavior of the collection in a
                                            //    multi-node setting, see section below
  },
  "multiTenancyConfig": {"enabled": true}   // Optional, for enabling multi-tenancy for this
                                            //    collection (default: false)
}

Code example - How to create a collection

This code example shows how to configure the collection parameters through a client library:

API docs

More info

from weaviate.classes.config import (
    Configure,
    DataType,
    Property,
    ReplicationDeletionStrategy,
    VectorDistances,
    VectorFilterStrategy,
)

client.collections.create(
    "Article",
    description="A collection of articles",
    properties=[
        Property(name="title", data_type=DataType.TEXT),
        Property(name="body", data_type=DataType.TEXT),
    ],
    vector_config=Configure.Vectors.text2vec_openai(
        name="default",
        source_properties=["title", "body"],
        vector_index_config=Configure.VectorIndex.hnsw(
            ef_construction=300,
            distance_metric=VectorDistances.COSINE,
            filter_strategy=VectorFilterStrategy.SWEEPING,
        ),
    ),
    multi_tenancy_config=Configure.multi_tenancy(False),
    sharding_config=Configure.sharding(
        virtual_per_physical=128,
        desired_count=1,
        desired_virtual_count=128,
    ),
    replication_config=Configure.replication(
        factor=1,
        async_enabled=False,
        deletion_strategy=ReplicationDeletionStrategy.TIME_BASED_RESOLUTION,
    ),
)

Further resources

For more code examples and configuration guides visit the How-to: Manage collections section.

`class`

The class is the name of the collection.

The collection name starts with an upper case letter. The upper case letter distinguishes collection names from primitive data types when the name is used as a property value.

Consider these examples that use the dataType property:

dataType: ["text"] is a text data type.
dataType: ["Text"] is a cross-reference type to a collection named Text.

After the first letter, collection names may use any GraphQL-compatible characters.

The collection name validation regex is /^[A-Z][_0-9A-Za-z]*$/.

Capitalization

Weaviate follows GraphQL naming conventions.

Start collection names with an upper case letter.
Start property names with a lower case letter.

If you use an initial upper case letter to define a property name, Weaviate changes it to a lower case letter internally.

`description`

A description of the collection. This is for your reference and can also provide additional information to Weaviate Agents.

Properties

Parameter	Type	Description	Default	Mutable
`name`	String	The name of the property.	(Required)	No
`dataType`	Array	An array containing one or more data types. For cross-references, use the capitalized collection name (e.g., `["Article"]`).	(Required)	No
`description`	String	A description of the property for your reference.	`null`	Yes
`tokenization`	String	For `text` properties, specifies how the text is split into tokens for inverted indexing.	`word`	No
`indexInverted`	Boolean	If `true`, inverted index is enabled for this property.	`true`	No
`indexFilterable`	Boolean	If `true`, builds a roaring bitmap index for this property to allow for efficient filtering.	`true`	No
`indexSearchable`	Boolean	If `true`, builds a searchable map index for this property, suitable for BM25 or hybrid search.	`true`	No
`indexRangeFilters`	Boolean	If `true`, builds a roaring bitmap index for numerical range-based filtering.	`false`	No
`invertedIndexConfig`	Object	Property-level overrides for inverted index settings, such as `bm25` parameters.	`{}`	No
`moduleConfig`	Object	Module-specific settings, such as skipping vectorization for this property.	`{}`	No

Example property configuration - JSON object

An example of a complete property object:

{
  "name": "title", // The name of the property
  "description": "title of the article", // A description for your reference
  "dataType": [
    // The data type of the object as described above. When creating cross-references, a property can have multiple dataTypes.
    "text"
  ],
  "tokenization": "word", // Split field contents into word-tokens when indexing into the inverted index. See "Property Tokenization" below for more detail.
  "moduleConfig": {
    // Module-specific settings
    "text2vec-contextionary": {
      "skip": true, // If true, the whole property is NOT included in vectorization. Default is false, meaning that the object will be NOT be skipped.
      "vectorizePropertyName": true // Whether the name of the property is used in the calculation for the vector position of data objects. Default false.
    }
  },
  "indexFilterable": true, // Optional, default is true. By default each property is indexed with a roaring bitmap index where available for efficient filtering.
  "indexSearchable": true // Optional, default is true. By default each property is indexed with a searchable index for BM25-suitable Map index for BM25 or hybrid searching.
}

Code example - How to configure collection properties

This code example shows how to configure the property parameters through a client library:

API docs

More info

from weaviate.classes.config import Property, DataType

# Note that you can use `client.collections.create_from_dict()` to create a collection from a v3-client-style JSON object
client.collections.create(
    "Article",
    vector_config=Configure.Vectors.text2vec_openai(),
    properties=[  # properties configuration is optional
        Property(name="title", data_type=DataType.TEXT),
        Property(name="description", data_type=DataType.TEXT, skip_vectorization=True),
        Property(name="rating", data_type=DataType.NUMBER),
    ],
)

Further resources

For more code example and configuration guides visit the How-to: Manage collections section.

`name`

Property names can contain the following characters: /[_A-Za-z][_0-9A-Za-z]*/.

Reserved words

The following words are reserved and cannot be used as property names:

_additional
id
_id

Additionally, we strongly recommend that you do not use the following words as property names, due to potential conflicts with future reserved words:

vector
_vector

`tokenization`

You can customize how text data is tokenized and indexed in the inverted index. Tokenization influences the results returned by the bm25 and hybrid operators, and where filters.

Tokenization is a property-level configuration for text properties. See how to set the tokenization option using a client library

Example property configuration - JSON object

{
  "classes": [
    {
      "class": "Question",
      "properties": [
        {
          "dataType": ["text"],
          "name": "question",
          "tokenization": "word"
        },
      ],
      ...
      "vectorizer": "text2vec-openai"
    }
  ]
}

Each token will be indexed separately in the inverted index. For example, if you have a text property with the value Hello, (beautiful) world, the following table shows how the tokens would be indexed for each tokenization method:

Tokenization Method	Explanation	Indexed Tokens
`word` (default)	Keep only alpha-numeric characters, lowercase them, and split by whitespace.	`hello`, `beautiful`, `world`
`lowercase`	Lowercase the entire text and split on whitespace.	`hello,`, `(beautiful)`, `world`
`whitespace`	Split the text on whitespace. Searches/filters become case-sensitive.	`Hello,`, `(beautiful)`, `world`
`field`	Index the whole field after trimming whitespace characters.	`Hello, (beautiful) world`
`trigram`	Split the property as rolling trigrams.	`Hel`, `ell`, `llo`, `lo,`, ...
`gse`	Use the `gse` tokenizer to split the property.	See `gse` docs
`kagome_ja`	Use the `Kagome` tokenizer with a Japanese (IPA) dictionary to split the property.	See `kagome` docs and the dictionary.
`kagome_kr`	Use the `Kagome` tokenizer with a Korean dictionary to split the property.	See `kagome` docs and the Korean dictionary.

Tokenization and search / filtering

Tokenization impacts how filters or keywords searches behave. The filter or keyword search query is also tokenized before being matched against the inverted index.

The following table shows an example scenario showing whether a filter or keyword search would identify a text property with value Hello, (beautiful) world as a hit.

Row: Various tokenization methods.
Column: Various search strings.

	`Beautiful`	`(Beautiful)`	`(beautiful)`	`Hello, (beautiful) world`
`word` (default)	✅	✅	✅	✅
`lowercase`	❌	✅	✅	✅
`whitespace`	❌	❌	✅	✅
`field`	❌	❌	❌	✅

gse and trigram tokenization methods

Added in 1.24

For Japanese and Chinese text, we recommend use of gse or trigram tokenization methods. These methods work better with these languages than the other methods as these languages are not easily able to be tokenized using whitespaces.

The gse tokenizer is not loaded by default to save resources. To use it, set the environment variable ENABLE_TOKENIZER_GSE to true on the Weaviate instance.

gse tokenization examples:

"素早い茶色の狐が怠けた犬を飛び越えた": ["素早", "素早い", "早い", "茶色", "の", "狐", "が", "怠け", "けた", "犬", "を", "飛び", "飛び越え", "越え", "た", "素早い茶色の狐が怠けた犬を飛び越えた"]
"すばやいちゃいろのきつねがなまけたいぬをとびこえた": ["すばや", "すばやい", "やい", "いち", "ちゃ", "ちゃい", "ちゃいろ", "いろ", "のき", "きつ", "きつね", "つね", "ねが", "がな", "なま", "なまけ", "まけ", "けた", "けたい", "たい", "いぬ", "を", "とび", "とびこえ", "こえ", "た", "すばやいちゃいろのきつねがなまけたいぬをとびこえた"]

trigram for fuzzy matching

While originally designed for Asian languages, trigram tokenization is also highly effective for fuzzy matching and typo tolerance in other languages.

kagome_ja tokenization method

Experimental feature

Available starting in v1.28.0. This is an experimental feature. Use with caution.

For Japanese text, kagome_ja tokenization method is also available. This uses the Kagome tokenizer with a Japanese MeCab IPA dictionary to split the property text.

The kagome_ja tokenizer is not loaded by default to save resources. To use it, set the environment variable ENABLE_TOKENIZER_KAGOME_JA to true on the Weaviate instance.

kagome_ja tokenization examples:

"春の夜の夢はうつつよりもかなしき夏の夜の夢はうつつに似たり秋の夜の夢はうつつを超え冬の夜の夢は心に響く山のあなたに小さな村が見える川の音が静かに耳に届く風が木々を通り抜ける音星空の下、すべてが平和である":
- ["春", "の", "夜", "の", "夢", "は", "うつつ", "より", "も", "かなしき", "\n\t", "夏", "の", "夜", "の", "夢", "は", "うつつ", "に", "似", "たり", "\n\t", "秋", "の", "夜", "の", "夢", "は", "うつつ", "を", "超え", "\n\t", "冬", "の", "夜", "の", "夢", "は", "心", "に", "響く", "\n\n\t", "山", "の", "あなた", "に", "小さな", "村", "が", "見える", "\n\t", "川", "の", "音", "が", "静か", "に", "耳", "に", "届く", "\n\t", "風", "が", "木々", "を", "通り抜ける", "音", "\n\t", "星空", "の", "下", "、", "すべて", "が", "平和", "で", "ある"]
"素早い茶色の狐が怠けた犬を飛び越えた":
- ["素早い", "茶色", "の", "狐", "が", "怠け", "た", "犬", "を", "飛び越え", "た"]
"すばやいちゃいろのきつねがなまけたいぬをとびこえた":
- ["すばやい", "ちゃ", "いろ", "の", "きつね", "が", "なまけ", "た", "いぬ", "を", "とびこえ", "た"]

kagome_kr tokenization method

Experimental feature

Available starting in v1.25.7. This is an experimental feature. Use with caution.

For Korean text, we recommend use of the kagome_kr tokenization method. This uses the Kagome tokenizer with a Korean MeCab (mecab-ko-dic) dictionary to split the property text.

The kagome_kr tokenizer is not loaded by default to save resources. To use it, set the environment variable ENABLE_TOKENIZER_KAGOME_KR to true on the Weaviate instance.

kagome_kr tokenization examples:

"아버지가방에들어가신다":
- ["아버지", "가", "방", "에", "들어가", "신다"]
"아버지가 방에 들어가신다":
- ["아버지", "가", "방", "에", "들어가", "신다"]
"결정하겠다":
- ["결정", "하", "겠", "다"]

Limit the number of gse and Kagome tokenizers

The gse and Kagome tokenizers can be resource intensive and affect Weaviate's performance. You can limit the combined number of gse and Kagome tokenizers running at the same time using the TOKENIZER_CONCURRENCY_COUNT environment variable.

Fuzzy matching with trigram tokenization

The trigram tokenization method provides fuzzy matching capabilities by breaking text into overlapping 3-character sequences. This enables BM25 searches to find matches even with spelling errors or variations.

Use cases for trigram fuzzy matching:

Typo tolerance: Find matches despite spelling errors (e.g., "Reliace" matches "Reliance")
Name reconciliation: Match entity names with variations across datasets
Search-as-you-type: Build autocomplete functionality
Partial matching: Find objects with partial string matches

How it works:

When text is tokenized with trigram, it's broken into all possible 3-character sequences:

"hello" → ["hel", "ell", "llo"]
"world" → ["wor", "orl", "rld"]

Similar strings share many trigrams, enabling fuzzy matching:

"Morgan Stanley" and "Stanley Morgn" share trigrams like "sta", "tan", "anl", "nle", "ley"

Performance considerations:

Filtering behavior will change significantly, as text filtering will be done based on trigram-tokenized text, instead of whole words
Creates larger inverted indexes due to more tokens
May impact query performance for large datasets

tip

Use trigram tokenization selectively on fields where fuzzy matching is preferred. Keep exact-match fields with word or field tokenization for precision.

Inverted index

Weaviate uses inverted indexes to enable fast and efficient filtering and searching. The inverted index maps values (like words or numbers) to the objects that contain them in order to speed-up all attribute-based filtering (where filters) and keyword searching (bm25, hybrid). Disabling indexing for properties you will never query can speed up data imports and reduce disk usage.

More details about the indexFilterable, indexSearchable, indexRangeFilters and invertedIndexConfig parameters can be found in Reference: Inverted index.

Vector configuration

Weaviate supports two approaches for vector configuration:

Single vector collections: One vector space per object using top-level parameters (vectorizer, vectorIndexType, vectorIndexConfig)
Multiple named vectors: Multiple vector spaces per object using the vectorConfig parameter (recommended)

You cannot combine both approaches in the same collection.

We recommend using vectorConfig

Using the vectorConfig parameter allows you to start with one vector per collection and adding new named vectors afterward.

Vector configuration parameters

Parameter	Type	Description	Default	Mutable
`vectorizer`	String	The vectorizer module to use (e.g., `text2vec-cohere`). Set to `none` to disable auto-vectorization. Available model providers	Module-specific default	No
`vectorIndexType`	String	Vector index type: `hnsw` (default), `flat`, or `dynamic`	`hnsw`	No
`vectorIndexConfig`	Object	Configuration settings for your chosen `vectorIndexType`	Index-specific defaults	Partially*
`vectorConfig`	Object	Alternative to above: Define multiple named vector spaces	`null`	Partially**
↪ `vectorConfig.<name>.vectorizer`	Object	Vectorizer config for this named vector (e.g., `{"text2vec-openai": {"properties": ["title"]}}`)	(Required)	No
↪ `vectorConfig.<name>.vectorIndexType`	String	Index type for this named vector	`hnsw`	No
↪ `vectorConfig.<name>.vectorIndexConfig`	Object	Index configuration for this named vector	Index-specific defaults	Partially*

* See vector index mutable parameters
** New named vectors can be added after collection creation

Single vector collections

If you don't explicitly define a named vector in your collection definition, Weaviate automatically creates what's known as a single vector collection. These vectors are stored internally under the named vector default (which is a reserved vector name).

To learn which properties of your data are vectorized, refer to the Configure semantic indexing section.

Code example - How to create single vector collection

This code example shows how to configure the vectorizer parameters for a single vector collection through a client library:

API docs

More info

from weaviate.classes.config import (
    Configure,
    DataType,
    Property,
    VectorDistances,
    VectorFilterStrategy,
)

client.collections.create(
    "Article",
    vector_config=Configure.Vectors.text2vec_openai(
        name="default",  # (Optional) Set the name of the vector, default name is "default"
        source_properties=["title", "body"],  # (Optional) Set the source property(ies)
        vector_index_config=Configure.VectorIndex.hnsw(
            ef_construction=300,
            distance_metric=VectorDistances.COSINE,
            filter_strategy=VectorFilterStrategy.SWEEPING,
        ),  # (Optional) Set vector index options
        vectorize_collection_name=True,  # (Optional) Set to True to vectorize the collection name
    ),
    properties=[  # properties configuration is optional
        Property(name="title", data_type=DataType.TEXT, vectorize_property_name=True),
        Property(name="body", data_type=DataType.TEXT),
    ],
)

Further resources

For more code example and configuration guides visit the How-to: Vectorizer and vector index config guide.

Multiple vector embeddings (named vectors)

Added in v1.24.0

Weaviate collections support multiple named vectors.

Collections can have multiple named vectors.

The vectors in a collection can have their own configurations. Each vector space can set its own index, its own compression algorithm, and its own vectorizer. This means you can use different vectorization models, and apply different distance metrics, to the same object.

To work with named vectors, adjust your queries to specify a target vector for vector search or hybrid search queries.

Code example - How to create multiple named vectors

This code example shows how to configure multiple named vectors through a client library:

API docs

More info

from weaviate.classes.config import (
    Configure,
    DataType,
    Property,
    VectorDistances,
    VectorFilterStrategy,
)

client.collections.create(
    "Article",
    vector_config=[
        Configure.Vectors.text2vec_openai(
            name="default",  # (Optional) Set the name of the vector, default name is "default"
            source_properties=[
                "title",
                "body",
            ],  # (Optional) Set the source property(ies)
            vector_index_config=Configure.VectorIndex.hnsw(
                ef_construction=300,
                distance_metric=VectorDistances.COSINE,
                filter_strategy=VectorFilterStrategy.SWEEPING,
            ),  # (Optional) Set vector index options
            vectorize_collection_name=True,  # (Optional) Set to True to vectorize the collection name
        ),
        Configure.Vectors.text2vec_openai(
            name="body_vectors",
            source_properties=["body"],
            vector_index_config=Configure.VectorIndex.flat(),
        ),
    ],
    properties=[  # properties configuration is optional
        Property(name="title", data_type=DataType.TEXT, vectorize_property_name=True),
        Property(name="body", data_type=DataType.TEXT),
    ],
)

Further resources

For more code example and configuration guides visit the How-to: Vectorizer and vector index config guide.

Module configuration

The moduleConfig parameter allows you to specify if the vectorizers will include or exclude the collection name in vector calculations (default true). It is also used to specify reranker and generative model providers at a collection level.

Example module configuration - JSON object

An example of a complete moduleConfig object:

  "moduleConfig": {
    "text2vec-contextionary": {
      "vectorizeClassName": true  // Include the collection name in vector calculation (default true)
    }
  },

Vector index

Vector indexing organizes vector data to make similarity searches fast and efficient. Instead of comparing a query to every vector, an index builds a structure that rapidly narrows the search to the most relevant candidates.

More details about the vectorIndexType and vectorIndexConfig parameters can be found in Reference: Vector index.

Replication

Replication factor change

The replication factor of a collection cannot be updated by updating the collection's definition.

From v1.32 by using replica movement, the replication factor of a shard can be changed.

Replication configurations can be set using the definition, through the replicationConfig parameter.

Parameter	Type	Description	Default	Mutable
`factor`	Integer	The number of copies (replicas) to maintain for each shard. A factor of `3` means one primary and two replicas.	`1`	No in `v1.25+`, Yes in earlier versions
`asyncEnabled`	Boolean	Enable asynchronous replication. Added in `v1.26`	`false`	Yes
`deletionStrategy`	String	Strategy for handling deletions in replication. Can be `NoAutomatedResolution`, `DeleteOnConflict` or `TimeBasedResolution`. Added in `v1.27`	`"NoAutomatedResolution"`	Yes

Example replication configuration - JSON object

An example of a complete replicationConfig object:

{
  "class": "Article",
  "vectorizer": "text2vec-openai",
  "replicationConfig": {
    "factor": 3,
    "asyncEnabled": false,
    "deletionStrategy": "NoAutomatedResolution"
  }
}

Code example - How to configure replication

This code example shows how to configure the replication parameters through a client library:

API docs

More info

from weaviate.classes.config import Configure, ReplicationDeletionStrategy

client.collections.create(
    "Article",
    replication_config=Configure.replication(
        factor=3,
        async_enabled=True,
        deletion_strategy=ReplicationDeletionStrategy.TIME_BASED_RESOLUTION,
    ),
)

Further resources

For more code example and configuration guides visit the How-to: Manage collections section.

Sharding

Sharding is configured via the shardingConfig object in the collection definition. These parameters are immutable and cannot be changed after the collection is created.

Parameter	Type	Description	Default	Mutable
`desiredCount`	Integer	The desired number of physical shards for the collection. If this value is larger than the number of cluster nodes, some nodes will host multiple shards.	Number of nodes	No
`virtualPerPhysical`	Integer	The number of virtual shards per physical shard. Virtual shards aid in reducing data movement during rebalancing.	`128`	No
`strategy`	String	The strategy for determining which shard an object belongs to. Only `"hash"` is currently supported. The hash is based on the `key` property.	`"hash"`	No
`key`	String	The property used for hashing to determine the target shard. Currently, only the object's internal UUID (`_id`) can be used.	`"_id"`	No
`function`	String	The hashing function used on the `key`. Only `"murmur3"` is supported, which creates a 64-bit hash, making collisions highly unlikely.	`"murmur3"`	No
`actualCount`	Integer	(Read-only) The actual number of physical shards created. This typically matches `desiredCount` unless an issue occurred during creation.	`1`	No
`desiredVirtualCount`	Integer	(Read-only) A calculated value representing `desiredCount * virtualPerPhysical`.	`128`	No
`actualVirtualCount`	Integer	(Read-only) The actual number of virtual shards that were created.	`128`	No

Example sharding configuration - JSON object

An example of a complete shardingConfig object:

  "shardingConfig": {
    "virtualPerPhysical": 128,
    "desiredCount": 1,           // defaults to the amount of Weaviate nodes in the cluster
    "actualCount": 1,
    "desiredVirtualCount": 128,
    "actualVirtualCount": 128,
    "key": "_id",
    "strategy": "hash",
    "function": "murmur3"
  }

Code example - How to configure sharding

This code example shows how to configure the sharding parameters through a client library:

API docs

More info

from weaviate.classes.config import Configure

client.collections.create(
    "Article",
    sharding_config=Configure.sharding(
        virtual_per_physical=128,
        desired_count=1,
        desired_virtual_count=128,
    ),
)

Further resources

For more code example and configuration guides visit the How-to: Manage collections section.

Multi-tenancy

Multi-tenancy allows you to isolate data within a single collection, where objects are associated with specific tenants. This is a useful feature for building SaaS applications or any system requiring strict data partitioning.

Why use multi-tenancy?

It provides data isolation at a lower overhead than creating a separate collection for each tenant, making it more scalable when you have a large number of tenants.

To enable multi-tenancy, set the enabled key to true in the multiTenancyConfig object. This parameter is immutable and must be set at creation time.

Parameter	Type	Description	Default	Mutable
`enabled`	Boolean	If `true`, enables multi-tenancy for the collection.	`false`	No
`autoTenantCreation`	Boolean	If `true`, a new tenant is created if you try to insert an object into a non-existent tenant. Added in `v1.25`	`false`	Yes
`autoTenantActivation`	Boolean	If `true`, automatically activate `INACTIVE` or `OFFLOADED` tenants if a search, read, update, or delete operation is performed on them. Added in `v1.25.2`	`false`	Yes

Code example - How to configure multi-tenancy

This code example shows how to configure the multi-tenancy parameters through a client library:

API docs

More info

from weaviate.classes.config import Configure

multi_collection = client.collections.create(
    name="MultiTenancyCollection",
    # Enable multi-tenancy on the new collection
    multi_tenancy_config=Configure.multi_tenancy(enabled=True)
)

Further resources

For more code example and configuration guides visit the How-to: Manage collections section.

Mutability

Some, but not all, parameters are mutable after you create your collection. To modify immutable parameters, export your data, create a new collection, and import your data into it.

Mutable parameters

Replication factor change

The replication factor of a collection cannot be updated by updating the collection's definition.

From v1.32 by using replica movement, the replication factor of a shard can be changed.

description
properties description
invertedIndexConfig
- bm25
  - b
  - k1
- cleanupIntervalSeconds
- stopwords
  - additions
  - preset
  - removals
moduleConfig (generative & reranker modules only, from 1.26.8 and v1.27.1)
multiTenancyConfig
- autoTenantCreation (introduced in v1.25.0)
- autoTenantActivation (introduced in v1.25.2)
replicationConfig
- asyncEnabled (introduced in v1.26.0)
- factor (not mutable in v1.25 or higher)
- deletionStrategy (introduced in v1.27.0)
vectorIndexConfig
- dynamicEfFactor
- dynamicEfMin
- dynamicEfMax
- filterStrategy (introduced in v1.27.0, applicable for HNSW)
- flatSearchCutoff
- bq
  - enabled
  - rescoreLimit
- pq
  - centroids
  - enabled
  - segments
  - trainingLimit
  - encoder
    - type
    - distribution
- sq
  - enabled
  - rescoreLimit
  - trainingLimit
- skip
- vectorCacheMaxObjects

After you create a collection, you can add new properties. You cannot modify existing properties after you create the collection. You can also add new named vectors.

Auto-schema

The "Auto-schema" feature generates a collection definition automatically by inferring parameters from data being added. It is enabled by default, and can be disabled (e.g. in docker-compose.yml) by setting the environment variable AUTOSCHEMA_ENABLED to 'false'.

It will:

Create a collection if an object is added to a non-existent collection.
Add any missing property from an object being added.
Infer array data types, such as int[], text[], number[], boolean[], date[] and object[].
Infer nested properties for object and object[] data types.
Throw an error if an object being added contains a property that conflicts with an existing schema type. (e.g. trying to import text into a field that exists in the schema as int).

Define the collection manually for production use

Generally speaking, we recommend that you disable auto-schema for production use.

A manual collection definition will provide more precise control.
There is a performance penalty associated with inferring the data structure at import time. This may be a costly operation in some cases, such as complex nested properties.

Auto-schema data types

Additional configurations are available to help the auto-schema infer properties to suit your needs.

AUTOSCHEMA_DEFAULT_NUMBER=number - create number columns for any numerical values (as opposed to int, etc).
AUTOSCHEMA_DEFAULT_DATE=date - create date columns for any date-like values.

The following are not allowed:

Any map type is forbidden, unless it clearly matches one of the two supported types phoneNumber or geoCoordinates.
Any array type is forbidden, unless it is clearly a reference-type. In this case, Weaviate needs to resolve the beacon and see what collection the resolved beacon is from, since it needs the collection name to be able to alter the schema.

Collections count limit

Added in v1.30

To ensure optimal performance, Weaviate limits the number of collections per node. Each collection adds overhead in terms of indexing, definition management, and storage. This limit aims to ensure Weaviate remains performant.

Default limit: 1000 collections.
Modify the limit: Use the MAXIMUM_ALLOWED_COLLECTIONS_COUNT environment variable to adjust the collection count limit.

note

If your instance already exceeds the limit, Weaviate will not allow the creation of any new collections. Existing collections will not be deleted.

tip

Instead of raising the collections count limit, consider rethinking your architecture. For more details, see Starter Guides: Scaling limits with collections.

Collection aliases

Added in v1.32

Collection aliases are alternative names for Weaviate collections that allow you to reference a collection by an alternative name.

Alias names must be unique (can't match existing collections or other aliases) and multiple aliases can point to the same collection. You can set up collection aliases programmatically through client libraries or by using the REST endpoints.

In order to manage collection aliases, you need to posses the right Collection aliases permissions. To manage the underlying collection the alias references, you also need the Collections permissions for that specific collection.

Collection aliases cannot be used to update collection definitions, including:

Updating and adding properties
Updating vector and inverted indexes
Configuring sharding and multi-tenancy
Modifying vectorizer, generative and reranker configurations

Collection alias usage

Weaviate automatically routes alias requests to the target collection for object-related operations. You can use aliases wherever collection names are required for:

Managing objects: Create, batch import, read, update and delete objects through collection aliases.
Querying objects: Fetch objects and perform searches (vector, keyword, hybrid, image, generative/RAG) and aggregations through aliases.

Further resources

Questions and feedback

If you have any questions or feedback, let us know in the user forum.

Technical questions

If you have questions feel free to post on our Community forum.

Documentation feedback

Leave feedback by opening a GitHub issue.

Additional resources

Need help?

Collection definition