Skip to main content
Go to documentation:
⌘U
Weaviate Database

Develop AI applications using Weaviate's APIs and tools

Deploy

Deploy, configure, and maintain Weaviate Database

Weaviate Agents

Build and deploy intelligent agents with Weaviate

Weaviate Cloud

Manage and scale Weaviate in the cloud

Additional resources

Academy
Integrations
Contributor guide
Events & Workshops

Need help?

Weaviate LogoAsk AI Assistant⌘K
Community Forum

Inverted index

The inverted index maps values (like words or numbers) to the objects that contain them. It is the backbone for all attribute-based filtering (where filters) and keyword searching (bm25, hybrid).

Inverted index types

Multiple inverted index types are available in Weaviate. Not all inverted index types are available for all data types. The available inverted index types are:

Inverted index typeDescriptionApplicable data typesDefaultAvailability
indexSearchableA searchable index for BM25-suitable Map index for BM25 or hybrid searching.text, text[],truev1.19
indexFilterableA Roaring Bitmap index for match-based filtering.Everything except blob, geoCoordinates, object and phoneNumber data types including arrays thereoftruev1.19
indexRangeFiltersA Roaring Bitmap index for numerical range-based filtering.int, number and date onlyfalsev1.26
  • Enable one or both of indexFilterable and indexRangeFilters to index a property for faster filtering.
    • If only one is enabled, the respective index is used for filtering.
    • If both are enabled, indexRangeFilters is used for operations involving comparison operators, and indexFilterable is used for equality and inequality operations.

Inverted index parameters

These parameters are set within the invertedIndexConfig object in your collection definition.

ParameterTypeDefaultDetails
bm25Object{ "k1": 1.2, "b": 0.75 }Sets the k1 and b parameters for the BM25 ranking algorithm. Can be overridden at the property level. See BM25 Configuration below.
stopwordsObject(Varies)Defines the stopword list to exclude common words from search queries. See Stopwords Configuration below.
indexTimestampsBooleanfalseIf true, indexes object creation and update timestamps, enabling filtering by creationTimeUnix and lastUpdateTimeUnix.
indexNullStateBooleanfalseIf true, indexes the null/non-null state of each property, enabling filtering for null values.
indexPropertyLengthBooleanfalseIf true, indexes the length of each property, enabling filtering by property length.
Performance Impact

Enabling indexTimestamps, indexNullState, or indexPropertyLength adds overhead as these additional indexes must be created and maintained. Only enable them if you require these specific filtering capabilities.

Code example

This code example shows how to configure inverted index parameters through a client library:

from weaviate.classes.config import Configure, Property, DataType

client.collections.create(
"Article",
# Additional settings not shown
properties=[ # properties configuration is optional
Property(
name="title",
data_type=DataType.TEXT,
index_filterable=True,
index_searchable=True,
),
Property(
name="chunk",
data_type=DataType.TEXT,
index_filterable=True,
index_searchable=True,
),
Property(
name="chunk_number",
data_type=DataType.INT,
index_range_filters=True,
),
],
inverted_index_config=Configure.inverted_index( # Optional
bm25_b=0.7,
bm25_k1=1.25,
index_null_state=True,
index_property_length=True,
index_timestamps=True,
stopwords_preset="en",
stopwords_additions=["example", "stopword"],
stopwords_removals=["the", "and"],
),
)

bm25

Part of invertedIndexConfig. The settings for BM25 are the free parameters k1 and b, and they are optional. The defaults (k1 = 1.2 and b = 0.75) work well for most cases.

They can be configured per collection, and can optionally be overridden per property.

Example bm25 configuration - JSON object

An example of a complete collection object with bm25 configuration:

{
"class": "Article",
// Configuration of the sparse index
"invertedIndexConfig": {
"bm25": {
"b": 0.75,
"k1": 1.2
}
},
"properties": [
{
"name": "title",
"description": "title of the article",
"dataType": ["text"],
// Property-level settings override the collection-level settings
"invertedIndexConfig": {
"bm25": {
"b": 0.75,
"k1": 1.2
}
},
"indexFilterable": true,
"indexSearchable": true
}
]
}

stopwords

Part of invertedIndexConfig. text properties may contain words that are very common and don't contribute to search results. Ignoring them speeds up queries that contain stopwords, as they can be automatically removed from queries as well. This speedup is very notable on scored searches, such as BM25.

The stopword configuration uses a preset system. You can select a preset to use the most common stopwords for a particular language (e.g. "en" preset). If you need more fine-grained control, you can add additional stopwords or remove stopwords that you believe should not be part of the list. Alternatively, you can create your custom stopword list by starting with an empty ("none") preset and adding all your desired stopwords as additions.

Example stopwords configuration - JSON object

An example of a complete collection object with stopwords configuration:

  "invertedIndexConfig": {
"stopwords": {
"preset": "en",
"additions": ["star", "nebula"],
"removals": ["a", "the"]
}
}

This configuration allows stopwords to be configured by collection. If not set, these values are set to the following defaults:

ParameterDefault valueAcceptable values
"preset""en""en", "none"
"additions"[]any list of custom words
"removals"[]any list of custom words
note
  • If preset is none, then the collection only uses stopwords from the additions list.
  • If the same item is included in both additions and removals, Weaviate returns an error.

As of v1.18, stopwords are indexed. Thus stopwords are included in the inverted index, but not in the tokenized query. As a result, when the BM25 algorithm is applied, stopwords are ignored in the input for relevance ranking but will affect the score.

Stopwords can now be configured at runtime. You can use the RESTful API to update the list of stopwords after your data has been indexed.

Note that stopwords are only removed when tokenization is set to word.

indexTimestamps

Part of invertedIndexConfig. To perform queries that are filtered by timestamps, configure the target collection to maintain an inverted index based on the objects' internal timestamps. Currently the timestamps include creationTimeUnix and lastUpdateTimeUnix.

To configure timestamp based indexing, set indexTimestamps to true in the invertedIndexConfig object.

indexNullState

Part of invertedIndexConfig. To perform queries that filter on null, configure the target collection to maintain an inverted index that tracks null values for each property in a collection .

To configure null based indexing, setting indexNullState to true in the invertedIndexConfig object.

indexPropertyLength

Part of invertedIndexConfig. To perform queries that filter by the length of a property, configure the target collection to maintain an inverted index based on the length of the properties.

To configure indexing based on property length, set indexPropertyLength to true in the invertedIndexConfig object.

note

Using these features requires more resources. The additional inverted indexes must be created and maintained for the lifetime of the collection.

How Weaviate creates inverted indexes

Weaviate creates separate inverted indexes for each property and each index type. For example, if you have a title property that is both searchable and filterable, Weaviate will create two separate inverted indexes for that property - one optimized for search operations and another for filtering operations. Find out more in Concepts: Inverted index.

Adding a property after collection creation

Adding a property after importing objects can lead to limitations in inverted-index related behavior, such as filtering by the new property's length or null status.

This is caused by the inverted index being built at import time. If you add a property after importing objects, the inverted index for metadata such as the length or the null status will not be updated to include the new properties. This means that the new property will not be indexed for existing objects. This can lead to unexpected behavior when querying.

To avoid this, you can either:

  • Add the property before importing objects.
  • Delete the collection, re-create it with the new property and then re-import the data.

We are working on a re-indexing API to allow you to re-index the data after adding a property. This will be available in a future release.

Further resources

Questions and feedback

If you have any questions or feedback, let us know in the user forum.