Inverted index

The inverted index maps values (like words or numbers) to the objects that contain them. It is the backbone for all attribute-based filtering (where filters) and keyword searching (bm25, hybrid).

Inverted index types

Multiple inverted index types are available in Weaviate. Not all inverted index types are available for all data types. The available inverted index types are:

Inverted index type	Description	Applicable data types	Default	Availability
`indexSearchable`	A searchable index for BM25-suitable Map index for BM25 or hybrid searching.	`text`, `text[]`,	`true`	`v1.19`
`indexFilterable`	A Roaring Bitmap index for match-based filtering.	Everything except `blob`, `geoCoordinates`, `object` and `phoneNumber` data types including arrays thereof	`true`	`v1.19`
`indexRangeFilters`	A Roaring Bitmap index for numerical range-based filtering.	`int`, `number` and `date` only	`false`	`v1.26`

Enable one or both of indexFilterable and indexRangeFilters to index a property for faster filtering.
- If only one is enabled, the respective index is used for filtering.
- If both are enabled, indexRangeFilters is used for operations involving comparison operators, and indexFilterable is used for equality and inequality operations.

Inverted index parameters

These parameters are set within the invertedIndexConfig object in your collection definition.

Parameter	Type	Default	Details
`bm25`	Object	`{ "k1": 1.2, "b": 0.75 }`	Sets the `k1` and `b` parameters for the BM25 ranking algorithm. Can be overridden at the property level. See BM25 Configuration below.
`stopwords`	Object	(Varies)	Defines the stopword list to exclude common words from search queries. See Stopwords Configuration below.
`indexTimestamps`	Boolean	`false`	If `true`, indexes object creation and update timestamps, enabling filtering by `creationTimeUnix` and `lastUpdateTimeUnix`.
`indexNullState`	Boolean	`false`	If `true`, indexes the null/non-null state of each property, enabling filtering for `null` values.
`indexPropertyLength`	Boolean	`false`	If `true`, indexes the length of each property, enabling filtering by property length.

Performance Impact

Enabling indexTimestamps, indexNullState, or indexPropertyLength adds overhead as these additional indexes must be created and maintained. Only enable them if you require these specific filtering capabilities.

Code example

This code example shows how to configure inverted index parameters through a client library:

API docs

More info

from weaviate.classes.config import (
    Configure,
    DataType,
    Property,
    StopwordsPreset,
    Tokenization,
)

client.collections.create(
    "Article",
    # Additional settings not shown
    properties=[  # properties configuration is optional
        Property(
            name="title",
            data_type=DataType.TEXT,
            index_filterable=True,
            index_searchable=True,
            tokenization=Tokenization.WORD,
        ),
        Property(
            name="chunk",
            data_type=DataType.TEXT,
            index_filterable=True,
            index_searchable=True,
            tokenization=Tokenization.FIELD,
        ),
        Property(
            name="chunk_number",
            data_type=DataType.INT,
            index_range_filters=True,
        ),
    ],
    inverted_index_config=Configure.inverted_index(  # Optional
        bm25_b=0.7,
        bm25_k1=1.25,
        index_null_state=True,
        index_property_length=True,
        index_timestamps=True,
        stopwords_preset=StopwordsPreset.EN,
        stopwords_additions=["example", "stopword"],
        stopwords_removals=["the", "and"],
    ),
)

`bm25`

Part of invertedIndexConfig. The settings for BM25 are the free parameters k1 and b, and they are optional. The defaults (k1 = 1.2 and b = 0.75) work well for most cases.

They can be configured per collection, and can optionally be overridden per property.

Example bm25 configuration - JSON object

An example of a complete collection object with bm25 configuration:

{
  "class": "Article",
  // Configuration of the sparse index
  "invertedIndexConfig": {
    "bm25": {
      "b": 0.75,
      "k1": 1.2
    }
  },
  "properties": [
    {
      "name": "title",
      "description": "title of the article",
      "dataType": ["text"],
      // Property-level settings override the collection-level settings
      "invertedIndexConfig": {
        "bm25": {
          "b": 0.75,
          "k1": 1.2
        }
      },
      "indexFilterable": true,
      "indexSearchable": true
    }
  ]
}

`stopwords`

Part of invertedIndexConfig. text properties may contain words that are very common and don't contribute to search results. Ignoring them speeds up queries that contain stopwords, as they can be automatically removed from queries as well. This speedup is very notable on scored searches, such as BM25.

The stopword configuration uses a preset system. You can select a preset to use the most common stopwords for a particular language (e.g. "en" preset). If you need more fine-grained control, you can add additional stopwords or remove stopwords that you believe should not be part of the list. Alternatively, you can create your custom stopword list by starting with an empty ("none") preset and adding all your desired stopwords as additions.

Example stopwords configuration - JSON object

An example of a complete collection object with stopwords configuration:

  "invertedIndexConfig": {
    "stopwords": {
      "preset": "en",
      "additions": ["star", "nebula"],
      "removals": ["a", "the"]
    }
  }

This configuration allows stopwords to be configured by collection. If not set, these values are set to the following defaults:

Parameter	Default value	Acceptable values
`"preset"`	`"en"`	`"en"`, `"none"`
`"additions"`	`[]`	any list of custom words
`"removals"`	`[]`	any list of custom words

note

If preset is none, then the collection only uses stopwords from the additions list.
If the same item is included in both additions and removals, Weaviate returns an error.

As of v1.18, stopwords are indexed. Thus stopwords are included in the inverted index, but not in the tokenized query. As a result, when the BM25 algorithm is applied, stopwords are ignored in the input for relevance ranking but will affect the score.

Stopwords can now be configured at runtime. You can use the RESTful API to update the list of stopwords after your data has been indexed.

info

Stopwords are only removed when tokenization is set to word.

`indexTimestamps`

Part of invertedIndexConfig. To perform queries that are filtered by timestamps, configure the target collection to maintain an inverted index based on the objects' internal timestamps. Currently the timestamps include creationTimeUnix and lastUpdateTimeUnix.

To configure timestamp based indexing, set indexTimestamps to true in the invertedIndexConfig object.

`indexNullState`

Part of invertedIndexConfig. To perform queries that filter on null, configure the target collection to maintain an inverted index that tracks null values for each property in a collection .

To configure null based indexing, setting indexNullState to true in the invertedIndexConfig object.

`indexPropertyLength`

Part of invertedIndexConfig. To perform queries that filter by the length of a property, configure the target collection to maintain an inverted index based on the length of the properties.

To configure indexing based on property length, set indexPropertyLength to true in the invertedIndexConfig object.

note

Using these features requires more resources. The additional inverted indexes must be created and maintained for the lifetime of the collection.

How Weaviate creates inverted indexes

Weaviate creates separate inverted indexes for each property and each index type. For example, if you have a title property that is both searchable and filterable, Weaviate will create two separate inverted indexes for that property - one optimized for search operations and another for filtering operations. Find out more in Concepts: Inverted index.

Adding a property after collection creation

Adding a property after importing objects can lead to limitations in inverted-index related behavior, such as filtering by the new property's length or null status.

This is caused by the inverted index being built at import time. If you add a property after importing objects, the inverted index for metadata such as the length or the null status will not be updated to include the new properties. This means that the new property will not be indexed for existing objects. This can lead to unexpected behavior when querying.

To avoid this, you can either:

Add the property before importing objects.
Delete the collection, re-create it with the new property and then re-import the data.

We are working on a re-indexing API to allow you to re-index the data after adding a property. This will be available in a future release.

How tokenization affects inverted indexing

For text properties, Weaviate first tokenizes the text before creating inverted index entries. Tokenization is the process of breaking text into individual tokens (words, phrases, or characters) that can be indexed and searched.

See the related concepts page for more details.

Further resources

Concepts: Inverted index
How-to: Set inverted index parameters
Reference: Tokenization options - Learn about different tokenization methods and how they affect text indexing

Questions and feedback

If you have any questions or feedback, let us know in the user forum.

Technical questions

If you have questions feel free to post on our Community forum.

Documentation feedback

Leave feedback by opening a GitHub issue.

Additional resources

Need help?

Inverted index

Inverted index types

Inverted index parameters

Code example

`bm25`

`stopwords`

`indexTimestamps`

`indexNullState`

`indexPropertyLength`

How Weaviate creates inverted indexes

Adding a property after collection creation

How tokenization affects inverted indexing

Further resources

Questions and feedback

Additional resources

Need help?

Inverted index types​

Inverted index parameters​

Code example​

bm25​

stopwords​

indexTimestamps​

indexNullState​

indexPropertyLength​

How Weaviate creates inverted indexes​

Adding a property after collection creation​

How tokenization affects inverted indexing​

Further resources​

Questions and feedback​

Inverted index types

Inverted index parameters

Code example

`bm25`

`stopwords`

`indexTimestamps`

`indexNullState`

`indexPropertyLength`

How Weaviate creates inverted indexes

Adding a property after collection creation

How tokenization affects inverted indexing

Further resources

Questions and feedback