Inverted index

An inverted index is a data structure in Weaviate that enables efficient text search and filtering operations.

Additional information

In Weaviate, the inverted index supports search capabilities such as keyword search, filtering, and range queries. An inverted index maps from terms (tokens) back to the objects that contain them. This mapping allows Weaviate to quickly identify which objects contain specific terms or match certain criteria during search queries.

You can enable inverted indexes on properties and adjust various parameters that control indexing behavior and tokenization strategies. Proper configuration of these parameters is crucial for optimizing both search performance and storage efficiency.

Enable inverted index for keyword searches and filtering

Inverted index parameters control how individual properties are indexed for search and filtering operations. These parameters determine whether specific properties can be searched, filtered, or used in range queries.

Enabling inverted index

The inverted index in Weaviate can be enabled through parameters at the property level:

index_filterable - Controls whether a property can be used in where filters. When set to true, the property values are indexed for efficient filtering operations. Disable this for properties that don't need filtering to save storage space.

index_searchable - Determines whether a property participates in keyword search queries. When true, the property's text content is tokenized and indexed for search. Set to false for properties that shouldn't be searchable to improve performance.

index_range_filters - Enables range filtering capabilities (greater than, less than, etc.) for numerical and date properties. When enabled, additional indexing structures are created to support efficient range queries.

API docs

More info

from weaviate.classes.config import Configure, Property, DataType

client.collections.create(
    "Article",
    # Additional settings not shown
    properties=[
        Property(
            name="title",
            data_type=DataType.TEXT,
            index_filterable=True,
            index_searchable=True,
        ),
        Property(
            name="chunk",
            data_type=DataType.TEXT,
            index_filterable=True,
            index_searchable=True,
        ),
        Property(
            name="chunk_number",
            data_type=DataType.INT,
            index_range_filters=True,
        ),
    ],
)

Set inverted index parameters

Inverted index parameters control the overall behavior of the inverted index for an entire collection. These parameters affect ranking algorithms, null value handling, and timestamp indexing across all properties in the collection.

Inverted index parameters

The inverted index in Weaviate can be configured through various parameters at the collection level:

bm25_b - Controls the degree of normalization by document length in the BM25 ranking algorithm. Values range from 0 to 1, where 0 means no length normalization and 1 means full normalization. Higher values favor shorter documents.

bm25_k1 - Controls term frequency saturation in BM25. Higher values make term frequency more important, while lower values reduce the impact of term frequency on scoring.

index_null_state - Determines whether null values are indexed. When enabled, you can filter for objects that have null values in specific properties.

index_property_length - Controls whether the length of text properties is indexed. When enabled, allows filtering based on text length and can improve certain ranking algorithms.

index_timestamps - Enables indexing of creation and update timestamps for objects, allowing filtering and sorting operations.

API docs

More info

from weaviate.classes.config import Configure, Property, DataType

client.collections.create(
    "Article",
    # Additional settings not shown
    inverted_index_config=Configure.inverted_index(
        bm25_b=0.7,
        bm25_k1=1.25,
        index_null_state=True,
        index_property_length=True,
        index_timestamps=True,
    ),
)

Set tokenization type for property

Configure a tokenization method for each property individually.

Tokenization methods

Tokenization determines how text content is broken down into individual terms that can be indexed and searched. Weaviate supports several tokenization strategies:

word - The default tokenization that splits text on whitespace and punctuation, converting to lowercase. Best for general text search where you want to match individual words.

lowercase - Converts the entire property value to lowercase but treats it as a single token. Useful for exact matching of short strings like categories or tags while being case-insensitive.

whitespace - Splits text only on whitespace characters, preserving punctuation and case. Good when punctuation is meaningful for search.

field - Treats the entire property value as a single token without any processing. Use for exact matching of complete field values like IDs, email addresses, or URLs.

trigram - Breaks text into overlapping 3-character sequences. Enables fuzzy matching and is useful for handling typos or partial matches.

gse - Google Search Engine tokenization, optimized for Chinese, Japanese, and Korean text. Provides language-aware tokenization for CJK languages.

API docs

More info

from weaviate.classes.config import Configure, Property, DataType, Tokenization

client.collections.create(
    "Article",
    vector_config=Configure.Vectors.text2vec_cohere(),
    properties=[
        Property(
            name="title",
            data_type=DataType.TEXT,
            tokenization=Tokenization.LOWERCASE,  # Use "lowercase" tokenization
            description="The title of the article.",  # Optional description
        ),
        Property(
            name="body",
            data_type=DataType.TEXT,
            tokenization=Tokenization.WHITESPACE,  # Use "whitespace" tokenization
        ),
    ],
)

Further resources

Questions and feedback

If you have any questions or feedback, let us know in the user forum.

Technical questions

If you have questions feel free to post on our Community forum.

Documentation feedback

Leave feedback by opening a GitHub issue.

Additional resources

Need help?

Inverted index

Enable inverted index for keyword searches and filtering

Set inverted index parameters

Set tokenization type for property

Further resources

Questions and feedback

Additional resources

Need help?

Enable inverted index for keyword searches and filtering​

Set inverted index parameters​

Set tokenization type for property​

Further resources​

Questions and feedback​

Enable inverted index for keyword searches and filtering

Set inverted index parameters

Set tokenization type for property

Further resources

Questions and feedback