Skip to main content
Go to documentation:
⌘U
Weaviate Database

Develop AI applications using Weaviate's APIs and tools

Deploy

Deploy, configure, and maintain Weaviate Database

Weaviate Agents

Build and deploy intelligent agents with Weaviate

Weaviate Cloud

Manage and scale Weaviate in the cloud

Additional resources

Integrations
Contributor guide
Events & Workshops
Weaviate Academy

Need help?

Weaviate LogoAsk AI Assistant⌘K
Community Forum

Rotational Quantization (RQ)

Compression by Default

Starting with v1.33, you can set a default quantization for new collections using the DEFAULT_QUANTIZATION environment variable. This variable is not set by default, meaning no quantization is applied unless you explicitly configure it. When set (e.g., to 8-bit RQ quantization), all newly created collections will use that quantization setting. Note that once set on a collection, quantization can't be disabled. Default quantization only applies for the HNSW vector index type.

Rotational quantization (RQ) is a fast vector compression technique that offers significant performance benefits. Two RQ variants are available in Weaviate:

  • 8-bit RQ: Up to 4x compression while retaining almost perfect recall (98-99% on most datasets). Recommended for most use cases.
  • 1-bit RQ: Close to 32x compression as dimensionality increases with moderate recall across various datasets.

8-bit RQ

Added in v1.32

8-bit Rotational quantization (RQ) for the HNSW vector index was added in v1.32.

Preview

8-bit Rotational quantization (RQ) for the flat vector index was added in v1.34 as a preview.

This means that the feature is still under development and may change in future releases, including potential breaking changes. We do not recommend using this feature in production environments at this time.

8-bit RQ provides up-to 4x compression while maintaining 98-99% recall in internal testing. It is generally recommended for most use cases as the default quantization techniques.

Enable compression for new collection

RQ can be enabled at collection creation time through the collection definition:

py docs  API docs
More infoCode snippets in the documentation reflect the latest client library and Weaviate Database version. Check the Release notes for specific versions.

If a snippet doesn't work or you have feedback, please open a GitHub issue.
from weaviate.classes.config import Configure, Property, DataType

client.collections.create(
name="MyCollection",
vector_config=Configure.Vectors.text2vec_openai(
quantizer=Configure.VectorIndex.Quantizer.rq()
),
properties=[
Property(name="title", data_type=DataType.TEXT),
],
)

Enable compression for existing collection

RQ can also be enabled for an existing collection by updating the collection definition:

py docs  API docs
More infoCode snippets in the documentation reflect the latest client library and Weaviate Database version. Check the Release notes for specific versions.

If a snippet doesn't work or you have feedback, please open a GitHub issue.
from weaviate.classes.config import Reconfigure

collection = client.collections.use("MyCollection")
collection.config.update(
vector_config=Reconfigure.Vectors.update(
name="default",
vector_index_config=Reconfigure.VectorIndex.hnsw(
quantizer=Reconfigure.VectorIndex.Quantizer.rq(),
),
)
)

1-bit RQ

Added in v1.32

1-bit Rotational quantization (RQ) for the HNSW vector index was added in v1.33.

Preview

1-bit Rotational quantization (RQ) for the flat vector index was added in v1.34 as a preview.

This means that the feature is still under development and may change in future releases, including potential breaking changes. We do not recommend using this feature in production environments at this time.

1-bit RQ is an quantization technique that provides close to 32x compression as dimensionality increases. 1-bit RQ serves as a more robust and accurate alternative to BQ with only a slight performance trade-off. While more performant than PQ in terms of encoding time and distance calculations, 1-bit RQ typically offers slightly lower recall than well-tuned PQ.

Enable compression for new collection

RQ can be enabled at collection creation time through the collection definition:

py docs  API docs
More infoCode snippets in the documentation reflect the latest client library and Weaviate Database version. Check the Release notes for specific versions.

If a snippet doesn't work or you have feedback, please open a GitHub issue.
from weaviate.classes.config import Configure, Property, DataType

client.collections.create(
name="MyCollection",
vector_config=Configure.Vectors.text2vec_openai(
quantizer=Configure.VectorIndex.Quantizer.rq(bits=1)
),
properties=[
Property(name="title", data_type=DataType.TEXT),
],
)

Enable compression for existing collection

RQ can also be enabled for an existing collection by updating the collection definition:

py docs  API docs
More infoCode snippets in the documentation reflect the latest client library and Weaviate Database version. Check the Release notes for specific versions.

If a snippet doesn't work or you have feedback, please open a GitHub issue.
from weaviate.classes.config import Reconfigure

collection = client.collections.use("MyCollection")
collection.config.update(
vector_config=Reconfigure.Vectors.update(
name="default",
vector_index_config=Reconfigure.VectorIndex.hnsw(
quantizer=Reconfigure.VectorIndex.Quantizer.rq(bits=1),
),
)
)

RQ parameters

To tune RQ, use these quantization and vector index parameters:

ParameterTypeDefaultDetails
rq: bitsinteger8The number of bits used to quantize each data point. Value can be 8 or 1.

Learn more about 8-bit and 1-bit RQ.
rq: rescoreLimitinteger-1The minimum number of candidates to fetch before rescoring.
rq : cachebooleanfalseWhether to cache the vectors in memory.
(only when using the flat vector index type)
vectorCacheMaxObjectsinteger1e12Maximum number of objects in the memory cache. By default, this limit is set to one trillion (1e12) objects when a new collection is created. For sizing recommendations, see Vector cache considerations.
py docs  API docs
More infoCode snippets in the documentation reflect the latest client library and Weaviate Database version. Check the Release notes for specific versions.

If a snippet doesn't work or you have feedback, please open a GitHub issue.
from weaviate.classes.config import Configure, Property, DataType

client.collections.create(
name="MyCollection",
vector_config=Configure.Vectors.text2vec_openai(
quantizer=Configure.VectorIndex.Quantizer.rq(
bits=8, # Optional: Number of bits
rescore_limit=20, # Optional: Number of candidates to fetch before rescoring
cache=True, # Optional: Enable caching for flat index (enabled by default for for HNSW)
),
vector_index_config=Configure.VectorIndex.flat(
vector_cache_max_objects=100000, # Optional: Maximum number of objects in the memory cache
),
),
properties=[
Property(name="title", data_type=DataType.TEXT),
],
)

Additional considerations

Multiple vector embeddings (named vectors)

Collections can have multiple named vectors. The vectors in a collection can have their own configurations, and compression must be enabled independently for each vector. Every vector is independent and can use PQ, BQ, RQ, SQ, or no compression.

Multi-vector embeddings (ColBERT, ColPali, etc.)

Added in v1.30

Multi-vector embeddings (implemented through models like ColBERT, ColPali, or ColQwen) represent each object or query using multiple vectors instead of a single vector. Just like with single vectors, multi-vectors support PQ, BQ, RQ, SQ, or no compression.

During the initial search phase, compressed vectors are used for efficiency. However, when computing the MaxSim operation, uncompressed vectors are utilized to ensure more precise similarity calculations. This approach balances the benefits of compression for search efficiency with the accuracy of uncompressed vectors during final scoring.

Multi-vector performance

RQ supports multi-vector embeddings. Each token vector is rounded up to a multiple of 64 dimensions, which may result in less than 4x compression for very short vectors. This is a technical limitation that may be addressed in future versions.

Further resources

Questions and feedback

If you have any questions or feedback, let us know in the user forum.