Skip to main content
Go to documentation:
⌘U
Weaviate Database

Develop AI applications using Weaviate's APIs and tools

Deploy

Deploy, configure, and maintain Weaviate Database

Weaviate Agents

Build and deploy intelligent agents with Weaviate

Weaviate Cloud

Manage and scale Weaviate in the cloud

Additional resources

Integrations
Contributor guide
Events & Workshops
Weaviate Academy

Need help?

Weaviate LogoAsk AI Assistant⌘K
Community Forum

Query profiling

Added in v1.36.9

Query profiling provides per-shard timing breakdowns for search queries. Enable it on any search request to see how long each phase takes — vector search, keyword scoring, filter evaluation, object retrieval — broken down by shard and cluster node.

Profiling uses the same instrumentation as slow query logging. It adds minimal overhead when enabled and zero overhead when disabled.

Enable profiling

Add query_profile=True to MetadataQuery, or include "query_profile" in the metadata list:

import weaviate
from weaviate.classes.query import MetadataQuery

client = weaviate.connect_to_local()

collection = client.collections.get("Article")

response = collection.query.near_vector(
near_vector=[0.1, 0.2, 0.3],
limit=5,
return_metadata=MetadataQuery(query_profile=True, distance=True),
)

if response.query_profile:
for shard in response.query_profile.shards:
print(f"Shard: {shard.name} (node: {shard.node})")
for search_type, profile in shard.searches.items():
print(f" [{search_type}]")
for key, value in profile.details.items():
print(f" {key}: {value}")

Profile data is returned on the response object at response.query_profile, not on individual result objects. It represents the entire query across all shards.

Supported search types

Search typeProfile sectionsQuery methods
Vector searchvectornear_vector, near_object, near_text, near_image, etc.
Keyword search (BM25)keywordbm25
Hybrid searchvector + keywordhybrid
Object fetchobjectfetch_objects
Any search + filtersIncludes filter metricsAdd filters to any search
Any search + groupByProfile at query levelAdd group_by to any search

BM25 example

from weaviate.classes.query import MetadataQuery

collection = client.collections.get("Article")

response = collection.query.bm25(
query="machine learning",
return_metadata=MetadataQuery(query_profile=True, score=True),
)

if response.query_profile:
for shard in response.query_profile.shards:
print(f"Shard: {shard.name} (node: {shard.node})")
for search_type, profile in shard.searches.items():
print(f" [{search_type}]")
for key, value in profile.details.items():
print(f" {key}: {value}")

Hybrid example

Hybrid search produces both vector and keyword profile sections per shard:

from weaviate.classes.query import MetadataQuery

collection = client.collections.get("Article")

response = collection.query.hybrid(
query="machine learning",
return_metadata=MetadataQuery(query_profile=True),
limit=5,
)

if response.query_profile:
for shard in response.query_profile.shards:
print(f"Shard: {shard.name} (node: {shard.node})")
for search_type, profile in shard.searches.items():
print(f" [{search_type}]")
for key, value in profile.details.items():
print(f" {key}: {value}")

Response structure

The profile is structured as:

response.query_profile
└── shards[]
├── name # Shard identifier (e.g. "shard_0")
├── node # Cluster node (e.g. "weaviate-0")
└── searches # Dict of search type → profile
├── "vector" → details: { key: value, ... }
├── "keyword" → details: { key: value, ... }
└── "object" → details: { key: value, ... }

Each search type contains a details dict with string key-value pairs. The available metrics depend on the query type, index configuration, and filter usage.

Available metrics

General metrics

MetricDescriptionPresent when
total_tookTotal time for this shard's searchAlways
objects_tookTime retrieving objects from storageAlways
sort_tookTime sorting resultsWhen sorting is applied

Vector search metrics

MetricDescription
vector_search_tookTime spent in vector index search
knn_search_layer_N_tookPer-layer HNSW graph traversal time (N = layer number)
knn_search_rescore_tookTime rescoring compressed vectors (PQ/BQ/SQ)
hnsw_flat_searchWhether flat (brute-force) search was used instead of HNSW ("true" or "false")

Filter metrics

MetricDescription
filters_build_allow_list_tookTime building the filter allow-list
filters_ids_matchedNumber of object IDs matching the filter

BM25 keyword metrics

MetricDescription
kwd_methodBM25 scoring method used (e.g., blockmaxwand)
kwd_timeTotal BM25 scoring time
kwd_1_tok_timeQuery tokenization time
kwd_3_term_timeTerm dictionary lookup time
kwd_4_bmw_timeBlockMaxWAND scoring time
kwd_6_res_countNumber of results from keyword scoring

Example output

A hybrid search on a 3-node cluster with filters produces profiles for both vector and keyword phases on each shard:

Shard: shard_abc (node: weaviate-0)
[keyword]
kwd_method: blockmaxwand
kwd_time: 242.75µs
kwd_1_tok_time: 18.291µs
kwd_3_term_time: 52.083µs
kwd_4_bmw_time: 156.417µs
total_took: 248.833µs
[vector]
filters_build_allow_list_took: 31.125µs
filters_ids_matched: 847
knn_search_layer_0_took: 14µs
objects_took: 153.542µs
total_took: 198.666µs
vector_search_took: 40.959µs

Shard: shard_def (node: weaviate-1)
[keyword]
kwd_method: blockmaxwand
kwd_time: 189.333µs
total_took: 195.25µs
[vector]
filters_build_allow_list_took: 27.458µs
filters_ids_matched: 912
total_took: 172.417µs
vector_search_took: 35.75µs

Multi-node behavior

In multi-node clusters, the coordinator node aggregates profile data from all shards across all nodes. Each shard profile includes the node field identifying which cluster node executed that shard's search. This makes it straightforward to identify performance imbalances across nodes.

Performance impact

  • When disabled (default): Zero overhead. A single boolean check skips all profiling code paths.
  • When enabled: Adds timing instrumentation to each shard search. The overhead is small (microsecond-level timer reads) but measurable under high-throughput workloads. Use for debugging and optimization, not in production hot paths.

Limitations

  • Response-level only: Profile data is on response.query_profile, not on individual objects. It represents the entire query, not individual result objects.
  • Search phases only: Profiling covers vector search, keyword scoring, and filter evaluation. It does not include time spent in generative modules, rerankers, or post-processing.
  • No per-object breakdown: You get per-shard timing, not per-object timing.
  • Metrics vary by query: Not all metrics appear in every response. Available metrics depend on the search type, index type (HNSW vs. flat), compression settings, and whether filters are used.

Questions and feedback

If you have any questions or feedback, let us know in the user forum.