Vector similarity search
Vector search returns the objects with most similar vectors to that of the query.
The Query Agent translates plain English questions into optimized Weaviate queries automatically - no manual query construction needed.
Search with text
Use the Near Text operator to find objects with the nearest vector to an input text.
If a snippet doesn't work or you have feedback, please open a GitHub issue.
from weaviate.classes.query import MetadataQuery
jeopardy = client.collections.use("JeopardyQuestion")
response = jeopardy.query.near_text(
query="animals in movies",
limit=2,
return_metadata=MetadataQuery(distance=True)
)
for o in response.objects:
print(o.properties)
print(o.metadata.distance)
Example response
The output is like this:
{
"data": {
"Get": {
"JeopardyQuestion": [
{
"answer": "meerkats",
"question": "Group of mammals seen <a href=\"http://www.j-archive.com/media/1998-06-01_J_28.jpg\" target=\"_blank\">here</a>: [like Timon in <i>The Lion King</i>]",
"_additional": { "distance": 0.17602634 }
},
{
"answer": "dogs",
"question": "Scooby-Doo, Goofy & Pluto are cartoon versions",
"_additional": { "distance": 0.17842108 }
}
]
}
}
}
Search with image
Use the Near Image operator to find objects with the nearest vector to an image.
This example uses a base64 representation of an image.
If a snippet doesn't work or you have feedback, please open a GitHub issue.
base64_string="SOME_BASE_64_REPRESENTATION"
# Get the collection containing images
dogs = client.collections.use("Dog")
# Perform query
response = dogs.query.near_image(
near_image=base64_string,
return_properties=["breed"],
limit=1,
# targetVector: "vector_name" # required when using multiple named vectors
)
print(response.objects[0])
client.close()
See Image search for more information.
Search with an existing object
If you have an object ID, use the Near Object operator to find similar objects to that object.
If a snippet doesn't work or you have feedback, please open a GitHub issue.
from weaviate.classes.query import MetadataQuery
jeopardy = client.collections.use("JeopardyQuestion")
response = jeopardy.query.near_object(
near_object=uuid, # A UUID of an object (e.g. "56b9449e-65db-5df4-887b-0a4773f52aa7")
limit=2,
return_metadata=MetadataQuery(distance=True)
)
for o in response.objects:
print(o.properties)
print(o.metadata.distance)
Additional information
To get the object ID, see Retrieve the object ID.
Search with a vector
If you have an input vector, use the Near Vector operator to find objects with similar vectors
If a snippet doesn't work or you have feedback, please open a GitHub issue.
from weaviate.classes.query import MetadataQuery
jeopardy = client.collections.use("JeopardyQuestion")
response = jeopardy.query.near_vector(
near_vector=query_vector, # your query vector goes here
limit=2,
return_metadata=MetadataQuery(distance=True)
)
for o in response.objects:
print(o.properties)
print(o.metadata.distance)
Named vectors
To search a collection that has named vectors, use the target vector field to specify which named vector to search.
If a snippet doesn't work or you have feedback, please open a GitHub issue.
from weaviate.classes.query import MetadataQuery
reviews = client.collections.use("WineReviewNV")
response = reviews.query.near_text(
query="a sweet German white wine",
limit=2,
target_vector="title_country", # Specify the target vector for named vector collections
return_metadata=MetadataQuery(distance=True)
)
for o in response.objects:
print(o.properties)
print(o.metadata.distance)
Example response
The output is like this:
{
"WineReviewNV": [
{
"country": "Austria",
"review_body": "With notions of cherry and cinnamon on the nose and just slight fizz, this is a refreshing, fruit-driven sparkling ros\u00e9 that's full of strawberry and cherry notes\u2014it might just be the very definition of easy summer wine. It ends dry, yet refreshing.",
"title": "Gebeshuber 2013 Frizzante Ros\u00e9 Pinot Noir (\u00d6sterreichischer Perlwein)"
},
{
"country": "Austria",
"review_body": "Beautifully perfumed, with acidity, white fruits and a mineral context. The wine is layered with citrus and lime, hints of fresh pineapple acidity. Screw cap.",
"title": "Stadt Krems 2009 Steinterrassen Riesling (Kremstal)"
}
]
}
Set a similarity threshold
To set a similarity threshold between the search and target vectors, define a maximum distance (or certainty).
If a snippet doesn't work or you have feedback, please open a GitHub issue.
from weaviate.classes.query import MetadataQuery
jeopardy = client.collections.use("JeopardyQuestion")
response = jeopardy.query.near_text(
query="animals in movies",
distance=0.25, # max accepted distance
return_metadata=MetadataQuery(distance=True)
)
for o in response.objects:
print(o.properties)
print(o.metadata.distance)
Additional information
- The distance value depends on many factors, including the vectorization model you use. Experiment with your data to find a value that works for you.
certaintyis only available withcosinedistance.- To find the least similar objects, use the negative cosine distance with
nearVectorsearch.
limit & offset
Use limit to set a fixed maximum number of objects to return.
Optionally, use offset to paginate the results.
If a snippet doesn't work or you have feedback, please open a GitHub issue.
from weaviate.classes.query import MetadataQuery
jeopardy = client.collections.use("JeopardyQuestion")
response = jeopardy.query.near_text(
query="animals in movies",
limit=2, # return 2 objects
offset=1, # With an offset of 1
return_metadata=MetadataQuery(distance=True)
)
for o in response.objects:
print(o.properties)
print(o.metadata.distance)
Limit result groups
To limit results to groups of similar distances to the query, use the autocut filter to set the number of groups to return.
If a snippet doesn't work or you have feedback, please open a GitHub issue.
from weaviate.classes.query import MetadataQuery
jeopardy = client.collections.use("JeopardyQuestion")
response = jeopardy.query.near_text(
query="animals in movies",
auto_limit=1, # number of close groups
return_metadata=MetadataQuery(distance=True)
)
for o in response.objects:
print(o.properties)
print(o.metadata.distance)
Example response
The output is like this:
{
"data": {
"Get": {
"JeopardyQuestion": [
{
"answer": "meerkats",
"question": "Group of mammals seen <a href=\"http://www.j-archive.com/media/1998-06-01_J_28.jpg\" target=\"_blank\">here</a>: [like Timon in <i>The Lion King</i>]",
"_additional": { "distance": 0.17602634 }
},
{
"answer": "dogs",
"question": "Scooby-Doo, Goofy & Pluto are cartoon versions",
"_additional": { "distance": 0.17842108 }
}
]
}
}
}
Group results
Use a property or a cross-reference to group results. To group returned objects, the query must include a Near search operator, such as Near Text or Near Object.
If a snippet doesn't work or you have feedback, please open a GitHub issue.
from weaviate.classes.query import MetadataQuery, GroupBy
jeopardy = client.collections.use("JeopardyQuestion")
group_by = GroupBy(
prop="round", # group by this property
objects_per_group=2, # maximum objects per group
number_of_groups=2, # maximum number of groups
)
response = jeopardy.query.near_text(
query="animals in movies", # find object based on this query
limit=10, # maximum total objects
return_metadata=MetadataQuery(distance=True),
group_by=group_by
)
for o in response.objects:
print(o.uuid)
print(o.belongs_to_group)
print(o.metadata.distance)
for grp, grp_items in response.groups.items():
print("=" * 10 + grp_items.name + "=" * 10)
print(grp_items.number_of_objects)
for o in grp_items.objects:
print(o.properties)
print(o.metadata)
Example response
The output is like this:
{
"data": {
"Get": {
"JeopardyQuestion": [
{
"_additional": {
"group": {
"count": 2,
"groupedBy": {
"path": [
"round"
],
"value": "Jeopardy!"
},
"hits": [
{
"answer": "meerkats",
"question": "Group of mammals seen <a href=\"http://www.j-archive.com/media/1998-06-01_J_28.jpg\" target=\"_blank\">here</a>: [like Timon in <i>The Lion King</i>]"
},
{
"answer": "dogs",
"question": "Scooby-Doo, Goofy & Pluto are cartoon versions"
}
],
"id": 0,
"maxDistance": 0.17842054,
"minDistance": 0.17602539
}
}
},
{
"_additional": {
"group": {
"count": 1,
"groupedBy": {
"path": [
"round"
],
"value": "Double Jeopardy!"
},
"hits": [
{
"answer": "fox",
"question": "In titles, animal associated with both Volpone and Reynard"
}
],
"id": 1,
"maxDistance": 0.18770188,
"minDistance": 0.18770188
}
}
}
]
}
}
}
Filter results
For more specific results, use a filter to narrow your search.
If a snippet doesn't work or you have feedback, please open a GitHub issue.
from weaviate.classes.query import MetadataQuery, Filter
jeopardy = client.collections.use("JeopardyQuestion")
response = jeopardy.query.near_text(
query="animals in movies",
filters=Filter.by_property("round").equal("Double Jeopardy!"),
limit=2,
return_metadata=MetadataQuery(distance=True),
)
for o in response.objects:
print(o.properties)
print(o.metadata.distance)
Example response
The output is like this:
{
"data": {
"Get": {
"JeopardyQuestion": [
{
"_additional": {
"distance": 0.18759078
},
"answer": "fox",
"question": "In titles, animal associated with both Volpone and Reynard",
"round": "Double Jeopardy!"
},
{
"_additional": {
"distance": 0.19532347
},
"answer": "Swan",
"question": "In a Tchaikovsky ballet, Prince Siegfried goes hunting for these animals & falls in love with 1 of them",
"round": "Double Jeopardy!"
}
]
}
}
}
Diversity selection (MMR)
v1.37This is a preview feature. The API may change in future releases.
- Python client: Support is not yet in a released
weaviate-client. Coming in the next release (tracked in PR #1997). - Multi-node clusters: MMR reranking may produce suboptimal results for collections whose shards are distributed across multiple nodes, since each shard returns its own candidate set before the coordinator reranks them. We are actively working on improving this.
Standard vector search returns the closest matches to the query, which often means a cluster of near-duplicate results. Maximum Marginal Relevance (MMR) reranks results to balance relevance with diversity — each selected result must add something new to the result set.
Add the selection parameter to any vector search query:
from weaviate.classes.query import Diversity
collection = client.collections.get("MMRDemo")
# Retrieve 20 candidates, then rerank to select 5 diverse results
response = collection.query.near_vector(
near_vector=base_vec,
limit=20,
selection=Diversity.MMR(
limit=5,
balance=0.5,
),
)
for o in response.objects:
print(o.properties["question"])
How it works
- Weaviate runs a regular vector search to retrieve a candidate set (controlled by the query's
limit) - The most relevant candidate is selected first
- For each remaining candidate, MMR computes a score that balances query similarity against maximum similarity to already-selected results, weighted by
balance - The candidate with the highest MMR score is selected next
- Steps 3–4 repeat until the
Diversity.MMR(limit)is reached
Parameters
| Parameter | Type | Description |
|---|---|---|
limit | int | Number of results to return after MMR reranking. Must be less than or equal to the query's top-level limit (the candidate set size). |
balance | float | Controls the relevance-diversity trade-off (0.0–1.0). 0.0 = pure diversity, 0.5 = balanced, 1.0 = pure relevance (equivalent to standard search). |
from weaviate.classes.query import Diversity
collection = client.collections.get("MMRDemo")
# Pure diversity — maximize difference between results
response_diverse = collection.query.near_vector(
near_vector=base_vec,
limit=20,
selection=Diversity.MMR(limit=5, balance=0.0),
)
# Balanced — equal weight on relevance and diversity
response_balanced = collection.query.near_vector(
near_vector=base_vec,
limit=20,
selection=Diversity.MMR(limit=5, balance=0.5),
)
# Pure relevance — equivalent to standard vector search
response_relevant = collection.query.near_vector(
near_vector=base_vec,
limit=20,
selection=Diversity.MMR(limit=5, balance=1.0),
)
Important notes:
- Result ordering: Results are ordered by MMR score, not query similarity. The first result is the most relevant, but subsequent results may have lower query similarity because they were chosen for diversity.
- No reindexing needed: MMR is applied at query time. You can use it on any existing collection without schema changes.
- Supported queries:
near_text,near_vector,near_object,near_image, andnear_media. - Not supported: hybrid search and multi-vector collections.
A larger candidate set (higher top-level limit) gives MMR more results to choose from, improving diversity at the cost of slightly more computation. A good starting point is setting the candidate limit to 2–4x the MMR limit.
Related pages
- Connect to Weaviate
- For image search, see Image search.
- For tutorials, see Queries.
- For search using the GraphQL API, see GraphQL API.
Questions and feedback
If you have any questions or feedback, let us know in the user forum.
