Additional operators
Syntax​
Functions such as limit, autocut, and sort modify queries at the class level.
Limit argument​
The limit argument restricts the number of results. These functions support limit:
GetExploreAggregate
If a snippet doesn't work or you have feedback, please open a GitHub issue.
import os
import weaviate
import weaviate.classes as wvc
from weaviate.classes.query import Sort
client = weaviate.connect_to_local()
try:
articles = client.collections.use("Article")
response = articles.query.fetch_objects(
limit=5
)
for o in response.objects:
print(f"Answer: {o.properties['title']}")
finally:
client.close()
Expected response
{
"data": {
"Get": {
"Article": [
{
"title": "Backs on the rack - Vast sums are wasted on treatments for back pain that make it worse"
},
{
"title": "Graham calls for swift end to impeachment trial, warns Dems against calling witnesses"
},
{
"title": "Through a cloud, brightly - Obituary: Paul Volcker died on December 8th"
},
{
"title": "Google Stadia Reviewed \u2013 Against The Stream"
},
{
"title": "Managing Supply Chain Risk"
}
]
}
}
}
Pagination with offset​
To return sets of results, "pages", use offset and limit together to specify a sub-set of the query response.
For example, to list the first ten results, set limit: 10 and offset: 0. To display the next ten results, set offset: 10. To continue iterating over the results, increase the offset again. For more details, see performance considerations
The Get and Explore functions support offset.
If a snippet doesn't work or you have feedback, please open a GitHub issue.
import os
import weaviate
import weaviate.classes as wvc
from weaviate.classes.query import Sort
client = weaviate.connect_to_local()
try:
articles = client.collections.use("Article")
response = articles.query.fetch_objects(
limit=5,
offset=2
)
for o in response.objects:
print(f"Answer: {o.properties['title']}")
finally:
client.close()
Expected response
{
"data": {
"Get": {
"Article": [
{
"title": "Through a cloud, brightly - Obituary: Paul Volcker died on December 8th"
},
{
"title": "Google Stadia Reviewed \u2013 Against The Stream"
},
{
"title": "Managing Supply Chain Risk"
},
{
"title": "Playing College Football In Madden"
},
{
"title": "The 50 best albums of 2019, No 3: Billie Eilish \u2013 When We All Fall Asleep, Where Do We Go?"
}
]
}
}
}
Performance considerations​
Pagination is not a cursor-based implementation. This has the following implications:
- Response time and system load increase as the number of pages grows. As the offset grows, each additional page request requires a new, larger call against your collection. For example, if your
offsetandlimitspecify results from 21-30, Weaviate retrieves 30 objects and drops the first 20. On the next call, Weaviate retrieves 40 objects and drops the first 30. - Resource requirements are amplified in multi-shard configurations. Each shard retrieves a full list of objects. Each shard also drops the objects before the offset. If you have 10 shards configured and ask for results 91-100, Weaviate retrieves 1000 objects (100 per shard) and drops 990 of them.
- The number of objects you can retrieve is limited. A single query returns up to
QUERY_MAXIMUM_RESULTS. If the sum ofoffsetandlimitexceedsQUERY_MAXIMUM_RESULTS, Weaviate returns an error. To change the limit, edit theQUERY_MAXIMUM_RESULTSenvironment variable. If you increaseQUERY_MAXIMUM_RESULTS, use the lowest value possible to avoid performance problems. - Pagination is not stateful. If the database state changes between calls, your pages might miss results. An insertion or a deletion will change the object count. An update could change object order. However, if there are no writes the overall results set is the same if you retrieve a large single page or many smaller ones.
Autocut​
The autocut function limits results based on discontinuities in the result set. Specifically, autocut looks for discontinuities, or jumps, in result metrics such as vector distance or search score.
To use autocut, specify how many jumps there should be in your query. The query stops returning results after the specified number of jumps.
For example, consider a nearText search that returns objects with these distance values:
[0.1899, 0.1901, 0.191, 0.21, 0.215, 0.23].
Autocut returns the following:
autocut: 1:[0.1899, 0.1901, 0.191]autocut: 2:[0.1899, 0.1901, 0.191, 0.21, 0.215]autocut: 3:[0.1899, 0.1901, 0.191, 0.21, 0.215, 0.23]
Autocut works with these functions:
nearXXXbm25hybrid
To use autocut with the hybrid search, specify the relativeScoreFusion ranking method.
Autocut is disabled by default. To explicitly disable autocut, set the number of jumps to 0 or a negative value.
If autocut is combined with the limit filter, autocut only considers the first objects returned up to the value of limit.
Sample client code:
If a snippet doesn't work or you have feedback, please open a GitHub issue.
from weaviate.classes.query import MetadataQuery
jeopardy = client.collections.use("JeopardyQuestion")
response = jeopardy.query.near_text(
query="animals in movies",
auto_limit=1, # number of close groups
return_metadata=MetadataQuery(distance=True)
)
for o in response.objects:
print(o.properties)
print(o.metadata.distance)
Example response
The output is like this:
{
"data": {
"Get": {
"JeopardyQuestion": [
{
"_additional": {
"distance": 0.17591828
},
"answer": "meerkats",
"question": "Group of mammals seen <a href=\"http://www.j-archive.com/media/1998-06-01_J_28.jpg\" target=\"_blank\">here</a>: [like Timon in <i>The Lion King</i>]"
},
{
"_additional": {
"distance": 0.17837524
},
"answer": "dogs",
"question": "Scooby-Doo, Goofy & Pluto are cartoon versions"
},
{
"_additional": {
"distance": 0.18658042
},
"answer": "The Call of the Wild Thornberrys",
"question": "Jack London story about the dog Buck who joins a Nick cartoon about Eliza, who can talk to animals"
},
{
"_additional": {
"distance": 0.18755406
},
"answer": "fox",
"question": "In titles, animal associated with both Volpone and Reynard"
},
{
"_additional": {
"distance": 0.18817466
},
"answer": "Lion Tamers/Wild Animal Trainers",
"question": "Mabel Stark, Clyde Beatty & Gunther Gebel-Williams"
},
{
"_additional": {
"distance": 0.19061792
},
"answer": "a fox",
"question": "\"Sly\" creature sought by sportsmen riding to hounds"
},
{
"_additional": {
"distance": 0.191764
},
"answer": "a lion",
"question": "The animal featured both in Rousseau's \"The Sleeping Gypsy\" & \"The Dream\""
}
]
}
}
}
For more client code examples for each functional category, see these pages:
Cursor with after​
Starting with version v1.18, you can use after to retrieve objects sequentially. For example, you can use after to retrieve a complete set of objects from a collection.
after creates a cursor that is compatible with single shard and multi-shard configurations.
The after function relies on object ids, and thus it only works with list queries. after is not compatible with where, near<Media>, bm25, hybrid, or similar searches, or in combination with filters. For those use cases, use pagination with offset and limit.
If a snippet doesn't work or you have feedback, please open a GitHub issue.
import os
import weaviate
import weaviate.classes as wvc
from weaviate.classes.query import Sort
client = weaviate.connect_to_local()
try:
articles = client.collections.use("Article")
response = articles.query.fetch_objects(
limit=5,
after="002d5cb3-298b-380d-addb-2e026b76c8ed"
)
for o in response.objects:
print(f"Answer: {o.properties['title']}")
finally:
client.close()
Expected response
{
"data": {
"Get": {
"Article": [
{
"_additional": {
"id": "00313a4c-4308-30b0-af4a-01773ad1752b"
},
"title": "Managing Supply Chain Risk"
},
{
"_additional": {
"id": "0042b9d0-20e4-334e-8f42-f297c150e8df"
},
"title": "Playing College Football In Madden"
},
{
"_additional": {
"id": "0047c049-cdd6-3f6e-bb89-84ae20b74f49"
},
"title": "The 50 best albums of 2019, No 3: Billie Eilish \u2013 When We All Fall Asleep, Where Do We Go?"
},
{
"_additional": {
"id": "00582185-cbf4-3cd6-8c59-c2d6ec979282"
},
"title": "How artificial intelligence is transforming the global battle against human trafficking"
},
{
"_additional": {
"id": "0061592e-b776-33f9-8109-88a5bd41df78"
},
"title": "Masculine, feminist or neutral? The language battle that has split Spain"
}
]
}
}
}
Sorting​
You can sort results by any primitive property, such as text, number, or int.
Sorting considerations​
Sorting can be applied when fetching objects, but it's unavailable when using search operators. Search operators don’t support sorting because they automatically rank results according to factors such as certainty or distance, which reflect the relevance of each result.
Weaviate's sorting implementation does not lead to massive memory spikes. Weaviate does not load all object properties into memory; only the property values being sorted are kept in memory.
Weaviate does not use any sorting-specific data structures on disk. When objects are sorted, Weaviate identifies the object and extracts the relevant properties. This works reasonably well for small scales (100s of thousand or millions of objects). It is expensive if you sort large lists of objects (100s of millions, billions). In the future, Weaviate may add a column-oriented storage mechanism to overcome this performance limitation.
Sort order​
boolean values​
false is considered smaller than true. false comes before true in ascending order and after true in descending order.
null values​
null values are considered smaller than any non-null values. null values come first in ascending order and last in descending order.
arrays​
Arrays are compared by each element separately. Elements at the same position are compared to each other, starting from the beginning of an array. When Weaviate finds an array element in one array that is smaller than its counterpart in the second array, Weaviate considers the whole first array to be smaller than the second one.
Arrays are equal if they have the same length and all elements are equal. If one array is subset of another array it is considered smaller.
Examples:
[1, 2, 3] = [1, 2, 3][1, 2, 4] < [1, 3, 4][2, 2] > [1, 2, 3, 4][1, 2, 3] < [1, 2, 3, 4]
Sorting API​
Sorting can be performed by one or more properties. If the values for the first property are identical, Weaviate uses the second property to determine the order, and so on.
The sort function takes either an object, or an array of objects, that describe a property and a sort order.
| Parameter | Required | Type | Description |
|---|---|---|---|
path | yes | text | The path to the sort field is an single element array that contains the field name. GraphQL supports specifying the field name directly. |
order | varies by client | asc or desc | The sort order, ascending (default) or descending. |
If a snippet doesn't work or you have feedback, please open a GitHub issue.
import os
import weaviate
import weaviate.classes as wvc
from weaviate.classes.query import Sort
client = weaviate.connect_to_local()
try:
article=client.collections.use("JeopardyQuestion")
response = article.query.fetch_objects(
sort=Sort.by_property(name="answer", ascending=True),
limit=3
)
for o in response.objects:
print(f"Answer: {o.properties['answer']}")
print(f"Points: {o.properties['points']}")
print(f"Question: {o.properties['question']}")
finally:
client.close()
Expected response
{
"data": {
"Get": {
"JeopardyQuestion": [
{
"answer": "$5 (Lincoln Memorial in the background)",
"points": 600,
"question": "A sculpture by Daniel Chester French can be seen if you look carefully on the back of this current U.S. bill"
},
{
"answer": "(1 of 2) Juneau, Alaska or Augusta, Maine",
"points": 0,
"question": "1 of the 2 U.S. state capitals that begin with the names of months"
},
{
"answer": "(1 of 2) Juneau, Alaska or Honolulu, Hawaii",
"points": 0,
"question": "One of the 2 state capitals whose names end with the letter \"U\""
}
]
}
}
}
Sorting by multiple properties​
To sort by more than one property, pass an array of order objects to the sort function:
If a snippet doesn't work or you have feedback, please open a GitHub issue.
import os
import weaviate
import weaviate.classes as wvc
from weaviate.classes.query import Sort
client = weaviate.connect_to_local()
try:
questions=client.collections.use("JeopardyQuestion")
response = questions.query.fetch_objects(
# Note: To sort by multiple properties, chain the relevant `by_xxx` methods.
sort=Sort.by_property(name="points", ascending=False).by_property(name="answer", ascending=True),
limit=3
)
for o in response.objects:
print(f"Answer: {o.properties['answer']}")
print(f"Points: {o.properties['points']}")
print(f"Question: {o.properties['question']}")
finally:
client.close()
Metadata properties​
To sort with metadata, add an underscore to the property name.
| Property Name | Sort Property Name |
|---|---|
id | _id |
creationTimeUnix | _creationTimeUnix |
lastUpdateTimeUnix | _lastUpdateTimeUnix |
If a snippet doesn't work or you have feedback, please open a GitHub issue.
import os
import weaviate
import weaviate.classes as wvc
from weaviate.classes.query import Sort
client = weaviate.connect_to_local()
try:
article=client.collections.use("JeopardyQuestion")
response = article.query.fetch_objects(
return_metadata=wvc.query.MetadataQuery(creation_time=True),
sort=Sort.by_property(name="_creationTimeUnix", ascending=True),
limit=3
)
for o in response.objects:
print(f"Answer: {o.properties['answer']}")
print(f"Points: {o.properties['points']}")
print(f"Question: {o.properties['question']}")
print(f"Creation time: {o.metadata.creation_time}")
finally:
client.close()
Python client v4 property names
| Property Name | Sort Property Name |
|---|---|
uuid | _id |
creation_time | _creationTimeUnix |
last_update_time | _lastUpdateTimeUnix |
Grouping​
You can use a group to combine similar concepts (also known as entity merging). There are two ways of grouping semantically similar objects together, closest and merge. To return the closest concept, set type: closest. To combine similar entities into a single string, set type: merge
Variables​
| Variable | Required | Type | Description |
|---|---|---|---|
type | yes | string | Either closest or merge |
force | yes | float | The force to apply for a particular movements. Must be between 0 and 1. 0 is no movement. 1 is maximum movement. |
Example​
If a snippet doesn't work or you have feedback, please open a GitHub issue.
import os
import weaviate
import weaviate.classes as wvc
from weaviate.classes.query import Sort
client = weaviate.connect_to_local()
try:
article=client.graphql_raw_query(
"""
{
Get {
Publication(
group:{
type: merge,
force:0.05
}
) {
name
}
}
}
"""
)
for a in article.get["Publication"]:
print(a)
finally:
client.close()
The query merges the results for International New York Times, The New York Times Company and New York Times.
The central concept in the group, The New York Times Company, leads the group. Related values follow in parentheses.
Expected response
{
"data": {
"Get": {
"Publication": [
{
"name": "Fox News"
},
{
"name": "Wired"
},
{
"name": "The New York Times Company (New York Times, International New York Times)"
},
{
"name": "Game Informer"
},
{
"name": "New Yorker"
},
{
"name": "Wall Street Journal"
},
{
"name": "Vogue"
},
{
"name": "The Economist"
},
{
"name": "Financial Times"
},
{
"name": "The Guardian"
},
{
"name": "CNN"
}
]
}
}
}
Questions and feedback​
If you have any questions or feedback, let us know in the user forum.
