Overview
This page covers aggregation queries. They are collectively referred to as Aggregate queries within.
An Aggregate query can aggregate over an entire collection, or the results of a search.
Parameters
An Aggregate query requires the target collection to be specified. Each query can include any of the following types of arguments:
| Argument | Description | Required |
|---|---|---|
| Collection | Also called "class". The object collection to be retrieved from. | Yes |
| Properties | Properties to be retrieved | Yes |
| Conditional filters | Filter the objects to be retrieved | No |
| Search operators | Specify the search strategy (e.g. near text, hybrid, bm25) | No |
| Additional operators | Specify additional operators (e.g. limit, offset, sort) | No |
| Tenant name | Specify the tenant name | Yes, if multi-tenancy enabled. (Read more: what is multi-tenancy?) |
| Consistency level | Specify the consistency level | No |
Available properties
Each data type has its own set of available aggregated properties. The following table shows the available properties for each data type.
| Data type | Available properties |
|---|---|
| Text | count, type, topOccurrences (value, occurs) |
| Number | count, type, minimum, maximum, mean, median, mode, sum |
| Integer | count, type, minimum, maximum, mean, median, mode, sum |
| Boolean | count, type, totalTrue, totalFalse, percentageTrue, percentageFalse |
| Date | count, type, minimum, maximum, mean, median, mode |
See a GraphQL Aggregate format
{
Aggregate {
<Class> (groupBy:[<property>]) {
groupedBy { # requires `groupBy` filter
path
value
}
meta {
count
}
<propertyOfDatatypeText> {
count
type
topOccurrences (limit: <n_minimum_count>) {
value
occurs
}
}
<propertyOfDatatypeNumberOrInteger> {
count
type
minimum
maximum
mean
median
mode
sum
}
<propertyOfDatatypeBoolean> {
count
type
totalTrue
totalFalse
percentageTrue
percentageFalse
}
<propertyWithReference>
pointingTo
type
}
}
}
Below is an example query to obtain meta information about the Article collection. Note that the data is not grouped here, and results relate to all data objects in the Article collection.
If a snippet doesn't work or you have feedback, please open a GitHub issue.
import weaviate
import weaviate.classes as wvc
import os
client = weaviate.connect_to_local()
try:
collection = client.collections.use("Article")
response = collection.aggregate.over_all(
total_count=True,
return_metrics=wvc.query.Metrics("wordCount").integer(
count=True,
maximum=True,
mean=True,
median=True,
minimum=True,
mode=True,
sum_=True,
),
)
print(response.total_count)
print(response.properties)
finally:
client.close()
The above query will result in something like the following:
{
"data": {
"Aggregate": {
"Article": [
{
"inPublication": {
"pointingTo": [
"Publication"
],
"type": "cref"
},
"meta": {
"count": 4403
},
"wordCount": {
"count": 4403,
"maximum": 16852,
"mean": 966.0113558937088,
"median": 680,
"minimum": 109,
"mode": 575,
"sum": 4253348,
"type": "int"
}
}
]
}
}
}
Get object count in collection
Use meta { count } to retrieve the total number of objects in a collection.
If a snippet doesn't work or you have feedback, please open a GitHub issue.
import weaviate
import weaviate.classes as wvc
import os
client = weaviate.connect_to_local()
try:
collection = client.collections.use("Article")
response = collection.aggregate.over_all(total_count=True)
print(response.total_count)
finally:
client.close()
groupBy argument
You can use a groupBy argument to get meta information about groups of data objects, from those matching a query. The groups can be based on a property of the data objects.
groupBy limitationsgroupByonly works withnear<Media>operators.- The
groupBypathis limited to one property or cross-reference. Nested paths are not supported.
The groupBy argument is structured as follows for the Aggregate function:
{
Aggregate {
<Class> ( groupBy: ["<propertyName>"] ) {
groupedBy {
path
value
}
meta {
count
}
<propertyName> {
count
}
}
}
}
In the following example, the articles are grouped by the property inPublication, referring to the article's publisher.
If a snippet doesn't work or you have feedback, please open a GitHub issue.
import weaviate
import weaviate.classes as wvc
import os
from weaviate.classes.aggregate import GroupByAggregate
client = weaviate.connect_to_local()
try:
# Coming soon
finally:
client.close()
Expected response
{
"data": {
"Aggregate": {
"Article": [
{
"groupedBy": {
"path": [
"inPublication"
],
"value": "weaviate://localhost/Publication/16476dca-59ce-395e-b896-050080120cd4"
},
"meta": {
"count": 829
},
"wordCount": {
"mean": 604.6537997587454
}
},
{
"groupedBy": {
"path": [
"inPublication"
],
"value": "weaviate://localhost/Publication/c9a0e53b-93fe-38df-a6ea-4c8ff4501783"
},
"meta": {
"count": 618
},
"wordCount": {
"mean": 917.1860841423949
}
},
...
]
}
}
}
Additional filters
Aggregate functions can be extended with conditional filters read more.
topOccurrences property
Aggregating data makes the topOccurrences property available. Note that the counts are not dependent on tokenization. The topOccurrences count is based on occurrences of the entire property, or one of the values if the property is an array.
You can optionally specify a limit parameter to limit the returned objects. For example, limit: 5 will return the top 5 most frequent occurrences.
Consistency levels
AggregateAggregate queries are currently not available with different consistency levels.
Multi-tenancy
Where multi-tenancy is configured, the Aggregate function can be configured to aggregate results from a specific tenant.
You can do so by specifying the tenant parameter in the query as shown below, or in the client.
{
Aggregate {
Article (
tenant: "tenantA"
) {
meta {
count
}
}
}
}
For more information on using multi-tenancy, see the Multi-tenancy operations guide.
Aggregating a Vector Search / Faceted Vector Search
You can combine a vector search (e.g. nearObject, nearVector, nearText, nearImage, etc.) with an aggregation. Internally, this is a two-step process where the vector search first finds the desired objects, then the results are aggregated.
Limiting the search space
Vector searches rank objects by similarity but do not exclude any objects. Thus, for a search operator to impact aggregation, you must limit the search space by setting either objectLimit or certainty for the query:
-
objectLimit, e.g.objectLimit: 100tells Weaviate to aggregate the first 100 objects retrieved by the vector search query. This is useful when you know upfront how many results you want to serve, for example, in a recommendation scenario where you want to produce 100 recommendations. -
certainty, e.g.certainty: 0.7tells Weaviate to aggregate all vector search results with a certainty score of 0.7 or higher. This list has no fixed length, it depends on how many objects are good matches. This is useful in user-facing search scenarios, such as e-commerce. The user might be interested in all search results semantically similar to "apple iphone" and then generate facets.
The aggregation query will fail if neither objectLimit nor certainty is set.
Examples
Below are examples for nearObject, nearVector, and nearText.
Any near<Media> will work.
nearObject
If a snippet doesn't work or you have feedback, please open a GitHub issue.
import weaviate
import weaviate.classes as wvc
import os
client = weaviate.connect_to_local()
try:
collection = client.collections.use("Article")
response = collection.aggregate.near_object(
near_object="00037775-1432-35e5-bc59-443baaef7d80",
distance=0.6,
object_limit=200,
total_count=True,
return_metrics=[
wvc.query.Metrics("wordCount").integer(
count=True,
maximum=True,
mean=True,
median=True,
minimum=True,
mode=True,
sum_=True,
),
]
)
print(response.total_count)
print(response.properties)
finally:
client.close()
nearVector
To run this query, replace the placeholder vector with a real vector from the same vectorizer that used to generate object vectors.
If a snippet doesn't work or you have feedback, please open a GitHub issue.
import weaviate
import weaviate.classes as wvc
import os
client = weaviate.connect_to_local()
try:
collection = client.collections.use("Article")
response = collection.aggregate.near_vector(
near_vector=some_vector,
distance=0.7,
object_limit=100,
total_count=True,
return_metrics=[
wvc.query.Metrics("wordCount").integer(
count=True,
maximum=True,
mean=True,
median=True,
minimum=True,
mode=True,
sum_=True,
),
]
)
print(response.total_count)
print(response.properties)
finally:
client.close()
nearText
For nearText to be available, a text2vec-* module must be installed with Weaviate.
If a snippet doesn't work or you have feedback, please open a GitHub issue.
import weaviate
import weaviate.classes as wvc
import os
client = weaviate.connect_to_local()
try:
collection = client.collections.use("Article")
response = collection.aggregate.near_text(
query="apple iphone",
object_limit=200,
total_count=True,
return_metrics=[
wvc.query.Metrics("wordCount").integer(
count=True,
maximum=True,
mean=True,
median=True,
minimum=True,
mode=True,
sum_=True,
),
]
)
print(response.total_count)
print(response.properties)
finally:
client.close()
Questions and feedback
If you have any questions or feedback, let us know in the user forum.
