Skip to main content
Go to documentation:
⌘U
Weaviate Database

Develop AI applications using Weaviate's APIs and tools

Deploy

Deploy, configure, and maintain Weaviate Database

Query Agent

Run agentic search over your Weaviate Cloud collections

Weaviate Cloud

Manage and scale Weaviate in the cloud

Engram

Persistent memory for LLM agents and applications

Additional resources

Integrations
Contributor guide
Events & Workshops
Weaviate Academy

Need help?

Weaviate LogoAsk AI Assistant⌘K
Community Forum

Ask Mode

Weaviate Cloud only

Ask Mode transforms your query into actionable searches or aggregations, and then provides a final answer to the question.

For example, you could ask:

"How many orders related to books were placed last week?"

And the agent will filter for orders, perform semantic search for books and sort or filter for timestamps from the last week. Then, the agent will provide a response, answering this question exactly based on the data retrieved.

For more details, see the page for the Python client or the Typescript Client.

Usage

Like all features of the Query Agent, it requires instantiation of the QueryAgent class, which is connected to your Weaviate client. See the class instantiation page for more detail.

Note, locally running Weaviate instances do not support the Query Agent.

from weaviate.agents.query import QueryAgent
from weaviate.classes.init import Auth
import weaviate

client = weaviate.connect_to_weaviate_cloud(
cluster_url=os.getenv("WEAVIATE_URL"),
auth_credentials=Auth.api_key(os.getenv("WEAVIATE_API_KEY")),
)

qa = QueryAgent(
client=client,
collections=["Weather"]
)

res = qa.ask(
query = "What was the average temperature in the first week of May 2025?",
result_evaluation = "none"
)

Make sure to include your API keys in your environment, and specify whichever collection you want to search over.

Async

In Python, the Query Agent supports both synchronous and asynchronous usage. The Python examples on this page use the synchronous client, but can be easily replaced with the async equivalents — see the async section for details. In JavaScript/TypeScript, all calls are asynchronous by default and use await.

Parameters

The .ask() method accepts several arguments:

ParameterTypeDescription
querystr | list[ChatMessage]The user query you want the agent to answer. This can be a simple string ("What is the highest-grossing product?") or a list of chat messages (for conversational context). See the page on multi-turn conversations for more detail.
collectionslist[str | QueryAgentCollectionConfig] | NoneThe name(s) of the collections to search. You can pass one or many collection names as a list of strings (e.g., ["ECommerce", "BookSales"]), or provide collection configuration objects for more control. If specified in the ask method, it will overwrite those defined in the instantiation of QueryAgent. See the page on collection configuration for more detail.
result_evaluationLiteral["llm", "none"]Controls whether the agent will ask an LLM to "evaluate" (i.e., rewrite or rephrase) the result based on all retrieved context. Accepts either:
"none" (default): faster and cheaper; where the final answer is the last LLM call and no further analysis is completed.
"llm": higher cost/latency - enables a final step where an LLM subsets the sources retrieved to only those used in the answer, as well as enabling the optional fields is_partial_answer and missing_information. See the response class for more details.

For more advanced searches, you can also specify additional filters within the collection configuration. See the page on additional filters for more detail.

These arguments allow you to customize agent behavior, data access, and the type of answer you receive.

Response

The AskModeResponse class has the following properties:

FieldTypeDescription
searcheslist[QueryResultWithCollectionNormalized]A list of QueryResultWithCollectionNormalized. Each contains full details on the searches carried out during the run. This gives explicit information on the search query, filters, UUID values and sorts that were used, as well as the collection searched on.
aggregationslist[AggregationResultWithCollectionNormalized]A list of AggregationResultWithCollectionNormalized. Each contains full details on the aggregations carried out during the run. This gives explicit information on the group-by property, filters, and aggregation metrics that were used, as well as the collection aggregated on.
usageModelUnitUsageA ModelUnitUsage instance providing detail on the model units that were used during the run. The model_units are effectively token usage measurements normalized by cost.
total_timefloatTotal time taken (seconds).
is_partial_answerbool | NoneA boolean or null value indicating whether the answer is incomplete or not. Only available if result_evaluation is "llm".
missing_informationlist[str] | NoneA list of strings detailing what information is missing from the answer that makes it incomplete. Only available if result_evaluation is "llm".
final_answerstrA string comprising the LLM's final answer to the user query.
sourceslist[Source] | NoneA list of Source objects, which have an object_id property correlating to the UUID of the Weaviate object that was retrieved during the run. If result_evaluation is "llm", these are subset to only those that are relevant to the final_answer.

See the client documentation for more detail.

Streaming

While regular Ask Mode returns a single object, you can choose to stream updates and tokens from the workflow of Ask Mode instead.

for output in qa.ask_stream("What was the average temperature in the first week of May 2025?"):
pass # Do something with the output

Since the Query Agent is a multi-layered agentic system, there are different types of streaming payloads you will receive. Each one always has a field that identifies which payload it is, see below.

Request

In addition to the standard Ask Mode arguments (above), the streaming method accepts two extra flags that control which payload types are emitted:

ParameterTypeDescription
include_progressboolOptional. If True (default), the agent will stream ProgressMessage updates as it processes the query.
include_final_stateboolOptional. If True (default), the agent will emit a final AskModeResponse payload at the end of the stream.

If both include_progress and include_final_state are set to false, the stream will only emit StreamedTokens payloads as the final answer is generated.

Responses

ProgressMessage — an update on what part of the system has most recently been completed. A class with four fields:

FieldTypeDescription
output_typeLiteral["progress_message"]Always progress_message.
stagestrOne of query_analysis, search, aggregation, or final_answer. Identifies the stage at which the agentic service is running.
messagestrA human-readable message describing what the agent is doing. For example, during query_analysis this is "Analyzing query...".
detailsProgressDetailsA dictionary providing additional context about each stage.

During the search and aggregation stages, this typically includes a "queries" key — a list of dictionaries, each with:
query — the specific search term used.
collection — the collection the search was run against.

This lets you see exactly which queries were issued, and against which collections, at each stage.

See the client documentation for more detail.

StreamedTokens — incremental chunks of the final answer as it is generated, letting you render the response token-by-token rather than waiting for it to complete. Each instance has two fields: output_type (always "streamed_tokens") and delta (the newly generated tokens to append to what you have received so far). See the client documentation for more detail.

AskModeResponse — the full response model, as defined above, with output_type always final_state. This is always the final result in the stream and indicates the system has completed. See the client documentation for more detail.

Example: Handling different streamed responses

You can handle each streamed payload differently depending on their class, or their output-type property. For example, you may want to display the progress message differently than building the tokens for the final answer.

from weaviate.agents.classes import ProgressMessage, StreamedTokens, AskModeResponse

def print_stream_output(output):
if isinstance(output, ProgressMessage):
print(output.message)
elif isinstance(output, StreamedTokens):
print(output.delta, end='', flush=True)
elif isinstance(output, AskModeResponse):
output.display()

for output in qa.ask_stream("What was the average temperature in the first week of May 2025?"):
print_stream_output(output)

Async

In Python, the above examples use the synchronous client, but Ask Mode can also be called asynchronously. This requires the AsyncQueryAgent class (instantiated the same way as its sync counterpart) together with an async Weaviate client.

from weaviate.agents.query import AsyncQueryAgent

async_client = weaviate.use_async_with_weaviate_cloud(
cluster_url=os.environ.get("WEAVIATE_URL"),
auth_credentials=Auth.api_key(os.environ.get("WEAVIATE_API_KEY")),
)
await async_client.connect()

async_qa = AsyncQueryAgent(
client=async_client, collections=["Weather"]
)

The .ask() method must be awaited:

await async_qa.ask(
query = "What was the average temperature in the first week of May 2025?",
)

And the .ask_stream() method must be used in an async for loop:

async for output in async_qa.ask_stream("What was the average temperature in the first week of May 2025?"):
pass # Do something with the output

Questions and feedback

If you have any questions or feedback, let us know in the user forum.