Ask Mode

Weaviate Cloud only

Ask Mode transforms your query into actionable searches or aggregations, and then provides a final answer to the question.

For example, you could ask:

"How many orders related to books were placed last week?"

And the agent will filter for orders, perform semantic search for books and sort or filter for timestamps from the last week. Then, the agent will provide a response, answering this question exactly based on the data retrieved.

For more details, see the page for the Python client or the Typescript Client.

Usage

Like all features of the Query Agent, it requires instantiation of the QueryAgent class, which is connected to your Weaviate client. See the class instantiation page for more detail.

Note, locally running Weaviate instances do not support the Query Agent.

from weaviate.agents.query import QueryAgent
from weaviate.classes.init import Auth
import weaviate

client = weaviate.connect_to_weaviate_cloud(
    cluster_url=os.getenv("WEAVIATE_URL"),
    auth_credentials=Auth.api_key(os.getenv("WEAVIATE_API_KEY")),
)

qa = QueryAgent(
    client=client, 
    collections=["Weather"]
)

res = qa.ask(
    query = "What was the average temperature in the first week of May 2025?",
    result_evaluation = "none"
)

Make sure to include your API keys in your environment, and specify whichever collection you want to search over.

Async

In Python, the Query Agent supports both synchronous and asynchronous usage. The Python examples on this page use the synchronous client, but can be easily replaced with the async equivalents — see the async section for details. In JavaScript/TypeScript, all calls are asynchronous by default and use await.

Parameters

The .ask() method accepts several arguments:

Parameter	Type	Description
`query`	`str \| list[ChatMessage]`	The user query you want the agent to answer. This can be a simple string (`"What is the highest-grossing product?"`) or a list of chat messages (for conversational context). See the page on multi-turn conversations for more detail.
`collections`	`list[str \| QueryAgentCollectionConfig] \| None`	The name(s) of the collections to search. You can pass one or many collection names as a list of strings (e.g., `["ECommerce", "BookSales"]`), or provide collection configuration objects for more control. If specified in the `ask` method, it will overwrite those defined in the instantiation of `QueryAgent`. See the page on collection configuration for more detail.
`result_evaluation`	`Literal["llm", "none"]`	Controls whether the agent will ask an LLM to "evaluate" (i.e., rewrite or rephrase) the result based on all retrieved context. Accepts either: • `"none"` (default): faster and cheaper; where the final answer is the last LLM call and no further analysis is completed. • `"llm"`: higher cost/latency - enables a final step where an LLM subsets the sources retrieved to only those used in the answer, as well as enabling the optional fields `is_partial_answer` and `missing_information`. See the response class for more details.

Parameter	Type	Description
`query`	`string \| ChatMessage[]`	The user query you want the agent to answer. This can be a simple string (`"What is the highest-grossing product?"`) or a list of chat messages (for conversational context). See the page on multi-turn conversations for more detail.
`collections`	`(string \| QueryAgentCollectionConfig)[]`	The name(s) of the collections to search. You can pass one or many collection names as a list of strings (e.g., `["ECommerce", "BookSales"]`), or provide collection configuration objects for more control. See the page on collection configuration for more detail. If specified in the `ask` method, it will overwrite those defined in the instantiation of `QueryAgent`.
`resultEvaluation`	`"llm" \| "none"`	Controls whether the agent will ask an LLM to "evaluate" (i.e., rewrite or rephrase) the result based on all retrieved context. Accepts either: • `"none"`: faster and cheaper; default setting where the final answer is the last LLM call. • `"llm"`: higher cost/latency - enables a final step where an LLM subsets the sources retrieved to only those used in the answer, as well as enabling the optional fields `is_partial_answer` and `missing_information`. See the response class for more details.

For more advanced searches, you can also specify additional filters within the collection configuration. See the page on additional filters for more detail.

These arguments allow you to customize agent behavior, data access, and the type of answer you receive.

Response

The AskModeResponse class has the following properties:

Field	Type	Description
`searches`	`list[QueryResultWithCollectionNormalized]`	A list of `QueryResultWithCollectionNormalized`. Each contains full details on the searches carried out during the run. This gives explicit information on the search query, filters, UUID values and sorts that were used, as well as the collection searched on.
`aggregations`	`list[AggregationResultWithCollectionNormalized]`	A list of `AggregationResultWithCollectionNormalized`. Each contains full details on the aggregations carried out during the run. This gives explicit information on the group-by property, filters, and aggregation metrics that were used, as well as the collection aggregated on.
`usage`	`ModelUnitUsage`	A `ModelUnitUsage` instance providing detail on the model units that were used during the run. The `model_units` are effectively token usage measurements normalized by cost.
`total_time`	`float`	Total time taken (seconds).
`is_partial_answer`	`bool \| None`	A boolean or null value indicating whether the answer is incomplete or not. Only available if `result_evaluation` is `"llm"`.
`missing_information`	`list[str] \| None`	A list of strings detailing what information is missing from the answer that makes it incomplete. Only available if `result_evaluation` is `"llm"`.
`final_answer`	`str`	A string comprising the LLM's final answer to the user query.
`sources`	`list[Source] \| None`	A list of `Source` objects, which have an `object_id` property correlating to the UUID of the Weaviate object that was retrieved during the run. If `result_evaluation` is `"llm"`, these are subset to only those that are relevant to the `final_answer`.

See the client documentation for more detail.

Field	Type	Description
`searches`	`Search[]`	A list of `Search` objects. Each contains full details on the searches carried out during the run. This gives explicit information on the search query, filters, UUID values and sorts that were used, as well as the collection searched on.
`aggregations`	`Aggregation[]`	A list of `Aggregation` objects. Each contains full details on the aggregations carried out during the run. This gives explicit information on the group-by property, filters, and aggregation metrics that were used, as well as the collection aggregated on.
`usage`	`ModelUnitUsage`	A `ModelUnitUsage` object providing detail on the model units that were used during the run. The `modelUnits` are effectively token usage measurements normalized by cost.
`totalTime`	`number`	Total time taken (seconds).
`isPartialAnswer`	`boolean`	A boolean indicating whether the answer is incomplete. Only available if `resultEvaluation` is `"llm"`.
`missingInformation`	`string[]`	A list of strings detailing what information is missing from the answer that makes it incomplete. Only available if `resultEvaluation` is `"llm"`.
`finalAnswer`	`string`	A string comprising the LLM's final answer to the user query.
`sources`	`Source[]`	A list of `Source` objects, which have an `objectId` property correlating to the UUID of the Weaviate object that was retrieved during the run. If `resultEvaluation` is `"llm"`, these are subset to only those that are relevant to the `finalAnswer`.

Streaming

While regular Ask Mode returns a single object, you can choose to stream updates and tokens from the workflow of Ask Mode instead.

for output in qa.ask_stream("What was the average temperature in the first week of May 2025?"):
    pass # Do something with the output

Since the Query Agent is a multi-layered agentic system, there are different types of streaming payloads you will receive. Each one always has a field that identifies which payload it is, see below.

Request

In addition to the standard Ask Mode arguments (above), the streaming method accepts two extra flags that control which payload types are emitted:

Parameter	Type	Description
`include_progress`	`bool`	Optional. If `True` (default), the agent will stream `ProgressMessage` updates as it processes the query.
`include_final_state`	`bool`	Optional. If `True` (default), the agent will emit a final `AskModeResponse` payload at the end of the stream.

If both include_progress and include_final_state are set to false, the stream will only emit StreamedTokens payloads as the final answer is generated.

Parameter	Type	Description
`includeProgress`	`boolean`	Optional. If `true` (default), the agent will stream `ProgressMessage` updates as it processes the query.
`includeFinalState`	`boolean`	Optional. If `true` (default), the agent will emit a final `AskModeResponse` payload at the end of the stream.

Responses

ProgressMessage — an update on what part of the system has most recently been completed. A class with four fields:

Field	Type	Description
`output_type`	`Literal["progress_message"]`	Always `progress_message`.
`stage`	`str`	One of `query_analysis`, `search`, `aggregation`, or `final_answer`. Identifies the stage at which the agentic service is running.
`message`	`str`	A human-readable message describing what the agent is doing. For example, during `query_analysis` this is `"Analyzing query..."`.
`details`	`ProgressDetails`	A dictionary providing additional context about each stage. During the `search` and `aggregation` stages, this typically includes a `"queries"` key — a list of dictionaries, each with: • `query` — the specific search term used. • `collection` — the collection the search was run against. This lets you see exactly which queries were issued, and against which collections, at each stage.

See the client documentation for more detail.

StreamedTokens — incremental chunks of the final answer as it is generated, letting you render the response token-by-token rather than waiting for it to complete. Each instance has two fields: output_type (always "streamed_tokens") and delta (the newly generated tokens to append to what you have received so far). See the client documentation for more detail.

AskModeResponse — the full response model, as defined above, with output_type always final_state. This is always the final result in the stream and indicates the system has completed. See the client documentation for more detail.

Field	Type	Description
`outputType`	`"progressMessage"`	Always `progressMessage`.
`stage`	`string`	One of `query_analysis`, `search`, `aggregation`, or `final_answer`. Identifies the stage at which the agentic service is running.
`message`	`string`	A human-readable message describing what the agent is doing. For example, during `query_analysis` this is `"Analyzing query..."`.
`details`	`ProgressDetails`	An object providing additional context about each stage. During the `search` and `aggregation` stages, this typically includes a `"queries"` key — a list of objects, each with: • `query` — the specific search term used. • `collection` — the collection the search was run against. This lets you see exactly which queries were issued, and against which collections, at each stage.

Example: Handling different streamed responses

You can handle each streamed payload differently depending on their class, or their output-type property. For example, you may want to display the progress message differently than building the tokens for the final answer.

from weaviate.agents.classes import ProgressMessage, StreamedTokens, AskModeResponse

def print_stream_output(output):
    if isinstance(output, ProgressMessage):
        print(output.message)
    elif isinstance(output, StreamedTokens):
        print(output.delta, end='', flush=True)
    elif isinstance(output, AskModeResponse):
        output.display()

for output in qa.ask_stream("What was the average temperature in the first week of May 2025?"):
    print_stream_output(output)

Async

In Python, the above examples use the synchronous client, but Ask Mode can also be called asynchronously. This requires the AsyncQueryAgent class (instantiated the same way as its sync counterpart) together with an async Weaviate client.

from weaviate.agents.query import AsyncQueryAgent

async_client = weaviate.use_async_with_weaviate_cloud(
    cluster_url=os.environ.get("WEAVIATE_URL"),
    auth_credentials=Auth.api_key(os.environ.get("WEAVIATE_API_KEY")),
)
await async_client.connect()

async_qa = AsyncQueryAgent(
    client=async_client, collections=["Weather"]
)

The .ask() method must be awaited:

await async_qa.ask(
    query = "What was the average temperature in the first week of May 2025?",
)

And the .ask_stream() method must be used in an async for loop:

async for output in async_qa.ask_stream("What was the average temperature in the first week of May 2025?"):
    pass # Do something with the output

Questions and feedback

Have a question or feedback? Here's how to reach us.

Community Forum

Ask questions and connect with other developers on our Community forum.

Support

Weaviate Cloud user or customer? Find the right channel on the Support page.

Additional resources

Need help?

Ask Mode

Usage

Parameters

Response

Streaming

Request

Responses

Example: Handling different streamed responses

Async

Questions and feedback

Additional resources

Need help?

Usage​

Parameters​

Response​

Streaming​

Request​

Responses​

Example: Handling different streamed responses​

Async​

Questions and feedback​

Usage

Parameters

Response

Streaming

Request

Responses

Example: Handling different streamed responses

Async

Questions and feedback