Ask Mode
Ask Mode transforms your query into actionable searches or aggregations, and then provides a final answer to the question.
For example, you could ask:
"How many orders related to books were placed last week?"
And the agent will filter for orders, perform semantic search for books and sort or filter for timestamps from the last week. Then, the agent will provide a response, answering this question exactly based on the data retrieved.
For more details, see the page for the Python client or the Typescript Client.
Usage
Like all features of the Query Agent, it requires instantiation of the QueryAgent class, which is connected to your Weaviate client. See the class instantiation page for more detail.
Note, locally running Weaviate instances do not support the Query Agent.
from weaviate.agents.query import QueryAgent
from weaviate.classes.init import Auth
import weaviate
client = weaviate.connect_to_weaviate_cloud(
cluster_url=os.getenv("WEAVIATE_URL"),
auth_credentials=Auth.api_key(os.getenv("WEAVIATE_API_KEY")),
)
qa = QueryAgent(
client=client,
collections=["Weather"]
)
res = qa.ask(
query = "What was the average temperature in the first week of May 2025?",
result_evaluation = "none"
)
Make sure to include your API keys in your environment, and specify whichever collection you want to search over.
In Python, the Query Agent supports both synchronous and asynchronous usage. The Python examples on this page use the synchronous client, but can be easily replaced with the async equivalents — see the async section for details. In JavaScript/TypeScript, all calls are asynchronous by default and use await.
Parameters
The .ask() method accepts several arguments:
| Parameter | Type | Description |
|---|---|---|
query | str | list[ChatMessage] | The user query you want the agent to answer. This can be a simple string ("What is the highest-grossing product?") or a list of chat messages (for conversational context). See the page on multi-turn conversations for more detail. |
collections | list[str | QueryAgentCollectionConfig] | None | The name(s) of the collections to search. You can pass one or many collection names as a list of strings (e.g., ["ECommerce", "BookSales"]), or provide collection configuration objects for more control. If specified in the ask method, it will overwrite those defined in the instantiation of QueryAgent. See the page on collection configuration for more detail. |
result_evaluation | Literal["llm", "none"] | Controls whether the agent will ask an LLM to "evaluate" (i.e., rewrite or rephrase) the result based on all retrieved context. Accepts either: • "none" (default): faster and cheaper; where the final answer is the last LLM call and no further analysis is completed.• "llm": higher cost/latency - enables a final step where an LLM subsets the sources retrieved to only those used in the answer, as well as enabling the optional fields is_partial_answer and missing_information. See the response class for more details. |
For more advanced searches, you can also specify additional filters within the collection configuration. See the page on additional filters for more detail.
These arguments allow you to customize agent behavior, data access, and the type of answer you receive.
Response
The AskModeResponse class has the following properties:
| Field | Type | Description |
|---|---|---|
searches | list[QueryResultWithCollectionNormalized] | A list of QueryResultWithCollectionNormalized. Each contains full details on the searches carried out during the run. This gives explicit information on the search query, filters, UUID values and sorts that were used, as well as the collection searched on. |
aggregations | list[AggregationResultWithCollectionNormalized] | A list of AggregationResultWithCollectionNormalized. Each contains full details on the aggregations carried out during the run. This gives explicit information on the group-by property, filters, and aggregation metrics that were used, as well as the collection aggregated on. |
usage | ModelUnitUsage | A ModelUnitUsage instance providing detail on the model units that were used during the run. The model_units are effectively token usage measurements normalized by cost. |
total_time | float | Total time taken (seconds). |
is_partial_answer | bool | None | A boolean or null value indicating whether the answer is incomplete or not. Only available if result_evaluation is "llm". |
missing_information | list[str] | None | A list of strings detailing what information is missing from the answer that makes it incomplete. Only available if result_evaluation is "llm". |
final_answer | str | A string comprising the LLM's final answer to the user query. |
sources | list[Source] | None | A list of Source objects, which have an object_id property correlating to the UUID of the Weaviate object that was retrieved during the run. If result_evaluation is "llm", these are subset to only those that are relevant to the final_answer. |
Streaming
While regular Ask Mode returns a single object, you can choose to stream updates and tokens from the workflow of Ask Mode instead.
for output in qa.ask_stream("What was the average temperature in the first week of May 2025?"):
pass # Do something with the output
Since the Query Agent is a multi-layered agentic system, there are different types of streaming payloads you will receive. Each one always has a field that identifies which payload it is, see below.
Request
In addition to the standard Ask Mode arguments (above), the streaming method accepts two extra flags that control which payload types are emitted:
| Parameter | Type | Description |
|---|---|---|
include_progress | bool | Optional. If True (default), the agent will stream ProgressMessage updates as it processes the query. |
include_final_state | bool | Optional. If True (default), the agent will emit a final AskModeResponse payload at the end of the stream. |
If both include_progress and include_final_state are set to false, the stream will only emit StreamedTokens payloads as the final answer is generated.
Responses
ProgressMessage — an update on what part of the system has most recently been completed. A class with four fields:
| Field | Type | Description |
|---|---|---|
output_type | Literal["progress_message"] | Always progress_message. |
stage | str | One of query_analysis, search, aggregation, or final_answer. Identifies the stage at which the agentic service is running. |
message | str | A human-readable message describing what the agent is doing. For example, during query_analysis this is "Analyzing query...". |
details | ProgressDetails | A dictionary providing additional context about each stage. During the search and aggregation stages, this typically includes a "queries" key — a list of dictionaries, each with:• query — the specific search term used.• collection — the collection the search was run against.This lets you see exactly which queries were issued, and against which collections, at each stage. |
See the client documentation for more detail.
StreamedTokens — incremental chunks of the final answer as it is generated, letting you render the response token-by-token rather than waiting for it to complete. Each instance has two fields: output_type (always "streamed_tokens") and delta (the newly generated tokens to append to what you have received so far). See the client documentation for more detail.
AskModeResponse — the full response model, as defined above, with output_type always final_state. This is always the final result in the stream and indicates the system has completed. See the client documentation for more detail.
Example: Handling different streamed responses
You can handle each streamed payload differently depending on their class, or their output-type property. For example, you may want to display the progress message differently than building the tokens for the final answer.
from weaviate.agents.classes import ProgressMessage, StreamedTokens, AskModeResponse
def print_stream_output(output):
if isinstance(output, ProgressMessage):
print(output.message)
elif isinstance(output, StreamedTokens):
print(output.delta, end='', flush=True)
elif isinstance(output, AskModeResponse):
output.display()
for output in qa.ask_stream("What was the average temperature in the first week of May 2025?"):
print_stream_output(output)
Async
In Python, the above examples use the synchronous client, but Ask Mode can also be called asynchronously. This requires the AsyncQueryAgent class (instantiated the same way as its sync counterpart) together with an async Weaviate client.
from weaviate.agents.query import AsyncQueryAgent
async_client = weaviate.use_async_with_weaviate_cloud(
cluster_url=os.environ.get("WEAVIATE_URL"),
auth_credentials=Auth.api_key(os.environ.get("WEAVIATE_API_KEY")),
)
await async_client.connect()
async_qa = AsyncQueryAgent(
client=async_client, collections=["Weather"]
)
The .ask() method must be awaited:
await async_qa.ask(
query = "What was the average temperature in the first week of May 2025?",
)
And the .ask_stream() method must be used in an async for loop:
async for output in async_qa.ask_stream("What was the average temperature in the first week of May 2025?"):
pass # Do something with the output
Questions and feedback
If you have any questions or feedback, let us know in the user forum.
