Collection export
v1.37This is a preview feature. The API may change in future releases.
Export collections from Weaviate to cloud storage in Apache Parquet format. Exports are point-in-time snapshots, writes that occur during an export do not affect the exported data. Only one export at a time per node is possible.
The export feature is disabled by default. To use it:
- Enable the export API and configure a storage bucket.
- Configure cloud storage credentials for your backend (S3, GCS, or Azure).
- Create an export via the client or REST API.
Environment variables
Set these environment variables to enable and configure exports:
| Environment Variable | Default | Description |
|---|---|---|
EXPORT_ENABLED | false | Enable the export API. |
EXPORT_DEFAULT_BUCKET | (empty) | Storage bucket name. Required for S3, GCS, and Azure backends. |
EXPORT_DEFAULT_PATH | "" | Optional base path prefix for exported files within the bucket. Defaults to an empty string (no prefix). Changed in v1.37.1: previously required to be explicitly set. |
EXPORT_PARALLELISM | 0 (GOMAXPROCS) | Number of concurrent scan workers. |
All four variables are runtime-configurable and can be changed without restarting Weaviate.
Collection export is not enabled by default in Weaviate Cloud. If you want to enable it, contact us via email.
Backend configuration
Exports support three cloud storage backends and the local filesystem. Each cloud storage backend uses the same credential environment variables as backups:
| Backend | Value | Credential env vars |
|---|---|---|
| Amazon S3 | s3 | AWS_REGION, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY |
| Google Cloud Storage | gcs | GOOGLE_APPLICATION_CREDENTIALS |
| Azure Blob Storage | azure | AZURE_STORAGE_ACCOUNT, AZURE_STORAGE_KEY or AZURE_STORAGE_CONNECTION_STRING |
Do not export to backup buckets. Backup buckets may have immutability policies that cause export operations to fail. Use a dedicated bucket for exports.
Create a collection export
Specify an export ID, backend, file format, and optionally which collections to include or exclude. If neither include nor exclude is specified, all collections are exported.
If a snippet doesn't work or you have feedback, please open a GitHub issue.
# Export specific collections
result = client.export.create(
export_id="my-export-include",
backend=ExportStorage.FILESYSTEM,
file_format=ExportFileFormat.PARQUET,
include_collections=["Articles", "Products"],
wait_for_completion=True,
)
print(result.status) # ExportStatus.SUCCESS
print(result.collections) # ['Articles', 'Products']
# Or exclude specific collections (exports everything else)
result = client.export.create(
export_id="my-export-exclude",
backend=ExportStorage.FILESYSTEM,
file_format=ExportFileFormat.PARQUET,
exclude_collections=["TempData"],
wait_for_completion=True,
)
result = client.export.create(
export_id="my-async-export-" + uuid.uuid4().hex[:8],
backend=ExportStorage.FILESYSTEM,
file_format=ExportFileFormat.PARQUET,
include_collections=["Articles"],
)
print(result.status) # ExportStatus.STARTED or ExportStatus.TRANSFERRING
Request parameters
| Field | Required | Description |
|---|---|---|
id | Yes | Unique export ID. Must match ^[a-z0-9_-]+$, max 128 characters. |
file_format | Yes | Output format. Currently only parquet is supported. |
include | No | Collections to export. Cannot be used together with exclude. |
exclude | No | Collections to exclude from export. Cannot be used together with include. |
Check collection export status
Exports run asynchronously. Poll the status endpoint to track progress.
If a snippet doesn't work or you have feedback, please open a GitHub issue.
status = client.export.get_status(
export_id=async_export_id,
backend=ExportStorage.FILESYSTEM,
)
print(status.status) # e.g. ExportStatus.TRANSFERRING
print(status.collections) # ['Articles']
print(status.shard_status) # Per-shard progress details
Export states
| State | Description |
|---|---|
STARTED | Export has been created and is initializing. |
TRANSFERRING | Data is being written to cloud storage. |
SUCCESS | Export completed successfully. |
FAILED | Export failed. Check shard status for details. |
CANCELED | Export was canceled by the user. |
Shard states
Each shard within an export has its own status:
| State | Description |
|---|---|
TRANSFERRING | Shard data is being written. |
SUCCESS | Shard export completed. |
FAILED | Shard export failed. |
SKIPPED | Shard was skipped (e.g., offloaded tenant). |
Cancel a collection export
If a snippet doesn't work or you have feedback, please open a GitHub issue.
client.export.cancel(
export_id=cancel_id,
backend=ExportStorage.FILESYSTEM,
)
Output format
Exports produce Apache Parquet files with Zstd compression. Each file contains:
| Column | Type | Description |
|---|---|---|
id | string | Object UUID |
creation_time | int64 | Creation timestamp (nanoseconds) |
update_time | int64 | Last update timestamp (nanoseconds) |
vector | bytes | Primary vector (little-endian float32) |
named_vectors | bytes | JSON-encoded named vectors |
multi_vectors | bytes | JSON-encoded multi-vectors |
properties | bytes | Raw JSON of object properties |
Files are named {collection}_{shard}_{rangeIndex}.parquet. Collection and tenant names are stored as Parquet file-level metadata.
Multi-tenancy
| Tenant state | Behavior |
|---|---|
| HOT | Exported from live data. |
| COLD | Exported directly from disk without loading into memory (remains COLD). |
| OFFLOADED | Skipped. The skip reason is recorded in the shard status. |
The tenant list is snapshotted when the export is created — tenants created during the export are not included.
Permissions
Export uses the backups permission manage_backups for RBAC authorization.
Further resources
Questions and feedback
If you have any questions or feedback, let us know in the user forum.
