Collection export

Preview — added in v1.37

This is a preview feature. The API may change in future releases.

Export collections from Weaviate to cloud storage in Apache Parquet format. Exports are point-in-time snapshots, writes that occur during an export do not affect the exported data. Only one export at a time per node is possible.

The export feature is disabled by default. To use it:

Enable the export API and configure a storage bucket.
Configure cloud storage credentials for your backend (S3, GCS, or Azure).
Create an export via the client or REST API.

Environment variables

Set these environment variables to enable and configure exports:

Environment Variable	Default	Description
`EXPORT_ENABLED`	`false`	Enable the export API.
`EXPORT_DEFAULT_BUCKET`	(empty)	Storage bucket name. Required for S3, GCS, and Azure backends.
`EXPORT_DEFAULT_PATH`	`""`	Optional base path prefix for exported files within the bucket. Defaults to an empty string (no prefix). Changed in `v1.37.1`: previously required to be explicitly set.
`EXPORT_PARALLELISM`	`0` (GOMAXPROCS)	Number of concurrent scan workers.
`EXPORT_SKIP_ACCESS_CHECK`	`false`	Skip the write-and-delete access check that runs when the export backend initializes. Set to `true` for immutable (write-once / WORM) buckets or least-privilege credentials that cannot delete objects. Added in `v1.37.8`.

EXPORT_ENABLED, EXPORT_DEFAULT_BUCKET, EXPORT_DEFAULT_PATH, and EXPORT_PARALLELISM are runtime-configurable and can be changed without restarting Weaviate. EXPORT_SKIP_ACCESS_CHECK is applied at startup and requires a restart to change.

Weaviate Cloud

The collection export feature is not available in Weaviate Cloud.

Backend configuration

Exports support three cloud storage backends and the local filesystem. Each cloud storage backend uses the same credential environment variables as backups:

Backend	Value	Credential env vars
Amazon S3	`s3`	`AWS_REGION`, `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`
Google Cloud Storage	`gcs`	`GOOGLE_APPLICATION_CREDENTIALS`
Azure Blob Storage	`azure`	`AZURE_STORAGE_ACCOUNT`, `AZURE_STORAGE_KEY` or `AZURE_STORAGE_CONNECTION_STRING`

Use a separate bucket for exports

Do not export to backup buckets. Backup buckets may have immutability policies that cause export operations to fail. Use a dedicated bucket for exports.

Create a collection export

Specify an export ID, backend, file format, and optionally which collections to include or exclude. If neither include nor exclude is specified, all collections are exported.

API docs

More info

# Export specific collections
result = client.export.create(
    export_id="my-export-include",
    backend=ExportStorage.FILESYSTEM,
    file_format=ExportFileFormat.PARQUET,
    include_collections=["Articles", "Products"],
    wait_for_completion=True,
)

print(result.status)       # ExportStatus.SUCCESS
print(result.collections)  # ['Articles', 'Products']

# Or exclude specific collections (exports everything else)
result = client.export.create(
    export_id="my-export-exclude",
    backend=ExportStorage.FILESYSTEM,
    file_format=ExportFileFormat.PARQUET,
    exclude_collections=["TempData"],
    wait_for_completion=True,
)
result = client.export.create(
    export_id="my-async-export-" + uuid.uuid4().hex[:8],
    backend=ExportStorage.FILESYSTEM,
    file_format=ExportFileFormat.PARQUET,
    include_collections=["Articles"],
)

print(result.status)  # ExportStatus.STARTED or ExportStatus.TRANSFERRING

Request parameters

Field	Required	Description
`id`	Yes	Unique export ID. Must match `^[a-z0-9_-]+$`, max 128 characters.
`file_format`	Yes	Output format. Currently only `parquet` is supported.
`include`	No	Collections to export. Cannot be used together with `exclude`.
`exclude`	No	Collections to exclude from export. Cannot be used together with `include`.

Check collection export status

Exports run asynchronously. Poll the status endpoint to track progress.

API docs

More info

status = client.export.get_status(
    export_id=async_export_id,
    backend=ExportStorage.FILESYSTEM,
)

print(status.status)        # e.g. ExportStatus.TRANSFERRING
print(status.collections)   # ['Articles']
print(status.shard_status)  # Per-shard progress details

Export states

State	Description
`STARTED`	Export has been created and is initializing.
`TRANSFERRING`	Data is being written to cloud storage.
`SUCCESS`	Export completed successfully.
`FAILED`	Export failed. Check shard status for details.
`CANCELED`	Export was canceled by the user.

Shard states

Each shard within an export has its own status:

State	Description
`TRANSFERRING`	Shard data is being written.
`SUCCESS`	Shard export completed.
`FAILED`	Shard export failed.
`SKIPPED`	Shard was skipped (e.g., offloaded tenant).

Cancel a collection export

API docs

More info

client.export.cancel(
    export_id=cancel_id,
    backend=ExportStorage.FILESYSTEM,
)

Output format

Exports produce Apache Parquet files with Zstd compression. Each file contains:

Column	Type	Description
`id`	string	Object UUID
`creation_time`	int64	Creation timestamp (nanoseconds)
`update_time`	int64	Last update timestamp (nanoseconds)
`vector`	bytes	Primary vector (little-endian float32)
`named_vectors`	bytes	JSON-encoded named vectors
`multi_vectors`	bytes	JSON-encoded multi-vectors
`properties`	bytes	Raw JSON of object properties

Files are named {collection}_{shard}_{rangeIndex}.parquet. Collection and tenant names are stored as Parquet file-level metadata.

Multi-tenancy

Tenant state	Behavior
HOT	Exported from live data.
COLD	Exported directly from disk without loading into memory (remains COLD).
OFFLOADED	Skipped. The skip reason is recorded in the shard status.

The tenant list is snapshotted when the export is created — tenants created during the export are not included.

Permissions

Export uses the backups permission manage_backups for RBAC authorization.

Further resources

REST API endpoint

Questions and feedback

Have a question or feedback? Here's how to reach us.

Community Forum

Ask questions and connect with other developers on our Community forum.

Support

Weaviate Cloud user or customer? Find the right channel on the Support page.

Additional resources

Need help?

Collection export

Environment variables

Backend configuration

Create a collection export

Request parameters

Check collection export status

Export states

Shard states

Cancel a collection export

Output format

Multi-tenancy

Permissions

Further resources

Questions and feedback

Additional resources

Need help?

Environment variables​

Backend configuration​

Create a collection export​

Request parameters​

Check collection export status​

Export states​

Shard states​

Cancel a collection export​

Output format​

Multi-tenancy​

Permissions​

Further resources​

Questions and feedback​

Environment variables

Backend configuration

Create a collection export

Request parameters

Check collection export status

Export states

Shard states

Cancel a collection export

Output format

Multi-tenancy

Permissions

Further resources

Questions and feedback