Skip to main content
Go to documentation:
⌘U
Weaviate Database

Develop AI applications using Weaviate's APIs and tools

Deploy

Deploy, configure, and maintain Weaviate Database

Weaviate Agents

Build and deploy intelligent agents with Weaviate

Weaviate Cloud

Manage and scale Weaviate in the cloud

Additional resources

Integrations
Contributor guide
Events & Workshops
Weaviate Academy

Need help?

Weaviate LogoAsk AI Assistant⌘K
Community Forum

Collection export

Preview — added in v1.37

This is a preview feature. The API may change in future releases.

Export collections from Weaviate to cloud storage in Apache Parquet format. Exports are point-in-time snapshots, writes that occur during an export do not affect the exported data. Only one export at a time per node is possible.

The export feature is disabled by default. To use it:

  1. Enable the export API and configure a storage bucket.
  2. Configure cloud storage credentials for your backend (S3, GCS, or Azure).
  3. Create an export via the client or REST API.

Environment variables

Set these environment variables to enable and configure exports:

Environment VariableDefaultDescription
EXPORT_ENABLEDfalseEnable the export API.
EXPORT_DEFAULT_BUCKET(empty)Storage bucket name. Required for S3, GCS, and Azure backends.
EXPORT_DEFAULT_PATH""Optional base path prefix for exported files within the bucket. Defaults to an empty string (no prefix). Changed in v1.37.1: previously required to be explicitly set.
EXPORT_PARALLELISM0 (GOMAXPROCS)Number of concurrent scan workers.

All four variables are runtime-configurable and can be changed without restarting Weaviate.

Weaviate Cloud

Collection export is not enabled by default in Weaviate Cloud. If you want to enable it, contact us via email.

Backend configuration

Exports support three cloud storage backends and the local filesystem. Each cloud storage backend uses the same credential environment variables as backups:

BackendValueCredential env vars
Amazon S3s3AWS_REGION, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY
Google Cloud StoragegcsGOOGLE_APPLICATION_CREDENTIALS
Azure Blob StorageazureAZURE_STORAGE_ACCOUNT, AZURE_STORAGE_KEY or AZURE_STORAGE_CONNECTION_STRING
Use a separate bucket for exports

Do not export to backup buckets. Backup buckets may have immutability policies that cause export operations to fail. Use a dedicated bucket for exports.

Create a collection export

Specify an export ID, backend, file format, and optionally which collections to include or exclude. If neither include nor exclude is specified, all collections are exported.

py docs  API docs
More infoCode snippets in the documentation reflect the latest client library and Weaviate Database version. Check the Release notes for specific versions.

If a snippet doesn't work or you have feedback, please open a GitHub issue.
# Export specific collections
result = client.export.create(
export_id="my-export-include",
backend=ExportStorage.FILESYSTEM,
file_format=ExportFileFormat.PARQUET,
include_collections=["Articles", "Products"],
wait_for_completion=True,
)

print(result.status) # ExportStatus.SUCCESS
print(result.collections) # ['Articles', 'Products']

# Or exclude specific collections (exports everything else)
result = client.export.create(
export_id="my-export-exclude",
backend=ExportStorage.FILESYSTEM,
file_format=ExportFileFormat.PARQUET,
exclude_collections=["TempData"],
wait_for_completion=True,
)
result = client.export.create(
export_id="my-async-export-" + uuid.uuid4().hex[:8],
backend=ExportStorage.FILESYSTEM,
file_format=ExportFileFormat.PARQUET,
include_collections=["Articles"],
)

print(result.status) # ExportStatus.STARTED or ExportStatus.TRANSFERRING

Request parameters

FieldRequiredDescription
idYesUnique export ID. Must match ^[a-z0-9_-]+$, max 128 characters.
file_formatYesOutput format. Currently only parquet is supported.
includeNoCollections to export. Cannot be used together with exclude.
excludeNoCollections to exclude from export. Cannot be used together with include.

Check collection export status

Exports run asynchronously. Poll the status endpoint to track progress.

py docs  API docs
More infoCode snippets in the documentation reflect the latest client library and Weaviate Database version. Check the Release notes for specific versions.

If a snippet doesn't work or you have feedback, please open a GitHub issue.
status = client.export.get_status(
export_id=async_export_id,
backend=ExportStorage.FILESYSTEM,
)

print(status.status) # e.g. ExportStatus.TRANSFERRING
print(status.collections) # ['Articles']
print(status.shard_status) # Per-shard progress details

Export states

StateDescription
STARTEDExport has been created and is initializing.
TRANSFERRINGData is being written to cloud storage.
SUCCESSExport completed successfully.
FAILEDExport failed. Check shard status for details.
CANCELEDExport was canceled by the user.

Shard states

Each shard within an export has its own status:

StateDescription
TRANSFERRINGShard data is being written.
SUCCESSShard export completed.
FAILEDShard export failed.
SKIPPEDShard was skipped (e.g., offloaded tenant).

Cancel a collection export

py docs  API docs
More infoCode snippets in the documentation reflect the latest client library and Weaviate Database version. Check the Release notes for specific versions.

If a snippet doesn't work or you have feedback, please open a GitHub issue.
client.export.cancel(
export_id=cancel_id,
backend=ExportStorage.FILESYSTEM,
)

Output format

Exports produce Apache Parquet files with Zstd compression. Each file contains:

ColumnTypeDescription
idstringObject UUID
creation_timeint64Creation timestamp (nanoseconds)
update_timeint64Last update timestamp (nanoseconds)
vectorbytesPrimary vector (little-endian float32)
named_vectorsbytesJSON-encoded named vectors
multi_vectorsbytesJSON-encoded multi-vectors
propertiesbytesRaw JSON of object properties

Files are named {collection}_{shard}_{rangeIndex}.parquet. Collection and tenant names are stored as Parquet file-level metadata.

Multi-tenancy

Tenant stateBehavior
HOTExported from live data.
COLDExported directly from disk without loading into memory (remains COLD).
OFFLOADEDSkipped. The skip reason is recorded in the shard status.

The tenant list is snapshotted when the export is created — tenants created during the export are not included.

Permissions

Export uses the backups permission manage_backups for RBAC authorization.

Further resources

Questions and feedback

If you have any questions or feedback, let us know in the user forum.