Last Updated: October 2025

Known issues

This page documents significant known issues in Weaviate, their symptoms, and recommended resolutions. Use the table below to find issues that may be affecting you, and their status.

Quick reference

Issue	Affected Versions	Resolution	Fixed In
Empty collections panic	1.28 - 1.31	In Progress	-
Database restoration blocked	1.27.23-26, 1.28.14-15, 1.29.5-7, 1.30.3	Fixed	1.27.27, 1.28.16, 1.29.8, 1.30.4
RAFT snapshot compatibility on downgrade	1.28.13+, 1.29.5+, 1.30.2+ (when downgrading to 1.27.25 or earlier)	Fixed	1.27.26
RAFT bootstrap timeout	1.25 - 1.28	Workaround	-
RAFT timeouts under heavy load	1.25 and later	Configuration	-
Invalid port 99999999 error	1.25 - 1.28	By Design	-
Context deadline on tenant deletion	1.26 - 1.28	Fixed	1.26.14, 1.27.11, 1.28.5
Memory pressure: Shard init failure	All	Configuration	-
RAFT snapshot cannot be created	1.25 - 1.28	Workaround	-
Failed to decode incoming command	1.25 - 1.29	Configuration	-

Don't see your issue in the table?

If you can't find the issue you are experiencing, you can always open a new one.

Known issues in detail

Database restoration blocked

Impact summary

Affected versions: 1.27.23-26, 1.28.14-15, 1.29.5-7, 1.30.3
Resolution: Fixed in 1.27.27, 1.28.16, 1.29.8, 1.30.4

Symptoms

Nodes fail to start and remain indefinitely in initialization
Repeated log messages: waiting for database to be restored
May show schema update errors during RAFT command replay:

  cmd_class":"Content_par_test","cmd_type":2,"cmd_type_name":"TYPE_UPDATE_CLASS","error":"updating schema: TYPE_UPDATE_CLASS: bad request :parse class update: property \"content\": property fields other than description cannot be updated through updating the class. Use the add property feature (e.g. \"POST /v1/schema/{className}/properties\")

Root cause

A regression introduced in schema catch-up handling caused invalid RAFT commands to block database initialization. The system would continuously retry these invalid commands, preventing marking the database as ready.

Resolution

Upgrade to the fixed version:

If on 1.27.x → upgrade to 1.27.27 or higher
If on 1.28.x → upgrade to 1.28.16 or higher
If on 1.29.x → upgrade to 1.29.8 or higher
If on 1.30.x → upgrade to 1.30.4 or higher

RAFT snapshot compatibility on downgrade

Impact summary

Affected versions: 1.28.13+, 1.29.5+, 1.30.2+ (when downgrading to 1.27.25 or earlier)
Resolution: Fixed in 1.27.26

Symptoms

After downgrading from 1.28+ to older 1.27.x versions:

RAFT snapshots fail to load
Cluster cannot reach Ready state
Node initialization failures

Root cause

RAFT snapshot format changes introduced in 1.28.13, 1.29.5, and 1.30.2 are not backward compatible with 1.27 releases prior to 1.27.26.

Resolution

When downgrading from 1.28.13+, 1.29.5+, or 1.30.2+:

Ensure you downgrade to version 1.27.26 or higher
Do not downgrade to 1.27.25 or earlier

RAFT bootstrap timeout

Impact summary

Affected versions: 1.25, 1.26, 1.27, 1.28
Resolution: Workaround available

Symptoms

Nodes stuck in crash loop with consistent restart intervals
Logs show schema catch-up in progress but never complete:

  Schema catching up: applying log entry: [X/Y]

Startup probe failures in Kubernetes

Root cause

During cluster initialization or node recovery, applying accumulated RAFT log entries (especially with large schemas or many collections) may exceed the default 600-second bootstrap timeout.

Resolution

Increase bootstrap timeout and startup probe:

env:
  - name: RAFT_BOOTSTRAP_TIMEOUT
    value: "900" # 15 minutes

startupProbe:
  failureThreshold: 90 # 90 * 10 = 900 seconds
  periodSeconds: 10
  httpGet:
    path: /v1/.well-known/ready
    port: 8080

Calculate appropriate timeout:

Small clusters (< 10 collections): 600s default usually sufficient
Medium clusters (10-100 collections): 900-1800s
Large clusters (100+ collections): 1800-3600s

Note: Versions 1.27.4+, 1.26.11+, and 1.25.26+ include optimizations that reduce schema rebuild time during catch-up.

Prevention

Monitor cluster scale and adjust timeouts proactively
Use WCS tool: wcs startup-count <failure-threshold> --apply; wcs sync

RAFT timeouts under heavy load

Impact summary

Affected versions: 1.25 and later
Resolution: Configuration available (default improved in 1.31+)

Symptoms

Frequent leader elections and cluster instability
Logs showing timeout errors:

  heartbeat timeout reached, starting election
  Election timeout reached, restarting election
  memberlist: Failed fallback TCP ping: timeout 1s: read tcp [...]: i/o timeout

High CPU usage, memory pressure, or goroutine counts in monitoring
Performance degradation during normal operations

Root cause

Under heavy load or network latency, nodes cannot respond to RAFT heartbeats and memberlist pings within default timeout windows. This causes false failure detection, unnecessary leader elections, and cascading performance issues.

The issue is typically a symptom of underlying resource pressure rather than a RAFT problem itself.

Resolution

1. Adjust RAFT timeout multiplier:

# Production (default in 1.31+)
RAFT_TIMEOUTS_MULTIPLIER=5

# High-latency networks
RAFT_TIMEOUTS_MULTIPLIER=10

# Heavily loaded or unstable environments
RAFT_TIMEOUTS_MULTIPLIER=15

This multiplies all timeout values:

Heartbeat timeout: 1s → 5s (with multiplier of 5)
Election timeout: 1s → 5s
Leader lease timeout: 0.5s → 2.5s
Memberlist TCP timeout: 10s → 50s

2. Investigate root cause:

Check for underlying issues:

Memory pressure or OOM events
CPU saturation
Network latency or packet loss
Too many collections causing Go scheduler pressure

3. If too many collections:

Reduce Go scheduler load:

GOMAXPROCS=<value less than available CPUs>

Best practices

Start with default multiplier (5) for most environments
Increase gradually if seeing frequent elections
Monitor cluster stability after changes
Address underlying resource issues rather than only masking with higher timeouts

Invalid port 99999999

Impact summary

Affected versions: 1.25, 1.26, 1.27, 1.28
Resolution: By design (not a bug)

Symptoms

Error message in logs:

dial tcp: address 99999999: invalid port

Often accompanied by memberlist instability messages:

memberlist: Suspect weaviate-0 has failed, no acks received
memberlist: Marking weaviate-0 as failed, suspect timeout reached

Root cause

This is not a RAFT problem but a symptom of memberlist instability. The invalid port 99999999 is intentionally returned to prevent RAFT from communicating with nodes that are not part of the memberlist, which prevents cross-talk issues where RAFT might contact old IP addresses from previous cluster configurations.

The underlying cause is typically:

Too many collections causing Go scheduler slowdown and network I/O delays
Network connectivity issues preventing memberlist health checks

Resolution

Address underlying causes:

If too many collections:

   GOMAXPROCS=<value less than available CPUs>

If network issues:
- Check connectivity between all cluster nodes
- Review network policies and firewall rules
- Verify DNS resolution

Context deadline on tenant deletion

Impact summary

Affected versions: 1.26, 1.27, 1.28
Resolution: Fixed in 1.26.14, 1.27.11, 1.28.5

Symptoms

Tenant deletion fails with timeout errors:

context deadline exceeded
session: fetching region failed: RequestCanceled: request context canceled
caused by: context deadline exceeded

Occurs only when tenant offloading module (offload-s3) is enabled.

Root cause

When tenant offloading is enabled and AWS credentials are misconfigured, the deletion process attempts to delete cloud resources but times out waiting for AWS responses.

Resolution

Temporary workaround (if upgrade not immediately possible):

Option 1: Disable tenant offloading

# Remove or disable tenant offloading module configuration

Option 2: Correct AWS credentials

# Provide valid AWS credentials for tenant offloading
AWS_ACCESS_KEY_ID=<valid_key>
AWS_SECRET_ACCESS_KEY=<valid_secret>

Permanent fix: Upgrade to fixed version:

1.26.x → 1.26.14 or higher
1.27.x → 1.27.11 or higher
1.28.x → 1.28.5 or higher

Memory pressure: Shard init failure

Impact summary

Affected versions: All versions
Resolution: Configuration required

Symptoms

Shard initialization failures during tenant activation
Errors in logs:

  memory pressure: cannot init shard: not enough memory mappings
  broadcast: cannot reach enough replicas

Tenant activation failures

Root cause

The system has reached the operating system limit for memory-mapped files (vm.max_map_count). Each shard requires multiple memory mappings, and the default OS limit may be insufficient for large multi-tenant deployments.

Resolution

Increase the system memory mapping limit:

# Check current value
sysctl vm.max_map_count

# Increase to 3-4x current value
# Example: 2097152 → 8388608
sysctl -w vm.max_map_count=8388608

Make the change persistent:

# Add to /etc/sysctl.conf
echo "vm.max_map_count=8388608" >> /etc/sysctl.conf

Restart affected pods to apply the new configuration.

Empty collections panic

Impact summary

Affected versions: 1.28, 1.29, 1.30, 1.31
Resolution: In progress

Symptoms

Single-node clusters panic on startup with:

Recovered from panic: assignment to entry in nil map
[...stack trace...]
github.com/weaviate/weaviate/cluster/schema.(*schema).addClass

Root cause

RAFT snapshots with no collections (previously called classes) cause a nil map assignment during restoration due to JSON unmarshaler omitempty behavior. This edge case occurs when snapshots are created before any collections are added.

Resolution

Option 1: Upgrade (when available) Upgrade to patched version containing the fix.

Option 2: Remove empty snapshot

Identify and remove the problematic snapshot:

# Navigate to RAFT directory
cd raft/snapshots/

# Find snapshot with empty classes
# Look for state.bin containing: {"node_id":"...","snapshot_id":"...","classes":{}}

# Remove the empty snapshot directory
rm -rf <snapshot-directory>

Example structure:

raft/
├── db_users/
├── raft.db
└── snapshots/
    ├── 4-4-1727456146194/    # Valid snapshot
    └── 5-6-1728681332462/    # Empty snapshot - remove this

Prevention

This issue should not occur in normal operations. It typically happens only if a snapshot is created before any schema is defined.

RAFT snapshot cannot be created

Impact summary

Affected versions: 1.25, 1.26, 1.27, 1.28
Resolution: Workaround available

Symptoms

Node stuck during bootstrap with error messages indicating snapshot threshold reached but unable to create snapshot. This typically occurs only during initial cluster setup with rapid configuration changes.

Root cause

During bootstrap, many configuration changes in short succession increase RAFT log size and trigger snapshot threshold before the node has fully initialized. The cluster becomes stuck because:

It cannot create a snapshot (requires bootstrap completion)
It cannot apply new configurations (requires snapshot first)

This should be rare in normal operations.

Resolution

Temporarily increase snapshot thresholds:

RAFT_SNAPSHOT_INTERVAL=600  # seconds (default: 120)
RAFT_SNAPSHOT_THRESHOLD=24576  # entries (default: 8192)

This allows the node to apply all RAFT log entries before triggering snapshot creation.

After node reports healthy:

Remove the custom configuration
Restart the node to return to defaults

Prevention

Avoid making many rapid configuration changes during initial cluster bootstrap
Stage large schema deployments rather than applying all at once

Failed to decode incoming command

Impact summary

Affected versions: 1.25, 1.26, 1.27, 1.28, 1.29
Resolution: Configuration

Symptoms

Log entries showing:

failed to decode incoming command
error: unknown rpc type 71
remote-address: 10.0.104.114:42128

Note: 71 represents ASCII 'G' (GET), 80 represents ASCII 'P' (POST)

Root cause

HTTP requests being sent to RAFT's internal TCP endpoint (port 8300). This commonly occurs when Prometheus or other monitoring tools auto-discover and attempt to scrape all open ports, including internal RAFT ports.

Resolution

Configure monitoring to exclude RAFT ports:

Update Prometheus scrape configuration to skip internal cluster ports:

Port 7000: Memberlist
Port 7100-7103: Memberlist gossip
Port 8300: RAFT

For Prometheus Operator:

additionalScrapeConfigs:
  - job_name: "weaviate"
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_container_port_number]
        regex: "(7000|7100|7101|7102|7103|8300)"
        action: drop

This is informational only and does not impact cluster functionality.

Getting help

If you encounter an issue not listed here:

Search GitHub Issues
Ask in the Weaviate Community Forum

If you can't find an existing issue, please open a new one. Try to include the following information:

Weaviate version
Deployment environment (cloud, on-prem, Kubernetes, etc.)
Relevant log excerpts
Steps to reproduce
Impact on your workload

Additional resources

Need help?

Quick reference​

Known issues in detail​

Database restoration blocked​

Symptoms​

Root cause​

Resolution​

RAFT snapshot compatibility on downgrade​

Symptoms​

Root cause​

Resolution​

RAFT bootstrap timeout​

Symptoms​

Root cause​

Resolution​

Prevention​

RAFT timeouts under heavy load​

Symptoms​

Root cause​

Resolution​

Best practices​

Invalid port 99999999​

Symptoms​

Root cause​

Resolution​

Context deadline on tenant deletion​

Symptoms​

Root cause​

Resolution​

Memory pressure: Shard init failure​

Symptoms​

Root cause​

Resolution​

Empty collections panic​

Symptoms​

Root cause​

Resolution​

Prevention​

RAFT snapshot cannot be created​

Symptoms​

Root cause​

Resolution​

Prevention​

Failed to decode incoming command​

Symptoms​

Root cause​

Resolution​

Getting help​

Quick reference

Known issues in detail

Database restoration blocked

Symptoms

Root cause

Resolution

RAFT snapshot compatibility on downgrade

Symptoms

Root cause

Resolution

RAFT bootstrap timeout

Symptoms

Root cause

Resolution

Prevention

RAFT timeouts under heavy load

Symptoms

Root cause

Resolution

Best practices

Invalid port 99999999

Symptoms

Root cause

Resolution

Context deadline on tenant deletion

Symptoms

Root cause

Resolution

Memory pressure: Shard init failure

Symptoms

Root cause

Resolution

Empty collections panic

Symptoms

Root cause

Resolution

Prevention

RAFT snapshot cannot be created

Symptoms

Root cause

Resolution

Prevention

Failed to decode incoming command

Symptoms

Root cause

Resolution

Getting help