aboutsummaryrefslogtreecommitdiff
path: root/k8s
diff options
context:
space:
mode:
authorChris Lu <chrislusf@users.noreply.github.com>2025-12-04 16:05:06 -0800
committerGitHub <noreply@github.com>2025-12-04 16:05:06 -0800
commitf9b4a4c396d42b749f29c07d3c1dec0d2a18aaed (patch)
treeff0ea88160a4adabda9616fc7d72472851c9e21b /k8s
parentfdb888729b66c8deeed28cbe92767afa4f5a0207 (diff)
downloadseaweedfs-f9b4a4c396d42b749f29c07d3c1dec0d2a18aaed.tar.xz
seaweedfs-f9b4a4c396d42b749f29c07d3c1dec0d2a18aaed.zip
fix: check freeEcSlot before evacuating EC shards to prevent data loss (#7621)
* fix: check freeEcSlot before evacuating EC shards to prevent data loss Related to #7619 The moveAwayOneEcVolume function was missing the freeEcSlot check that exists in other EC shard placement functions. This could cause EC shards to be moved to volume servers that have no capacity, resulting in: 1. 0-byte shard files when disk is full 2. Data loss when source shards are deleted after 'successful' copy Changes: - Add freeEcSlot check before attempting to move EC shards - Sort destinations by both shard count and free slots - Refresh topology during evacuation to get updated slot counts - Log when nodes are skipped due to no free slots - Update freeEcSlot count after successful moves * fix: clarify comment wording per CodeRabbit review The comment stated 'after each move' but the code executes before calling moveAwayOneEcVolume. Updated to 'before moving each EC volume' for accuracy. * fix: collect topology once and track capacity changes locally Remove the topology refresh within the loop as it gives a false sense of correctness - the refreshed topology could still be stale (minutes old). Instead, we: 1. Collect topology once at the start 2. Track capacity changes ourselves via freeEcSlot decrement after each move This is more accurate because we know exactly what moves we've made, rather than relying on potentially stale topology refreshes. * fix: ensure partial EC volume moves are reported as failures Set hasMoved=false when a shard fails to move, even if previous shards succeeded. This prevents the caller from incorrectly assuming the entire volume was evacuated, which could lead to data loss if the source server is decommissioned based on this incorrect status. * fix: also reset hasMoved on moveMountedShardToEcNode error Same issue as the previous fix: if moveMountedShardToEcNode fails after some shards succeeded, hasMoved would incorrectly be true. Ensure partial moves are always reported as failures.
Diffstat (limited to 'k8s')
0 files changed, 0 insertions, 0 deletions