aboutsummaryrefslogtreecommitdiff
path: root/weed/shell/command_ec_common.go
AgeCommit message (Collapse)AuthorFilesLines
2025-12-02Add disk-aware EC rebalancing (#7597)Chris Lu1-12/+178
* Add placement package for EC shard placement logic - Consolidate EC shard placement algorithm for reuse across shell and worker tasks - Support multi-pass selection: racks, then servers, then disks - Include proper spread verification and scoring functions - Comprehensive test coverage for various cluster topologies * Make ec.balance disk-aware for multi-disk servers - Add EcDisk struct to track individual disks on volume servers - Update EcNode to maintain per-disk shard distribution - Parse disk_id from EC shard information during topology collection - Implement pickBestDiskOnNode() for selecting best disk per shard - Add diskDistributionScore() for tie-breaking node selection - Update all move operations to specify target disk in RPC calls - Improves shard balance within multi-disk servers, not just across servers * Use placement package in EC detection for consistent disk-level placement - Replace custom EC disk selection logic with shared placement package - Convert topology DiskInfo to placement.DiskCandidate format - Use SelectDestinations() for multi-rack/server/disk spreading - Convert placement results back to topology DiskInfo for task creation - Ensures EC detection uses same placement logic as shell commands * Make volume server evacuation disk-aware - Use pickBestDiskOnNode() when selecting evacuation target disk - Specify target disk in evacuation RPC requests - Maintains balanced disk distribution during server evacuations * Rename PlacementConfig to PlacementRequest for clarity PlacementRequest better reflects that this is a request for placement rather than a configuration object. This improves API semantics. * Rename DefaultConfig to DefaultPlacementRequest Aligns with the PlacementRequest type naming for consistency * Address review comments from Gemini and CodeRabbit Fix HIGH issues: - Fix empty disk discovery: Now discovers all disks from VolumeInfos, not just from EC shards. This ensures disks without EC shards are still considered for placement. - Fix EC shard count calculation in detection.go: Now correctly filters by DiskId and sums actual shard counts using ShardBits.ShardIdCount() instead of just counting EcShardInfo entries. Fix MEDIUM issues: - Add disk ID to evacuation log messages for consistency with other logging - Remove unused serverToDisks variable in placement.go - Fix comment that incorrectly said 'ascending' when sorting is 'descending' * add ec tests * Update ec-integration-tests.yml * Update ec_integration_test.go * Fix EC integration tests CI: build weed binary and update actions - Add 'Build weed binary' step before running tests - Update actions/setup-go from v4 to v6 (Node20 compatibility) - Update actions/checkout from v2 to v4 (Node20 compatibility) - Move working-directory to test step only * Add disk-aware EC rebalancing integration tests - Add TestDiskAwareECRebalancing test with multi-disk cluster setup - Test EC encode with disk awareness (shows disk ID in output) - Test EC balance with disk-level shard distribution - Add helper functions for disk-level verification: - startMultiDiskCluster: 3 servers x 4 disks each - countShardsPerDisk: track shards per disk per server - calculateDiskShardVariance: measure distribution balance - Verify no single disk is overloaded with shards
2025-10-27Erasure Coding: Ec refactoring (#7396)Chris Lu1-4/+10
* refactor: add ECContext structure to encapsulate EC parameters - Create ec_context.go with ECContext struct - NewDefaultECContext() creates context with default 10+4 configuration - Helper methods: CreateEncoder(), ToExt(), String() - Foundation for cleaner function signatures - No behavior change, still uses hardcoded 10+4 * refactor: update ec_encoder.go to use ECContext - Add WriteEcFilesWithContext() and RebuildEcFilesWithContext() functions - Keep old functions for backward compatibility (call new versions) - Update all internal functions to accept ECContext parameter - Use ctx.DataShards, ctx.ParityShards, ctx.TotalShards consistently - Use ctx.CreateEncoder() instead of hardcoded reedsolomon.New() - Use ctx.ToExt() for shard file extensions - No behavior change, still uses default 10+4 configuration * refactor: update ec_volume.go to use ECContext - Add ECContext field to EcVolume struct - Initialize ECContext with default configuration in NewEcVolume() - Update LocateEcShardNeedleInterval() to use ECContext.DataShards - Phase 1: Always uses default 10+4 configuration - No behavior change * refactor: add EC shard count fields to VolumeInfo protobuf - Add data_shards_count field (field 8) to VolumeInfo message - Add parity_shards_count field (field 9) to VolumeInfo message - Fields are optional, 0 means use default (10+4) - Backward compatible: fields added at end - Phase 1: Foundation for future customization * refactor: regenerate protobuf Go files with EC shard count fields - Regenerated volume_server_pb/*.go with new EC fields - DataShardsCount and ParityShardsCount accessors added to VolumeInfo - No behavior change, fields not yet used * refactor: update VolumeEcShardsGenerate to use ECContext - Create ECContext with default configuration in VolumeEcShardsGenerate - Use ecCtx.TotalShards and ecCtx.ToExt() in cleanup - Call WriteEcFilesWithContext() instead of WriteEcFiles() - Save EC configuration (DataShardsCount, ParityShardsCount) to VolumeInfo - Log EC context being used - Phase 1: Always uses default 10+4 configuration - No behavior change * fmt * refactor: update ec_test.go to use ECContext - Update TestEncodingDecoding to create and use ECContext - Update validateFiles() to accept ECContext parameter - Update removeGeneratedFiles() to use ctx.TotalShards and ctx.ToExt() - Test passes with default 10+4 configuration * refactor: use EcShardConfig message instead of separate fields * optimize: pre-calculate row sizes in EC encoding loop * refactor: replace TotalShards field with Total() method - Remove TotalShards field from ECContext to avoid field drift - Add Total() method that computes DataShards + ParityShards - Update all references to use ctx.Total() instead of ctx.TotalShards - Read EC config from VolumeInfo when loading EC volumes - Read data shard count from .vif in VolumeEcShardsToVolume - Use >= instead of > for exact boundary handling in encoding loops * optimize: simplify VolumeEcShardsToVolume to use existing EC context - Remove redundant CollectEcShards call - Remove redundant .vif file loading - Use v.ECContext.DataShards directly (already loaded by NewEcVolume) - Slice tempShards instead of collecting again * refactor: rename MaxShardId to MaxShardCount for clarity - Change from MaxShardId=31 to MaxShardCount=32 - Eliminates confusing +1 arithmetic (MaxShardId+1) - More intuitive: MaxShardCount directly represents the limit fix: support custom EC ratios beyond 14 shards in VolumeEcShardsToVolume - Add MaxShardId constant (31, since ShardBits is uint32) - Use MaxShardId+1 (32) instead of TotalShardsCount (14) for tempShards buffer - Prevents panic when slicing for volumes with >14 total shards - Critical fix for custom EC configurations like 20+10 * fix: add validation for EC shard counts from VolumeInfo - Validate DataShards/ParityShards are positive and within MaxShardCount - Prevent zero or invalid values that could cause divide-by-zero - Fallback to defaults if validation fails, with warning log - VolumeEcShardsGenerate now preserves existing EC config when regenerating - Critical safety fix for corrupted or legacy .vif files * fix: RebuildEcFiles now loads EC config from .vif file - Critical: RebuildEcFiles was always using default 10+4 config - Now loads actual EC config from .vif file when rebuilding shards - Validates config before use (positive shards, within MaxShardCount) - Falls back to default if .vif missing or invalid - Prevents data corruption when rebuilding custom EC volumes * add: defensive validation for dataShards in VolumeEcShardsToVolume - Validate dataShards > 0 and <= MaxShardCount before use - Prevents panic from corrupted or uninitialized ECContext - Returns clear error message instead of panic - Defense-in-depth: validates even though upstream should catch issues * fix: replace TotalShardsCount with MaxShardCount for custom EC ratio support Critical fixes to support custom EC ratios > 14 shards: disk_location_ec.go: - validateEcVolume: Check shards 0-31 instead of 0-13 during validation - removeEcVolumeFiles: Remove shards 0-31 instead of 0-13 during cleanup ec_volume_info.go ShardBits methods: - ShardIds(): Iterate up to MaxShardCount (32) instead of TotalShardsCount (14) - ToUint32Slice(): Iterate up to MaxShardCount (32) - IndexToShardId(): Iterate up to MaxShardCount (32) - MinusParityShards(): Remove shards 10-31 instead of 10-13 (added note about Phase 2) - Minus() shard size copy: Iterate up to MaxShardCount (32) - resizeShardSizes(): Iterate up to MaxShardCount (32) Without these changes: - Custom EC ratios > 14 total shards would fail validation on startup - Shards 14-31 would never be discovered or cleaned up - ShardBits operations would miss shards >= 14 These changes are backward compatible - MaxShardCount (32) includes the default TotalShardsCount (14), so existing 10+4 volumes work as before. * fix: replace TotalShardsCount with MaxShardCount in critical data structures Critical fixes for buffer allocations and loops that must support custom EC ratios up to 32 shards: Data Structures: - store_ec.go:354: Buffer allocation for shard recovery (bufs array) - topology_ec.go:14: EcShardLocations.Locations fixed array size - command_ec_rebuild.go:268: EC shard map allocation - command_ec_common.go:626: Shard-to-locations map allocation Shard Discovery Loops: - ec_task.go:378: Loop to find generated shard files - ec_shard_management.go: All 8 loops that check/count EC shards These changes are critical because: 1. Buffer allocations sized to 14 would cause index-out-of-bounds panics when accessing shards 14-31 2. Fixed arrays sized to 14 would truncate shard location data 3. Loops limited to 0-13 would never discover/manage shards 14-31 Note: command_ec_encode.go:208 intentionally NOT changed - it creates shard IDs to mount after encoding. In Phase 1 we always generate 14 shards, so this remains TotalShardsCount and will be made dynamic in Phase 2 based on actual EC context. Without these fixes, custom EC ratios > 14 total shards would cause: - Runtime panics (array index out of bounds) - Data loss (shards 14-31 never discovered/tracked) - Incomplete shard management (missing shards not detected) * refactor: move MaxShardCount constant to ec_encoder.go Moved MaxShardCount from ec_volume_info.go to ec_encoder.go to group it with other shard count constants (DataShardsCount, ParityShardsCount, TotalShardsCount). This improves code organization and makes it easier to understand the relationship between these constants. Location: ec_encoder.go line 22, between TotalShardsCount and MinTotalDisks * improve: add defensive programming and better error messages for EC Code review improvements from CodeRabbit: 1. ShardBits Guardrails (ec_volume_info.go): - AddShardId, RemoveShardId: Reject shard IDs >= MaxShardCount - HasShardId: Return false for out-of-range shard IDs - Prevents silent no-ops from bit shifts with invalid IDs 2. Future-Proof Regex (disk_location_ec.go): - Updated regex from \.ec[0-9][0-9] to \.ec\d{2,3} - Now matches .ec00 through .ec999 (currently .ec00-.ec31 used) - Supports future increases to MaxShardCount beyond 99 3. Better Error Messages (volume_grpc_erasure_coding.go): - Include valid range (1..32) in dataShards validation error - Helps operators quickly identify the problem 4. Validation Before Save (volume_grpc_erasure_coding.go): - Validate ECContext (DataShards > 0, ParityShards > 0, Total <= MaxShardCount) - Log EC config being saved to .vif for debugging - Prevents writing invalid configs to disk These changes improve robustness and debuggability without changing core functionality. * fmt * fix: critical bugs from code review + clean up comments Critical bug fixes: 1. command_ec_rebuild.go: Fixed indentation causing compilation error - Properly nested if/for blocks in registerEcNode 2. ec_shard_management.go: Fixed isComplete logic incorrectly using MaxShardCount - Changed from MaxShardCount (32) back to TotalShardsCount (14) - Default 10+4 volumes were being incorrectly reported as incomplete - Missing shards 14-31 were being incorrectly reported as missing - Fixed in 4 locations: volume completeness checks and getMissingShards 3. ec_volume_info.go: Fixed MinusParityShards removing too many shards - Changed from MaxShardCount (32) back to TotalShardsCount (14) - Was incorrectly removing shard IDs 10-31 instead of just 10-13 Comment cleanup: - Removed Phase 1/Phase 2 references (development plan context) - Replaced with clear statements about default 10+4 configuration - SeaweedFS repo uses fixed 10+4 EC ratio, no phases needed Root cause: Over-aggressive replacement of TotalShardsCount with MaxShardCount. MaxShardCount (32) is the limit for buffer allocations and shard ID loops, but TotalShardsCount (14) must be used for default EC configuration logic. * fix: add defensive bounds checks and compute actual shard counts Critical fixes from code review: 1. topology_ec.go: Add defensive bounds checks to AddShard/DeleteShard - Prevent panic when shardId >= MaxShardCount (32) - Return false instead of crashing on out-of-range shard IDs 2. command_ec_common.go: Fix doBalanceEcShardsAcrossRacks - Was using hardcoded TotalShardsCount (14) for all volumes - Now computes actual totalShardsForVolume from rackToShardCount - Fixes incorrect rebalancing for volumes with custom EC ratios - Example: 5+2=7 shards would incorrectly use 14 as average These fixes improve robustness and prepare for future custom EC ratios without changing current behavior for default 10+4 volumes. Note: MinusParityShards and ec_task.go intentionally NOT changed for seaweedfs repo - these will be enhanced in seaweed-enterprise repo where custom EC ratio configuration is added. * fmt * style: make MaxShardCount type casting explicit in loops Improved code clarity by explicitly casting MaxShardCount to the appropriate type when used in loop comparisons: - ShardId comparisons: Cast to ShardId(MaxShardCount) - uint32 comparisons: Cast to uint32(MaxShardCount) Changed in 5 locations: - Minus() loop (line 90) - ShardIds() loop (line 143) - ToUint32Slice() loop (line 152) - IndexToShardId() loop (line 219) - resizeShardSizes() loop (line 248) This makes the intent explicit and improves type safety readability. No functional changes - purely a style improvement.
2025-08-23Shell: support regular expression for collection selection (#7158)Chris Lu1-0/+11
* support regular expression for collection selection * refactor * ordering * fix exact match * Update command_volume_balance_test.go * simplify * Update command_volume_balance.go * comment
2025-08-07ec candidate selection needs to adjust same rack count compare (#7106)Chris Lu1-2/+2
ec needs to adjust same rack count compare
2025-07-16convert error fromating to %w everywhere (#6995)Chris Lu1-1/+1
2025-05-12Move `shell.ErrorWaitGroup` into a common file, to cleanly reuse across ↵Lisandro Pin1-56/+0
`weed shell` commands. (#6780) Move `shell.ErrorWaitGroup` into a dedicated common file, to cleanly reuse across `weed shell` commands.
2025-05-09Improve safety for weed shell's `ec.encode`. (#6773)Lisandro Pin1-0/+8
Improve safety for weed shells `ec.encode`. The current process for `ec.encode` is: 1. EC shards for a volume are generated and added to a single server 2. The original volume is deleted 3. EC shards get re-balanced across the entire topology It is then possible to lose data between #2 and #3, if the underlying volume storage/server/rack/DC happens to fail, for whatever reason. As a fix, this MR reworks `ec.encode` so: * Newly created EC shards are spread across all locations for the source volume. * Source volumes are deleted only after EC shards are converted and balanced.
2025-02-28`ec.encode`: Fix resolution of target collections. (#6585)Lisandro Pin1-2/+2
* Don't ignore empty (`""`) collection names when computing collections for a given volume ID. * `ec.encode`: Fix resolution of target collections. When no `volumeId` parameter is provided, compute volumes based on the provided collection name, even if it's empty (`""`). This restores behavior to before recent EC rebalancing rework. See also https://github.com/seaweedfs/seaweedfs/blob/ec30a504bae6cad75f859964e14c60d39cc43709/weed/shell/command_ec_encode.go#L99 .
2025-02-28Fix calculation of node's free EC shard slots. (#6584)Lisandro Pin1-1/+7
2025-02-07Nit: remove missing newlines on `weed shell` commands output. (#6524)Lisandro Pin1-1/+1
Nit: remove missing newlines on `weed` commands output.
2025-02-06Remove warning on EC balancing if no replica placement settings are found. ↵Lisandro Pin1-3/+0
(#6516) Effectively undoes c9399a68; with ff8bd862, a replica placement type `000` will no longer break shards re-balancing. Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com>
2025-02-04Nit: fix missing newline on EC balancing warnings regarding replica settings ↵Lisandro Pin1-1/+1
(#6509) Nit: fix missing newline on EC balancing warnings regarding replica settings. See 79136812.
2025-01-30Improve EC shards balancing logic regarding replica placement settings. (#6491)Lisandro Pin1-4/+4
The replica placement type specifies numebr of _replicas_ on the same/different rack; that means we can have one EC shard copy on each, even if the replica setting is zero. This PR reworks replica placement parsing for EC rebalancing, so we check allow (replica placement + 1) when selecting racks and nodes to balance EC shards into.
2025-01-29`ec.balance`: Allow EC balancing without collections. (#6488)Lisandro Pin1-4/+4
2025-01-29`ec.encode`: Display a warning on EC balancing if no replica placement ↵Lisandro Pin1-10/+18
settings are found. (#6487)
2024-12-19"golang.org/x/exp/slices" => "slices" and go fmtchrislu1-1/+1
2024-12-18Allow configuring the maximum number of concurrent tasks for EC ↵Lisandro Pin1-45/+48
parallelization. (#6376) Follow-up to b0210df0.
2024-12-17Rework `shell.EcBalance()`'s waitgroup code into a standalone type. (#6373)Lisandro Pin1-54/+61
Rework `shell.EcBalance()`'s waitgroup with errors code into a standalone type. We'll re-use this for other EC jobs - for example, volume creation. Also fixes potential concurrency issues when collecting error results.
2024-12-15Parallelize EC shards balancing within racks (#6354)Lisandro Pin1-5/+5
Parallelize EC shards balancing within racks.
2024-12-13Parallelize EC shards balancing across racks. (#6352)Lisandro Pin1-5/+6
2024-12-13Parallelize EC balancing for racks. (#6351)Lisandro Pin1-15/+15
2024-12-12Begin implementing EC balancing parallelization support. (#6342)Lisandro Pin1-7/+54
* Begin implementing EC balancing parallelization support. Impacts both `ec.encode` and `ec.balance`, * Nit: improve type naming. * Make the goroutine workgroup handler for `EcBalance()` a bit smarter/error-proof. * Nit: unify naming for `ecBalancer` wait group methods with the rest of the module. * Fix concurrency bug. * Fix whitespace after Gitlab automerge. * Delete stray TODO.
2024-12-12Limit EC re-balancing for `ec.encode` to relevant collections when a volume ↵Lisandro Pin1-0/+41
ID argument is provided. (#6347) Limit EC re-balancing for `ec.encode` to relevant collections when a volume ID is provided.
2024-12-10Unify the re-balancing logic for `ec.encode` with `ec.balance`. (#6339)Lisandro Pin1-18/+108
Among others, this enables recent changes related to topology aware re-balancing at EC encoding time.
2024-12-06Remove average constraints when selecting nodes/racks to balance EC shards ↵Lisandro Pin1-37/+8
into. (#6325)
2024-12-05Share common parameters for EC re-balancing functions under a single struct. ↵Lisandro Pin1-55/+63
(#6319) TODO cleanup for https://github.com/seaweedfs/seaweedfs/discussions/6179.
2024-12-04Account for replication placement settings when balancing EC shards within ↵Lisandro Pin1-10/+8
the same rack. (#6317) * Account for replication placement settings when balancing EC shards within racks. * Update help contents for `ec.balance`. * Add a few more representative test cases for `pickEcNodeToBalanceShardsInto()`.
2024-12-04Account for replication placement settings when balancing EC shards across ↵Lisandro Pin1-7/+9
racks. (#6316)
2024-12-02Resolve replica placement for EC volumes from master server defaults. (#6303)Lisandro Pin1-21/+46
2024-11-28Display details upon failures to re-balance EC shards racks. (#6299)Lisandro Pin1-11/+12
2024-11-27Improve EC shards rebalancing logic across nodes (#6297)Lisandro Pin1-21/+59
* Improve EC shards rebalancing logic across nodes. - Favor target nodes with less preexisting shards, to ensure a fair distribution. - Randomize selection when multiple possible target nodes are available. - Add logic to account for replication settings when selecting target nodes (currently disabled). * Fix minor test typo. * Clarify internal error messages for `pickEcNodeToBalanceShardsInto()`.
2024-11-21use math rand v2chrislu1-2/+2
2024-11-21Improve EC shards rebalancing logic across racks (#6270)Lisandro Pin1-20/+63
Improve EC shards rebalancing logic across racks. - Favor target shards with less preexisting shards, to ensure a fair distribution. - Randomize selection when multiple possible target shards are available. - Add logic to account for replication settings when selecting target shards (currently disabled).
2024-11-19Unify usage of shell.EcNode.dc as DataCenterId. (#6258)Lisandro Pin1-6/+5
2024-11-18Introduce logic to resolve volume replica placement within EC rebalancing. ↵Lisandro Pin1-26/+44
(#6254) * Rename `command_ec_encode_test.go` to `command_ec_common_test.go`. All tests defined in this file are now for `command_ec_common.go`. * Minor code cleanups. - Fix broken `ec.balance` test. - Rework integer ceiling division to not use floats, which can introduce precision errors. * Introduce logic to resolve volume replica placement within EC rebalancing. This will be used to make rebalancing logic topology-aware. * Give shell.EcNode.dc a dedicated DataCenterId type.
2024-11-04Refactor `ec.balance` logic into a `weeed/shell/command_ec_common.go`… (#6195)Lisandro Pin1-1/+423
* Refactor `ec.balance` logic into a `weeed/shell/command_ec_common.go` standalone function. This is a prerequisite to unify the balance logic for `ec.balance` and `ec.encode'. * s/Balance()/EcBalance()/g
2023-09-25Revert "Revert "Merge branch 'master' of ↵chrislu1-4/+4
https://github.com/seaweedfs/seaweedfs"" This reverts commit 8cb42c39
2023-09-18Revert "Merge branch 'master' of https://github.com/seaweedfs/seaweedfs"chrislu1-4/+4
This reverts commit 2e5aa06026750c99ea283181974d2ccfe5eb0468, reversing changes made to 4d414f54a224142f3f4d934f4af3b5dceb6fec6b.
2023-09-18Bump github.com/rclone/rclone from 1.63.1 to 1.64.0 (#4850)dependabot[bot]1-4/+4
* Bump github.com/rclone/rclone from 1.63.1 to 1.64.0 Bumps [github.com/rclone/rclone](https://github.com/rclone/rclone) from 1.63.1 to 1.64.0. - [Release notes](https://github.com/rclone/rclone/releases) - [Changelog](https://github.com/rclone/rclone/blob/master/RELEASE.md) - [Commits](https://github.com/rclone/rclone/compare/v1.63.1...v1.64.0) --- updated-dependencies: - dependency-name: github.com/rclone/rclone dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * API changes * go mod --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Chris Lu <chrislusf@users.noreply.github.com> Co-authored-by: chrislu <chris.lu@gmail.com>
2022-10-09refactor to change capacity data typechrislu1-4/+0
2022-09-14refactor(shell): `Decending` -> `Descending` (#3675)Ryan Russell1-2/+2
Signed-off-by: Ryan Russell <git@ryanrussell.org> Signed-off-by: Ryan Russell <git@ryanrussell.org>
2022-08-22shell: stop long running jobs if lock is lostchrislu1-0/+4
2022-07-29move to https://github.com/seaweedfs/seaweedfschrislu1-8/+8
2022-04-18enhancement: replace sort.Slice with slices.SortFunc to reduce reflectionjustin1-9/+8
2022-02-08ec.encode: calculate free ec slots based on (maxVolumeCount-volumeCount)chrislu1-1/+1
fix https://github.com/chrislusf/seaweedfs/issues/2642
2022-02-08volume.balance: add delay during tight loopchrislu1-1/+1
fix https://github.com/chrislusf/seaweedfs/issues/2637
2021-12-26use streaming mode for long poll grpc callschrislu1-4/+4
streaming mode would create separate grpc connections for each call. this is to ensure the long poll connections are properly closed.
2021-09-12change server address from string to a typeChris Lu1-11/+15
2021-02-22refactoringChris Lu1-7/+4
2021-02-16avoid nilChris Lu1-1/+7