diff options
| author | Chris Lu <chrislusf@users.noreply.github.com> | 2025-07-30 12:38:03 -0700 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2025-07-30 12:38:03 -0700 |
| commit | 891a2fb6ebc324329f5330a140b8cacff3899db4 (patch) | |
| tree | d02aaa80a909e958aea831f206b3240b0237d7b7 /weed/command | |
| parent | 64198dad8346fe284cbef944fe01ff0d062c147d (diff) | |
| download | seaweedfs-891a2fb6ebc324329f5330a140b8cacff3899db4.tar.xz seaweedfs-891a2fb6ebc324329f5330a140b8cacff3899db4.zip | |
Admin: misc improvements on admin server and workers. EC now works. (#7055)
* initial design
* added simulation as tests
* reorganized the codebase to move the simulation framework and tests into their own dedicated package
* integration test. ec worker task
* remove "enhanced" reference
* start master, volume servers, filer
Current Status
✅ Master: Healthy and running (port 9333)
✅ Filer: Healthy and running (port 8888)
✅ Volume Servers: All 6 servers running (ports 8080-8085)
🔄 Admin/Workers: Will start when dependencies are ready
* generate write load
* tasks are assigned
* admin start wtih grpc port. worker has its own working directory
* Update .gitignore
* working worker and admin. Task detection is not working yet.
* compiles, detection uses volumeSizeLimitMB from master
* compiles
* worker retries connecting to admin
* build and restart
* rendering pending tasks
* skip task ID column
* sticky worker id
* test canScheduleTaskNow
* worker reconnect to admin
* clean up logs
* worker register itself first
* worker can run ec work and report status
but:
1. one volume should not be repeatedly worked on.
2. ec shards needs to be distributed and source data should be deleted.
* move ec task logic
* listing ec shards
* local copy, ec. Need to distribute.
* ec is mostly working now
* distribution of ec shards needs improvement
* need configuration to enable ec
* show ec volumes
* interval field UI component
* rename
* integration test with vauuming
* garbage percentage threshold
* fix warning
* display ec shard sizes
* fix ec volumes list
* Update ui.go
* show default values
* ensure correct default value
* MaintenanceConfig use ConfigField
* use schema defined defaults
* config
* reduce duplication
* refactor to use BaseUIProvider
* each task register its schema
* checkECEncodingCandidate use ecDetector
* use vacuumDetector
* use volumeSizeLimitMB
* remove
remove
* remove unused
* refactor
* use new framework
* remove v2 reference
* refactor
* left menu can scroll now
* The maintenance manager was not being initialized when no data directory was configured for persistent storage.
* saving config
* Update task_config_schema_templ.go
* enable/disable tasks
* protobuf encoded task configurations
* fix system settings
* use ui component
* remove logs
* interface{} Reduction
* reduce interface{}
* reduce interface{}
* avoid from/to map
* reduce interface{}
* refactor
* keep it DRY
* added logging
* debug messages
* debug level
* debug
* show the log caller line
* use configured task policy
* log level
* handle admin heartbeat response
* Update worker.go
* fix EC rack and dc count
* Report task status to admin server
* fix task logging, simplify interface checking, use erasure_coding constants
* factor in empty volume server during task planning
* volume.list adds disk id
* track disk id also
* fix locking scheduled and manual scanning
* add active topology
* simplify task detector
* ec task completed, but shards are not showing up
* implement ec in ec_typed.go
* adjust log level
* dedup
* implementing ec copying shards and only ecx files
* use disk id when distributing ec shards
🎯 Planning: ActiveTopology creates DestinationPlan with specific TargetDisk
📦 Task Creation: maintenance_integration.go creates ECDestination with DiskId
🚀 Task Execution: EC task passes DiskId in VolumeEcShardsCopyRequest
💾 Volume Server: Receives disk_id and stores shards on specific disk (vs.store.Locations[req.DiskId])
📂 File System: EC shards and metadata land in the exact disk directory planned
* Delete original volume from all locations
* clean up existing shard locations
* local encoding and distributing
* Update docker/admin_integration/EC-TESTING-README.md
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* check volume id range
* simplify
* fix tests
* fix types
* clean up logs and tests
---------
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Diffstat (limited to 'weed/command')
| -rw-r--r-- | weed/command/admin.go | 17 | ||||
| -rw-r--r-- | weed/command/worker.go | 66 |
2 files changed, 74 insertions, 9 deletions
diff --git a/weed/command/admin.go b/weed/command/admin.go index 6ac42330c..c1b55f105 100644 --- a/weed/command/admin.go +++ b/weed/command/admin.go @@ -33,6 +33,7 @@ var ( type AdminOptions struct { port *int + grpcPort *int masters *string adminUser *string adminPassword *string @@ -42,6 +43,7 @@ type AdminOptions struct { func init() { cmdAdmin.Run = runAdmin // break init cycle a.port = cmdAdmin.Flag.Int("port", 23646, "admin server port") + a.grpcPort = cmdAdmin.Flag.Int("port.grpc", 0, "gRPC server port for worker connections (default: http port + 10000)") a.masters = cmdAdmin.Flag.String("masters", "localhost:9333", "comma-separated master servers") a.dataDir = cmdAdmin.Flag.String("dataDir", "", "directory to store admin configuration and data files") @@ -50,7 +52,7 @@ func init() { } var cmdAdmin = &Command{ - UsageLine: "admin -port=23646 -masters=localhost:9333 [-dataDir=/path/to/data]", + UsageLine: "admin -port=23646 -masters=localhost:9333 [-port.grpc=33646] [-dataDir=/path/to/data]", Short: "start SeaweedFS web admin interface", Long: `Start a web admin interface for SeaweedFS cluster management. @@ -63,12 +65,13 @@ var cmdAdmin = &Command{ - Maintenance operations The admin interface automatically discovers filers from the master servers. - A gRPC server for worker connections runs on HTTP port + 10000. + A gRPC server for worker connections runs on the configured gRPC port (default: HTTP port + 10000). Example Usage: weed admin -port=23646 -masters="master1:9333,master2:9333" weed admin -port=23646 -masters="localhost:9333" -dataDir="/var/lib/seaweedfs-admin" - weed admin -port=23646 -masters="localhost:9333" -dataDir="~/seaweedfs-admin" + weed admin -port=23646 -port.grpc=33646 -masters="localhost:9333" -dataDir="~/seaweedfs-admin" + weed admin -port=9900 -port.grpc=19900 -masters="localhost:9333" Data Directory: - If dataDir is specified, admin configuration and maintenance data is persisted @@ -128,6 +131,11 @@ func runAdmin(cmd *Command, args []string) bool { return false } + // Set default gRPC port if not specified + if *a.grpcPort == 0 { + *a.grpcPort = *a.port + 10000 + } + // Security warnings if *a.adminPassword == "" { fmt.Println("WARNING: Admin interface is running without authentication!") @@ -135,6 +143,7 @@ func runAdmin(cmd *Command, args []string) bool { } fmt.Printf("Starting SeaweedFS Admin Interface on port %d\n", *a.port) + fmt.Printf("Worker gRPC server will run on port %d\n", *a.grpcPort) fmt.Printf("Masters: %s\n", *a.masters) fmt.Printf("Filers will be discovered automatically from masters\n") if *a.dataDir != "" { @@ -232,7 +241,7 @@ func startAdminServer(ctx context.Context, options AdminOptions) error { } // Start worker gRPC server for worker connections - err = adminServer.StartWorkerGrpcServer(*options.port) + err = adminServer.StartWorkerGrpcServer(*options.grpcPort) if err != nil { return fmt.Errorf("failed to start worker gRPC server: %w", err) } diff --git a/weed/command/worker.go b/weed/command/worker.go index f217e57f7..6e592f73f 100644 --- a/weed/command/worker.go +++ b/weed/command/worker.go @@ -3,6 +3,7 @@ package command import ( "os" "os/signal" + "path/filepath" "strings" "syscall" "time" @@ -21,7 +22,7 @@ import ( ) var cmdWorker = &Command{ - UsageLine: "worker -admin=<admin_server> [-capabilities=<task_types>] [-maxConcurrent=<num>]", + UsageLine: "worker -admin=<admin_server> [-capabilities=<task_types>] [-maxConcurrent=<num>] [-workingDir=<path>]", Short: "start a maintenance worker to process cluster maintenance tasks", Long: `Start a maintenance worker that connects to an admin server to process maintenance tasks like vacuum, erasure coding, remote upload, and replication fixes. @@ -34,6 +35,7 @@ Examples: weed worker -admin=admin.example.com:23646 weed worker -admin=localhost:23646 -capabilities=vacuum,replication weed worker -admin=localhost:23646 -maxConcurrent=4 + weed worker -admin=localhost:23646 -workingDir=/tmp/worker `, } @@ -43,6 +45,7 @@ var ( workerMaxConcurrent = cmdWorker.Flag.Int("maxConcurrent", 2, "maximum number of concurrent tasks") workerHeartbeatInterval = cmdWorker.Flag.Duration("heartbeat", 30*time.Second, "heartbeat interval") workerTaskRequestInterval = cmdWorker.Flag.Duration("taskInterval", 5*time.Second, "task request interval") + workerWorkingDir = cmdWorker.Flag.String("workingDir", "", "working directory for the worker") ) func init() { @@ -67,6 +70,45 @@ func runWorker(cmd *Command, args []string) bool { return false } + // Set working directory and create task-specific subdirectories + var baseWorkingDir string + if *workerWorkingDir != "" { + glog.Infof("Setting working directory to: %s", *workerWorkingDir) + if err := os.Chdir(*workerWorkingDir); err != nil { + glog.Fatalf("Failed to change working directory: %v", err) + return false + } + wd, err := os.Getwd() + if err != nil { + glog.Fatalf("Failed to get working directory: %v", err) + return false + } + baseWorkingDir = wd + glog.Infof("Current working directory: %s", baseWorkingDir) + } else { + // Use default working directory when not specified + wd, err := os.Getwd() + if err != nil { + glog.Fatalf("Failed to get current working directory: %v", err) + return false + } + baseWorkingDir = wd + glog.Infof("Using current working directory: %s", baseWorkingDir) + } + + // Create task-specific subdirectories + for _, capability := range capabilities { + taskDir := filepath.Join(baseWorkingDir, string(capability)) + if err := os.MkdirAll(taskDir, 0755); err != nil { + glog.Fatalf("Failed to create task directory %s: %v", taskDir, err) + return false + } + glog.Infof("Created task directory: %s", taskDir) + } + + // Create gRPC dial option using TLS configuration + grpcDialOption := security.LoadClientTLS(util.GetViper(), "grpc.worker") + // Create worker configuration config := &types.WorkerConfig{ AdminServer: *workerAdminServer, @@ -74,6 +116,8 @@ func runWorker(cmd *Command, args []string) bool { MaxConcurrent: *workerMaxConcurrent, HeartbeatInterval: *workerHeartbeatInterval, TaskRequestInterval: *workerTaskRequestInterval, + BaseWorkingDir: baseWorkingDir, + GrpcDialOption: grpcDialOption, } // Create worker instance @@ -82,9 +126,6 @@ func runWorker(cmd *Command, args []string) bool { glog.Fatalf("Failed to create worker: %v", err) return false } - - // Create admin client with LoadClientTLS - grpcDialOption := security.LoadClientTLS(util.GetViper(), "grpc.worker") adminClient, err := worker.CreateAdminClient(*workerAdminServer, workerInstance.ID(), grpcDialOption) if err != nil { glog.Fatalf("Failed to create admin client: %v", err) @@ -94,10 +135,25 @@ func runWorker(cmd *Command, args []string) bool { // Set admin client workerInstance.SetAdminClient(adminClient) + // Set working directory + if *workerWorkingDir != "" { + glog.Infof("Setting working directory to: %s", *workerWorkingDir) + if err := os.Chdir(*workerWorkingDir); err != nil { + glog.Fatalf("Failed to change working directory: %v", err) + return false + } + wd, err := os.Getwd() + if err != nil { + glog.Fatalf("Failed to get working directory: %v", err) + return false + } + glog.Infof("Current working directory: %s", wd) + } + // Start the worker err = workerInstance.Start() if err != nil { - glog.Fatalf("Failed to start worker: %v", err) + glog.Errorf("Failed to start worker: %v", err) return false } |
