aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
14 daysfix: simplify SeaweedFS readiness check in integration testv1.3.6chrislusf1-19/+7
- Reduce volumeSizeLimitMB from 64 to 16 for faster volume allocation - Trust the readiness probe instead of redundant manual wget check (pod 1/1 Ready means filer port 8888 is responding) - Use 'kubectl wait --for=condition=ready pod' which is more reliable - Add brief 5s stabilization delay after readiness
14 daysfix: simplify SeaweedFS health probe for single-node modechrislusf1-9/+2
Remove livenessProbe on /cluster/healthz which may not work well with -master.peers=none. Keep only the filer readinessProbe which is what we actually need to verify before running CSI tests.
14 daysfix: add health probes to SeaweedFS deployment in integration testchrislusf1-8/+28
- Add readinessProbe for filer (httpGet on port 8888) - Add livenessProbe for master (httpGet on /cluster/healthz port 9333) - Increase wait timeout from 60s to 180s for deployment - Increase filer wait loop from 30 to 60 iterations (3s each) - Add pod status and logs output on failure for debugging Learned from SeaweedFS repo's e2e-mount.yml compose file which uses proper healthchecks for each service.
14 daysfix: log process.stop() error and document 100ms delaychrislusf1-1/+4
- Log warning if stopping mount process fails after mount wait timeout to help diagnose potential zombie processes - Add comment explaining the 100ms delay before unmounting is for FUSE cleanup and pending I/O to complete
14 daysfix: use :dev tag instead of :latest for static manifestschrislusf1-3/+3
Using :latest in static manifests can lead to unpredictable behavior. The :dev tag signals this is a development version and is more appropriate for version-controlled manifests.
14 daysfix: use :latest tag and replace deprecated IsMountPointchrislusf2-9/+10
- Change image tags from :dev to :latest in seaweedfs-csi.yaml for predictable production deployments - Replace deprecated IsMountPoint with IsLikelyNotMountPoint for consistency with k8s.io/mount-utils recommendations
14 daysfix: remove redundant unmount call in Unmount functionchrislusf1-6/+3
The weedMountProcess.wait() function already handles unmounting when the process terminates. Removing the explicit unmount call in Unmount() centralizes the unmount logic and avoids potential race conditions.
14 daysfix: improve Unmount transactionality and add contextual loggingchrislusf1-6/+35
- Unmount now uses getMount first, only removes from state after all cleanup operations succeed (transactional behavior) - Add volume ID prefix to weed mount stdout/stderr logs for better debugging when multiple mounts are active
14 dayschore: remove accidentally committed binarychrislusf1-0/+0
14 daysrefactor: use generic helper for mount/unmount handlerschrislusf2-41/+27
Reduce code duplication by using a generic makePostHandler function that abstracts the common logic of handling POST requests, decoding JSON, calling a manager function, and encoding the JSON response.
14 daysfix: address gemini review - permissions and process stopchrislusf1-4/+4
- Change cacheDir permissions from 0750 to 0755 for non-root access - Change targetPath (mount point) permissions from 0750 to 0755 - Remove ineffective os.ErrProcessDone checks (not exported in os package)
14 daysfix: address gemini review - OnDelete strategy and log invalid endpointchrislusf2-4/+6
- Change seaweedfs-mount DaemonSet updateStrategy from RollingUpdate to OnDelete in seaweedfs-csi.yaml for consistency with values.yaml (safer for active mounts) - Add warning log when invalid mountEndpoint is provided to aid debugging
14 daysfix: improve integration test reliability and make mount service opt-inchrislusf2-9/+12
- Add hard failure if SeaweedFS filer never becomes ready (exit 1 after loop) - Remove || true from CSI pod readiness checks for earlier failure detection - Change mountService.enabled default to false (opt-in) for safer upgrades Existing installations won't unexpectedly get a new privileged DaemonSet
14 daysfix: skip DockerHub login for PRs where secrets are unavailablechrislusf1-1/+2
14 daysfix: address gemini-code-assist review commentschrislusf2-6/+9
- Change mountService updateStrategy from RollingUpdate to OnDelete (mount service not yet resilient to its own restarts) - Change mountService image from :latest to :dev for consistency - Fix defer os.RemoveAll: explicitly remove cache dir after process stops to avoid removing while process might still be running
14 daysfix: use StatusInternalServerError for mount/unmount errorschrislusf1-2/+2
Errors from manager.Mount and manager.Unmount can be due to internal server issues (filesystem errors, process start failures) not just bad client requests.
14 daysfix: address PR review commentschrislusf6-25/+39
- Set localSocket in rebuildVolumeFromStaging to fix invalid gRPC target - Use SHA256 hash (16 hex chars) in LocalSocketPath to minimize collision risk - Update GitHub Actions to latest versions (checkout@v4, metadata-action@v5, etc.) - Fix volumeMounts/volumes conditional mismatch in helm templates - Add documentation for mountService defaults in values.yaml
14 daysfix: use correct namespace (default) and app labels in integration testchrislusf1-12/+10
- CSI driver deploys to 'default' namespace, not 'seaweedfs-csi' - Fix app labels: seaweedfs-controller, seaweedfs-node, seaweedfs-mount - Update log collection to use correct labels
14 daysci: add -master.peers=none to speed up SeaweedFS startup in testschrislusf1-0/+1
14 daysfix: correct filer address placeholder in integration testchrislusf1-3/+16
- Fix sed pattern: replace SEAWEEDFS_FILER:8888 instead of localhost:8888 - Add readiness check for SeaweedFS filer before deploying CSI driver - Wait for all CSI components (controller, node, mount service) - Increase wait time for pods to start
14 daysfix: improve ensureTargetClean to be non-recursive and always ensure ↵chrislusf1-10/+12
directory exists Address review feedback from gemini-code-assist: - Replace recursive approach with non-recursive to avoid potential stack overflow - Always call os.MkdirAll at the end to ensure directory exists after unmount - Add better error messages with context - Add logging for unmount operations
14 daysci: add CSI integration test workflowchrislusf1-0/+228
- Sets up kind cluster - Deploys SeaweedFS server - Deploys CSI driver with mount service - Creates StorageClass and PVC - Runs functional tests (write/read) - Collects logs on failure for debugging
14 daysfix: address code review feedbackchrislusf9-30/+62
- CRITICAL: Make socket path configurable based on mountEndpoint - Added volumeSocketDir field to SeaweedFsDriver - LocalSocketPath now accepts baseDir parameter - Derived from mountEndpoint for user-configurable socket paths - HIGH: Pin seaweedfs version in Dockerfiles for reproducible builds - Added SEAWEEDFS_VERSION build arg (default: 3.80) - Clone specific tag instead of master - HIGH: Fix Dockerfile.dev to use local context instead of personal fork - Removed hardcoded zemul/seaweedfs-csi-driver clone - Now uses COPY . . for local development - HIGH: Change :latest to :dev in kubernetes manifests - Mutable :latest tag replaced with :dev for predictability - MEDIUM: Remove Aliyun mirror from Dockerfile.dev - Region-specific mirrors shouldn't be in general-purpose files - MEDIUM: Improve error handling in client.go - Now reports read errors when failing to read error response body - MEDIUM: Fix inconsistent error return in manager.go - Return nil instead of empty struct on error (Go idiom)
14 daysfix: skip DockerHub login for PRs, update action versions to fix deprecation ↵chrislusf1-12/+11
warnings
14 daysfix: build from context instead of cloning masterchrislusf2-2/+2
14 daysfix: use Go 1.23 and build from context instead of cloning masterchrislusf2-12/+17
14 daysfix: use kubeMounter instead of mountutil in manager.gochrislusf1-2/+2
14 daysfeat: add log泽淼 周1-15/+4
14 daysfix: Zombie process泽淼 周1-4/+6
14 daysadd: merge prepare params.泽淼 周9-40/+158
14 daysadd helm file泽淼 周1-0/+78
14 daysOptimization: Reduce unnecessary logic of seaweedfs-mount泽淼 周7-271/+168
14 daysfix: seaweedfs-csi.yaml volumes config泽淼 周2-16/+34
14 daysfix: build error泽淼 周1-1/+0
14 daysfix: build error泽淼 周1-1/+1
14 daysadjust: Dockerfile泽淼 周1-2/+2
14 daysfeat: helm config泽淼 周3-1/+45
14 daysfix: seaweefs-mount image泽淼 周1-1/+1
14 daysfeat: seaweedfs-mount daemonset泽淼 周1-0/+57
14 daysfeat: Separated weed mount lifecycle into a dedicated service and rewired ↵泽淼 周14-319/+951
the CSI components to call it.
2025-12-03fix: use RemoveAll for more robust staging path cleanupv1.3.5chrislusf1-1/+2
Address gemini-code-assist review - use os.RemoveAll instead of os.Remove to handle cases where the directory is not empty after an imperfect unmount. This ensures complete cleanup of stale staging paths.
2025-12-03fix: cleanup volume mutex on self-healing failure in NodePublishVolumechrislusf1-0/+2
Address gemini-code-assist review - when cleanup or re-staging fails during self-healing in NodePublishVolume, remove the volume mutex to avoid leaving stale entries. This maintains consistency with NodeStageVolume's error handling behavior.
2025-12-03fix: cleanup staged mount on quota application failurechrislusf1-0/+4
Address CodeRabbit review - if volume.Stage() succeeds but volume.Quota() fails, clean up the staged mount before returning the error to avoid leaving an orphaned FUSE process.
2025-12-03fix: propagate errors instead of just logging warningschrislusf2-3/+6
Address gemini-code-assist review feedback: 1. Return error from volume.Quota() failure in stageNewVolume - quota failures should fail the staging operation 2. Return error from cleanupStaleStagingPath() in NodeStageVolume - fail fast if cleanup fails rather than attempting to stage anyway 3. Return error from cleanupStaleStagingPath() in NodePublishVolume - same fail-fast behavior for consistency 4. Return error from mount.CleanupMountPoint() in Volume.Unstage() - propagate cleanup errors to caller as expected
2025-12-03fix: preserve healthy mounts in NodeStageVolume instead of re-stagingchrislusf1-13/+13
Address CodeRabbit review - when a healthy staging path exists after driver restart, rebuild the cache using rebuildVolumeFromStaging() instead of cleaning up and re-staging. This: - Maintains consistency with NodePublishVolume behavior - Avoids disrupting existing published volumes that are bind-mounted - Makes NodeStageVolume idempotent as per CSI spec
2025-12-03fix: add nil checks for AccessMode to prevent panicchrislusf1-2/+10
Address CodeRabbit review feedback - add defensive nil checks for GetVolumeCapability() and GetAccessMode() in both isPublishVolumeReadOnly and isVolumeReadOnly to prevent potential nil pointer dereference.
2025-12-03refactor: address code review feedbackchrislusf3-59/+43
- Handle unexpected stat errors in cleanupStaleStagingPath (high priority) - Extract staging logic into stageNewVolume helper method for reuse - Extract isReadOnlyAccessMode helper to avoid duplicated read-only checks - Remove redundant mountutil.Unmount call (CleanupMountPoint already handles it)
2025-12-03fix: add self-healing for volume mount failures after driver restartchrislusf3-7/+205
This addresses issue #203 - CSI Driver Self-Healing for Volume Mount Failures. Problem: When the CSI node driver restarts, the in-memory volume cache is lost. Kubelet then directly calls NodePublishVolume (skipping NodeStageVolume), which fails with 'volume hasn't been staged yet' error. Solution: 1. Added isStagingPathHealthy() to detect healthy vs stale/corrupted mounts 2. Added cleanupStaleStagingPath() to clean up stale mount points 3. Enhanced NodeStageVolume to clean up stale mounts before staging 4. Implemented self-healing in NodePublishVolume: - If staging path is healthy: rebuild volume cache from existing mount - If staging path is stale: clean up and re-stage automatically 5. Updated Volume.Unstage to handle rebuilt volumes without unmounter Benefits: - Automatic recovery after CSI driver restarts - No manual intervention required (no kubelet/pod restarts needed) - Handles both live and dead FUSE mount scenarios - Backward compatible with normal operations Fixes #203
2025-11-28Resolve merge conflicts in go.mod and go.sumv1.3.4Chris Lu3-7/+7
2025-11-28disk full when buildingChris Lu3-68/+76