diff options
| author | Chris Lu <chrislusf@users.noreply.github.com> | 2025-12-03 21:12:19 -0800 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2025-12-03 21:12:19 -0800 |
| commit | 39ba19eea6d47a5d35c67064d560fb569c6c5baf (patch) | |
| tree | 6213a3d8acd5698964eb200555cd276c3c3285fe /weed/filer/empty_folder_cleanup/cleanup_queue_test.go | |
| parent | 268cc84e8c8629c4824d4cc30c79cc8dac0a5142 (diff) | |
| download | seaweedfs-39ba19eea6d47a5d35c67064d560fb569c6c5baf.tar.xz seaweedfs-39ba19eea6d47a5d35c67064d560fb569c6c5baf.zip | |
filer: async empty folder cleanup via metadata events (#7614)
* filer: async empty folder cleanup via metadata events
Implements asynchronous empty folder cleanup when files are deleted in S3.
Key changes:
1. EmptyFolderCleaner - New component that handles folder cleanup:
- Uses consistent hashing (LockRing) to determine folder ownership
- Each filer owns specific folders, avoiding duplicate cleanup work
- Debounces delete events (10s delay) to batch multiple deletes
- Caches rough folder counts to skip unnecessary checks
- Cancels pending cleanup when new files are created
- Handles both file and subdirectory deletions
2. Integration with metadata events:
- Listens to both local and remote filer metadata events
- Processes create/delete/rename events to track folder state
- Only processes folders under /buckets/<bucket>/...
3. Removed synchronous empty folder cleanup from S3 handlers:
- DeleteObjectHandler no longer calls DoDeleteEmptyParentDirectories
- DeleteMultipleObjectsHandler no longer tracks/cleans directories
- Cleanup now happens asynchronously via metadata events
Benefits:
- Non-blocking: S3 delete requests return immediately
- Coordinated: Only one filer (the owner) cleans each folder
- Efficient: Batching and caching reduce unnecessary checks
- Event-driven: Folder deletion triggers parent folder check automatically
* filer: add CleanupQueue data structure for deduplicated folder cleanup
CleanupQueue uses a linked list for FIFO ordering and a hashmap for O(1)
deduplication. Processing is triggered when:
- Queue size reaches maxSize (default 1000), OR
- Oldest item exceeds maxAge (default 10 minutes)
Key features:
- O(1) Add, Remove, Pop, Contains operations
- Duplicate folders are ignored (keeps original position/time)
- Testable with injectable time function
- Thread-safe with mutex protection
* filer: use CleanupQueue for empty folder cleanup
Replace timer-per-folder approach with queue-based processing:
- Use CleanupQueue for deduplication and ordered processing
- Process queue when full (1000 items) or oldest item exceeds 10 minutes
- Background processor checks queue every 10 seconds
- Remove from queue on create events to cancel pending cleanup
Benefits:
- Bounded memory: queue has max size, not unlimited timers
- Efficient: O(1) add/remove/contains operations
- Batch processing: handle many folders efficiently
- Better for high-volume delete scenarios
* filer: CleanupQueue.Add moves duplicate to back with updated time
When adding a folder that already exists in the queue:
- Remove it from its current position
- Add it to the back of the queue
- Update the queue time to current time
This ensures that folders with recent delete activity are processed
later, giving more time for additional deletes to occur.
* filer: CleanupQueue uses event time and inserts in sorted order
Changes:
- Add() now takes eventTime parameter instead of using current time
- Insert items in time-sorted order (oldest at front) to handle out-of-order events
- When updating duplicate with newer time, reposition to maintain sort order
- Ignore updates with older time (keep existing later time)
This ensures proper ordering when processing events from distributed filers
where event arrival order may not match event occurrence order.
* filer: remove unused CleanupQueue functions (SetNowFunc, GetAll)
Removed test-only functions:
- SetNowFunc: tests now use real time with past event times
- GetAll: tests now use Pop() to verify order
Kept functions used in production:
- Peek: used in filer_notify_read.go
- OldestAge: used in empty_folder_cleaner.go logging
* filer: initialize cache entry on first delete/create event
Previously, roughCount was only updated if the cache entry already
existed, but entries were only created during executeCleanup. This
meant delete/create events before the first cleanup didn't track
the count.
Now create the cache entry on first event, so roughCount properly
tracks all changes from the start.
* filer: skip adding to cleanup queue if roughCount > 0
If the cached roughCount indicates there are still items in the
folder, don't bother adding it to the cleanup queue. This avoids
unnecessary queue entries and reduces wasted cleanup checks.
* filer: don't create cache entry on create event
Only update roughCount if the folder is already being tracked.
New folders don't need tracking until we see a delete event.
* filer: move empty folder cleanup to its own package
- Created weed/filer/empty_folder_cleanup package
- Defined FilerOperations interface to break circular dependency
- Added CountDirectoryEntries method to Filer
- Exported IsUnderPath and IsUnderBucketPath helper functions
* filer: make isUnderPath and isUnderBucketPath private
These helpers are only used within the empty_folder_cleanup package.
Diffstat (limited to 'weed/filer/empty_folder_cleanup/cleanup_queue_test.go')
| -rw-r--r-- | weed/filer/empty_folder_cleanup/cleanup_queue_test.go | 370 |
1 files changed, 370 insertions, 0 deletions
diff --git a/weed/filer/empty_folder_cleanup/cleanup_queue_test.go b/weed/filer/empty_folder_cleanup/cleanup_queue_test.go new file mode 100644 index 000000000..eda1c3633 --- /dev/null +++ b/weed/filer/empty_folder_cleanup/cleanup_queue_test.go @@ -0,0 +1,370 @@ +package empty_folder_cleanup + +import ( + "testing" + "time" +) + +func TestCleanupQueue_Add(t *testing.T) { + q := NewCleanupQueue(100, 10*time.Minute) + now := time.Now() + + // Add first item + if !q.Add("/buckets/b1/folder1", now) { + t.Error("expected Add to return true for new item") + } + if q.Len() != 1 { + t.Errorf("expected len 1, got %d", q.Len()) + } + + // Add second item with later time + if !q.Add("/buckets/b1/folder2", now.Add(1*time.Second)) { + t.Error("expected Add to return true for new item") + } + if q.Len() != 2 { + t.Errorf("expected len 2, got %d", q.Len()) + } + + // Add duplicate with newer time - should update and reposition + if q.Add("/buckets/b1/folder1", now.Add(2*time.Second)) { + t.Error("expected Add to return false for existing item") + } + if q.Len() != 2 { + t.Errorf("expected len 2 after duplicate, got %d", q.Len()) + } + + // folder1 should now be at the back (newer time) - verify by popping + folder1, _ := q.Pop() + folder2, _ := q.Pop() + if folder1 != "/buckets/b1/folder2" || folder2 != "/buckets/b1/folder1" { + t.Errorf("expected folder1 to be moved to back, got %s, %s", folder1, folder2) + } +} + +func TestCleanupQueue_Add_OutOfOrder(t *testing.T) { + q := NewCleanupQueue(100, 10*time.Minute) + baseTime := time.Now() + + // Add items out of order + q.Add("/buckets/b1/folder3", baseTime.Add(3*time.Second)) + q.Add("/buckets/b1/folder1", baseTime.Add(1*time.Second)) + q.Add("/buckets/b1/folder2", baseTime.Add(2*time.Second)) + + // Items should be in time order (oldest first) - verify by popping + expected := []string{"/buckets/b1/folder1", "/buckets/b1/folder2", "/buckets/b1/folder3"} + for i, exp := range expected { + folder, ok := q.Pop() + if !ok || folder != exp { + t.Errorf("at index %d: expected %s, got %s", i, exp, folder) + } + } +} + +func TestCleanupQueue_Add_DuplicateWithOlderTime(t *testing.T) { + q := NewCleanupQueue(100, 10*time.Minute) + baseTime := time.Now() + + // Add folder at t=5 + q.Add("/buckets/b1/folder1", baseTime.Add(5*time.Second)) + + // Try to add same folder with older time - should NOT update + q.Add("/buckets/b1/folder1", baseTime.Add(2*time.Second)) + + // Time should remain at t=5 + _, queueTime, _ := q.Peek() + if queueTime != baseTime.Add(5*time.Second) { + t.Errorf("expected time to remain unchanged, got %v", queueTime) + } +} + +func TestCleanupQueue_Remove(t *testing.T) { + q := NewCleanupQueue(100, 10*time.Minute) + now := time.Now() + + q.Add("/buckets/b1/folder1", now) + q.Add("/buckets/b1/folder2", now.Add(1*time.Second)) + q.Add("/buckets/b1/folder3", now.Add(2*time.Second)) + + // Remove middle item + if !q.Remove("/buckets/b1/folder2") { + t.Error("expected Remove to return true for existing item") + } + if q.Len() != 2 { + t.Errorf("expected len 2, got %d", q.Len()) + } + if q.Contains("/buckets/b1/folder2") { + t.Error("removed item should not be in queue") + } + + // Remove non-existent item + if q.Remove("/buckets/b1/nonexistent") { + t.Error("expected Remove to return false for non-existent item") + } + + // Verify order is preserved by popping + folder1, _ := q.Pop() + folder3, _ := q.Pop() + if folder1 != "/buckets/b1/folder1" || folder3 != "/buckets/b1/folder3" { + t.Errorf("unexpected order: %s, %s", folder1, folder3) + } +} + +func TestCleanupQueue_Pop(t *testing.T) { + q := NewCleanupQueue(100, 10*time.Minute) + now := time.Now() + + // Pop from empty queue + folder, ok := q.Pop() + if ok { + t.Error("expected Pop to return false for empty queue") + } + if folder != "" { + t.Errorf("expected empty folder, got %s", folder) + } + + // Add items and pop in order + q.Add("/buckets/b1/folder1", now) + q.Add("/buckets/b1/folder2", now.Add(1*time.Second)) + q.Add("/buckets/b1/folder3", now.Add(2*time.Second)) + + folder, ok = q.Pop() + if !ok || folder != "/buckets/b1/folder1" { + t.Errorf("expected folder1, got %s (ok=%v)", folder, ok) + } + + folder, ok = q.Pop() + if !ok || folder != "/buckets/b1/folder2" { + t.Errorf("expected folder2, got %s (ok=%v)", folder, ok) + } + + folder, ok = q.Pop() + if !ok || folder != "/buckets/b1/folder3" { + t.Errorf("expected folder3, got %s (ok=%v)", folder, ok) + } + + // Queue should be empty now + if q.Len() != 0 { + t.Errorf("expected empty queue, got len %d", q.Len()) + } +} + +func TestCleanupQueue_Peek(t *testing.T) { + q := NewCleanupQueue(100, 10*time.Minute) + now := time.Now() + + // Peek empty queue + folder, _, ok := q.Peek() + if ok { + t.Error("expected Peek to return false for empty queue") + } + + // Add item and peek + q.Add("/buckets/b1/folder1", now) + folder, queueTime, ok := q.Peek() + if !ok || folder != "/buckets/b1/folder1" { + t.Errorf("expected folder1, got %s (ok=%v)", folder, ok) + } + if queueTime != now { + t.Errorf("expected queue time %v, got %v", now, queueTime) + } + + // Peek should not remove item + if q.Len() != 1 { + t.Errorf("Peek should not remove item, len=%d", q.Len()) + } +} + +func TestCleanupQueue_Contains(t *testing.T) { + q := NewCleanupQueue(100, 10*time.Minute) + now := time.Now() + + q.Add("/buckets/b1/folder1", now) + + if !q.Contains("/buckets/b1/folder1") { + t.Error("expected Contains to return true") + } + if q.Contains("/buckets/b1/folder2") { + t.Error("expected Contains to return false for non-existent") + } +} + +func TestCleanupQueue_ShouldProcess_MaxSize(t *testing.T) { + q := NewCleanupQueue(3, 10*time.Minute) + now := time.Now() + + // Empty queue + if q.ShouldProcess() { + t.Error("empty queue should not need processing") + } + + // Add items below max + q.Add("/buckets/b1/folder1", now) + q.Add("/buckets/b1/folder2", now.Add(1*time.Second)) + if q.ShouldProcess() { + t.Error("queue below max should not need processing") + } + + // Add item to reach max + q.Add("/buckets/b1/folder3", now.Add(2*time.Second)) + if !q.ShouldProcess() { + t.Error("queue at max should need processing") + } +} + +func TestCleanupQueue_ShouldProcess_MaxAge(t *testing.T) { + q := NewCleanupQueue(100, 100*time.Millisecond) // Short max age for testing + + // Add item with old event time + oldTime := time.Now().Add(-1 * time.Second) // 1 second ago + q.Add("/buckets/b1/folder1", oldTime) + + // Item is older than maxAge, should need processing + if !q.ShouldProcess() { + t.Error("old item should trigger processing") + } + + // Clear and add fresh item + q.Clear() + q.Add("/buckets/b1/folder2", time.Now()) + + // Fresh item should not trigger processing + if q.ShouldProcess() { + t.Error("fresh item should not trigger processing") + } +} + +func TestCleanupQueue_Clear(t *testing.T) { + q := NewCleanupQueue(100, 10*time.Minute) + now := time.Now() + + q.Add("/buckets/b1/folder1", now) + q.Add("/buckets/b1/folder2", now.Add(1*time.Second)) + q.Add("/buckets/b1/folder3", now.Add(2*time.Second)) + + q.Clear() + + if q.Len() != 0 { + t.Errorf("expected empty queue after Clear, got len %d", q.Len()) + } + if q.Contains("/buckets/b1/folder1") { + t.Error("queue should not contain items after Clear") + } +} + +func TestCleanupQueue_OldestAge(t *testing.T) { + q := NewCleanupQueue(100, 10*time.Minute) + + // Empty queue + if q.OldestAge() != 0 { + t.Error("empty queue should have zero oldest age") + } + + // Add item with time in the past + oldTime := time.Now().Add(-5 * time.Minute) + q.Add("/buckets/b1/folder1", oldTime) + + // Age should be approximately 5 minutes + age := q.OldestAge() + if age < 4*time.Minute || age > 6*time.Minute { + t.Errorf("expected ~5m age, got %v", age) + } +} + +func TestCleanupQueue_TimeOrder(t *testing.T) { + q := NewCleanupQueue(100, 10*time.Minute) + baseTime := time.Now() + + // Add items in order + items := []string{ + "/buckets/b1/a", + "/buckets/b1/b", + "/buckets/b1/c", + "/buckets/b1/d", + "/buckets/b1/e", + } + for i, item := range items { + q.Add(item, baseTime.Add(time.Duration(i)*time.Second)) + } + + // Pop should return in time order + for i, expected := range items { + got, ok := q.Pop() + if !ok { + t.Errorf("Pop %d: expected item, got empty", i) + } + if got != expected { + t.Errorf("Pop %d: expected %s, got %s", i, expected, got) + } + } +} + +func TestCleanupQueue_DuplicateWithNewerTime(t *testing.T) { + q := NewCleanupQueue(100, 10*time.Minute) + baseTime := time.Now() + + // Add items + q.Add("/buckets/b1/folder1", baseTime) + q.Add("/buckets/b1/folder2", baseTime.Add(1*time.Second)) + q.Add("/buckets/b1/folder3", baseTime.Add(2*time.Second)) + + // Add duplicate with newer time - should update and reposition + q.Add("/buckets/b1/folder1", baseTime.Add(3*time.Second)) + + // folder1 should now be at the back (newest time) - verify by popping + expected := []string{"/buckets/b1/folder2", "/buckets/b1/folder3", "/buckets/b1/folder1"} + for i, exp := range expected { + folder, ok := q.Pop() + if !ok || folder != exp { + t.Errorf("at index %d: expected %s, got %s", i, exp, folder) + } + } +} + +func TestCleanupQueue_Concurrent(t *testing.T) { + q := NewCleanupQueue(1000, 10*time.Minute) + done := make(chan bool) + now := time.Now() + + // Concurrent adds + go func() { + for i := 0; i < 100; i++ { + q.Add("/buckets/b1/folder"+string(rune('A'+i%26)), now.Add(time.Duration(i)*time.Millisecond)) + } + done <- true + }() + + // Concurrent removes + go func() { + for i := 0; i < 50; i++ { + q.Remove("/buckets/b1/folder" + string(rune('A'+i%26))) + } + done <- true + }() + + // Concurrent pops + go func() { + for i := 0; i < 30; i++ { + q.Pop() + } + done <- true + }() + + // Concurrent reads + go func() { + for i := 0; i < 100; i++ { + q.Len() + q.Contains("/buckets/b1/folderA") + q.ShouldProcess() + } + done <- true + }() + + // Wait for all goroutines + for i := 0; i < 4; i++ { + <-done + } + + // Just verify no panic occurred and queue is in consistent state + _ = q.Len() +} + |
