Phase 3: Advanced ML pattern detection and training optimization

- Add DatasetPatternDetector with ML-specific dataset access pattern analysis * Sequential, shuffle, batch, multi-epoch, distributed, and validation patterns * Epoch boundary detection and dataset traversal analysis * Adaptive prefetch recommendations based on detected patterns * Comprehensive throughput and performance metrics - Implement TrainingOptimizer for ML workload lifecycle management * Training phase detection (initialization, training, validation, checkpointing) * Model file access optimization with checkpoint frequency tracking * Training workload registration and multi-workload support * Adaptive optimization levels based on training phase and performance - Create BatchOptimizer for intelligent batch access pattern optimization * Linear, strided, shuffled, hierarchical, multi-GPU, and pipelined batch patterns * Batch sequence detection with predictive next-batch recommendations * Configurable prefetch strategies per batch pattern type * Performance-aware optimization with hit rate tracking - Enhance MLOptimization core integration * Unified interface integrating all Phase 1, 2, and 3 components * Coordinated shutdown and lifecycle management * Comprehensive metrics aggregation across all ML optimization layers - Add Phase 3 comprehensive test coverage * Dataset pattern detection validation * Training optimizer workload management testing * Batch optimization pattern recognition testing * End-to-end ML optimization integration testing Architecture Highlights: - Clean separation of concerns with specialized detectors for different ML patterns - Adaptive optimization that responds to detected training phases and patterns - Scalable design supporting multiple concurrent training workloads - Rich metrics and monitoring for all ML optimization components - Production-ready with proper cleanup, timeouts, and resource management Test Results: Core Phase 3 functionality verified and passing Integration: Seamlessly builds upon Phase 1 prefetching and Phase 2 caching foundations
author: chrislu <chris.lu@gmail.com> 2025-08-30 15:53:35 -0700
committer: chrislu <chris.lu@gmail.com> 2025-08-30 15:53:35 -0700
commit: 29edb780d9fbabda7e28d56eecf9beeaff76d12d (patch)
tree: 22c735f812f66a9c4c3d6c4978ad5e4703940799 /weed/mount/ml/access_pattern.go
parent: 63b94321ec015ca6565364fc3b97f9a849f7e0ee (diff)
download: seaweedfs-29edb780d9fbabda7e28d56eecf9beeaff76d12d.tar.xz
seaweedfs-29edb780d9fbabda7e28d56eecf9beeaff76d12d.zip
1 files changed, 4 insertions, 18 deletions
diff --git a/weed/mount/ml/access_pattern.go b/weed/mount/ml/access_pattern.go
index 4c7ed03a8..05670c616 100644
--- a/weed/mount/ml/access_pattern.go
+++ b/weed/mount/ml/access_pattern.go
@@ -14,7 +14,7 @@ const (
 	RandomAccess AccessPattern = iota
 	SequentialAccess
 	StridedAccess    // Common in image datasets - fixed stride between accesses
-	BatchAccess      // Multiple files accessed together
+	BatchGroupAccess // Multiple files accessed together
 	EpochAccess      // Dataset restart patterns (ML training)
 	ModelAccess      // Large model checkpoint loading
 )
@@ -27,8 +27,8 @@ func (ap AccessPattern) String() string {
 		return "Sequential"
 	case StridedAccess:
 		return "Strided"
-	case BatchAccess:
-		return "Batch"
+	case BatchGroupAccess:
+		return "BatchGroup"
 	case EpochAccess:
 		return "Epoch"
 	case ModelAccess:
@@ -384,21 +384,7 @@ func (apd *AccessPatternDetector) CleanupOldEntries(maxAge time.Duration) {
 	}
 }
 
-// Helper functions
-
-func minInt64(a, b int64) int64 {
-	if a < b {
-		return a
-	}
-	return b
-}
-
-func maxInt64(a, b int64) int64 {
-	if a > b {
-		return a
-	}
-	return b
-}
+// Helper functions moved to dataset_pattern.go to avoid redeclaration
 
 func minFloat(a, b float64) float64 {
 	if a < b {
author	chrislu <chris.lu@gmail.com>	2025-08-30 15:53:35 -0700
committer	chrislu <chris.lu@gmail.com>	2025-08-30 15:53:35 -0700
commit	29edb780d9fbabda7e28d56eecf9beeaff76d12d (patch)
tree	22c735f812f66a9c4c3d6c4978ad5e4703940799 /weed/mount/ml/access_pattern.go
parent	63b94321ec015ca6565364fc3b97f9a849f7e0ee (diff)
download	seaweedfs-29edb780d9fbabda7e28d56eecf9beeaff76d12d.tar.xz seaweedfs-29edb780d9fbabda7e28d56eecf9beeaff76d12d.zip