diff options
Diffstat (limited to 'test/kafka/kafka-client-loadtest/README.md')
| -rw-r--r-- | test/kafka/kafka-client-loadtest/README.md | 397 |
1 files changed, 397 insertions, 0 deletions
diff --git a/test/kafka/kafka-client-loadtest/README.md b/test/kafka/kafka-client-loadtest/README.md new file mode 100644 index 000000000..4f465a21b --- /dev/null +++ b/test/kafka/kafka-client-loadtest/README.md @@ -0,0 +1,397 @@ +# Kafka Client Load Test for SeaweedFS + +This comprehensive load testing suite validates the SeaweedFS MQ stack using real Kafka client libraries. Unlike the existing SMQ tests, this uses actual Kafka clients (`sarama` and `confluent-kafka-go`) to test the complete integration through: + +- **Kafka Clients** → **SeaweedFS Kafka Gateway** → **SeaweedFS MQ Broker** → **SeaweedFS Storage** + +## Architecture + +``` +┌─────────────────┐ ┌──────────────────┐ ┌─────────────────────┐ +│ Kafka Client │ │ Kafka Gateway │ │ SeaweedFS MQ │ +│ Load Test │───▶│ (Port 9093) │───▶│ Broker │ +│ - Producers │ │ │ │ │ +│ - Consumers │ │ Protocol │ │ Topic Management │ +│ │ │ Translation │ │ Message Storage │ +└─────────────────┘ └──────────────────┘ └─────────────────────┘ + │ + ▼ + ┌─────────────────────┐ + │ SeaweedFS Storage │ + │ - Master │ + │ - Volume Server │ + │ - Filer │ + └─────────────────────┘ +``` + +## Features + +### 🚀 **Multiple Test Modes** +- **Producer-only**: Pure message production testing +- **Consumer-only**: Consumption from existing topics +- **Comprehensive**: Full producer + consumer load testing + +### 📊 **Rich Metrics & Monitoring** +- Prometheus metrics collection +- Grafana dashboards +- Real-time throughput and latency tracking +- Consumer lag monitoring +- Error rate analysis + +### 🔧 **Configurable Test Scenarios** +- **Quick Test**: 1-minute smoke test +- **Standard Test**: 5-minute medium load +- **Stress Test**: 10-minute high load +- **Endurance Test**: 30-minute sustained load +- **Custom**: Fully configurable parameters + +### 📈 **Message Types** +- **JSON**: Structured test messages +- **Avro**: Schema Registry integration +- **Binary**: Raw binary payloads + +### 🛠 **Kafka Client Support** +- **Sarama**: Native Go Kafka client +- **Confluent**: Official Confluent Go client +- Schema Registry integration +- Consumer group management + +## Quick Start + +### Prerequisites +- Docker & Docker Compose +- Make (optional, but recommended) + +### 1. Run Default Test +```bash +make test +``` +This runs a 5-minute comprehensive test with 10 producers and 5 consumers. + +### 2. Quick Smoke Test +```bash +make quick-test +``` +1-minute test with minimal load for validation. + +### 3. Stress Test +```bash +make stress-test +``` +10-minute high-throughput test with 20 producers and 10 consumers. + +### 4. Test with Monitoring +```bash +make test-with-monitoring +``` +Includes Prometheus + Grafana dashboards for real-time monitoring. + +## Detailed Usage + +### Manual Control +```bash +# Start infrastructure only +make start + +# Run load test against running infrastructure +make test TEST_MODE=comprehensive TEST_DURATION=10m + +# Stop everything +make stop + +# Clean up all resources +make clean +``` + +### Using Scripts Directly +```bash +# Full control with the main script +./scripts/run-loadtest.sh start -m comprehensive -d 10m --monitoring + +# Check service health +./scripts/wait-for-services.sh check + +# Setup monitoring configurations +./scripts/setup-monitoring.sh +``` + +### Environment Variables +```bash +export TEST_MODE=comprehensive # producer, consumer, comprehensive +export TEST_DURATION=300s # Test duration +export PRODUCER_COUNT=10 # Number of producer instances +export CONSUMER_COUNT=5 # Number of consumer instances +export MESSAGE_RATE=1000 # Messages/second per producer +export MESSAGE_SIZE=1024 # Message size in bytes +export TOPIC_COUNT=5 # Number of topics to create +export PARTITIONS_PER_TOPIC=3 # Partitions per topic + +make test +``` + +## Configuration + +### Main Configuration File +Edit `config/loadtest.yaml` to customize: + +- **Kafka Settings**: Bootstrap servers, security, timeouts +- **Producer Config**: Batching, compression, acknowledgments +- **Consumer Config**: Group settings, fetch parameters +- **Message Settings**: Size, format (JSON/Avro/Binary) +- **Schema Registry**: Avro/Protobuf schema validation +- **Metrics**: Prometheus collection intervals +- **Test Scenarios**: Predefined load patterns + +### Example Custom Configuration +```yaml +test_mode: "comprehensive" +duration: "600s" # 10 minutes + +producers: + count: 15 + message_rate: 2000 + message_size: 2048 + compression_type: "snappy" + acks: "all" + +consumers: + count: 8 + group_prefix: "high-load-group" + max_poll_records: 1000 + +topics: + count: 10 + partitions: 6 + replication_factor: 1 +``` + +## Test Scenarios + +### 1. Producer Performance Test +```bash +make producer-test TEST_DURATION=10m PRODUCER_COUNT=20 MESSAGE_RATE=3000 +``` +Tests maximum message production throughput. + +### 2. Consumer Performance Test +```bash +# First produce messages +make producer-test TEST_DURATION=5m + +# Then test consumption +make consumer-test TEST_DURATION=10m CONSUMER_COUNT=15 +``` + +### 3. Schema Registry Integration +```bash +# Enable schemas in config/loadtest.yaml +schemas: + enabled: true + +make test +``` +Tests Avro message serialization through Schema Registry. + +### 4. High Availability Test +```bash +# Test with container restarts during load +make test TEST_DURATION=20m & +sleep 300 +docker restart kafka-gateway +``` + +## Monitoring & Metrics + +### Real-Time Dashboards +When monitoring is enabled: +- **Prometheus**: http://localhost:9090 +- **Grafana**: http://localhost:3000 (admin/admin) + +### Key Metrics Tracked +- **Throughput**: Messages/second, MB/second +- **Latency**: End-to-end message latency percentiles +- **Errors**: Producer/consumer error rates +- **Consumer Lag**: Per-partition lag monitoring +- **Resource Usage**: CPU, memory, disk I/O + +### Grafana Dashboards +- **Kafka Load Test**: Comprehensive test metrics +- **SeaweedFS Cluster**: Storage system health +- **Custom Dashboards**: Extensible monitoring + +## Advanced Features + +### Schema Registry Testing +```bash +# Test Avro message serialization +export KAFKA_VALUE_TYPE=avro +make test +``` + +The load test includes: +- Schema registration +- Avro message encoding/decoding +- Schema evolution testing +- Compatibility validation + +### Multi-Client Testing +The test supports both Sarama and Confluent clients: +```go +// Configure in producer/consumer code +useConfluent := true // Switch client implementation +``` + +### Consumer Group Rebalancing +- Automatic consumer group management +- Partition rebalancing simulation +- Consumer failure recovery testing + +### Chaos Testing +```yaml +chaos: + enabled: true + producer_failure_rate: 0.01 + consumer_failure_rate: 0.01 + network_partition_probability: 0.001 +``` + +## Troubleshooting + +### Common Issues + +#### Services Not Starting +```bash +# Check service health +make health-check + +# View detailed logs +make logs + +# Debug mode +make debug +``` + +#### Low Throughput +- Increase `MESSAGE_RATE` and `PRODUCER_COUNT` +- Adjust `batch_size` and `linger_ms` in config +- Check consumer `max_poll_records` setting + +#### High Latency +- Reduce `linger_ms` for lower latency +- Adjust `acks` setting (0, 1, or "all") +- Monitor consumer lag + +#### Memory Issues +```bash +# Reduce concurrent clients +make test PRODUCER_COUNT=5 CONSUMER_COUNT=3 + +# Adjust message size +make test MESSAGE_SIZE=512 +``` + +### Debug Commands +```bash +# Execute shell in containers +make exec-master +make exec-filer +make exec-gateway + +# Attach to load test +make attach-loadtest + +# View real-time stats +curl http://localhost:8080/stats +``` + +## Development + +### Building from Source +```bash +# Set up development environment +make dev-env + +# Build load test binary +make build + +# Run tests locally (requires Go 1.21+) +cd cmd/loadtest && go run main.go -config ../../config/loadtest.yaml +``` + +### Extending the Tests +1. **Add new message formats** in `internal/producer/` +2. **Add custom metrics** in `internal/metrics/` +3. **Create new test scenarios** in `config/loadtest.yaml` +4. **Add monitoring panels** in `monitoring/grafana/dashboards/` + +### Contributing +1. Fork the repository +2. Create a feature branch +3. Add tests for new functionality +4. Ensure all tests pass: `make test` +5. Submit a pull request + +## Performance Benchmarks + +### Expected Performance (on typical hardware) + +| Scenario | Producers | Consumers | Rate (msg/s) | Latency (p95) | +|----------|-----------|-----------|--------------|---------------| +| Quick | 2 | 2 | 200 | <10ms | +| Standard | 5 | 3 | 2,500 | <20ms | +| Stress | 20 | 10 | 40,000 | <50ms | +| Endurance| 10 | 5 | 10,000 | <30ms | + +*Results vary based on hardware, network, and SeaweedFS configuration* + +### Tuning for Maximum Performance +```yaml +producers: + batch_size: 1000 + linger_ms: 10 + compression_type: "lz4" + acks: "1" # Balance between speed and durability + +consumers: + max_poll_records: 5000 + fetch_min_bytes: 1048576 # 1MB + fetch_max_wait_ms: 100 +``` + +## Comparison with Existing Tests + +| Feature | SMQ Tests | **Kafka Client Load Test** | +|---------|-----------|----------------------------| +| Protocol | SMQ (SeaweedFS native) | **Kafka (industry standard)** | +| Clients | SMQ clients | **Real Kafka clients (Sarama, Confluent)** | +| Schema Registry | ❌ | **✅ Full Avro/Protobuf support** | +| Consumer Groups | Basic | **✅ Full Kafka consumer group features** | +| Monitoring | Basic | **✅ Prometheus + Grafana dashboards** | +| Test Scenarios | Limited | **✅ Multiple predefined scenarios** | +| Real-world | Synthetic | **✅ Production-like workloads** | + +This load test provides comprehensive validation of the SeaweedFS Kafka Gateway using real-world Kafka clients and protocols. + +--- + +## Quick Reference + +```bash +# Essential Commands +make help # Show all available commands +make test # Run default comprehensive test +make quick-test # 1-minute smoke test +make stress-test # High-load stress test +make test-with-monitoring # Include Grafana dashboards +make clean # Clean up all resources + +# Monitoring +make monitor # Start Prometheus + Grafana +# → http://localhost:9090 (Prometheus) +# → http://localhost:3000 (Grafana, admin/admin) + +# Advanced +make benchmark # Run full benchmark suite +make health-check # Validate service health +make validate-setup # Check configuration +``` |
