test/kafka/kafka-client-loadtest/README.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397

# Kafka Client Load Test for SeaweedFS

This comprehensive load testing suite validates the SeaweedFS MQ stack using real Kafka client libraries. Unlike the existing SMQ tests, this uses actual Kafka clients (`sarama` and `confluent-kafka-go`) to test the complete integration through:

- **Kafka Clients** → **SeaweedFS Kafka Gateway** → **SeaweedFS MQ Broker** → **SeaweedFS Storage**

## Architecture

```
┌─────────────────┐    ┌──────────────────┐    ┌─────────────────────┐
│   Kafka Client  │    │  Kafka Gateway   │    │   SeaweedFS MQ      │
│   Load Test     │───▶│  (Port 9093)     │───▶│   Broker            │
│   - Producers   │    │                  │    │                     │
│   - Consumers   │    │  Protocol        │    │   Topic Management  │
│                 │    │  Translation     │    │   Message Storage   │
└─────────────────┘    └──────────────────┘    └─────────────────────┘
                                                             │
                                                             ▼
                                                ┌─────────────────────┐
                                                │  SeaweedFS Storage  │
                                                │  - Master           │
                                                │  - Volume Server    │
                                                │  - Filer            │
                                                └─────────────────────┘
```

## Features

### 🚀 **Multiple Test Modes**
- **Producer-only**: Pure message production testing
- **Consumer-only**: Consumption from existing topics  
- **Comprehensive**: Full producer + consumer load testing

### 📊 **Rich Metrics & Monitoring**
- Prometheus metrics collection
- Grafana dashboards
- Real-time throughput and latency tracking
- Consumer lag monitoring
- Error rate analysis

### 🔧 **Configurable Test Scenarios**
- **Quick Test**: 1-minute smoke test
- **Standard Test**: 5-minute medium load
- **Stress Test**: 10-minute high load  
- **Endurance Test**: 30-minute sustained load
- **Custom**: Fully configurable parameters

### 📈 **Message Types**
- **JSON**: Structured test messages
- **Avro**: Schema Registry integration
- **Binary**: Raw binary payloads

### 🛠 **Kafka Client Support**
- **Sarama**: Native Go Kafka client
- **Confluent**: Official Confluent Go client
- Schema Registry integration
- Consumer group management

## Quick Start

### Prerequisites
- Docker & Docker Compose
- Make (optional, but recommended)

### 1. Run Default Test
```bash
make test
```
This runs a 5-minute comprehensive test with 10 producers and 5 consumers.

### 2. Quick Smoke Test
```bash
make quick-test
```
1-minute test with minimal load for validation.

### 3. Stress Test
```bash
make stress-test  
```
10-minute high-throughput test with 20 producers and 10 consumers.

### 4. Test with Monitoring
```bash
make test-with-monitoring
```
Includes Prometheus + Grafana dashboards for real-time monitoring.

## Detailed Usage

### Manual Control
```bash
# Start infrastructure only
make start

# Run load test against running infrastructure
make test TEST_MODE=comprehensive TEST_DURATION=10m

# Stop everything
make stop

# Clean up all resources
make clean
```

### Using Scripts Directly
```bash
# Full control with the main script
./scripts/run-loadtest.sh start -m comprehensive -d 10m --monitoring

# Check service health
./scripts/wait-for-services.sh check

# Setup monitoring configurations
./scripts/setup-monitoring.sh
```

### Environment Variables
```bash
export TEST_MODE=comprehensive        # producer, consumer, comprehensive  
export TEST_DURATION=300s            # Test duration
export PRODUCER_COUNT=10              # Number of producer instances
export CONSUMER_COUNT=5               # Number of consumer instances  
export MESSAGE_RATE=1000              # Messages/second per producer
export MESSAGE_SIZE=1024              # Message size in bytes
export TOPIC_COUNT=5                  # Number of topics to create
export PARTITIONS_PER_TOPIC=3         # Partitions per topic

make test
```

## Configuration

### Main Configuration File
Edit `config/loadtest.yaml` to customize:

- **Kafka Settings**: Bootstrap servers, security, timeouts
- **Producer Config**: Batching, compression, acknowledgments  
- **Consumer Config**: Group settings, fetch parameters
- **Message Settings**: Size, format (JSON/Avro/Binary)
- **Schema Registry**: Avro/Protobuf schema validation
- **Metrics**: Prometheus collection intervals
- **Test Scenarios**: Predefined load patterns

### Example Custom Configuration
```yaml
test_mode: "comprehensive"
duration: "600s"  # 10 minutes

producers:
  count: 15
  message_rate: 2000
  message_size: 2048
  compression_type: "snappy"
  acks: "all"

consumers:
  count: 8
  group_prefix: "high-load-group"
  max_poll_records: 1000

topics:
  count: 10
  partitions: 6
  replication_factor: 1
```

## Test Scenarios

### 1. Producer Performance Test
```bash
make producer-test TEST_DURATION=10m PRODUCER_COUNT=20 MESSAGE_RATE=3000
```
Tests maximum message production throughput.

### 2. Consumer Performance Test  
```bash
# First produce messages
make producer-test TEST_DURATION=5m

# Then test consumption
make consumer-test TEST_DURATION=10m CONSUMER_COUNT=15
```

### 3. Schema Registry Integration
```bash
# Enable schemas in config/loadtest.yaml
schemas:
  enabled: true
  
make test
```
Tests Avro message serialization through Schema Registry.

### 4. High Availability Test
```bash
# Test with container restarts during load
make test TEST_DURATION=20m &
sleep 300
docker restart kafka-gateway
```

## Monitoring & Metrics

### Real-Time Dashboards
When monitoring is enabled:
- **Prometheus**: http://localhost:9090
- **Grafana**: http://localhost:3000 (admin/admin)

### Key Metrics Tracked
- **Throughput**: Messages/second, MB/second
- **Latency**: End-to-end message latency percentiles  
- **Errors**: Producer/consumer error rates
- **Consumer Lag**: Per-partition lag monitoring
- **Resource Usage**: CPU, memory, disk I/O

### Grafana Dashboards
- **Kafka Load Test**: Comprehensive test metrics
- **SeaweedFS Cluster**: Storage system health
- **Custom Dashboards**: Extensible monitoring

## Advanced Features

### Schema Registry Testing
```bash
# Test Avro message serialization
export KAFKA_VALUE_TYPE=avro
make test
```

The load test includes:
- Schema registration
- Avro message encoding/decoding  
- Schema evolution testing
- Compatibility validation

### Multi-Client Testing
The test supports both Sarama and Confluent clients:
```go
// Configure in producer/consumer code
useConfluent := true  // Switch client implementation
```

### Consumer Group Rebalancing
- Automatic consumer group management
- Partition rebalancing simulation
- Consumer failure recovery testing

### Chaos Testing
```yaml
chaos:
  enabled: true
  producer_failure_rate: 0.01
  consumer_failure_rate: 0.01
  network_partition_probability: 0.001
```

## Troubleshooting

### Common Issues

#### Services Not Starting
```bash
# Check service health
make health-check

# View detailed logs
make logs

# Debug mode
make debug
```

#### Low Throughput
- Increase `MESSAGE_RATE` and `PRODUCER_COUNT`
- Adjust `batch_size` and `linger_ms` in config
- Check consumer `max_poll_records` setting

#### High Latency
- Reduce `linger_ms` for lower latency
- Adjust `acks` setting (0, 1, or "all")
- Monitor consumer lag

#### Memory Issues  
```bash
# Reduce concurrent clients
make test PRODUCER_COUNT=5 CONSUMER_COUNT=3

# Adjust message size  
make test MESSAGE_SIZE=512
```

### Debug Commands
```bash
# Execute shell in containers
make exec-master
make exec-filer  
make exec-gateway

# Attach to load test
make attach-loadtest

# View real-time stats
curl http://localhost:8080/stats
```

## Development

### Building from Source
```bash
# Set up development environment
make dev-env

# Build load test binary
make build

# Run tests locally (requires Go 1.21+)
cd cmd/loadtest && go run main.go -config ../../config/loadtest.yaml
```

### Extending the Tests
1. **Add new message formats** in `internal/producer/`
2. **Add custom metrics** in `internal/metrics/`  
3. **Create new test scenarios** in `config/loadtest.yaml`
4. **Add monitoring panels** in `monitoring/grafana/dashboards/`

### Contributing
1. Fork the repository
2. Create a feature branch
3. Add tests for new functionality
4. Ensure all tests pass: `make test`
5. Submit a pull request

## Performance Benchmarks

### Expected Performance (on typical hardware)

| Scenario | Producers | Consumers | Rate (msg/s) | Latency (p95) |
|----------|-----------|-----------|--------------|---------------|
| Quick    | 2         | 2         | 200          | <10ms         |
| Standard | 5         | 3         | 2,500        | <20ms         |
| Stress   | 20        | 10        | 40,000       | <50ms         |
| Endurance| 10        | 5         | 10,000       | <30ms         |

*Results vary based on hardware, network, and SeaweedFS configuration*

### Tuning for Maximum Performance
```yaml
producers:
  batch_size: 1000
  linger_ms: 10
  compression_type: "lz4"
  acks: "1"  # Balance between speed and durability

consumers:  
  max_poll_records: 5000
  fetch_min_bytes: 1048576  # 1MB
  fetch_max_wait_ms: 100
```

## Comparison with Existing Tests

| Feature | SMQ Tests | **Kafka Client Load Test** |
|---------|-----------|----------------------------|
| Protocol | SMQ (SeaweedFS native) | **Kafka (industry standard)** |
| Clients | SMQ clients | **Real Kafka clients (Sarama, Confluent)** |
| Schema Registry | ❌ | **✅ Full Avro/Protobuf support** |
| Consumer Groups | Basic | **✅ Full Kafka consumer group features** |
| Monitoring | Basic | **✅ Prometheus + Grafana dashboards** |
| Test Scenarios | Limited | **✅ Multiple predefined scenarios** |
| Real-world | Synthetic | **✅ Production-like workloads** |

This load test provides comprehensive validation of the SeaweedFS Kafka Gateway using real-world Kafka clients and protocols.

---

## Quick Reference

```bash
# Essential Commands
make help                    # Show all available commands
make test                    # Run default comprehensive test  
make quick-test              # 1-minute smoke test
make stress-test             # High-load stress test
make test-with-monitoring    # Include Grafana dashboards
make clean                   # Clean up all resources

# Monitoring
make monitor                 # Start Prometheus + Grafana
# → http://localhost:9090 (Prometheus)
# → http://localhost:3000 (Grafana, admin/admin)

# Advanced
make benchmark               # Run full benchmark suite
make health-check            # Validate service health
make validate-setup          # Check configuration
```