aboutsummaryrefslogtreecommitdiff
path: root/other/java/hdfs3/README.md
blob: 8d1591ba043846633d3685623edf8ec501d353a1 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
# SeaweedFS Hadoop3 Client

Hadoop FileSystem implementation for SeaweedFS, compatible with Hadoop 3.x.

## Building

```bash
mvn clean install
```

## Testing

This project includes two types of tests:

### 1. Configuration Tests (No SeaweedFS Required)

These tests verify configuration handling and initialization logic without requiring a running SeaweedFS instance:

```bash
mvn test -Dtest=SeaweedFileSystemConfigTest
```

### 2. Integration Tests (Requires SeaweedFS)

These tests verify actual FileSystem operations against a running SeaweedFS instance.

#### Prerequisites

1. Start SeaweedFS with default ports:
   ```bash
   # Terminal 1: Start master
   weed master
   
   # Terminal 2: Start volume server
   weed volume -master=localhost:9333
   
   # Terminal 3: Start filer
   weed filer -master=localhost:9333
   ```

2. Verify services are running:
   - Master: http://localhost:9333
   - Filer HTTP: http://localhost:8888
   - Filer gRPC: localhost:18888

#### Running Integration Tests

```bash
# Enable integration tests
export SEAWEEDFS_TEST_ENABLED=true

# Run all tests
mvn test

# Run specific test
mvn test -Dtest=SeaweedFileSystemTest
```

### Test Configuration

Integration tests can be configured via environment variables or system properties:

- `SEAWEEDFS_TEST_ENABLED`: Set to `true` to enable integration tests (default: false)
- Tests use these default connection settings:
  - Filer Host: localhost
  - Filer HTTP Port: 8888
  - Filer gRPC Port: 18888

### Running Tests with Custom Configuration

To test against a different SeaweedFS instance, modify the test code or use Hadoop configuration:

```java
conf.set("fs.seaweed.filer.host", "your-host");
conf.setInt("fs.seaweed.filer.port", 8888);
conf.setInt("fs.seaweed.filer.port.grpc", 18888);
```

## Test Coverage

The test suite covers:

- **Configuration & Initialization**
  - URI parsing and configuration
  - Default values
  - Configuration overrides
  - Working directory management

- **File Operations**
  - Create files
  - Read files
  - Write files
  - Append to files
  - Delete files

- **Directory Operations**
  - Create directories
  - List directory contents
  - Delete directories (recursive and non-recursive)

- **Metadata Operations**
  - Get file status
  - Set permissions
  - Set owner/group
  - Rename files and directories

## Usage in Hadoop

1. Copy the built JAR to your Hadoop classpath:
   ```bash
   cp target/seaweedfs-hadoop3-client-*.jar $HADOOP_HOME/share/hadoop/common/lib/
   ```

2. Configure `core-site.xml`:
   ```xml
   <configuration>
     <property>
       <name>fs.seaweedfs.impl</name>
       <value>seaweed.hdfs.SeaweedFileSystem</value>
     </property>
     <property>
       <name>fs.seaweed.filer.host</name>
       <value>localhost</value>
     </property>
     <property>
       <name>fs.seaweed.filer.port</name>
       <value>8888</value>
     </property>
     <property>
       <name>fs.seaweed.filer.port.grpc</name>
       <value>18888</value>
     </property>
     <!-- Optional: Replication configuration with three priority levels:
          1) If set to non-empty value (e.g. "001") - uses that value
          2) If set to empty string "" - uses SeaweedFS filer's default replication
          3) If not configured (property not present) - uses HDFS replication parameter
     -->
     <!-- <property>
       <name>fs.seaweed.replication</name>
       <value>001</value>
     </property> -->
   </configuration>
   ```

3. Use SeaweedFS with Hadoop commands:
   ```bash
   hadoop fs -ls seaweedfs://localhost:8888/
   hadoop fs -mkdir seaweedfs://localhost:8888/test
   hadoop fs -put local.txt seaweedfs://localhost:8888/test/
   ```

## Continuous Integration

For CI environments, tests can be run in two modes:

1. **Configuration Tests Only** (default, no SeaweedFS required):
   ```bash
   mvn test -Dtest=SeaweedFileSystemConfigTest
   ```

2. **Full Integration Tests** (requires SeaweedFS):
   ```bash
   # Start SeaweedFS in CI environment
   # Then run:
   export SEAWEEDFS_TEST_ENABLED=true
   mvn test
   ```

## Troubleshooting

### Tests are skipped

If you see "Skipping test - SEAWEEDFS_TEST_ENABLED not set":
```bash
export SEAWEEDFS_TEST_ENABLED=true
```

### Connection refused errors

Ensure SeaweedFS is running and accessible:
```bash
curl http://localhost:8888/
```

### gRPC errors

Verify the gRPC port is accessible:
```bash
# Should show the port is listening
netstat -an | grep 18888
```

## Contributing

When adding new features, please include:
1. Configuration tests (no SeaweedFS required)
2. Integration tests (with SEAWEEDFS_TEST_ENABLED guard)
3. Documentation updates