aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorchrislu <chris.lu@gmail.com>2025-06-07 15:52:51 -0700
committerchrislu <chris.lu@gmail.com>2025-06-07 15:52:51 -0700
commit60f11f651054a22f2bb16520be231f4ba9ba1640 (patch)
treef7d0858ca376f4a0422afca7665e1c89fa27ee95
parent61c4f01e05b88253b3759eb18765f479f7c7c0e8 (diff)
downloadseaweedfs-60f11f651054a22f2bb16520be231f4ba9ba1640.tar.xz
seaweedfs-60f11f651054a22f2bb16520be231f4ba9ba1640.zip
add a readme file for volume needle data layout
-rw-r--r--weed/storage/needle/README.md160
1 files changed, 160 insertions, 0 deletions
diff --git a/weed/storage/needle/README.md b/weed/storage/needle/README.md
new file mode 100644
index 000000000..23abf1ce7
--- /dev/null
+++ b/weed/storage/needle/README.md
@@ -0,0 +1,160 @@
+# Needle Layout Format
+
+This document describes the binary layout of the Needle structure as used in SeaweedFS storage, for all supported versions (v1, v2, v3).
+
+A Needle represents a file or data blob stored in a volume file. The layout determines how the Needle is serialized to disk for efficient storage and retrieval.
+
+---
+
+## Common Field Sizes
+
+| Field | Size (bytes) |
+|---------------|--------------|
+| Cookie | 4 |
+| NeedleId | 8 |
+| Size | 4 |
+| DataSize | 4 |
+| Flags | 1 |
+| NameSize | 1 |
+| MimeSize | 1 |
+| LastModified | 5 |
+| Ttl | 2 |
+| PairsSize | 2 |
+| Checksum | 4 |
+| Timestamp | 8 |
+
+---
+
+## Needle Layouts by Version
+
+### Version 1
+
+| Offset | Field | Size (bytes) | Description |
+|----------|----------|--------------|-----------------------------------------------|
+| 0 | Cookie | 4 | Random number to mitigate brute force lookups |
+| 4 | Id | 8 | Needle ID |
+| 12 | Size | 4 | Length of Data |
+| 16 | Data | N | File data (N = Size) |
+| 16+N | Checksum | 4 | CRC32 of Data |
+| 20+N | Padding | 0-7 | To align to 8 bytes |
+
+### Version 2
+
+| Offset | Field | Size (bytes) | Description |
+|----------|-------------|--------------|-----------------------------------------------|
+| 0 | Cookie | 4 | Random number |
+| 4 | Id | 8 | Needle ID |
+| 12 | Size | 4 | Total size of the following fields |
+| 16 | DataSize | 4 | Length of Data (N) |
+| 20 | Data | N | File data |
+| 20+N | Flags | 1 | Bit flags |
+| 21+N | NameSize | 1 (opt) | Optional, if present |
+| 22+N | Name | M (opt) | Optional, if present (M = NameSize) |
+| ... | MimeSize | 1 (opt) | Optional, if present |
+| ... | Mime | K (opt) | Optional, if present (K = MimeSize) |
+| ... | LastModified| 5 (opt) | Optional, if present |
+| ... | Ttl | 2 (opt) | Optional, if present |
+| ... | PairsSize | 2 (opt) | Optional, if present |
+| ... | Pairs | P (opt) | Optional, if present (P = PairsSize) |
+| ... | Checksum | 4 | CRC32 |
+| ... | Padding | 0-7 | To align to 8 bytes |
+
+### Version 3
+
+| Offset | Field | Size (bytes) | Description |
+|----------|-------------|--------------|-----------------------------------------------|
+| 0 | Cookie | 4 | Random number |
+| 4 | Id | 8 | Needle ID |
+| 12 | Size | 4 | Total size of the following fields |
+| 16 | DataSize | 4 | Length of Data (N) |
+| 20 | Data | N | File data |
+| 20+N | Flags | 1 | Bit flags |
+| 21+N | NameSize | 1 (opt) | Optional, if present |
+| 22+N | Name | M (opt) | Optional, if present (M = NameSize) |
+| ... | MimeSize | 1 (opt) | Optional, if present |
+| ... | Mime | K (opt) | Optional, if present (K = MimeSize) |
+| ... | LastModified| 5 (opt) | Optional, if present |
+| ... | Ttl | 2 (opt) | Optional, if present |
+| ... | PairsSize | 2 (opt) | Optional, if present |
+| ... | Pairs | P (opt) | Optional, if present (P = PairsSize) |
+| ... | Checksum | 4 | CRC32 |
+| ... | Timestamp | 8 | Append time in nanoseconds |
+| ... | Padding | 0-7 | To align to 8 bytes |
+
+- Offsets marked with `...` depend on the presence and size of previous optional fields.
+- Fields marked `(opt)` are optional and only present if the corresponding size or flag is non-zero.
+- N = DataSize, M = NameSize, K = MimeSize, P = PairsSize.
+
+---
+
+## Field Explanations
+
+- **Cookie**: 4 bytes, random value for security.
+- **Id**: 8 bytes, unique identifier for the Needle.
+- **Size**: 4 bytes, total size of the Needle data section (not including header, checksum, timestamp, or padding).
+- **DataSize**: 4 bytes, length of the Data field.
+- **Data**: File data (variable length).
+- **Flags**: 1 byte, bit flags for Needle properties.
+- **NameSize/Name**: 1 byte + variable, optional file name.
+- **MimeSize/Mime**: 1 byte + variable, optional MIME type.
+- **LastModified**: 5 bytes, optional last modified timestamp.
+- **Ttl**: 2 bytes, optional time-to-live.
+- **PairsSize/Pairs**: 2 bytes + variable, optional key-value pairs.
+- **Checksum**: 4 bytes, CRC32 checksum of the Needle data.
+- **Timestamp**: 8 bytes, append time (only in v3).
+- **Padding**: 0-7 bytes, to align the total Needle size to 8 bytes.
+
+---
+
+## Version Comparison Table
+
+| Field | v1 | v2 | v3 |
+|---------------|----|----|----|
+| Cookie | ✔ | ✔ | ✔ |
+| Id | ✔ | ✔ | ✔ |
+| Size | ✔ | ✔ | ✔ |
+| DataSize | | ✔ | ✔ |
+| Data | ✔ | ✔ | ✔ |
+| Flags | | ✔ | ✔ |
+| NameSize/Name | | ✔ | ✔ |
+| MimeSize/Mime | | ✔ | ✔ |
+| LastModified | | ✔ | ✔ |
+| Ttl | | ✔ | ✔ |
+| PairsSize/Pairs| | ✔ | ✔ |
+| Checksum | ✔ | ✔ | ✔ |
+| Timestamp | | | ✔ |
+| Padding | ✔ | ✔ | ✔ |
+
+---
+
+## Flags Field Details
+
+The `Flags` field (present in v2 and v3) is a bitmask that encodes several boolean properties of the Needle. Each bit has a specific meaning:
+
+| Bit Value | Name | Meaning |
+|-----------|-----------------------|--------------------------------------------------------------|
+| 0x01 | FlagIsCompressed | Data is compressed (isCompressed) |
+| 0x02 | FlagHasName | Name field is present (NameSize/Name) |
+| 0x04 | FlagHasMime | Mime field is present (MimeSize/Mime) |
+| 0x08 | FlagHasLastModifiedDate| LastModified field is present |
+| 0x10 | FlagHasTtl | Ttl field is present |
+| 0x20 | FlagHasPairs | Pairs field is present (PairsSize/Pairs) |
+| 0x80 | FlagIsChunkManifest | Data is a chunk manifest (for large files) |
+
+- If a flag is set, the corresponding field(s) will appear in the Needle layout at the appropriate position.
+- The `Flags` field is always present in v2 and v3, immediately after the Data field.
+
+## Optional Fields
+
+- Fields marked as optional in the layout tables are only present if the corresponding flag in the `Flags` field is set (except for Name/Mime/Pairs, which also depend on their size fields being non-zero).
+- The order of optional fields is fixed and matches the order of their flags.
+
+## Special Notes
+
+- **isCompressed**: If set, the Data field is compressed (typically using gzip). This is indicated by the lowest bit (0x01) in the Flags byte.
+- **isChunkManifest**: If set, the Data field contains a manifest describing chunks of a large file, not raw file data.
+- **All multi-byte fields** are stored in big-endian order.
+- **Padding** is always added at the end to align the total Needle size to 8 bytes.
+- **N = DataSize, M = NameSize, K = MimeSize, P = PairsSize** in the layout tables above.
+
+For more details, see the implementation in the corresponding Go files in this directory. \ No newline at end of file