diff options
Diffstat (limited to 'note/replication.txt')
| -rw-r--r-- | note/replication.txt | 86 |
1 files changed, 86 insertions, 0 deletions
diff --git a/note/replication.txt b/note/replication.txt new file mode 100644 index 000000000..c4bf46044 --- /dev/null +++ b/note/replication.txt @@ -0,0 +1,86 @@ +1. each file can choose the replication factor +2. replication granularity is in volume level +3. if not enough spaces, we can automatically decrease some volume's the replication factor, especially for cold data +4. plan to support migrating data to cheaper storage +5. plan to manual volume placement, access-based volume placement, auction based volume placement + +When a new volume server is started, it reports + 1. how many volumes it can hold + 2. current list of existing volumes and each volume's replication type +Each volume server remembers: + 1. current volume ids + 2. replica locations are read from the master + +The master assign volume ids based on + 1. replication factor + data center, rack + 2. concurrent write support +On master, stores the replication configuration +{ + replication:{ + {type:"00", min_volume_count:3, weight:10}, + {type:"01", min_volume_count:2, weight:20}, + {type:"10", min_volume_count:2, weight:20}, + {type:"11", min_volume_count:3, weight:30}, + {type:"20", min_volume_count:2, weight:20} + }, + port:9333, +} +Or manually via command line + 1. add volume with specified replication factor + 2. add volume with specified volume id + + +If duplicated volume ids are reported from different volume servers, +the master determines the replication factor of the volume, +if less than the replication factor, the volume is in readonly mode +if more than the replication factor, the volume will purge the smallest/oldest volume +if equal, the volume will function as usual + + +Use cases: + on volume server + 1. weed volume -mserver="xx.xx.xx.xx:9333" -publicUrl="good.com:8080" -dir="/tmp" -volumes=50 + on weed master + 1. weed master -port=9333 + generate a default json configuration file if doesn't exist + +Bootstrap + 1. at the very beginning, the system has no volumes at all. +When data node starts: + 1. each data node send to master its existing volumes and max volume blocks + 2. master remembers the topology/data_center/rack/data_node/volumes + for each replication level, stores + volume id ~ data node + writable volume ids +If any "assign" request comes in + 1. find a writable volume with the right replicationLevel + 2. if not found, grow the volumes with the right replication level + 3. return a writable volume to the user + + +Plan: + Step 1. implement one copy(no replication), automatically assign volume ids + Step 2. add replication + +For the above operations, here are the todo list: + for data node: + 0. detect existing volumes DONE + 1. onStartUp, and periodically, send existing volumes and maxVolumeCount store.Join(), DONE + 2. accept command to grow a volume( id + replication level) DONE + /admin/assign_volume?volume=some_id&replicationType=01 + 3. accept setting volumeLocationList DONE + /admin/set_volume_locations_list?volumeLocationsList=[{Vid:xxx,Locations:[loc1,loc2,loc3]}] + 4. for each write, pass the write to the next location, (Step 2) + POST method should accept an index, like ttl, get decremented every hop + for master: + 1. accept data node's report of existing volumes and maxVolumeCount ALREADY EXISTS /dir/join + 2. periodically refresh for active data nodes, and adjust writable volumes + 3. send command to grow a volume(id + replication level) DONE + 4. NOT_IMPLEMENTING: if dead/stale data nodes are found, for the affected volumes, send stale info + to other data nodes. BECAUSE the master will stop sending writes to these data nodes + 5. accept lookup for volume locations ALREADY EXISTS /dir/lookup + 6. read topology/datacenter/rack layout + +TODO: + 1. replicate content to the other server if the replication type needs replicas |
