aboutsummaryrefslogtreecommitdiff
path: root/note/replication.txt
diff options
context:
space:
mode:
Diffstat (limited to 'note/replication.txt')
-rw-r--r--note/replication.txt86
1 files changed, 86 insertions, 0 deletions
diff --git a/note/replication.txt b/note/replication.txt
new file mode 100644
index 000000000..c4bf46044
--- /dev/null
+++ b/note/replication.txt
@@ -0,0 +1,86 @@
+1. each file can choose the replication factor
+2. replication granularity is in volume level
+3. if not enough spaces, we can automatically decrease some volume's the replication factor, especially for cold data
+4. plan to support migrating data to cheaper storage
+5. plan to manual volume placement, access-based volume placement, auction based volume placement
+
+When a new volume server is started, it reports
+ 1. how many volumes it can hold
+ 2. current list of existing volumes and each volume's replication type
+Each volume server remembers:
+ 1. current volume ids
+ 2. replica locations are read from the master
+
+The master assign volume ids based on
+ 1. replication factor
+ data center, rack
+ 2. concurrent write support
+On master, stores the replication configuration
+{
+ replication:{
+ {type:"00", min_volume_count:3, weight:10},
+ {type:"01", min_volume_count:2, weight:20},
+ {type:"10", min_volume_count:2, weight:20},
+ {type:"11", min_volume_count:3, weight:30},
+ {type:"20", min_volume_count:2, weight:20}
+ },
+ port:9333,
+}
+Or manually via command line
+ 1. add volume with specified replication factor
+ 2. add volume with specified volume id
+
+
+If duplicated volume ids are reported from different volume servers,
+the master determines the replication factor of the volume,
+if less than the replication factor, the volume is in readonly mode
+if more than the replication factor, the volume will purge the smallest/oldest volume
+if equal, the volume will function as usual
+
+
+Use cases:
+ on volume server
+ 1. weed volume -mserver="xx.xx.xx.xx:9333" -publicUrl="good.com:8080" -dir="/tmp" -volumes=50
+ on weed master
+ 1. weed master -port=9333
+ generate a default json configuration file if doesn't exist
+
+Bootstrap
+ 1. at the very beginning, the system has no volumes at all.
+When data node starts:
+ 1. each data node send to master its existing volumes and max volume blocks
+ 2. master remembers the topology/data_center/rack/data_node/volumes
+ for each replication level, stores
+ volume id ~ data node
+ writable volume ids
+If any "assign" request comes in
+ 1. find a writable volume with the right replicationLevel
+ 2. if not found, grow the volumes with the right replication level
+ 3. return a writable volume to the user
+
+
+Plan:
+ Step 1. implement one copy(no replication), automatically assign volume ids
+ Step 2. add replication
+
+For the above operations, here are the todo list:
+ for data node:
+ 0. detect existing volumes DONE
+ 1. onStartUp, and periodically, send existing volumes and maxVolumeCount store.Join(), DONE
+ 2. accept command to grow a volume( id + replication level) DONE
+ /admin/assign_volume?volume=some_id&replicationType=01
+ 3. accept setting volumeLocationList DONE
+ /admin/set_volume_locations_list?volumeLocationsList=[{Vid:xxx,Locations:[loc1,loc2,loc3]}]
+ 4. for each write, pass the write to the next location, (Step 2)
+ POST method should accept an index, like ttl, get decremented every hop
+ for master:
+ 1. accept data node's report of existing volumes and maxVolumeCount ALREADY EXISTS /dir/join
+ 2. periodically refresh for active data nodes, and adjust writable volumes
+ 3. send command to grow a volume(id + replication level) DONE
+ 4. NOT_IMPLEMENTING: if dead/stale data nodes are found, for the affected volumes, send stale info
+ to other data nodes. BECAUSE the master will stop sending writes to these data nodes
+ 5. accept lookup for volume locations ALREADY EXISTS /dir/lookup
+ 6. read topology/datacenter/rack layout
+
+TODO:
+ 1. replicate content to the other server if the replication type needs replicas