For a 5,000 node cluster it takes about 9 update cycles for a change to reach every other node. Since each update is on a 60 second timer, that's 9 minutes for a change to push out.
I didn't do a very sophtisticated time model..plus there is random start and all that. So maybe in practice it's a little different. But 9 minutes seems like a long time to propagate a host change out to the rest of the cluster. Maybe I mis-interpreted what they're doing?
I recall some confusion about whether Dynamo was actually providing SimpleDB, or if they were two separate software systems. Does anyone know if this was resolved?
Comments (2)
Where did you see the 60 second interval? The only quote I was able to find in the paper about the frequency of the gossip exchanges was:
> Each node contacts a peer chosen at random every second and the
> two nodes efficiently reconcile their persisted membership change histories.
They do also mention that the nodes will actively exchange full routing tables, which could be the 60 second interval you refer to.
The use of both methods could explain the infrequency of the full table exchange.
Posted by Stu Hood | April 14, 2008 2:50 PM
Posted on April 14, 2008 14:50
Section 4.8.1 of the paper says that peers exchange the information every second, not every 60 seconds.
I believe Dynamo and SimpleDB are different systems -- Dynamo is written in Java and IIRC SimpleDB is Erlang.
Posted by Manuel Simoni | April 14, 2008 3:11 PM
Posted on April 14, 2008 15:11