Fault Tolerance
siiky
2023/03/13
2023/05/15
2023/05/15
course,distributed,programming
Linearizability and Serializability
A program/system is linearizable if the results it produces could have been produced by a sequential program.
Serializability is a similar concept but applied to transactions (i.e. ordered sets of operations), taking each transaction as a single atomic operation.
Data replication
ROWA (Read One Write All)
A client wishing to read can read from a single replica. When wishing to write, it has to write to all replicas.
This is not fault-tolerant: a single replica goes down and clients can no longer write.
Quorums
Subgroups of the all replicas are used to read and write.
The simplest approach is read/write to/from a majority of replicas. Depending on problem requirements, however, it may be more beneficial to unbalance the two subsets: in a read-heavy system, clients may read from fewer replicas, as long as they write to more replicas (there's a rule).
For this to work, the number of elements of the read/write quorums must meet a couple of requirements (N is number of replicas, Qr/Qw is the read/write quorum):
- #Qr + #Qw > N -- read/write quorums always intersect
- 2 * #Qw > N -- any two write quorums always intersect
But... there's no definite order in which writes are applied: two given writes X=A and X=B may be applied in different orders in different replicas, so a client may read different values from different read quorums.
Or... because of network delays, even if operations are delivered and applied in different replicas in the same order, they may be delivered and applied in different times, so a client may read different values from different read quorums.
Distributed consensus
Algorithms of distributed consensus:
(Paxos) Leslie Lamport, "Paxos Made Simple" (https://lamport.azurewebsites.net)
raft.gmi
(HotStuff) "HotStuff: BFT Consensus in the Lens of Blockchain" (https://arxiv.org)
Research
What good are models and what models are good? (https://www.cs.cornell.edu)
Weighed voting for replicated data (https://www.cs.cornell.edu)
Read-Write Quorum Systems Made Practical (https://mwhittaker.github.io)
Response: 20 (Success), text/gemini
| Original URL | gemini://siiky.srht.site/wiki/class.ft.gmi |
|---|---|
| Status Code | 20 (Success) |
| Content-Type | text/gemini; charset=utf-8 |