Terminating Reliable Broadcast

From HandWiki
Revision as of 09:13, 9 July 2021 by imported>NBrush (url)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Terminating Reliable Broadcast (TRB) is a problem in distributed computing that encapsulates the task of broadcasting a message to a set of receiving processes in the presence of faults.[1] In particular, the sender and any other process might fail ("crash") at any time.

Problem description

A TRB protocol typically organizes the system into a sending process and a set of receiving processes, which may include the sender itself. A process is called "correct" if it does not fail at any point during its execution. The goal of the protocol is to transfer data (the "message") from the sender to the set of receiving processes. A process may perform many I/O operations during protocol execution, but eventually "delivers" a message by passing it to the application on that process that invoked the TRB protocol.

The protocol must provide important guarantees to the receiving processes. All correct receiving processes, for example, must deliver the sender's message if the sender is also correct. A receiving process may deliver a special message, [math]\displaystyle{ \mathrm{SF} }[/math] ("sender faulty"), if the sender failed, but either all correct processes will deliver [math]\displaystyle{ \mathrm{SF} }[/math] or none will. A correct process is therefore guaranteed that data delivered to it was also delivered to all other correct processes.

More precisely, a TRB protocol must satisfy the four formal properties below.

  • Termination: every correct process delivers some value.
  • Validity: if the sender is correct and broadcasts a message [math]\displaystyle{ m }[/math], then every correct process delivers [math]\displaystyle{ m }[/math].
  • Integrity: a process delivers a message at most once, and if it delivers some message [math]\displaystyle{ m \neq \mathrm{SF} }[/math], then [math]\displaystyle{ m }[/math] was broadcast by the sender.
  • Agreement: if a correct process delivers a message [math]\displaystyle{ m }[/math], then all correct processes deliver [math]\displaystyle{ m }[/math].

The presence of faults in the system makes these properties more difficult to satisfy. A simple but invalid TRB protocol might have the sender broadcast the message to all processes, and have receiving processes deliver the message as soon as it is received. This protocol, however, does not satisfy agreement if faults can occur: if the sender crashes after sending the message to some processes, but before sending it to others, then the first set of processes may deliver the message while the second set delivers [math]\displaystyle{ \mathrm{SF} }[/math].

TRB is closely related, but not identical, to the fundamental distributed computing problem of consensus.

References