Dependability state model

From HandWiki

A dependability state diagram is a method for modelling a system as a Markov chain. It is used in reliability engineering for availability and reliability analysis.[1]

A simple state model with two states

It consists of creating a finite state machine which represent the different states a system may be in. Transitions between states happen as a result of events from underlying Poisson processes with different intensities.

Example

Example FSM with two working states and one failed

A redundant computer system consist of identical two-compute nodes, which each fail with an intensity of [math]\displaystyle{ \lambda }[/math]. When failed, they are repaired one at the time by a single repairman with negative exponential distributed repair times with expectation [math]\displaystyle{ \mu^{-1} }[/math].

  • state 0: 0 failed units, normal state of the system.
  • state 1: 1 failed unit, system operational.
  • state 2: 2 failed units. system not operational.

Intensities from state 0 and state 1 are [math]\displaystyle{ 2\lambda }[/math], since each compute node has a failure intensity of [math]\displaystyle{ \lambda }[/math]. Intensity from state 1 to state 2 is [math]\displaystyle{ \lambda }[/math]. Transitions from state 2 to state 1 and state 1 to state 0 represent the repairs of the compute nodes and have the intensity [math]\displaystyle{ \mu }[/math], since only a single unit is repaired at the time.

Availability

The asymptotic availability, i.e. availability over a long period, of the system is equal to the probability that the model is in state 1 or state 2.

This is calculated by making a set of linear equations of the state transition and solving the linear system.

The matrix is constructed with a row for each state. In a row, the intensity into the state is set in the column with the same index, with a negative term.

[math]\displaystyle{ \mathbf{A_0} = \begin{bmatrix} 0 & -\mu & 0 \\ -\lambda & 0 & -\mu \\ 0 & \lambda & 0 \end{bmatrix}. }[/math]

The identities cells balance the sum of their column to 0:

[math]\displaystyle{ \mathbf{A_1} = \begin{bmatrix} (\lambda) & -\mu & 0 \\ -\lambda & (\lambda+\mu) & -\mu \\ 0 & -\lambda & (\mu) \\ \end{bmatrix}. }[/math]

In addition the equality clause must be taken into account:

[math]\displaystyle{ \sum_n P_n = 1. }[/math]

By solving this equation, the probability of being in state 1 or state 2 can be found, which is equal to the long-term availability of the service.

Reliability

The reliability of the system is found by making the failure states absorbing, i.e. removing all outgoing state transitions.

For this system the function is:

[math]\displaystyle{ R(t) = e^{-\lambda t} \, }[/math]

Criticism

Finite state models of systems are subject to state explosion. To create a realistic model of a system one ends up with a model with so many states that it is infeasible to solve or draw the model.

References

  1. Bjarne E. Helvik (2007). Dependable Computing Systems and Communication Networks. Gnist Tapir.