Active redundancy

From HandWiki
Short description: Design concept

Active redundancy is a design concept that increases operational availability and that reduces operating cost by automating most critical maintenance actions.

This concept is related to condition-based maintenance and fault reporting.[1]

History

The initial requirement began with military combat systems during World War I. The approach used for survivability was to install thick armor plate to resist gun fire and install multiple guns.

This became unaffordable and impractical during the Cold War when aircraft and missile systems became common.

The new approach was to build distributed systems that continue to work when components are damaged. This depends upon very crude forms of artificial intelligence that perform reconfiguration by obeying specific rules. An example of this approach is the AN/UYK-43 computer.

Formal design philosophies involving active redundancy are required for critical systems where corrective labor is undesirable or impractical to correct failure during normal operation.

Commercial aircraft are required to have multiple redundant computing systems, hydraulic systems, and propulsion systems so that a single in-flight equipment failure will not cause loss of life.

A more recent outcome of this work is the Internet, which relies on a backbone of routers that provide the ability to automatically re-route communication without human intervention when failures occur.

Satellites placed into orbit around the Earth must include massive active redundancy to ensure operation will continue for a decade or longer despite failures induced by normal failure, radiation-induced failure, and thermal shock.

This strategy now dominates space systems, aircraft, and missile systems.

Principle

Maintenance requires three actions, which usually involve down time and high priority labor costs:

  • Automatic fault detection
  • Automatic fault isolation
  • Automatic reconfiguration

Active redundancy eliminates down time and reduces manpower requirements by automating all three actions. This requires some amount of automated artificial intelligence.

N stands for needed equipment. The amount of excess capacity affects overall system reliability by limiting the effects of failure.

For example, if it takes two generators to power a city, then "N+1" would be three generators to allow a single failure. Similarly, "N+2" would be four generators, which would allow one generator to fail while a second generator has already failed.

Active redundancy improves operational availability as follows.

[math]\displaystyle{ A_{o}^{N} = 0.99 \ up \ time }[/math]
[math]\displaystyle{ \approx failed \ 90 \ hours/year }[/math]
[math]\displaystyle{ A_{o}^{N+1} = 1 - \left( (1 - A_{o}^{N} ) \times (1 - A_{o}^{N} ) \right) = 0.9999 \ up \ time }[/math]
[math]\displaystyle{ \approx failed \ 50 \ minutes/year }[/math]
[math]\displaystyle{ A_{o}^{N+2} = 1 - \left( (1 - A_{o}^{N} ) \times (1 - A_{o}^{N} ) \times (1 - A_{o}^{N} ) \right) = 0.999999 \ up \ time }[/math]
[math]\displaystyle{ \approx failed \ 30 \ seconds/year }[/math]

Passive components

Active redundancy in passive components requires redundant components that share the burden when failure occurs, like in cabling and piping.

This allows forces to be redistributed across a bridge to prevent failure if a vehicle ruptures a cable.[2]

This allows water flow to be redistributed through pipes when a limited number of valves are shut or pumps shut down.[3]

Active components

Active redundancy in active components requires reconfiguration when failure occurs. Computer programming must recognize the failure and automatically reconfigure to restore operation.

All modern computers provide the following when an existing feature is enabled via fault reporting.

  • Automatic fault detection
  • Automatic fault isolation

Mechanical devices must reconfigure, such as transmission settings on hybrid vehicles that have redundant propulsion systems. The petroleum engine will start up when battery power fails.

Electrical power systems must perform two actions to prevent total system failure when smaller failures occur, such as when a tree falls across a power line. Power systems incorporate communication, switching, and automatic scheduling that allows these actions to be automated.

  • Shut down the damaged power line to isolate the failure
  • Adjust generator settings to prevent voltage and frequency excursions

Benefits

This is the only known strategy that can achieve high availability.

Detriments

This maintenance philosophy requires custom development with extra components.

See also

References