2Sum

From HandWiki
Revision as of 23:04, 6 February 2024 by Corlink (talk | contribs) (linkage)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Short description: Algorithm to compute rounding error

2Sum[1] is a floating-point algorithm for computing the exact round-off error in a floating-point addition operation.

2Sum and its variant Fast2Sum were first published by Ole Møller in 1965.[2] Fast2Sum is often used implicitly in other algorithms such as compensated summation algorithms;[1] Kahan's summation algorithm was published first in 1965,[3] and Fast2Sum was later factored out of it by Dekker in 1971 for double-double arithmetic algorithms.[4] The names 2Sum and Fast2Sum appear to have been applied retroactively by Shewchuk in 1997.[5]

Algorithm

Given two floating-point numbers [math]\displaystyle{ a }[/math] and [math]\displaystyle{ b }[/math], 2Sum computes the floating-point sum [math]\displaystyle{ s := a \oplus b }[/math] rounded to nearest and the floating-point error [math]\displaystyle{ t := a + b - (a \oplus b) }[/math] so that [math]\displaystyle{ s + t = a + b }[/math], where [math]\displaystyle{ \oplus }[/math] and [math]\displaystyle{ \ominus }[/math] respectively denote the addition and subtraction rounded to nearest. The error [math]\displaystyle{ t }[/math] is itself a floating-point number.

Inputs floating-point numbers [math]\displaystyle{ a, b }[/math]
Outputs rounded sum [math]\displaystyle{ s = a \oplus b }[/math] and exact error [math]\displaystyle{ t = a + b - (a \oplus b) }[/math]
  1. [math]\displaystyle{ s := a \oplus b }[/math]
  2. [math]\displaystyle{ a' := s \ominus b }[/math]
  3. [math]\displaystyle{ b' := s \ominus a' }[/math]
  4. [math]\displaystyle{ \delta_a := a \ominus a' }[/math]
  5. [math]\displaystyle{ \delta_b := b \ominus b' }[/math]
  6. [math]\displaystyle{ t := \delta_a \oplus \delta_b }[/math]
  7. return [math]\displaystyle{ (s, t) }[/math]

Provided the floating-point arithmetic is correctly rounded to nearest (with ties resolved any way), as is the default in IEEE 754, and provided the sum does not overflow and, if it underflows, underflows gradually, it can be proven that [math]\displaystyle{ s + t = a + b }[/math].[1][6][2]

A variant of 2Sum called Fast2Sum uses only three floating-point operations, for floating-point arithmetic in radix 2 or radix 3, under the assumption that the exponent of [math]\displaystyle{ a }[/math] is at least as large as the exponent of [math]\displaystyle{ b }[/math], such as when [math]\displaystyle{ \left|a\right| \geq \left|b\right| }[/math]:[1][6][7][4]

Inputs radix-2 or radix-3 floating-point numbers [math]\displaystyle{ a }[/math] and [math]\displaystyle{ b }[/math], of which at least one is zero, or which respectively have normalized exponents [math]\displaystyle{ e_a \geq e_b }[/math]
Outputs rounded sum [math]\displaystyle{ s = a \oplus b }[/math] and exact error [math]\displaystyle{ t = a + b - (a \oplus b) }[/math]
  1. [math]\displaystyle{ s := a \oplus b }[/math]
  2. [math]\displaystyle{ z = s \ominus a }[/math]
  3. [math]\displaystyle{ t = b \ominus z }[/math]
  4. return [math]\displaystyle{ (s, t) }[/math]

Even if the conditions are not satisfied, 2Sum and Fast2Sum often provide reasonable approximations to the error, i.e. [math]\displaystyle{ s + t \approx a + b }[/math], which enables algorithms for compensated summation, dot-product, etc., to have low error even if the inputs are not sorted or the rounding mode is unusual.[1][2] More complicated variants of 2Sum and Fast2Sum also exist for rounding modes other than round-to-nearest.[1]

See also

References

  1. 1.0 1.1 1.2 1.3 1.4 1.5 Muller, Jean-Michel; Brunie, Nicolas; de Dinechin, Florent; Jeannerod, Claude-Pierre; Joldes, Mioara; Lefèvre, Vincent; Melquiond, Guillaume; Revol, Nathalie et al. (2018). Handbook of Floating-Point Arithmetic (2nd ed.). Cham, Switzerland: Birkhäuser. pp. 104–111. doi:10.1007/978-3-319-76526-6. ISBN 978-3-319-76525-9. https://doi.org/10.1007/978-3-319-76526-6. Retrieved 2020-09-20. 
  2. 2.0 2.1 2.2 Møller, Ole (March 1965). "Quasi double-precision in floating point addition". BIT Numerical Mathematics 5: 37–50. doi:10.1007/BF01975722. 
  3. Kahan, W. (January 1965). "Further remarks on reducing truncation errors". Communications of the ACM (Association for Computing Machinery) 8 (1): 40. doi:10.1145/363707.363723. ISSN 0001-0782. 
  4. 4.0 4.1 Dekker, T.J. (June 1971). "A floating-point technique for extending the available precision". Numerische Mathematik 18 (3): 224–242. doi:10.1007/BF01397083. https://ir.cwi.nl/pub/9159. Retrieved 2020-09-24. 
  5. Shewchuk, Jonathan Richard (October 1997). "Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates". Discrete & Computational Geometry 18 (3): 305–363. doi:10.1007/PL00009321. 
  6. 6.0 6.1 Knuth, Donald E. (1998). The Art of Computer Programming, Volume II: Seminumerical Algorithms (3rd ed.). Addison–Wesley. p. 236. ISBN 978-0-201-89684-8. https://www-cs-faculty.stanford.edu/~knuth/taocp.html. Retrieved 2020-09-20. 
  7. Sterbenz, Pat H. (1974). Floating-Point Computation. Englewood Cliffs, NJ, United States: Prentice-Hall. pp. 138–143. ISBN 0-13-322495-3.