TensorFloat-32

TensorFloat-32 (TF32) is a numeric floating point format designed for Tensor Core running on certain Nvidia GPUs. It was first implemented in the Ampere architecture ^[1]. TensorFloat-32 combines the 8-bit exponent size of IEEE single precision with the 10-bit mantissa size of half precision for a total of 19 bits per number. It is comparable to the bfloat16 format, which uses a 7-bit mantissa.

Format

The binary format is:

1 sign bit
8 exponent bits
10 significand bits (also called mantissa, or precision bits)

File:General floating point.svg

The 19-significant-bit format fits within a double word (32 bits), and while it lacks precision compared with a normal 32-bit IEEE 754 floating-point number, it provides much faster computation, up to 8 times on a A100 (compared to a V100 using FP32).^[2]

Stored in the same space as FP32, it is not a distinct storage format, but a specification for reduced-precision FP32 multiply–accumulate operations. FP32 inputs are rounded to TF32, multiplied to produce a 21-bit product (including the implicit msbit, this is an 11×11→22-bit multiply), and summed into a standard FP32 accumulator.^[3]

References

↑ Kharya, Paresh (14 May 2020). "TensorFloat-32 in the A100 GPU Accelerates AI Training, HPC up to 20x". https://blogs.nvidia.com/blog/tensorfloat-32-precision-format/.
↑ "NVIDIA TF32". 8 February 2023. https://deeprec.readthedocs.io/en/latest/NVIDIA-TF32.html.
↑ Stosic, Dusan; Micikevicius, Paulius (27 January 2021). "Accelerating AI Training with NVIDIA TF32 Tensor Cores". https://developer.nvidia.com/blog/accelerating-ai-training-with-tf32-tensor-cores/.

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/TensorFloat-32. Read more

[1] Kharya, Paresh (14 May 2020). "TensorFloat-32 in the A100 GPU Accelerates AI Training, HPC up to 20x". https://blogs.nvidia.com/blog/tensorfloat-32-precision-format/.

[2] "NVIDIA TF32". 8 February 2023. https://deeprec.readthedocs.io/en/latest/NVIDIA-TF32.html.

[3] Stosic, Dusan; Micikevicius, Paulius (27 January 2021). "Accelerating AI Training with NVIDIA TF32 Tensor Cores". https://developer.nvidia.com/blog/accelerating-ai-training-with-tf32-tensor-cores/.

[1]

[2]

[3]

Anonymous

Search

TensorFloat-32

Namespaces

More

Page actions

Format

See also

References

Navigation

Navigation

Resources

Help

googletranslator

Navigation

Wiki tools

Wiki tools

Anonymous

Search

TensorFloat-32

Format

See also

References

Navigation

Wiki tools

Page tools

Other projects

Categories