`APyFloat`¶

class apytypes.APyFloat¶

Floating-point scalar with configurable format.

The implementation is a generalization of the IEEE 754 standard, meaning that features like subnormals, infinities, and NaN, are still supported. The format is defined by the number of exponent and mantissa bits, and a non-negative bias. These fields are named exp_bits, man_bits, and bias respectively. Similar to the hardware representation of a floating-point number, the value is stored using three fields; a sign bit sign, a biased exponent exp, and an integral mantissa with a hidden one man. The value of a normal number would thus be

\[(-1)^{\texttt{sign}} \times 2^{\texttt{exp} - \texttt{bias}} \times (1 + \texttt{man} \times 2^{\texttt{-man_bits}}).\]

In general, if the bias is not explicitly given for a format APyFloat will default to an IEEE-like bias using the formula

\[\texttt{bias} = 2^{\texttt{exp_bits - 1}} - 1.\]

Arithmetic can be performed similarly to the operations of the built-in type float in Python. The resulting word length from operations will be the same as the input operands’ by quantizing to nearest number with ties to even (QuantizationMode.TIES_EVEN). If the operands do not share the same format, the resulting bit widths of the exponent and mantissa field will be the maximum of its inputs:

Attributes:

bias: Exponent bias.
bits: Total number of bits.
exp: Exponent bits with bias.
exp_bits: Number of exponent bits.
is_finite: True if and only if value is zero, subnormal, or normal.
is_inf: True if and only if value is infinite.
is_nan: True if and only if value is NaN.
is_normal: True if and only if value is normal (not zero, subnormal, infinite, or NaN).
is_subnormal: True if and only if value is subnormal.
is_zero: True if and only if value is zero.
man: Mantissa bits.
man_bits: Number of mantissa bits.
sign: Sign bit.
true_exp: Exponent value.
true_man: Mantissa value.
true_sign: Sign value.

Methods

`cast`	Change format of the floating-point number.
`cast_to_bfloat16`	Cast to bfloat16 format.
`cast_to_double`	Cast to IEEE 754 binary64 (double-precision) format.
`cast_to_half`	Cast to IEEE 754 binary16 (half-precision) format.
`cast_to_single`	Cast to IEEE 754 binary32 (single-precision) format.
`from_bits`
`from_float`
`is_identical`	Test if two `APyFloat` objects are identical.
`next_down`	Get the largest floating-point number in the same format that compares less.
`next_up`	Get the smallest floating-point number in the same format that compares greater.
`to_bits`	Get the bit-representation of an `APyFloat`.

Examples

>>> from apytypes import APyFloat
>>> a = APyFloat.from_float(1.25, exp_bits=5, man_bits=2)
>>> b = APyFloat.from_float(1.75, exp_bits=5, man_bits=2)

Operands with same format, result will have exp_bits=5, man_bits=2

>>> a + b
APyFloat(sign=0, exp=16, man=2, exp_bits=5, man_bits=2)

>>> d = APyFloat.from_float(1.75, exp_bits=4, man_bits=4)

Operands with different formats, result will have exp_bits=5, man_bits=4

>>> a + d
APyFloat(sign=0, exp=16, man=8, exp_bits=5, man_bits=4)

If the operands of an arithmetic operation have IEEE-like biases, then the result will also have an IEEE-like bias – based on the resulting number of exponent bits. To support operations with biases deviating from the standard, the bias of the resulting format is calculated as the “average” of the inputs’ biases:

\[\texttt{bias}_3 = \frac{\left ( \left (\texttt{bias}_1 + 1 \right ) / 2^{\texttt{exp_bits}_1} + \left (\texttt{bias}_2 + 1 \right ) / 2^{\texttt{exp_bits}_2} \right ) \times 2^{\texttt{exp_bits}_3}}{2} - 1,\]

where \(\texttt{exp_bits}_1\) and \(\texttt{exp_bits}_2\) are the bit widths of the operands, \(\texttt{bias}_1\) and \(\texttt{bias}_2\) are the input biases, and \(\texttt{exp_bits}_3\) is the target bit width. Note that this formula still results in an IEEE-like bias when the inputs use IEEE-like biases.

Constructor¶

__init__(self, sign: int, exp: int, man: int, exp_bits: int, man_bits: int, bias: int | None = None) → None¶

Create an APyFloat object.

Parameters:

signbool or int: The sign of the float. False/0 means positive. True/non-zero means negative.
expint: Exponent of the float as stored, i.e., actual value + bias.
manint: Mantissa of the float as stored, i.e., without a hidden one.
exp_bitsint: Number of exponent bits.
man_bitsint: Number of mantissa bits.
biasint, optional: Exponent bias. If not provided, bias is 2**exp_bits - 1.

Returns:

APyFloat

Creation from other types¶

from_float(value: object, exp_bits: int, man_bits: int, bias: int | None = None) → APyFloat¶

Create an APyFloat object from an int, float, APyFixed, or APyFloat.

Note

It is in all cases better to use cast() to create an APyFloat from an APyFloat.

The quantization mode used is QuantizationMode.TIES_EVEN.

Parameters:

valueint, float: Floating-point value to initialize from.
exp_bitsint: Number of exponent bits.
man_bitsint: Number of mantissa bits.
biasint, optional: Exponent bias. If not provided, bias is 2**exp_bits - 1.

Returns:

APyFloat

See also

from_bits

Examples

>>> from apytypes import APyFloat

a, initialized from floating-point values.

>>> a = APyFloat.from_float(1.35, exp_bits=10, man_bits=15)

from_bits(bits: int, exp_bits: int, man_bits: int, bias: int | None = None) → APyFloat¶

Create an APyFloat object from a bit-representation.

Parameters:

bitsint: The bit-representation for the float.
exp_bitsint: Number of exponent bits.
man_bitsint: Number of mantissa bits.
biasint, optional: Exponent bias. If not provided, bias is 2**exp_bits - 1.

Returns:

APyFloat

See also

to_bits
from_float

Examples

>>> from apytypes import APyFloat

a, initialized to -1.5 from a bit pattern.

>>> a = APyFloat.from_bits(0b1_01111_10, exp_bits=5, man_bits=2)

Change word length¶

cast(self, exp_bits: int | None = None, man_bits: int | None = None, bias: int | None = None, quantization: QuantizationMode | None = None) → APyFloat¶

Change format of the floating-point number.

This is the primary method for performing quantization when dealing with APyTypes floating-point numbers.

Parameters:

exp_bitsint, optional: Number of exponent bits in the result.
man_bitsint, optional: Number of mantissa bits in the result.
biasint, optional: Exponent bias. If not provided, bias is 2**exp_bits - 1.
quantizationQuantizationMode, optional.: Quantization mode to use in this cast. If None, use the global quantization mode.

Returns:

APyFloat

Get bit representation¶

to_bits(self) → int¶

Get the bit-representation of an APyFloat.

Returns:

int

See also

from_bits

Examples

>>> from apytypes import APyFloat

a, initialized to -1.5 from a bit pattern.

>>> a = APyFloat.from_bits(0b1_01111_10, exp_bits=5, man_bits=2)
>>> a
APyFloat(sign=1, exp=15, man=2, exp_bits=5, man_bits=2)
>>> a.to_bits() == 0b1_01111_10
True

Comparison¶

is_identical(self, other: APyFloat) → bool¶

Test if two APyFloat objects are identical.

Two APyFloat objects are considered identical if, and only if, they have the same sign, exponent, mantissa, and format.

Returns:

bool

Convenience methods¶

Casting¶

cast_to_bfloat16(self, quantization: QuantizationMode | None = None) → APyFloat¶

Cast to bfloat16 format.

Convenience method corresponding to

f.cast(exp_bits=8, man_bits=7)

Parameters:

quantizationQuantizationMode, optional: Quantization mode to use. If not provided, the global mode, see get_float_quantization_mode(), is used.

cast_to_double(self, quantization: QuantizationMode | None = None) → APyFloat¶

Cast to IEEE 754 binary64 (double-precision) format.

Convenience method corresponding to

f.cast(exp_bits=11, man_bits=52)

Parameters:

quantizationQuantizationMode, optional: Quantization mode to use. If not provided, the global mode, see get_float_quantization_mode(), is used.

cast_to_half(self, quantization: QuantizationMode | None = None) → APyFloat¶

Cast to IEEE 754 binary16 (half-precision) format.

Convenience method corresponding to

f.cast(exp_bits=5, man_bits=10)

Parameters:

quantizationQuantizationMode, optional: Quantization mode to use. If not provided, the global mode, see get_float_quantization_mode(), is used.

cast_to_single(self, quantization: QuantizationMode | None = None) → APyFloat¶

Cast to IEEE 754 binary32 (single-precision) format.

Convenience method corresponding to

f.cast(exp_bits=8, man_bits=23)

Parameters:

quantizationQuantizationMode, optional: Quantization mode to use. If not provided, the global mode, see get_float_quantization_mode(), is used.