APyFloat

class apytypes.APyFloat

Floating-point scalar with configurable format.

The implementation is a generalization of the IEEE 754 standard, meaning that features like subnormals, infinities, and NaN, are still supported. The format is defined by the number of exponent and mantissa bits, and a non-negative bias. These fields are named exp_bits, man_bits, and bias respectively. Similar to the hardware representation of a floating-point number, the value is stored using three fields; a sign bit sign, a biased exponent exp, and an integral mantissa with a hidden one man. The value of a normal number would thus be

\[(-1)^{\texttt{sign}} \times 2^{\texttt{exp} - \texttt{bias}} \times (1 + \texttt{man} \times 2^{\texttt{-man_bits}}).\]

In general, if the bias is not explicitly given for a format APyFloat will default to an IEEE-like bias using the formula

\[\texttt{bias} = 2^{\texttt{exp_bits - 1}} - 1.\]

Arithmetic can be performed similarly to the operations of the built-in type float in Python. The resulting word length from operations will be the same as the input operands’ by quantizing to nearest number with ties to even (QuantizationMode.TIES_EVEN). If the operands do not share the same format, the resulting bit widths of the exponent and mantissa field will be the maximum of its inputs:

Attributes:
bias

Exponent bias.

bits

Total number of bits.

exp

Exponent bits with bias.

exp_bits

Number of exponent bits.

is_finite

True if and only if value is zero, subnormal, or normal.

is_inf

True if and only if value is infinite.

is_nan

True if and only if value is NaN.

is_normal

True if and only if value is normal (not zero, subnormal, infinite, or NaN).

is_subnormal

True if and only if value is subnormal.

is_zero

True if and only if value is zero.

man

Mantissa bits.

man_bits

Number of mantissa bits.

sign

Sign bit.

true_exp

Exponent value.

true_man

Mantissa value.

true_sign

Sign value.

Methods

cast

Change format of the floating-point number.

cast_to_bfloat16

Cast to bfloat16 format.

cast_to_double

Cast to IEEE 754 binary64 (double-precision) format.

cast_to_half

Cast to IEEE 754 binary16 (half-precision) format.

cast_to_single

Cast to IEEE 754 binary32 (single-precision) format.

from_bits

from_float

is_identical

Test if two APyFloat objects are identical.

next_down

Get the largest floating-point number in the same format that compares less.

next_up

Get the smallest floating-point number in the same format that compares greater.

to_bits

Get the bit-representation of an APyFloat.

Examples

>>> from apytypes import APyFloat
>>> a = APyFloat.from_float(1.25, exp_bits=5, man_bits=2)
>>> b = APyFloat.from_float(1.75, exp_bits=5, man_bits=2)

Operands with same format, result will have exp_bits=5, man_bits=2

>>> a + b
APyFloat(sign=0, exp=16, man=2, exp_bits=5, man_bits=2)
>>> d = APyFloat.from_float(1.75, exp_bits=4, man_bits=4)

Operands with different formats, result will have exp_bits=5, man_bits=4

>>> a + d
APyFloat(sign=0, exp=16, man=8, exp_bits=5, man_bits=4)

If the operands of an arithmetic operation have IEEE-like biases, then the result will also have an IEEE-like bias – based on the resulting number of exponent bits. To support operations with biases deviating from the standard, the bias of the resulting format is calculated as the “average” of the inputs’ biases:

\[\texttt{bias}_3 = \frac{\left ( \left (\texttt{bias}_1 + 1 \right ) / 2^{\texttt{exp_bits}_1} + \left (\texttt{bias}_2 + 1 \right ) / 2^{\texttt{exp_bits}_2} \right ) \times 2^{\texttt{exp_bits}_3}}{2} - 1,\]

where \(\texttt{exp_bits}_1\) and \(\texttt{exp_bits}_2\) are the bit widths of the operands, \(\texttt{bias}_1\) and \(\texttt{bias}_2\) are the input biases, and \(\texttt{exp_bits}_3\) is the target bit width. Note that this formula still results in an IEEE-like bias when the inputs use IEEE-like biases.

Constructor

__init__(self, sign: int, exp: int, man: int, exp_bits: int, man_bits: int, bias: int | None = None) None

Create an APyFloat object.

Parameters:
signbool or int

The sign of the float. False/0 means positive. True/non-zero means negative.

expint

Exponent of the float as stored, i.e., actual value + bias.

manint

Mantissa of the float as stored, i.e., without a hidden one.

exp_bitsint

Number of exponent bits.

man_bitsint

Number of mantissa bits.

biasint, optional

Exponent bias. If not provided, bias is 2**exp_bits - 1.

Returns:
APyFloat

Creation from other types

from_float(value: object, exp_bits: int, man_bits: int, bias: int | None = None) APyFloat

Create an APyFloat object from an int, float, APyFixed, or APyFloat.

Note

It is in all cases better to use cast() to create an APyFloat from an APyFloat.

The quantization mode used is QuantizationMode.TIES_EVEN.

Parameters:
valueint, float

Floating-point value to initialize from.

exp_bitsint

Number of exponent bits.

man_bitsint

Number of mantissa bits.

biasint, optional

Exponent bias. If not provided, bias is 2**exp_bits - 1.

Returns:
APyFloat

See also

from_bits

Examples

>>> from apytypes import APyFloat

a, initialized from floating-point values.

>>> a = APyFloat.from_float(1.35, exp_bits=10, man_bits=15)
from_bits(bits: int, exp_bits: int, man_bits: int, bias: int | None = None) APyFloat

Create an APyFloat object from a bit-representation.

Parameters:
bitsint

The bit-representation for the float.

exp_bitsint

Number of exponent bits.

man_bitsint

Number of mantissa bits.

biasint, optional

Exponent bias. If not provided, bias is 2**exp_bits - 1.

Returns:
APyFloat

See also

to_bits
from_float

Examples

>>> from apytypes import APyFloat

a, initialized to -1.5 from a bit pattern.

>>> a = APyFloat.from_bits(0b1_01111_10, exp_bits=5, man_bits=2)

Change word length

cast(self, exp_bits: int | None = None, man_bits: int | None = None, bias: int | None = None, quantization: QuantizationMode | None = None) APyFloat

Change format of the floating-point number.

This is the primary method for performing quantization when dealing with APyTypes floating-point numbers.

Parameters:
exp_bitsint, optional

Number of exponent bits in the result.

man_bitsint, optional

Number of mantissa bits in the result.

biasint, optional

Exponent bias. If not provided, bias is 2**exp_bits - 1.

quantizationQuantizationMode, optional.

Quantization mode to use in this cast. If None, use the global quantization mode.

Returns:
APyFloat

Get bit representation

to_bits(self) int

Get the bit-representation of an APyFloat.

Returns:
int

See also

from_bits

Examples

>>> from apytypes import APyFloat

a, initialized to -1.5 from a bit pattern.

>>> a = APyFloat.from_bits(0b1_01111_10, exp_bits=5, man_bits=2)
>>> a
APyFloat(sign=1, exp=15, man=2, exp_bits=5, man_bits=2)
>>> a.to_bits() == 0b1_01111_10
True

Comparison

is_identical(self, other: APyFloat) bool

Test if two APyFloat objects are identical.

Two APyFloat objects are considered identical if, and only if, they have the same sign, exponent, mantissa, and format.

Returns:
bool

Convenience methods

Casting

cast_to_bfloat16(self, quantization: QuantizationMode | None = None) APyFloat

Cast to bfloat16 format.

Convenience method corresponding to

f.cast(exp_bits=8, man_bits=7)
Parameters:
quantizationQuantizationMode, optional

Quantization mode to use. If not provided, the global mode, see get_float_quantization_mode(), is used.

cast_to_double(self, quantization: QuantizationMode | None = None) APyFloat

Cast to IEEE 754 binary64 (double-precision) format.

Convenience method corresponding to

f.cast(exp_bits=11, man_bits=52)
Parameters:
quantizationQuantizationMode, optional

Quantization mode to use. If not provided, the global mode, see get_float_quantization_mode(), is used.

cast_to_half(self, quantization: QuantizationMode | None = None) APyFloat

Cast to IEEE 754 binary16 (half-precision) format.

Convenience method corresponding to

f.cast(exp_bits=5, man_bits=10)
Parameters:
quantizationQuantizationMode, optional

Quantization mode to use. If not provided, the global mode, see get_float_quantization_mode(), is used.

cast_to_single(self, quantization: QuantizationMode | None = None) APyFloat

Cast to IEEE 754 binary32 (single-precision) format.

Convenience method corresponding to

f.cast(exp_bits=8, man_bits=23)
Parameters:
quantizationQuantizationMode, optional

Quantization mode to use. If not provided, the global mode, see get_float_quantization_mode(), is used.

Calculations

next_up(self) APyFloat

Get the smallest floating-point number in the same format that compares greater.

Returns:
APyFloat

See also

next_down
next_down(self) APyFloat

Get the largest floating-point number in the same format that compares less.

Returns:
APyFloat

See also

next_up

Properties

Word length

property bits

Total number of bits.

Returns:
int
property exp_bits

Number of exponent bits.

Returns:
int
property man_bits

Number of mantissa bits.

Returns:
int
property bias

Exponent bias.

Returns:
int

Values

property sign

Sign bit.

Returns:
bool

See also

true_sign
property exp

Exponent bits with bias.

Returns:
int

See also

true_exp
property man

Mantissa bits.

These are without a possible hidden one.

Returns:
int

See also

true_man
property true_sign

Sign value.

Returns:
int

See also

sign
property true_exp

Exponent value.

The bias value is subtracted and exponent adjusted in case of a subnormal number.

Returns:
int

See also

exp
property true_man

Mantissa value.

These are with a possible hidden one.

Returns:
int

See also

man

Bit pattern information

property is_finite

True if and only if value is zero, subnormal, or normal.

Returns:
bool
property is_inf

True if and only if value is infinite.

Returns:
bool
property is_nan

True if and only if value is NaN.

Returns:
bool
property is_normal

True if and only if value is normal (not zero, subnormal, infinite, or NaN).

Returns:
bool
property is_subnormal

True if and only if value is subnormal.

Returns:
bool
property is_zero

True if and only if value is zero.

Returns:
bool