Quantization and overflow handling¶
Quantization modes¶
The following figure illustrates the effect of the different quantization modes
on quantizing a fixed-point number with three fractional bits to none.
The dots corresponds to different values, where the red dots correspond to
numbers that were initially an integer and yellow dots to numbers that are
exactly between two integers (ties). Below each plot is an error distribution
histogram, where the red line indicates the bias (average error) of the quantization. Note that
the bias will converge towards zero the more bits are quantized away (except for
QuantizationMode.TRN
, which will converge towards a half).
data:image/s3,"s3://crabby-images/ce00e/ce00ecd29ab04c96f9627560564103532f07fe0e" alt="Illustration of the different quantization modes."
- class apytypes.QuantizationMode(value)¶
- RND = 5¶
Round to nearest, ties toward positive infinity.
- RND_CONV = 9¶
Round to nearest, ties to even.
- RND_CONV_ODD = 10¶
Round to nearest, ties to odd.
- RND_INF = 7¶
Round to nearest, ties away from zero.
- RND_MIN_INF = 8¶
Round to nearest, ties toward negative infinity.
- RND_ZERO = 6¶
Round to nearest, ties towards zero.
- TRN = 0¶
Round towards negative infinity (truncation).
Implementation: remove additional bits.
- TRN_INF = 1¶
Round towards positive infinity.
- TRN_ZERO = 2¶
Round towards zero (unbiased magnitude truncation).
- TRN_AWAY = 3¶
Round away from zero.
- TRN_MAG = 4¶
Fixed-point magnitude truncation (add sign-bit).
- JAM = 11¶
Jamming/von Neumann rounding.
- JAM_UNBIASED = 12¶
Unbiased jamming/von Neumann rounding.
Aliases¶
- TIES_ODD = 10¶
Alternate unbiased fixed-point rounding. Round to nearest, ties to odd. Alias for
RND_CONV_ODD
.
- TIES_NEG = 8¶
Round to nearest, ties toward negative infinity. Alias for
RND_MIN_INF
.
Utility functions¶
- apytypes.get_float_quantization_mode() QuantizationMode ¶
Get current quantization context.
- Returns:
See also
- apytypes.set_float_quantization_mode(mode: QuantizationMode) None ¶
Set current quantization context.
- Parameters:
- mode
QuantizationMode
The quantization mode to use.
- mode
See also
- apytypes.get_float_quantization_seed() int ¶
Set current quantization seed.
The quantization seed is used for stochastic quantization.
- Returns:
See also
- apytypes.set_float_quantization_seed(seed: int) None ¶
Set current quantization seed.
The quantization seed is used for stochastic quantization.
- Parameters:
- seed
int
The quantization seed to use.
- seed
See also
Sign of zero for floating-point¶
For multiplication and division the sign is always the XOR of the operands’ signs, but for addition and subtraction the sign depends on the quantization mode. Below is a table showing what the sign of zero will be in different cases. Using this table one can derive the sign for subtraction as well.
\(x + y\) |
TO_NEG |
Other modes |
---|---|---|
\((+0) + (+0)\) |
\(+0\) |
\(+0\) |
\((+0) + (-0)\) |
\(-0\) |
\(+0\) |
\((-0) + (+0)\) |
\(-0\) |
\(+0\) |
\((-0) + (-0)\) |
\(-0\) |
\(-0\) |
\(x + y, x = -y\) |
\(-0\) |
\(+0\) |