POWER Vector Library Manual
1.0.4
|
Header package containing a collection of 128-bit SIMD operations over 64-bit double-precision floating point elements. More...
Go to the source code of this file.
Functions | |
static vf64_t | vec_absf64 (vf64_t vf64x) |
Vector double absolute value. More... | |
static int | vec_all_isfinitef64 (vf64_t vf64) |
Return true if all 2x64-bit vector double values are Finite (Not NaN nor Inf). More... | |
static int | vec_all_isinff64 (vf64_t vf64) |
Return true if all 2x64-bit vector double values are infinity. More... | |
static int | vec_all_isnanf64 (vf64_t vf64) |
Return true if all 2x64-bit vector double values are NaN. More... | |
static int | vec_all_isnormalf64 (vf64_t vf64) |
Return true if all 2x64-bit vector double values are normal (Not NaN, Inf, denormal, or zero). More... | |
static int | vec_all_issubnormalf64 (vf64_t vf64) |
Return true if all 2x64-bit vector double values are subnormal (denormal). More... | |
static int | vec_all_iszerof64 (vf64_t vf64) |
Return true if all 2x64-bit vector double values are +-0.0. More... | |
static int | vec_any_isfinitef64 (vf64_t vf64) |
Return true if any of 2x64-bit vector double values are Finite (Not NaN nor Inf). More... | |
static int | vec_any_isinff64 (vf64_t vf64) |
Return true if any of 2x64-bit vector double values are infinity. More... | |
static int | vec_any_isnanf64 (vf64_t vf64) |
Return true if any of 2x64-bit vector double values are NaN. More... | |
static int | vec_any_isnormalf64 (vf64_t vf64) |
Return true if any of 2x64-bit vector double values are normal (Not NaN, Inf, denormal, or zero). More... | |
static int | vec_any_issubnormalf64 (vf64_t vf64) |
Return true if any of 2x64-bit vector double values is subnormal (denormal). More... | |
static int | vec_any_iszerof64 (vf64_t vf64) |
Return true if any of 2x64-bit vector double values are +-0.0. More... | |
static vf64_t | vec_copysignf64 (vf64_t vf64x, vf64_t vf64y) |
Copy the sign bit from vf64x merged with magnitude from vf64y and return the resulting vector double values. More... | |
static vb64_t | vec_isfinitef64 (vf64_t vf64) |
Return 2x64-bit vector boolean true values for each double element that is Finite (Not NaN nor Inf). More... | |
static vb64_t | vec_isinff64 (vf64_t vf64) |
Return 2x64-bit vector boolean true values for each double, if infinity. More... | |
static vb64_t | vec_isnanf64 (vf64_t vf64) |
Return 2x64-bit vector boolean true values, for each double NaN value. More... | |
static vb64_t | vec_isnormalf64 (vf64_t vf64) |
Return 2x64-bit vector boolean true values, for each double value, if normal (Not NaN, Inf, denormal, or zero). More... | |
static vb64_t | vec_issubnormalf64 (vf64_t vf64) |
Return 2x64-bit vector boolean true values, for each double value that is subnormal (denormal). More... | |
static vb64_t | vec_iszerof64 (vf64_t vf64) |
Return 2x64-bit vector boolean true values, for each double value that is +-0.0. More... | |
static long double | vec_pack_longdouble (vf64_t lval) |
Copy the pair of doubles from a vector to IBM long double. More... | |
static vb64_t | vec_setb_dp (vf64_t vra) |
Vector Set Bool from Sign, Double Precision. More... | |
static vf64_t | vec_unpack_longdouble (long double lval) |
Copy the pair of doubles from a IBM long double to a vector double. More... | |
static vf64_t | vec_vglfdso (double *array, const long long offset0, const long long offset1) |
Vector Gather-Load Float Double from scalar Offsets. More... | |
static vf64_t | vec_vglfddo (double *array, vi64_t vra) |
Vector Gather-Load Float Double from Doubleword Offsets. More... | |
static vf64_t | vec_vglfddsx (double *array, vi64_t vra, const unsigned char scale) |
Vector Gather-Load Float Double from Doubleword Scaled Indexes. More... | |
static vf64_t | vec_vglfddx (double *array, vi64_t vra) |
Vector Gather-Load Float Double from Doubleword indexes. More... | |
static void | vec_vsstfdso (vf64_t xs, double *array, const long long offset0, const long long offset1) |
Vector Scatter-Store Float Double to Scalar Offsets. More... | |
static void | vec_vsstfddo (vf64_t xs, double *array, vi64_t vra) |
Vector Scatter-Store Float Double to Doubleword Offsets. More... | |
static void | vec_vsstfddsx (vf64_t xs, double *array, vi64_t vra, const unsigned char scale) |
Vector Scatter-Store Float Double to Doubleword Scaled Index. More... | |
static void | vec_vsstfddx (vf64_t xs, double *array, vi64_t vra) |
Vector Scatter-Store Float Double to Doubleword Indexes. More... | |
static vf64_t | vec_vlxsfdx (const signed long long ra, const double *rb) |
Vector Load Scalar Float Double Indexed. More... | |
static void | vec_vstxsfdx (vf64_t xs, const signed long long ra, double *rb) |
Vector Store Scalar Float Double Indexed. More... | |
static vf64_t | vec_xviexpdp (vui64_t sig, vui64_t exp) |
Vector Insert Exponent Double-Precision. More... | |
static vui64_t | vec_xvxexpdp (vf64_t vrb) |
Vector Extract Exponent Double-Precision. More... | |
static vui64_t | vec_xvxsigdp (vf64_t vrb) |
Vector Extract Significand Double-Precision. More... | |
Header package containing a collection of 128-bit SIMD operations over 64-bit double-precision floating point elements.
Many vector double-precision (64-bit float) operations are implemented with PowerISA-2.06 Vector Scalar Extended (VSX) (POWER7 and later) instructions. Most VSX instructions provide access to 64 combined scalar/vector registers. PowerISA-3.0 (POWER9) provides additional vector double operations: convert with round, convert to/from integer, insert/extract exponent and significand, and test data class. Most of these operations (compiler built-ins, or intrinsics) are defined in <altivec.h> and described in the compiler documentation.
So it is reasonable for this header to provide vector forms of the double-precision floating point classification functions (isnormal/subnormal/finite/inf/nan/zero, etc.). These functions can be implemented directly using (one or more) POWER9 instructions, or a few vector logical and integer compare instructions for POWER7/8. Each is comfortably small enough to be in-lined and inherently faster than the equivalent POSIX or compiler built-in runtime scalar functions.
Most of these operations are implemented in a few instructions on newer (POWER7/POWER8/POWER9) processors. This header serves to fill in functional gaps for older (POWER7, POWER8) processors and provides an inline assembler implementation for older compilers that do not provide the built-ins.
This header covers operations that are any of the following:
For example: using the the classification functions for implementing the math library function sine and cosine. The POSIX specification requires that special input values are processed without raising extraneous floating point exceptions and return specific floating point values in response. For example, the sin() function.
The following code example uses functions from this header to address the POSIX requirements for special values input to for a vectorized sinf():
The code generated for this fragment runs between 24 (-mcpu=power9) and 40 (-mcpu=power8) instructions. The normal execution path is 14 to 25 instructions respectively.
Another example the cos() function.
The following code example uses functions from this header to address the POSIX requirements for special values input to vectorized cosf():
Neither example raises floating point exceptions or sets errno, as appropriate for a vector math library.
High level performance estimates are provided as an aid to function selection when evaluating algorithms. For background on how Latency and Throughput are derived see: Performance data.
Vector double absolute value.
processor | Latency | Throughput |
---|---|---|
power8 | 6-7 | 2/cycle |
power9 | 2 | 2/cycle |
vf64x | vector double values containing the magnitudes. |
|
inlinestatic |
Return true if all 2x64-bit vector double values are Finite (Not NaN nor Inf).
A IEEE Binary64 finite value has an exponent between 0x000 and 0x7fe (a 0x7ff indicates NaN or Inf). The significand can be any value. The sign bit is ignored.
processor | Latency | Throughput |
---|---|---|
power8 | 4-20 | 2/cycle |
power9 | 6 | 1/cycle |
vf64 | a vector of __binary64 values. |
|
inlinestatic |
Return true if all 2x64-bit vector double values are infinity.
A IEEE Binary64 infinity has a exponent of 0x7ff and significand of all zeros. The sign bit is ignored.
processor | Latency | Throughput |
---|---|---|
power8 | 6-20 | 2/cycle |
power9 | 6 | 1/cycle |
vf64 | a vector of __binary64 values. |
|
inlinestatic |
Return true if all 2x64-bit vector double values are NaN.
A IEEE Binary64 NaN value has an exponent between 0x7ff and the significand is nonzero. The sign bit is ignored.
processor | Latency | Throughput |
---|---|---|
power8 | 6-20 | 2/cycle |
power9 | 6 | 1/cycle |
vf64 | a vector of __binary64 values. |
|
inlinestatic |
Return true if all 2x64-bit vector double values are normal (Not NaN, Inf, denormal, or zero).
A IEEE Binary64 normal value has an exponent between 0x001 and 0x7fe (a 0x7ff indicates NaN or Inf). The significand can be any value (expect 0 if the exponent is zero). The sign bit is ignored.
processor | Latency | Throughput |
---|---|---|
power8 | 10-28 | 1/cycle |
power9 | 6 | 1/cycle |
vf64 | a vector of __binary64 values. |
|
inlinestatic |
Return true if all 2x64-bit vector double values are subnormal (denormal).
A IEEE Binary64 subnormal has an exponent of 0x000 and a nonzero significand. The sign bit is ignored.
processor | Latency | Throughput |
---|---|---|
power8 | 10-30 | 1/cycle |
power9 | 6 | 1/cycle |
vf64 | a vector of __binary64 values. |
|
inlinestatic |
Return true if all 2x64-bit vector double values are +-0.0.
A IEEE Binary64 zero has an exponent of 0x000 and a zero significand. The sign bit is ignored.
processor | Latency | Throughput |
---|---|---|
power8 | 6-20 | 2/cycle |
power9 | 6 | 1/cycle |
vf64 | a vector of __binary64 values. |
|
inlinestatic |
Return true if any of 2x64-bit vector double values are Finite (Not NaN nor Inf).
A IEEE Binary64 finite value has an exponent between 0x000 and 0x7fe (a 0x7ff indicates NaN or Inf). The significand can be any value. The sign bit is ignored.
processor | Latency | Throughput |
---|---|---|
power8 | 4-20 | 2/cycle |
power9 | 6 | 1/cycle |
vf64 | a vector of __binary64 values. |
|
inlinestatic |
Return true if any of 2x64-bit vector double values are infinity.
A IEEE Binary64 infinity has a exponent of 0x7ff and significand of all zeros.
processor | Latency | Throughput |
---|---|---|
power8 | 6-20 | 2/cycle |
power9 | 6 | 1/cycle |
vf64 | a vector of __binary32 values. |
|
inlinestatic |
Return true if any of 2x64-bit vector double values are NaN.
A IEEE Binary64 NaN value has an exponent between 0x7ff and the significand is nonzero. The sign bit is ignored.
processor | Latency | Throughput |
---|---|---|
power8 | 6-20 | 2/cycle |
power9 | 6 | 1/cycle |
vf64 | a vector of __binary64 values. |
|
inlinestatic |
Return true if any of 2x64-bit vector double values are normal (Not NaN, Inf, denormal, or zero).
A IEEE Binary64 normal value has an exponent between 0x001 and 0x7fe (a 0x7ff indicates NaN or Inf). The significand can be any value (expect 0 if the exponent is zero). The sign bit is ignored.
processor | Latency | Throughput |
---|---|---|
power8 | 6-20 | 1/cycle |
power9 | 6 | 1/cycle |
vf64 | a vector of __binary64 values. |
|
inlinestatic |
Return true if any of 2x64-bit vector double values is subnormal (denormal).
A IEEE Binary64 subnormal has an exponent of 0x000 and a nonzero significand. The sign bit is ignored.
processor | Latency | Throughput |
---|---|---|
power8 | 10-18 | 1/cycle |
power9 | 6 | 1/cycle |
vf64 | a vector of __binary64 values. |
|
inlinestatic |
Return true if any of 2x64-bit vector double values are +-0.0.
A IEEE Binary64 zero has an exponent of 0x000 and a zero significand. The sign bit is ignored.
processor | Latency | Throughput |
---|---|---|
power8 | 6-20 | 2/cycle |
power9 | 6 | 1/cycle |
vf64 | a vector of __binary64 values. |
Copy the sign bit from vf64x merged with magnitude from vf64y and return the resulting vector double values.
processor | Latency | Throughput |
---|---|---|
power8 | 6-7 | 2/cycle |
power9 | 2 | 2/cycle |
vf64x | vector double values containing the sign bits. |
vf64y | vector double values containing the magnitudes. |
Return 2x64-bit vector boolean true values for each double element that is Finite (Not NaN nor Inf).
A IEEE Binary64 finite value has an exponent between 0x000 and 0x7fe (a 0x7ff indicates NaN or Inf). The significand can be any value.
Using the vec_cmpeq conditional to generate the predicate mask for NaN / Inf and then invert this for the finite condition. The sign bit is ignored.
processor | Latency | Throughput |
---|---|---|
power8 | 6-15 | 2/cycle |
power9 | 5 | 2/cycle |
vf64 | a vector of __binary64 values. |
Return 2x64-bit vector boolean true values for each double, if infinity.
A IEEE Binary64 infinity has a exponent of 0x7ff and significand of all zeros.
processor | Latency | Throughput |
---|---|---|
power8 | 4-13 | 2/cycle |
power9 | 3 | 2/cycle |
vf64 | a vector of __binary64 values. |
Return 2x64-bit vector boolean true values, for each double NaN value.
A IEEE Binary64 NaN value has an exponent between 0x7ff and the significand is nonzero. The sign bit is ignored.
processor | Latency | Throughput |
---|---|---|
power8 | 4-13 | 2/cycle |
power9 | 3 | 2/cycle |
vf64 | a vector of __binary64 values. |
Return 2x64-bit vector boolean true values, for each double value, if normal (Not NaN, Inf, denormal, or zero).
A IEEE Binary64 normal value has an exponent between 0x001 and 0x7ffe (a 0x7ff indicates NaN or Inf). The significand can be any value (expect 0 if the exponent is zero).
processor | Latency | Throughput |
---|---|---|
power8 | 6-15 | 1/cycle |
power9 | 5 | 1/cycle |
vf64 | a vector of __binary64 values. |
Return 2x64-bit vector boolean true values, for each double value that is subnormal (denormal).
A IEEE Binary64 subnormal has an exponent of 0x000 and a nonzero significand. The sign bit is ignored.
processor | Latency | Throughput |
---|---|---|
power8 | 6-16 | 1/cycle |
power9 | 3 | 1/cycle |
vf64 | a vector of __binary64 values. |
Return 2x64-bit vector boolean true values, for each double value that is +-0.0.
A IEEE Binary64 zero has an exponent of 0x000 and a zero significand. The sign bit is ignored.
processor | Latency | Throughput |
---|---|---|
power8 | 4-13 | 2/cycle |
power9 | 3 | 2/cycle |
vf64 | a vector of __binary32 values. |
|
inlinestatic |
Copy the pair of doubles from a vector to IBM long double.
lval | vector double values containing the IBM long double. |
Vector Set Bool from Sign, Double Precision.
For each double, propagate the sign bit to all 64-bits of that doubleword. The result is vector bool long long reflecting the sign bit of each 64-bit double.
The resulting mask can be used in vector masking and select operations.
processor | Latency | Throughput |
---|---|---|
power8 | 2-4 | 2/cycle |
power9 | 2-5 | 2/cycle |
vra | Vector double. |
|
inlinestatic |
Copy the pair of doubles from a IBM long double to a vector double.
lval | IBM long double as FPR pair. |
Vector Gather-Load Float Double from Doubleword Offsets.
For each doubleword element [i] of vra, load the float double element at *(char*)array+vra[i]. Merge those float double elements and return the resulting vector.
processor | Latency | Throughput |
---|---|---|
power8 | 12 | 1/cycle |
power9 | 11 | 1/cycle |
array | Pointer to array of doubles. |
vra | Vector of doubleword (64-bit) byte offsets from &array. |
Vector Gather-Load Float Double from Doubleword Scaled Indexes.
For each doubleword element [i] of vra, load the float double element *array[vra[i] * (1 << scale)]. Merge those float double elements and return the resulting vector. Indexes are converted to offsets from *array by shifting each doubleword left (3+scale) bits.
processor | Latency | Throughput |
---|---|---|
power8 | 14-23 | 1/cycle |
power9 | 13-22 | 1/cycle |
array | Pointer to array of doubles. |
vra | Vector of doubleword indexes. |
scale | 8-bit integer. Indexes are multiplied by 2scale. |
Vector Gather-Load Float Double from Doubleword indexes.
For each doubleword element [i] of vra, load the double element array[vra[i]]. Merge those float double elements and return the resulting vector. The indexes are converted to offsets from *array by shifting each doubleword index left 3-bits (*8).
processor | Latency | Throughput |
---|---|---|
power8 | 14-23 | 1/cycle |
power9 | 13-22 | 1/cycle |
array | Pointer to array of doubles. |
vra | Vector of doubleword indexes. |
|
inlinestatic |
Vector Gather-Load Float Double from scalar Offsets.
For each scalar offset[0|1], load the float double element at *(char*)array+offset[0|1]. Merge those float double elements and return the resulting vector.
processor | Latency | Throughput |
---|---|---|
power8 | 12 | 1/cycle |
power9 | 11 | 1/cycle |
array | Pointer to array of doubles. |
offset0 | Scalar (64-bit) byte offsets from &array. |
offset1 | Scalar (64-bit) byte offsets from &array. |
|
inlinestatic |
Vector Load Scalar Float Double Indexed.
Load the left most doubleword of vector xt as a scalar double from the effective address formed by rb+ra. The operand rb is a pointer to an array of doubles. The operand ra is a doubleword integer byte offset from rb. The result xt is returned as a vf64_t vector. For best performance rb and ra should be doubleword aligned (integer multiple of 8).
This operation is an alternate form of Vector Load Element (vec_lde), with the added simplification that data is always left justified in the vector. This simplifies merging elements for gather operations.
processor | Latency | Throughput |
---|---|---|
power8 | 5 | 2/cycle |
power9 | 5 | 2/cycle |
ra | const doubleword index (offset/displacement). |
rb | const doubleword pointer to an array of doubles. |
Vector Scatter-Store Float Double to Doubleword Offsets.
For each doubleword element [i] of vra, Store the double element xs[i] at *(char*)array+vra[i].
processor | Latency | Throughput |
---|---|---|
power8 | 12 | 1/cycle |
power9 | 8 | 1/cycle |
xs | Vector double elements to scatter store. |
array | Pointer to array of doubles. |
vra | Vector of doubleword (64-bit) byte offsets from &array. |
|
inlinestatic |
Vector Scatter-Store Float Double to Doubleword Scaled Index.
For each doubleword element [i] of vra, store the double element xs[i] at array[vra[i] * (1 << scale)]. Indexes are converted to offsets from *array by shifting each doubleword of vra left (3+scale) bits.
processor | Latency | Throughput |
---|---|---|
power8 | 14-23 | 1/cycle |
power9 | 10-19 | 1/cycle |
xs | Vector double elements to store. |
array | Pointer to array of doubles. |
vra | Vector of doubleword indexes. |
scale | Factor effectually multiplying the indexes by 2scale. |
Vector Scatter-Store Float Double to Doubleword Indexes.
For each doubleword element [i] of vra, store the double element xs[i] at array[vra[i]]. Indexes are converted to offsets from *array by shifting each doubleword of vra left 3 bits.
processor | Latency | Throughput |
---|---|---|
power8 | 14-23 | 1/cycle |
power9 | 10-19 | 1/cycle |
xs | Vector double elements to store. |
array | Pointer to array of doubles. |
vra | Vector of doubleword indexes. |
|
inlinestatic |
Vector Scatter-Store Float Double to Scalar Offsets.
For each doubleword element [i] of vra, Store the double element xs[i] at *(char*)array+offset[0|1].
processor | Latency | Throughput |
---|---|---|
power8 | 12 | 1/cycle |
power9 | 8 | 1/cycle |
xs | Vector double elements to scatter store. |
array | Pointer to array of doubles. |
offset0 | Scalar (64-bit) byte offset from &array. |
offset1 | Scalar (64-bit) byte offset from &array. |
|
inlinestatic |
Vector Store Scalar Float Double Indexed.
Stores the left most doubleword of vector xs as a scalar double float at the effective address formed by rb+ra. The operand rb is a pointer to an array of doubles. The operand ra is a doubleword integer byte offset from rb. For best performance rb and ra should be doubleword aligned (integer multiple of 8).
This operation is an alternate form of vector store element, with the added simplification that data is always left justified in the vector. This simplifies scatter operations.
processor | Latency | Throughput |
---|---|---|
power8 | 0 - 2 | 2/cycle |
power9 | 0 - 2 | 4/cycle |
xs | vector doubleword element 0 to be stored. |
ra | const doubleword index (offset/displacement). |
rb | const doubleword pointer to an array of doubles. |
Vector Insert Exponent Double-Precision.
For each doubleword of sig and exp, merge the sign (bit 0) and significand (bits 12:63) from sig with the 11-bit exponent from exp (bits 53:63). The exponent is merged into bits 1:11 of the final result. The result is returned as a Vector Double-Precision floating point value.
processor | Latency | Throughput |
---|---|---|
power8 | 6-15 | 2/cycle |
power9 | 2 | 4/cycle |
sig | Vector unsigned long long containing the Sign Bit and 52-bit significand. |
exp | Vector unsigned long long containing the 11-bit exponent. |
Vector Extract Exponent Double-Precision.
For each doubleword of vrb, Extract the double-precision exponent (bits 1:11) and right justify it to (bits 53:63 of) of the result vector doubleword. The result is returned as vector long long integer value.
processor | Latency | Throughput |
---|---|---|
power8 | 6-15 | 2/cycle |
power9 | 2 | 4/cycle |
vrb | vector double value. |
Vector Extract Significand Double-Precision.
For each doubleword of vrb, Extract the double-precision significand (bits 12:63) and restore the implied (hidden) bit (bit 11) if the double-precision value is normal (not zero, subnormal, Infinity or NaN). The result is return as vector long long integer value with up to 53 bits of significance.
processor | Latency | Throughput |
---|---|---|
power8 | 8-17 | 1/cycle |
power9 | 3 | 2/cycle |
vrb | vector double value. |