POWER Vector Library Manual
1.0.4
|
Header package containing a collection of 128-bit SIMD operations over 4x32-bit floating point elements. More...
Go to the source code of this file.
Typedefs | |
typedef vf32_t | __vbinary32 |
typedef __vbinary32 to vector of 4 xfloat elements. | |
Functions | |
static vf32_t | vec_absf32 (vf32_t vf32x) |
Vector float absolute value. More... | |
static int | vec_all_isfinitef32 (vf32_t vf32) |
Return true if all 4x32-bit vector float values are Finite (Not NaN nor Inf). More... | |
static int | vec_all_isinff32 (vf32_t vf32) |
Return true if all 4x32-bit vector float values are infinity. More... | |
static int | vec_all_isnanf32 (vf32_t vf32) |
Return true if all of 4x32-bit vector float values are NaN. More... | |
static int | vec_all_isnormalf32 (vf32_t vf32) |
Return true if all of 4x32-bit vector float values are normal (Not NaN, Inf, denormal, or zero). More... | |
static int | vec_all_issubnormalf32 (vf32_t vf32) |
Return true if all of 4x32-bit vector float values is subnormal (denormal). More... | |
static int | vec_all_iszerof32 (vf32_t vf32) |
Return true if all of 4x32-bit vector float values are +-0.0. More... | |
static int | vec_any_isfinitef32 (vf32_t vf32) |
Return true if any 4x32-bit vector float values are Finite (Not NaN nor Inf). More... | |
static int | vec_any_isinff32 (vf32_t vf32) |
Return true if any 4x32-bit vector float values are infinity. More... | |
static int | vec_any_isnanf32 (vf32_t vf32) |
Return true if any of 4x32-bit vector float values are NaN. More... | |
static int | vec_any_isnormalf32 (vf32_t vf32) |
Return true if any of 4x32-bit vector float values are normal (Not NaN, Inf, denormal, or zero). More... | |
static int | vec_any_issubnormalf32 (vf32_t vf32) |
Return true if any of 4x32-bit vector float values is subnormal (denormal). More... | |
static int | vec_any_iszerof32 (vf32_t vf32) |
Return true if any of 4x32-bit vector float values are +-0.0. More... | |
static vf32_t | vec_copysignf32 (vf32_t vf32x, vf32_t vf32y) |
Copy the sign bit from vf32x merged with magnitude from vf32y and return the resulting vector float values. More... | |
static vb32_t | vec_isfinitef32 (vf32_t vf32) |
Return 4x32-bit vector boolean true values for each float element that is Finite (Not NaN nor Inf). More... | |
static vb32_t | vec_isinff32 (vf32_t vf32) |
Return 4x32-bit vector boolean true values for each float, if infinity. More... | |
static vb32_t | vec_isnanf32 (vf32_t vf32) |
Return 4x32-bit vector boolean true values, for each float NaN value. More... | |
static vb32_t | vec_isnormalf32 (vf32_t vf32) |
Return 4x32-bit vector boolean true values, for each float value, if normal (Not NaN, Inf, denormal, or zero). More... | |
static vb32_t | vec_issubnormalf32 (vf32_t vf32) |
Return 4x32-bit vector boolean true values, for each float value that is subnormal (denormal). More... | |
static vb32_t | vec_iszerof32 (vf32_t vf32) |
Return 4x32-bit vector boolean true values, for each float value that is +-0.0. More... | |
static vb32_t | vec_setb_sp (vf32_t vra) |
Vector Set Bool from Sign, Single Precision. More... | |
static vf32_t | vec_vgl4fsso (float *array, const long long offset0, const long long offset1, const long long offset2, const long long offset3) |
Vector Gather-Load 4 Words from scalar Offsets. More... | |
static vf32_t | vec_vgl4fswo (float *array, vi32_t vra) |
Vector Gather-Load 4 Words from Vector Word Offsets. More... | |
static vf32_t | vec_vgl4fswsx (float *array, vi32_t vra, const unsigned char scale) |
Vector Gather-Load 4 Words from Vector Word Scaled Indexes. More... | |
static vf32_t | vec_vgl4fswx (float *array, vi32_t vra) |
Vector Gather-Load 4 Words from Vector Word Indexes. More... | |
static vf64_t | vec_vglfsdo (float *array, vi64_t vra) |
Vector Gather-Load Single Floats from Vector Doubleword Offsets. More... | |
static vf64_t | vec_vglfsdsx (float *array, vi64_t vra, const unsigned char scale) |
Vector Gather-Load Single Floats from Vector Doubleword Scaled Indexes. More... | |
static vf64_t | vec_vglfsdx (float *array, vi64_t vra) |
Vector Gather-Load Single Floats from Vector Doubleword Indexes. More... | |
static vf64_t | vec_vglfsso (float *array, const long long offset0, const long long offset1) |
Vector Gather-Load Float Single from scalar Offsets. More... | |
static vf64_t | vec_vlxsspx (const signed long long ra, const float *rb) |
Vector Load Scalar Single Float Indexed. More... | |
static void | vec_vsst4fsso (vf32_t xs, float *array, const long long offset0, const long long offset1, const long long offset2, const long long offset3) |
Vector Scatter-Store 4 Float Singles to Scalar Offsets. More... | |
static void | vec_vsst4fswo (vf32_t xs, float *array, vi32_t vra) |
Vector Scatter-Store 4 Float Singles to Vector Word Offsets. More... | |
static void | vec_vsst4fswsx (vf32_t xs, float *array, vi32_t vra, const unsigned char scale) |
Vector Scatter-Store 4 Float Singles to Vector Word Indexes. More... | |
static void | vec_vsst4fswx (vf32_t xs, float *array, vi32_t vra) |
Vector Scatter-Store 4 Float Singles to Vector Word Indexes. More... | |
static void | vec_vsstfsdo (vf64_t xs, float *array, vi64_t vra) |
Vector Scatter-Store Floats Singles to Vector Doubleword Offsets. More... | |
static void | vec_vsstfsdsx (vf64_t xs, float *array, vi64_t vra, const unsigned char scale) |
Vector Scatter-Store Words to Vector Doubleword Scaled Indexes. More... | |
static void | vec_vsstfsdx (vf64_t xs, float *array, vi64_t vra) |
Vector Scatter-Store Words to Vector Doubleword Indexes. More... | |
static void | vec_vsstfsso (vf64_t xs, float *array, const long long offset0, const long long offset1) |
Vector Scatter-Store Float Singles to Scalar Offsets. More... | |
static void | vec_vstxsspx (vf64_t xs, const signed long long ra, float *rb) |
Vector Store Scalar Single Float Indexed. More... | |
static vf32_t | vec_xviexpsp (vui32_t sig, vui32_t exp) |
Vector Insert Exponent Single-Precision. More... | |
static vui32_t | vec_xvxexpsp (vf32_t vrb) |
Vector Extract Exponent Single-Precision. More... | |
static vui32_t | vec_xvxsigsp (vf32_t vrb) |
Vector Extract Significand Single-Precision. More... | |
Header package containing a collection of 128-bit SIMD operations over 4x32-bit floating point elements.
Most vector float (32-bit float) operations are implemented with PowerISA VMX instructions either defined by the original VMX (a.k.a. Altivec) or added to later versions of the PowerISA. POWER8 added the Vector Scalar Extended (VSX) with access to additional vector registers (64 total) and operations. Most of these operations (compiler built-ins, or intrinsics) are defined in <altivec.h> and described in the compiler documentation.
Most of these operations are implemented in a single instruction on newer (POWER8/POWER9) processors. This header serves to fill in functional gaps for older (POWER7, POWER8) processors and provides an inline assembler implementation for older compilers that do not provide the built-ins.
POWER9 adds useful vector float operations, including: test data class, extract exponent, extract significand, and insert exponent. These operations are common in math library implementations.
So it is reasonable for this header to provide vector forms of the floating point classification functions (isnormal/subnormal/finite/inf/nan/zero, etc.). These functions can be implemented directly using (one or more) POWER9 instructions, or a few vector logical and integer compare instructions for POWER7/8. Each is comfortably small enough to be in-lined and inherently faster than the equivalent POSIX or compiler built-in runtime scalar functions.
This header covers operations that are any of the following:
For example: using the the classification functions for implementing the math library function sine and cosine. The POSIX specification requires that special input values are processed without raising extraneous floating point exceptions and return specific floating point values in response. For example the sin() function.
The following code example uses functions from this header to address the POSIX requirements for special values input to for a vectorized sinf():
The code generated for this fragment runs between 24 (-mcpu=power9) and 40 (-mcpu=power8) instructions. The normal execution path is 14 to 25 instructions respectively.
Another example the cos() function.
The following code example uses functions from this header to address the POSIX requirements for special values input to vectorized cosf():
Neither example raises floating point exceptions or sets errno, as appropriate for a vector math library.
High level performance estimates are provided as an aid to function selection when evaluating algorithms. For background on how Latency and Throughput are derived see: Performance data.
Vector float absolute value.
processor | Latency | Throughput |
---|---|---|
power8 | 6-7 | 2/cycle |
power9 | 2 | 2/cycle |
vf32x | vector float values containing the magnitudes. |
|
inlinestatic |
Return true if all 4x32-bit vector float values are Finite (Not NaN nor Inf).
A IEEE Binary32 finite value has an exponent between 0x000 and 0x7f0 (a 0x7f8 indicates NaN or Inf). The significand can be any value. The sign bit is ignored.
processor | Latency | Throughput |
---|---|---|
power8 | 4-20 | 2/cycle |
power9 | 6 | 1/cycle |
vf32 | a vector of __binary32 values. |
|
inlinestatic |
Return true if all 4x32-bit vector float values are infinity.
A IEEE Binary32 infinity has a exponent of 0x7f8 and significand of all zeros. The sign bit is ignored.
processor | Latency | Throughput |
---|---|---|
power8 | 6-20 | 2/cycle |
power9 | 6 | 1/cycle |
vf32 | a vector of __binary32 values. |
|
inlinestatic |
Return true if all of 4x32-bit vector float values are NaN.
A IEEE Binary32 NaN value has an exponent between 0x7f8 and the significand is nonzero. The sign bit is ignored.
processor | Latency | Throughput |
---|---|---|
power8 | 6-20 | 2/cycle |
power9 | 6 | 1/cycle |
vf32 | a vector of __binary32 values. |
|
inlinestatic |
Return true if all of 4x32-bit vector float values are normal (Not NaN, Inf, denormal, or zero).
A IEEE Binary32 normal value has an exponent between 0x008 and 0x7f (a 0x7f8 indicates NaN or Inf). The significand can be any value (expect 0 if the exponent is zero). The sign bit is ignored.
processor | Latency | Throughput |
---|---|---|
power8 | 6-20 | 1/cycle |
power9 | 6 | 1/cycle |
vf32 | a vector of __binary32 values. |
|
inlinestatic |
Return true if all of 4x32-bit vector float values is subnormal (denormal).
A IEEE Binary32 subnormal has an exponent of 0x000 and a nonzero significand. The sign bit is ignored.
processor | Latency | Throughput |
---|---|---|
power8 | 10-30 | 1/cycle |
power9 | 6 | 1/cycle |
vf32 | a vector of __binary32 values. |
|
inlinestatic |
Return true if all of 4x32-bit vector float values are +-0.0.
A IEEE Binary32 zero has an exponent of 0x000 and a zero significand. The sign bit is ignored.
processor | Latency | Throughput |
---|---|---|
power8 | 6-20 | 2/cycle |
power9 | 6 | 1/cycle |
vf32 | a vector of __binary32 values. |
|
inlinestatic |
Return true if any 4x32-bit vector float values are Finite (Not NaN nor Inf).
A IEEE Binary32 finite value has an exponent between 0x000 and 0x7f0 (a 0x7f8 indicates NaN or Inf). The significand can be any value. The sign bit is ignored.
processor | Latency | Throughput |
---|---|---|
power8 | 4-20 | 2/cycle |
power9 | 6 | 1/cycle |
vf32 | a vector of __binary32 values. |
|
inlinestatic |
Return true if any 4x32-bit vector float values are infinity.
A IEEE Binary32 infinity has a exponent of 0x7f8 and significand of all zeros.
processor | Latency | Throughput |
---|---|---|
power8 | 6-20 | 2/cycle |
power9 | 6 | 2/cycle |
vf32 | a vector of __binary32 values. |
|
inlinestatic |
Return true if any of 4x32-bit vector float values are NaN.
A IEEE Binary32 NaN value has an exponent between 0x7f8 and the significand is nonzero. The sign bit is ignored.
processor | Latency | Throughput |
---|---|---|
power8 | 6-20 | 2/cycle |
power9 | 6 | 2/cycle |
vf32 | a vector of __binary32 values. |
|
inlinestatic |
Return true if any of 4x32-bit vector float values are normal (Not NaN, Inf, denormal, or zero).
A IEEE Binary32 normal value has an exponent between 0x008 and 0x7f (a 0x7f8 indicates NaN or Inf). The significand can be any value (expect 0 if the exponent is zero). The sign bit is ignored.
processor | Latency | Throughput |
---|---|---|
power8 | 10-24 | 1/cycle |
power9 | 6 | 1/cycle |
vf32 | a vector of __binary32 values. |
|
inlinestatic |
Return true if any of 4x32-bit vector float values is subnormal (denormal).
A IEEE Binary32 subnormal has an exponent of 0x000 and a nonzero significand. The sign bit is ignored.
processor | Latency | Throughput |
---|---|---|
power8 | 10-18 | 1/cycle |
power9 | 6 | 1/cycle |
vf32 | a vector of __binary32 values. |
|
inlinestatic |
Return true if any of 4x32-bit vector float values are +-0.0.
A IEEE Binary32 zero has an exponent of 0x000 and a zero significand. The sign bit is ignored.
processor | Latency | Throughput |
---|---|---|
power8 | 6-20 | 2/cycle |
power9 | 6 | 1/cycle |
vf32 | a vector of __binary32 values. |
Copy the sign bit from vf32x merged with magnitude from vf32y and return the resulting vector float values.
processor | Latency | Throughput |
---|---|---|
power8 | 6-7 | 2/cycle |
power9 | 2 | 2/cycle |
vf32x | vector float values containing the sign bits. |
vf32y | vector float values containing the magnitudes. |
Return 4x32-bit vector boolean true values for each float element that is Finite (Not NaN nor Inf).
A IEEE Binary32 finite value has an exponent between 0x000 and 0x7f0 (a 0x7f8 indicates NaN or Inf). The significand can be any value. Using the vec_cmpeq conditional to generate the predicate mask for NaN / Inf and then invert this for the finite condition. The sign bit is ignored.
processor | Latency | Throughput |
---|---|---|
power8 | 6-15 | 2/cycle |
power9 | 5 | 2/cycle |
vf32 | a vector of __binary32 values. |
Return 4x32-bit vector boolean true values for each float, if infinity.
A IEEE Binary32 infinity has a exponent of 0x7f8 and significand of all zeros.
processor | Latency | Throughput |
---|---|---|
power8 | 4-13 | 2/cycle |
power9 | 3 | 2/cycle |
vf32 | a vector of __binary32 values. |
Return 4x32-bit vector boolean true values, for each float NaN value.
A IEEE Binary32 NaN value has an exponent between 0x7f8 and the significand is nonzero. The sign bit is ignored.
processor | Latency | Throughput |
---|---|---|
power8 | 4-13 | 2/cycle |
power9 | 3 | 2/cycle |
vf32 | a vector of __binary32 values. |
Return 4x32-bit vector boolean true values, for each float value, if normal (Not NaN, Inf, denormal, or zero).
A IEEE Binary32 normal value has an exponent between 0x008 and 0x7f (a 0x7f8 indicates NaN or Inf). The significand can be any value (expect 0 if the exponent is zero). The sign bit is ignored.
processor | Latency | Throughput |
---|---|---|
power8 | 6-15 | 1/cycle |
power9 | 5 | 1/cycle |
vf32 | a vector of __binary32 values. |
Return 4x32-bit vector boolean true values, for each float value that is subnormal (denormal).
A IEEE Binary32 subnormal has an exponent of 0x000 and a nonzero significand. The sign bit is ignored.
processor | Latency | Throughput |
---|---|---|
power8 | 6-16 | 1/cycle |
power9 | 3 | 1/cycle |
vf32 | a vector of __binary32 values. |
Return 4x32-bit vector boolean true values, for each float value that is +-0.0.
A IEEE Binary32 zero has an exponent of 0x000 and a zero significand. The sign bit is ignored.
processor | Latency | Throughput |
---|---|---|
power8 | 4-13 | 2/cycle |
power9 | 5 | 2/cycle |
vf32 | a vector of __binary32 values. |
Vector Set Bool from Sign, Single Precision.
For each float, propagate the sign bit to all 32-bits of that word. The result is vector bool int reflecting the sign bit of each 32-bit float.
The resulting mask can be used in masking and select operations.
processor | Latency | Throughput |
---|---|---|
power8 | 2-9 | 2/cycle |
power9 | 2-8 | 2/cycle |
vra | Vector float. |
|
inlinestatic |
Vector Gather-Load 4 Words from scalar Offsets.
For each scalar offset[0,1,2,3], load the word from the effective address formed by *(char*)array+offset[0-3]. Merge resulting float single word elements [0,1,2,3] and return the resulting vector.
processor | Latency | Throughput |
---|---|---|
power8 | 10 | 1/cycle |
power9 | 11 | 1/cycle |
array | Pointer to array of integer words. |
offset0 | Scalar (64-bit) byte offset from &array. |
offset1 | Scalar (64-bit) byte offset from &array. |
offset2 | Scalar (64-bit) byte offset from &array. |
offset3 | Scalar (64-bit) byte offset from &array. |
Vector Gather-Load 4 Words from Vector Word Offsets.
For each signed word element [i] of vra, load the float single word element at *(char*)array+vra[i]. Merge those word elements [0-3] and return the resulting vector.
processor | Latency | Throughput |
---|---|---|
power8 | 14 | 1/cycle |
power9 | 15 | 1/cycle |
array | Pointer to array of integer words. |
vra | Vector of signed word (32-bit) byte offsets from &array. |
Vector Gather-Load 4 Words from Vector Word Scaled Indexes.
For each signed word element [i] of vra, load the float single word element at array[vra[i] << scale]. Merge those word elements [0-3] and return the resulting vector.
processor | Latency | Throughput |
---|---|---|
power8 | 16-25 | 1/cycle |
power9 | 18-27 | 1/cycle |
array | Pointer to array of integer words. |
vra | Vector of signed word (32-bit) indexes. |
scale | 8-bit integer. Indexes are multiplying by 2scale. |
Vector Gather-Load 4 Words from Vector Word Indexes.
For word element [i] of vra, load the float single word element at array[vra[i]]. Merge those word elements [0-3] and return the resulting vector.
processor | Latency | Throughput |
---|---|---|
power8 | 16-25 | 1/cycle |
power9 | 18-27 | 1/cycle |
array | Pointer to array of integer words. |
vra | Vector of signed word (32-bit) indexes. |
Vector Gather-Load Single Floats from Vector Doubleword Offsets.
For each doubleword element [0-1] of vra, load the float single word element at *(char*)array+vra[i] expanding them to float double format. Merge doubleword elements [0,1] and return the resulting vector.
processor | Latency | Throughput |
---|---|---|
power8 | 12 | 1/cycle |
power9 | 11 | 1/cycle |
array | Pointer to array of float singles. |
vra | Vector of doubleword (64-bit) byte offsets from &array. |
Vector Gather-Load Single Floats from Vector Doubleword Scaled Indexes.
For each doubleword element [0-1] of vra, load the float single word element at array[vra[i] << scale)]. Merge doubleword elements [0,1] and return the resulting vector.
processor | Latency | Throughput |
---|---|---|
power8 | 14-23 | 1/cycle |
power9 | 13-22 | 1/cycle |
array | Pointer to array of float. |
vra | Vector of doubleword indexes from &array. |
scale | 8-bit integer. Indexes are multiplying by 2scale. |
Vector Gather-Load Single Floats from Vector Doubleword Indexes.
For each doubleword element [0-1] of vra, load the float single word element at array[vra[i]]. Merge doubleword elements [0,1] and return the resulting vector.
processor | Latency | Throughput |
---|---|---|
power8 | 14-23 | 1/cycle |
power9 | 13-22 | 1/cycle |
array | Pointer to array of float. |
vra | Vector of doubleword indexes from &array. |
|
inlinestatic |
Vector Gather-Load Float Single from scalar Offsets.
For each scalar offset[0|1], load the float single element at *(char*)array+offset[0|1] expanding them to float double format. Merge doubleword elements [0,1] and return the resulting vector.
processor | Latency | Throughput |
---|---|---|
power8 | 7 | 2/cycle |
power9 | 11 | 2/cycle |
array | Pointer to array of floats. |
offset0 | Scalar (64-bit) byte offsets from &array. |
offset1 | Scalar (64-bit) byte offsets from &array. |
|
inlinestatic |
Vector Load Scalar Single Float Indexed.
Load doubleword[0] of vector xt as a scalar (double float formatted) single float word from the effective address formed by rb+ra. The operand rb is a pointer to an array of float words. The operand ra is a doubleword integer byte offset from rb. The result xt is returned as a vf64_t vector. For best performance rb and ra should be word aligned (integer multiple of 4).
This operation is an alternate form of Vector Load Element (vec_lde), with the added simplification that data is always left justified in the vector. Another advantage for Power8 and later, the lxsspx instruction can load directly into any of the 64 VSRs, while expanding the single float word value into float double format, in a single operation. Both simplify merging elements for gather operations.
processor | Latency | Throughput |
---|---|---|
power8 | 5 | 2/cycle |
power9 | 8 | 2/cycle |
ra | const doubleword index (offset/displacement). |
rb | const pointer to an array of floats. |
|
inlinestatic |
Vector Scatter-Store 4 Float Singles to Scalar Offsets.
For each float word element [0-3] of xs, store the float element xs[i] at *(char*)array+offset[i].
processor | Latency | Throughput |
---|---|---|
power8 | 6 | 1/cycle |
power9 | 4 | 2/cycle |
xs | Vector float elements to scatter store. |
array | Pointer to array of float words. |
offset0 | Scalar (64-bit) byte offset from &array. |
offset1 | Scalar (64-bit) byte offset from &array. |
offset2 | Scalar (64-bit) byte offset from &array. |
offset3 | Scalar (64-bit) byte offset from &array. |
Vector Scatter-Store 4 Float Singles to Vector Word Offsets.
For each float word element [0-3] of xs, store the float element xs[i] at *(char*)array+vra[i].
processor | Latency | Throughput |
---|---|---|
power8 | 10 | 1/cycle |
power9 | 12 | 2/cycle |
xs | Vector float elements to scatter store. |
array | Pointer to array of float words. |
vra | Vector of signed word (32-bit) byte offsets from &array. |
|
inlinestatic |
Vector Scatter-Store 4 Float Singles to Vector Word Indexes.
For each float word element [0-4] of xs, store the float element xs[i] at *(char*)array[vra[i]<<scale].
processor | Latency | Throughput |
---|---|---|
power8 | 12-21 | 1/cycle |
power9 | 15-24 | 2/cycle |
xs | Vector float elements to scatter store. |
array | Pointer to array of float words. |
vra | Vector of signed word (32-bit) indexes from array. |
scale | 8-bit integer. Indexes are multiplying by 2scale. |
Vector Scatter-Store 4 Float Singles to Vector Word Indexes.
For each float word element [0-3] of xs, store the float element xs[i] at *(char*)array[vra[i]].
processor | Latency | Throughput |
---|---|---|
power8 | 12-21 | 1/cycle |
power9 | 15-24 | 2/cycle |
xs | Vector float elements to scatter store. |
array | Pointer to array of float words. |
vra | Vector of signed word (32-bit) indexes from array. |
Vector Scatter-Store Floats Singles to Vector Doubleword Offsets.
For each doubleword element [0-1] of vra, store the doubleword float element xs[i], converted to float single word format, at *(char*)array+vra[i].
processor | Latency | Throughput |
---|---|---|
power8 | 8 | 1/cycle |
power9 | 9 | 2/cycle |
xs | Vector doubleword elements to scatter store as float single words. |
array | Pointer to array of float words. |
vra | Vector of doubleword (64-bit) byte offsets from &array. |
|
inlinestatic |
Vector Scatter-Store Words to Vector Doubleword Scaled Indexes.
For each doubleword element [0-1] of vra, store the doubleword float element xs[i], converted to float single word format, at array[vra[i]<<scale].
processor | Latency | Throughput |
---|---|---|
power8 | 10-19 | 1/cycle |
power9 | 10-19 | 1/cycle |
xs | Vector doubleword elements to scatter store as float single words. |
array | Pointer to array of float words. |
vra | Vector of doubleword (64-bit) indexes from &array. |
scale | 8-bit integer. Indexes are multiplying by 2scale. |
Vector Scatter-Store Words to Vector Doubleword Indexes.
For each doubleword element [0-1] of vra, store the doubleword float element xs[i], converted to float single word format, at array[vra[i]].
processor | Latency | Throughput |
---|---|---|
power8 | 10-19 | 1/cycle |
power9 | 10-19 | 1/cycle |
xs | Vector doubleword elements to scatter store as float single words. |
array | Pointer to array of float words. |
vra | Vector of doubleword (64-bit) indexes from &array. |
|
inlinestatic |
Vector Scatter-Store Float Singles to Scalar Offsets.
For each scalar offset[0-1], Store the doubleword element xs[i], converted to float single word format, at *(char*)array+offset[0|1].
processor | Latency | Throughput |
---|---|---|
power8 | 3 | 1/cycle |
power9 | 3 | 2/cycle |
xs | Vector doubleword elements to scatter store as float single words. |
array | Pointer to array of float words. |
offset0 | Scalar (64-bit) byte offset from &array. |
offset1 | Scalar (64-bit) byte offset from &array. |
|
inlinestatic |
Vector Store Scalar Single Float Indexed.
Stores doubleword float element 0 of vector xs as a scalar float word at the effective address formed by rb+ra. The operand rb is a pointer to an array of float. The operand ra is a doubleword integer byte offset from rb. For best performance rb and ra should be word aligned (integer multiple of 4).
This operation is an alternate form of vector store element (vec_ste), with the added simplification that data is always left justified in the vector. Another advantage for Power8 and later, the stxsspx instruction can load directly into any of the 64 VSRs. Both simplify scatter operations.
processor | Latency | Throughput |
---|---|---|
power8 | 0 - 2 | 2/cycle |
power9 | 0 - 2 | 4/cycle |
xs | vector doubleword element 0 to be stored as single float. |
ra | const doubleword index (offset/displacement). |
rb | const pointer to an array of floats. |
Vector Insert Exponent Single-Precision.
For each word of sig and exp, merge the sign (bit 0) and significand (bits 9:31) from sig with the 8-bit exponent from exp (bits 24:31). The exponent is merged into bits 1:8 of the final result. The result is returned as a Vector Single-Precision floating point value.
processor | Latency | Throughput |
---|---|---|
power8 | 6-15 | 2/cycle |
power9 | 2 | 4/cycle |
sig | Vector unsigned int containing the Sign Bit and 23-bit significand. |
exp | Vector unsigned int containing the 8-bit exponent. |
Vector Extract Exponent Single-Precision.
For each word of vrb, Extract the single-precision exponent (bits 1:8) and right justify it to (bits 24:31 of) of the result vector word. The result is returned as vector unsigned integer value.
processor | Latency | Throughput |
---|---|---|
power8 | 6-15 | 2/cycle |
power9 | 2 | 4/cycle |
vrb | vector double value. |
Vector Extract Significand Single-Precision.
For each word of vrb, Extract the single-precision significand (bits 0:31) and restore the implied (hidden) bit (bit 8) if the single-precision value is normal (not zero, subnormal, Infinity or NaN). The result is return as vector unsigned int value with up to 24 bits of significance.
processor | Latency | Throughput |
---|---|---|
power8 | 8-17 | 1/cycle |
power9 | 3 | 2/cycle |
vrb | vector double value. |