page POWER Vector Library (pveclib): Is there a way for automake to compile vec_int512_runtime.c with -mcpu=power9 and -o vec_runtime_PWR9.o? And similarly for PWR7/PWR8.
File vec_bcd_ppc.h: The BCD add/subtract extend/carry story is not complete. The carry extend operations based only on the OV condition codes only works as expected for bcdadd operands with the same sign and bcdsub with different signs. See vec_bcdaddcsq() and vec_bcdaddecsq(). Extended BCD difference (or subtract the same sign or add with different signs) is more complicated. See vec_bcdsubcsq() and vec_bcdsubecsq(). Generating a true borrow seems to require looking one (31-digit) column ahead or behind. The first attempt at generating correct borrowing is implemented in vec_cbcdaddcsq() and vec_cbcdaddecsq(). There are still cases where these operation will generate a borrow and invert (10s complement) incorrectly. The net seems to be that for BCD multiple precision difference to work correctly, the larger magnitude must be the first operand.
File vec_int128_ppc.h: The implementation above gives correct results for all the cases tested for divide by constants 10³¹ and 10³²). This is not a mathematical proof of correctness, just an observation. Anyone who finds a counter example or offers a mathematical proof should submit a bug report.
File vec_int512_ppc.h: Currently the dynamic resolvers and IFUNC symbols for vec_int512_runtime.c are contained within vec_runtime_DYN.c. As the list of runtime operations expands to other element sizes/types, vec_runtime_DYN.c should be refactored into multiple files.