Optimize popcount functions with ARM Neon intrinsics.
authorNathan Bossart <nathan@postgresql.org>
Fri, 28 Mar 2025 19:49:35 +0000 (14:49 -0500)
committerNathan Bossart <nathan@postgresql.org>
Fri, 28 Mar 2025 19:49:35 +0000 (14:49 -0500)
commit6be53c27673a5fca64a00a684c36c29db6ca33a5
tree6631906bc69ffa8ec6404be8851d37b664e225f8
parent51a0382e8d8793b5cc89b69285e5ecdffe03c2bf
Optimize popcount functions with ARM Neon intrinsics.

This commit introduces Neon implementations of pg_popcount{32,64},
pg_popcount(), and pg_popcount_masked().  As in simd.h, we assume
that all available AArch64 hardware supports Neon, so we don't need
any new configure-time or runtime checks.  Some compilers already
emit Neon instructions for these functions, but our hand-rolled
implementations for pg_popcount() and pg_popcount_masked()
performed better in testing, likely due to better instruction-level
parallelism.

Author: "Chiranmoy.Bhattacharya@fujitsu.com" <Chiranmoy.Bhattacharya@fujitsu.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/010101936e4aaa70-b474ab9e-b9ce-474d-a3ba-a3dc223d295c-000000%40us-west-2.amazonses.com
src/include/port/pg_bitutils.h
src/port/Makefile
src/port/meson.build
src/port/pg_bitutils.c
src/port/pg_popcount_aarch64.c [new file with mode: 0644]