Optimize popcount functions with ARM SVE intrinsics.
authorNathan Bossart <nathan@postgresql.org>
Fri, 28 Mar 2025 21:20:20 +0000 (16:20 -0500)
committerNathan Bossart <nathan@postgresql.org>
Fri, 28 Mar 2025 21:20:20 +0000 (16:20 -0500)
commit519338ace410d9b1ffb13176b8802b0307ff0531
treecef689c0b92e9678b1b5cf0110b0ba3a37c8ebe0
parent3c8e463b0d885e0d976f6a13a1fb78187b25c86f
Optimize popcount functions with ARM SVE intrinsics.

This commit introduces SVE implementations of pg_popcount{32,64}.
Unlike the Neon versions, we need an additional configure-time
check to determine if the compiler supports SVE intrinsics, and we
need a runtime check to determine if the current CPU supports SVE
instructions.  Our testing showed that the SVE implementations are
much faster for larger inputs and are comparable to the status
quo for smaller inputs.

Author: "Devanga.Susmitha@fujitsu.com" <Devanga.Susmitha@fujitsu.com>
Co-authored-by: "Chiranmoy.Bhattacharya@fujitsu.com" <Chiranmoy.Bhattacharya@fujitsu.com>
Co-authored-by: "Malladi, Rama" <ramamalladi@hotmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Discussion: https://postgr.es/m/010101936e4aaa70-b474ab9e-b9ce-474d-a3ba-a3dc223d295c-000000%40us-west-2.amazonses.com
Discussion: https://postgr.es/m/OSZPR01MB84990A9A02A3515C6E85A65B8B2A2%40OSZPR01MB8499.jpnprd01.prod.outlook.com
config/c-compiler.m4
configure
configure.ac
meson.build
src/include/pg_config.h.in
src/include/port/pg_bitutils.h
src/port/pg_popcount_aarch64.c