Describe the bug
I am experiencing a numerical stability issue when running ABACUS compiled with the AMD AOCC/AOCL toolchain. For a specific input file, the calculation fails during the SCF cycle with NaN (Not a Number) values appearing in the charge density, leading to a Charge_Mixing factorization error.
Crucially, the exact same input file and MPI configuration run successfully when using an ABACUS executable compiled with the standard GCC and OpenBLAS toolchain. This suggests the issue is related to subtle differences in floating-point behavior or optimization between the two compiler/library ecosystems.
The error message from the AOCC/AOCL build is:
...
charge before normalized = -nan
charge after normalized = -nan
...
Charge_Mixing warning : Error when factorizing beta.
=========Environment Details===========
ABACUS Version: [3.10.0-LTS]
CPU Architecture: AMD EPYC (Zen architecture)
Toolchain 1 (Fails):
C/C++ Compiler: AOCC 5.0.0 (clang)
Fortran Compiler: gfortran 11.5.0
MPI: OpenMPI 5.0.3 (compiled with AOCC)
Math Libraries: AOCL 5.0.0 (libblis, libflame, libscalapack)
ELPA: 2025.01.001 (compiled with AOCC/gfortran + AOCL)
Toolchain 2 (Succeeds):
Compiler: GCC 13.2.0
MPI: OpenMPI (compiled with GCC)
Math Libraries: OpenBLAS scalapack libxc fftw3 elpa (all building from source , compiled with GCC)
Expected behavior
No response
To Reproduce
No response
Environment
No response
Additional Context
No response
Task list for Issue attackers (only for developers)
Describe the bug
I am experiencing a numerical stability issue when running ABACUS compiled with the AMD AOCC/AOCL toolchain. For a specific input file, the calculation fails during the SCF cycle with NaN (Not a Number) values appearing in the charge density, leading to a Charge_Mixing factorization error.
Crucially, the exact same input file and MPI configuration run successfully when using an ABACUS executable compiled with the standard GCC and OpenBLAS toolchain. This suggests the issue is related to subtle differences in floating-point behavior or optimization between the two compiler/library ecosystems.
The error message from the AOCC/AOCL build is:
...
charge before normalized = -nan
charge after normalized = -nan
...
Charge_Mixing warning : Error when factorizing beta.
=========Environment Details===========
ABACUS Version: [3.10.0-LTS]
CPU Architecture: AMD EPYC (Zen architecture)
Toolchain 1 (Fails):
C/C++ Compiler: AOCC 5.0.0 (clang)
Fortran Compiler: gfortran 11.5.0
MPI: OpenMPI 5.0.3 (compiled with AOCC)
Math Libraries: AOCL 5.0.0 (libblis, libflame, libscalapack)
ELPA: 2025.01.001 (compiled with AOCC/gfortran + AOCL)
Toolchain 2 (Succeeds):
Compiler: GCC 13.2.0
MPI: OpenMPI (compiled with GCC)
Math Libraries: OpenBLAS scalapack libxc fftw3 elpa (all building from source , compiled with GCC)
Expected behavior
No response
To Reproduce
No response
Environment
No response
Additional Context
No response
Task list for Issue attackers (only for developers)