Describe the bug
During testing of GPU parallel efficiency, I observed that under large-scale tasks, enabling out_chg significantly increased the scf runtime compared to disabling it. The specific timing data is shown in the figure below.
Upon examining the screen output, it was observed that the final step of the SCF calculation took an unusually long time. After enabling the out_chg option, the SCF time on 12,800 random orbital samples using 64 GPUs yielded the results shown below.
================================================================
SELF-CONSISTENT:
DONE(17.2895 SEC) : INIT SCF
ITER ETOT/eV EDIFF/eV DRHO TIME/s
CG1 -2.15838992e+05 0.00000000e+00 1.6106e+02 10.73
CG2 -2.15787655e+05 5.13377187e+01 1.2265e+01 10.25
CG3 -2.15794070e+05 -6.41501186e+00 6.2074e-03 10.28
CG4 -2.15794082e+05 -1.26070361e-02 3.4884e-05 10.31
CG5 -2.15794082e+05 -5.26812500e-05 1.6591e-06 10.25
CG6 -2.15794082e+05 2.61236538e-06 2.1107e-09 174.86
After disabling the out_chg option, the SCF time on 12,800 random orbital samples using 64 GPUs yielded the results shown below.
================================================================
SELF-CONSISTENT:
DONE(17.3907 SEC) : INIT SCF
ITER ETOT/eV EDIFF/eV DRHO TIME/s
CG1 -2.15837842e+05 0.00000000e+00 1.6108e+02 10.68
CG2 -2.15786452e+05 5.13891162e+01 1.2267e+01 10.16
CG3 -2.15792861e+05 -6.40888612e+00 6.2055e-03 10.17
CG4 -2.15792874e+05 -1.26498832e-02 3.4873e-05 10.22
CG5 -2.15792874e+05 -4.03126567e-05 1.6631e-06 10.22
CG6 -2.15792874e+05 -7.28789952e-06 2.1194e-09 10.07
Expected behavior
To Reproduce
No response
Environment
No response
Additional Context
No response
Task list for Issue attackers (only for developers)
Describe the bug
During testing of GPU parallel efficiency, I observed that under large-scale tasks, enabling
out_chgsignificantly increased the scf runtime compared to disabling it. The specific timing data is shown in the figure below.Upon examining the screen output, it was observed that the final step of the SCF calculation took an unusually long time. After enabling the out_chg option, the SCF time on 12,800 random orbital samples using 64 GPUs yielded the results shown below.
================================================================
SELF-CONSISTENT:
DONE(17.2895 SEC) : INIT SCF
ITER ETOT/eV EDIFF/eV DRHO TIME/s
CG1 -2.15838992e+05 0.00000000e+00 1.6106e+02 10.73
CG2 -2.15787655e+05 5.13377187e+01 1.2265e+01 10.25
CG3 -2.15794070e+05 -6.41501186e+00 6.2074e-03 10.28
CG4 -2.15794082e+05 -1.26070361e-02 3.4884e-05 10.31
CG5 -2.15794082e+05 -5.26812500e-05 1.6591e-06 10.25
CG6 -2.15794082e+05 2.61236538e-06 2.1107e-09 174.86
After disabling the out_chg option, the SCF time on 12,800 random orbital samples using 64 GPUs yielded the results shown below.
================================================================
SELF-CONSISTENT:
DONE(17.3907 SEC) : INIT SCF
ITER ETOT/eV EDIFF/eV DRHO TIME/s
CG1 -2.15837842e+05 0.00000000e+00 1.6108e+02 10.68
CG2 -2.15786452e+05 5.13891162e+01 1.2267e+01 10.16
CG3 -2.15792861e+05 -6.40888612e+00 6.2055e-03 10.17
CG4 -2.15792874e+05 -1.26498832e-02 3.4873e-05 10.22
CG5 -2.15792874e+05 -4.03126567e-05 1.6631e-06 10.22
CG6 -2.15792874e+05 -7.28789952e-06 2.1194e-09 10.07
Expected behavior
To Reproduce
No response
Environment
No response
Additional Context
No response
Task list for Issue attackers (only for developers)