2022-07-09 cpu并行化指令集SIMD/AVX性能对比测试

目录

摘要:

测试环境:

操作系统:

cpu:

gcc版本:

gcc编译器优化级别设置:

测试代码:

测试思路:

一. 生成测试数据文件的代码:

二. 对数据文件进行字符匹配查找:

2.1 仅仅对比SIMD

2.2 对比AVX

makefile:

测试结果:

一. 对比SIMD

二. 对比AVX


摘要:

进行SIMD/AVX的cpu并行化指令集性能测试, 为后续优化提供数据依据

Intel® Intrinsics Guide

测试环境:

操作系统:

Linux fedora 5.17.5-300.fc36.x86_64 #1 SMP PREEMPT Thu Apr 28 15:51:30 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

cpu:

[root@fedora simd]# cat /proc/cpuinfo 
processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 25
model		: 80
model name	: AMD Ryzen 7 5800H with Radeon Graphics
stepping	: 0
microcode	: 0xffffffff
cpu MHz		: 3193.890
cache size	: 512 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 2
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero arat umip vaes vpclmulqdq rdpid overflow_recov succor fsrm
bugs		: fxsave_leak sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass
bogomips	: 6387.78
TLB size	: 2560 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 45 bits physical, 48 bits virtual
power management:

gcc版本:

[root@fedora simd]# gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/12/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,objc,obj-c++,ada,go,d,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --enable-libstdcxx-backtrace --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl=/builddir/build/BUILD/gcc-12.1.1-20220507/obj-x86_64-redhat-linux/isl-install --enable-offload-targets=nvptx-none --without-cuda-driver --enable-offload-defaulted --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux --with-build-config=bootstrap-lto --enable-link-serialization=1
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.1.1 20220507 (Red Hat 12.1.1-1) (GCC) 

gcc编译器优化级别设置:

  1.  所有程序编译都开启 -O2 级别优化

x86测试代码:

代码备份: 

simd.tar.gz-C++文档类资源-CSDN下载

测试思路:

思路是对一个数据文件中的匹配字符进行查找, 对比耗时, 单位毫秒:

  1.  直接遍历查找匹配字符,模拟现有数据库的遍历查询
  2. 使用SIMD指令集
  3. 使用AVX指令集

一. 生成测试数据文件的代码:

目的是生成一个数据文本

#include <iostream>
#include <random>
#include <fstream>

using namespace std;

void RandCharFile(const uint32_t file_len) {
    default_random_engine e;
    ofstream ofstr("./test_file");
    for (uint32_t i = 0; i < file_len; ++i) {
       ofstr << e() % 128;
    }
    ofstr.close();
}

int main() {
    RandCharFile(1024 * 1024 * 1024);

    return 0;
}

二. 对数据文件进行字符匹配查找:

2.1 仅仅对比SIMD

#include <iostream>
#include <x86intrin.h>
#include <fstream>
#include <chrono>

#define AVX512_FUNCTION_SPECIFIC_ATTRIBUTE __attribute__((target("sse,sse2,sse3,ssse3,sse4,popcnt,avx,avx2,avx512f")))
#define AVX2_FUNCTION_SPECIFIC_ATTRIBUTE __attribute__((target("sse,sse2,sse3,ssse3,sse4,popcnt,avx,avx2")))
#define AVX_FUNCTION_SPECIFIC_ATTRIBUTE __attribute__((target("sse,sse2,sse3,ssse3,sse4,popcnt,avx"))
#define SSE42_FUNCTION_SPECIFIC_ATTRIBUTE __attribute__((target("sse,sse2,sse3,ssse3,sse4,popcnt")))

using namespace std;

struct StringView {
    const char* p;
    const size_t len;
};

StringView FileSize(const char* fileName) {
    ifstream ifstr(fileName);
    const auto b = ifstr.tellg();
    ifstr.seekg(0, ios::end);
    const auto e = ifstr.tellg();
    const size_t fileSize = e - b;
    ifstr.seekg(0, ios::beg);
    char *p = new char[fileSize];
    ifstr.read(p, fileSize);
    return {p, fileSize};
}

// Normal function
size_t count_c_normal(const StringView& str, const uint8_t c) {
    uint32_t num = 0;
    for (uint32_t i = 0; i < str.len; ++i) {
        if (c == *(str.p + i)) {
            ++num;
        }
    }
    return num;
}

// SIMD function
AVX512_FUNCTION_SPECIFIC_ATTRIBUTE size_t count_c_simd(const StringView& str, const uint8_t c) {
    __m128i ch = _mm_set1_epi8(c); // char ch[16] = { c, c, ..., c }
    size_t cnt = 0;
    uint32_t i = 0;
    for (; i < str.len; i+=16) {
        // char t[16] = { (str+i)[0], (str+i)[1], ... }
        __m128i t = _mm_loadu_si128((__m128i *)(str.p + i));
        __m128i res = _mm_cmpeq_epi8(t, ch);

        // res[16] = { 0xFF, 0x00, 0xFF ... }
        unsigned mask = _mm_movemask_epi8(res);

        // bits[16] = 0...1101
        cnt += __builtin_popcount(mask);
    }

    // free cnt .
    for (; i < str.len; ++i) {
        if (c == *(str.p + i))
        {
            ++cnt;
        }
    }
    return cnt;
}

int main() {
    const auto ret = FileSize("./test_file");
    size_t cnt1 = 0, cnt2 = 0;
    
    const auto t1 = std::chrono::steady_clock::now();
    cnt1 = count_c_normal(ret, uint8_t('1'));

    const auto t2 = std::chrono::steady_clock::now();
    cnt2 = count_c_simd(ret, uint8_t('1'));
    
    const auto t3 = std::chrono::steady_clock::now();

    std::cout << "cnt1: " << cnt1 << std::endl;
    std::cout << "cnt2: " << cnt2 << std::endl;

    std::cout << "-----------" << std::endl;

    const auto d1 = std::chrono::duration_cast<std::chrono::milliseconds>(t2-t1).count();
    const auto d2 = std::chrono::duration_cast<std::chrono::milliseconds>(t3-t2).count();
    
    std::cout << "NORMAL: " << d1 << std::endl;
    std::cout << "SIMD: " << d2 << std::endl;
    
    return 0;
}

2.2 对比AVX

#include <iostream>
#include <x86intrin.h>
#include <fstream>
#include <chrono>


#define AVX512_FUNCTION_SPECIFIC_ATTRIBUTE __attribute__((target("sse,sse2,sse3,ssse3,sse4,popcnt,avx,avx2,avx512f")))
#define AVX2_FUNCTION_SPECIFIC_ATTRIBUTE __attribute__((target("sse,sse2,sse3,ssse3,sse4,popcnt,avx,avx2")))
#define AVX_FUNCTION_SPECIFIC_ATTRIBUTE __attribute__((target("sse,sse2,sse3,ssse3,sse4,popcnt,avx"))
#define SSE42_FUNCTION_SPECIFIC_ATTRIBUTE __attribute__((target("sse,sse2,sse3,ssse3,sse4,popcnt")))

using namespace std;

struct StringView {
    const char* p;
    const size_t len;
};

StringView FileSize(const char* fileName) {
    ifstream ifstr(fileName);
    const auto b = ifstr.tellg();
    ifstr.seekg(0, ios::end);
    const auto e = ifstr.tellg();
    const size_t fileSize = e - b;
    ifstr.seekg(0, ios::beg);
    char *p = new char[fileSize];
    ifstr.read(p, fileSize);
    return {p, fileSize};
}

// Normal function
size_t count_c_normal(const StringView& str, const uint8_t c) {
    uint32_t num = 0;
    for (uint32_t i = 0; i < str.len; ++i) {
        if (c == *(str.p + i)) {
            ++num;
        }
    }
    return num;
}

// SIMD function
AVX512_FUNCTION_SPECIFIC_ATTRIBUTE size_t count_c_simd(const StringView& str, const uint8_t c) {
    __m128i ch = _mm_set1_epi8(c); // char ch[16] = { c, c, ..., c }
    size_t cnt = 0;
    uint32_t i = 0;
    for (; i < str.len; i+=16) {
        // char t[16] = { (str+i)[0], (str+i)[1], ... }
        __m128i t = _mm_loadu_si128((__m128i *)(str.p + i));
        __m128i res = _mm_cmpeq_epi8(t, ch);

        // res[16] = { 0xFF, 0x00, 0xFF ... }
        unsigned mask = _mm_movemask_epi8(res);

        // bits[16] = 0...1101
        cnt += __builtin_popcount(mask);
    }

    // free cnt .
    for (; i < str.len; ++i) {
        if (c == *(str.p + i))
        {
            ++cnt;
        }
    }
    return cnt;
}

// AVX function
AVX2_FUNCTION_SPECIFIC_ATTRIBUTE size_t count_c_avx256(const StringView& str, const uint8_t c) {
    __m256i ch = _mm256_set1_epi8(c); // char ch[16] = { c, c, ..., c }
    size_t cnt = 0;
    uint32_t i = 0;
    for (; i < str.len; i+=32) {
        // char t[16] = { (str+i)[0], (str+i)[1], ... }
        __m256i t = _mm256_loadu_si256((__m256i *)(str.p + i));
        __m256i res = _mm256_cmpeq_epi8(t, ch);

        // res[16] = { 0xFF, 0x00, 0xFF ... }
        unsigned mask = _mm256_movemask_epi8(res);

        // bits[16] = 0...1101
        cnt += __builtin_popcount(mask);
    }

    // free cnt .
    for (; i < str.len; ++i) {
        if (c == *(str.p + i))
        {
            ++cnt;
        }
    }
    return cnt;
}

int main() {
    const auto ret = FileSize("./test_file");
    size_t cnt1 = 0, cnt2 = 0, cnt3 = 0;
    const auto t1 = std::chrono::steady_clock::now();
    cnt1 = count_c_normal(ret, uint8_t('1'));
    const auto t2 = std::chrono::steady_clock::now();
    cnt2 = count_c_simd(ret, uint8_t('1'));
    const auto t3 = std::chrono::steady_clock::now();
    cnt3 = count_c_avx256(ret, uint8_t('1'));
    const auto t4 = std::chrono::steady_clock::now();

    std::cout << "cnt1: " << cnt1 << std::endl;
    std::cout << "cnt2: " << cnt2 << std::endl;
    std::cout << "cnt3: " << cnt3 << std::endl;

    std::cout << "-----------" << std::endl;

    const auto d1 = std::chrono::duration_cast<std::chrono::milliseconds>(t2-t1).count();
    const auto d2 = std::chrono::duration_cast<std::chrono::milliseconds>(t3-t2).count();
    const auto d3 = std::chrono::duration_cast<std::chrono::milliseconds>(t4-t3).count();

    std::cout << "NORMAL: " << d1 << std::endl;
    std::cout << "SIMD: " << d2 << std::endl;
    std::cout << "AVX: " << d3 << std::endl;
        
    return 0;
}

makefile:


all:
	make t01
	make t02
	make t03
	make t04

t01:
	g++ -ggdb3 -o t01 -mavx512f -O2 t01.cc

t02:
	g++ -ggdb3 -o t02 -mavx512f -O2 t02.cc

t03:
	g++ -ggdb3 -o t03 -mavx512f -O0 t03.cc

t04:
	g++ -ggdb3 -o t04 -mavx -mavx2 -O2 t04.cc


clean:
	rm -rf t01 t02 t03 t04

测试结果:

一. 对比SIMD

[root@fedora simd]# ./t02
cnt1: 511699574
cnt2: 511699574
-----------
NORMAL: 4569
SIMD: 113

二. 对比AVX

[root@fedora simd]# ./t04
cnt1: 511699574
cnt2: 511699574
cnt3: 511699574
-----------
NORMAL: 4536
SIMD: 114
AVX: 97

ARM测试代码:

/*
* Copyright (C) Arm Limited, 2019 All rights reserved.
*
* The example code is provided to you as an aid to learning when working
* with Arm-based technology, including but not limited to programming tutorials.
* Arm hereby grants to you, subject to the terms and conditions of this Licence,
* a non-exclusive, non-transferable, non-sub-licensable, free-of-charge licence,
* to use and copy the Software solely for the purpose of demonstration and
* evaluation.
*
* You accept that the Software has not been tested by Arm therefore the Software
* is provided "as is", without warranty of any kind, express or implied. In no
* event shall the authors or copyright holders be liable for any claim, damages
* or other liability, whether in action or contract, tort or otherwise, arising
* from, out of or in connection with the Software or the use of Software.
*/
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <stdbool.h>
#include <math.h>
#include <sys/time.h>

#include <arm_neon.h>

#define BLOCK_SIZE 4
void matrix_multiply_c(float32_t *A, float32_t *B, float32_t *C, uint32_t n, uint32_t m, uint32_t k) {
  for (int i_idx=0; i_idx<n; i_idx++) {
    for (int j_idx=0; j_idx<m; j_idx++) {
      C[n*j_idx + i_idx] = 0;
      for (int k_idx=0; k_idx<k; k_idx++) {
        C[n*j_idx + i_idx] += A[n*k_idx + i_idx]*B[k*j_idx + k_idx];
      }
    }
  }
}
void matrix_multiply_neon(float32_t  *A, float32_t  *B, float32_t *C, uint32_t n, uint32_t m, uint32_t k) {
  /*
  * Multiply matrices A and B, store the result in C.
  * It is the user's responsibility to make sure the matrices are compatible.
  */
  int A_idx;
  int B_idx;
  int C_idx;
  // these are the columns of a 4x4 sub matrix of A
  float32x4_t A0;
  float32x4_t A1;
  float32x4_t A2;
  float32x4_t A3;
  // these are the columns of a 4x4 sub matrix of B
  float32x4_t B0;
  float32x4_t B1;
  float32x4_t B2;
  float32x4_t B3;
  // these are the columns of a 4x4 sub matrix of C
  float32x4_t C0;
  float32x4_t C1;
  float32x4_t C2;
  float32x4_t C3;
  for (int i_idx=0; i_idx<n; i_idx+=4) {
    for (int j_idx=0; j_idx<m; j_idx+=4) {
      // Zero accumulators before matrix op
      C0 = vmovq_n_f32(0);
      C1 = vmovq_n_f32(0);
      C2 = vmovq_n_f32(0);
      C3 = vmovq_n_f32(0);
      for (int k_idx=0; k_idx<k; k_idx+=4) {
        // Compute base index to 4x4 block
        A_idx = i_idx + n*k_idx;
        B_idx = k*j_idx + k_idx;
        // Load most current A values in row
        A0 = vld1q_f32(A+A_idx);
        A1 = vld1q_f32(A+A_idx+n);
        A2 = vld1q_f32(A+A_idx+2*n);
        A3 = vld1q_f32(A+A_idx+3*n);
        // multiply-accumulate in 4x1 blocks, i.e. each column in C
        B0 = vld1q_f32(B+B_idx);
        C0 = vfmaq_laneq_f32(C0, A0, B0, 0);
        C0 = vfmaq_laneq_f32(C0, A1, B0, 1);
        C0 = vfmaq_laneq_f32(C0, A2, B0, 2);
        C0 = vfmaq_laneq_f32(C0, A3, B0, 3);
        B1 = vld1q_f32(B+B_idx+k);
        C1 = vfmaq_laneq_f32(C1, A0, B1, 0);
        C1 = vfmaq_laneq_f32(C1, A1, B1, 1);
        C1 = vfmaq_laneq_f32(C1, A2, B1, 2);
        C1 = vfmaq_laneq_f32(C1, A3, B1, 3);
        B2 = vld1q_f32(B+B_idx+2*k);
        C2 = vfmaq_laneq_f32(C2, A0, B2, 0);
        C2 = vfmaq_laneq_f32(C2, A1, B2, 1);
        C2 = vfmaq_laneq_f32(C2, A2, B2, 2);
        C2 = vfmaq_laneq_f32(C2, A3, B2, 3);
        B3 = vld1q_f32(B+B_idx+3*k);
        C3 = vfmaq_laneq_f32(C3, A0, B3, 0);
        C3 = vfmaq_laneq_f32(C3, A1, B3, 1);
        C3 = vfmaq_laneq_f32(C3, A2, B3, 2);
        C3 = vfmaq_laneq_f32(C3, A3, B3, 3);
      }
      // Compute base index for stores
      C_idx = n*j_idx + i_idx;
      vst1q_f32(C+C_idx, C0);
      vst1q_f32(C+C_idx+n, C1);
      vst1q_f32(C+C_idx+2*n, C2);
      vst1q_f32(C+C_idx+3*n, C3);
    }
  }
}
void matrix_multiply_4x4_neon(float32_t *A, float32_t *B, float32_t *C) {
  // these are the columns A
  float32x4_t A0;
  float32x4_t A1;
  float32x4_t A2;
  float32x4_t A3;
  // these are the columns B
  float32x4_t B0;
  float32x4_t B1;
  float32x4_t B2;
  float32x4_t B3;
  // these are the columns C
  float32x4_t C0;
  float32x4_t C1;
  float32x4_t C2;
  float32x4_t C3;
  A0 = vld1q_f32(A);
  A1 = vld1q_f32(A+4);
  A2 = vld1q_f32(A+8);
  A3 = vld1q_f32(A+12);
  // Zero accumulators for C values
  C0 = vmovq_n_f32(0);
  C1 = vmovq_n_f32(0);
  C2 = vmovq_n_f32(0);
  C3 = vmovq_n_f32(0);
  // multiply-accumulate in 4x1 blocks, i.e. each column in C
  B0 = vld1q_f32(B);
  C0 = vfmaq_laneq_f32(C0, A0, B0, 0);
  C0 = vfmaq_laneq_f32(C0, A1, B0, 1);
  C0 = vfmaq_laneq_f32(C0, A2, B0, 2);
  C0 = vfmaq_laneq_f32(C0, A3, B0, 3);
  vst1q_f32(C, C0);
  B1 = vld1q_f32(B+4);
  C1 = vfmaq_laneq_f32(C1, A0, B1, 0);
  C1 = vfmaq_laneq_f32(C1, A1, B1, 1);
  C1 = vfmaq_laneq_f32(C1, A2, B1, 2);
  C1 = vfmaq_laneq_f32(C1, A3, B1, 3);
  vst1q_f32(C+4, C1);
  B2 = vld1q_f32(B+8);
  C2 = vfmaq_laneq_f32(C2, A0, B2, 0);
  C2 = vfmaq_laneq_f32(C2, A1, B2, 1);
  C2 = vfmaq_laneq_f32(C2, A2, B2, 2);
  C2 = vfmaq_laneq_f32(C2, A3, B2, 3);
  vst1q_f32(C+8, C2);
  B3 = vld1q_f32(B+12);
  C3 = vfmaq_laneq_f32(C3, A0, B3, 0);
  C3 = vfmaq_laneq_f32(C3, A1, B3, 1);
  C3 = vfmaq_laneq_f32(C3, A2, B3, 2);
  C3 = vfmaq_laneq_f32(C3, A3, B3, 3);
  vst1q_f32(C+12, C3);
}
void print_matrix(float32_t *M, uint32_t cols, uint32_t rows) {
  for (int i=0; i<rows; i++) {
    for (int j=0; j<cols; j++) {
      printf("%f ", M[j*rows + i]);
    }
    printf("\n");
  }
  printf("\n");
}
void matrix_init_rand(float32_t *M, uint32_t numvals) {
  for (int i=0; i<numvals; i++) {
    M[i] = (float)rand()/(float)(RAND_MAX);
  }
}
void matrix_init(float32_t *M, uint32_t cols, uint32_t rows, float32_t val) {
  for (int i=0; i<rows; i++) {
    for (int j=0; j<cols; j++) {
      M[j*rows + i] = val;
    }
  }
}
bool f32comp_noteq(float32_t a, float32_t b) {
  if (fabs(a-b) < 0.000001) {
    return false;
  }
  return true;
}
bool matrix_comp(float32_t *A, float32_t *B, uint32_t rows, uint32_t cols) {
  float32_t a;
  float32_t b;
  for (int i=0; i<rows; i++) {
    for (int j=0; j<cols; j++) {
      a = A[rows*j + i];
      b = B[rows*j + i];
      if (f32comp_noteq(a, b)) {
        printf("i=%d, j=%d, A=%f, B=%f\n", i, j, a, b);
        return false;
      }
    }
  }
  return true;
}
int main() {

  struct timeval tv;
  long long start_us = 0, end_us = 0;

  uint32_t n = 2*BLOCK_SIZE; // rows in A
  uint32_t m = 2*BLOCK_SIZE; // cols in B
  uint32_t k = 2*BLOCK_SIZE; // cols in a and rows in b
  float32_t A[n*k];
  float32_t B[k*m];
  float32_t C[n*m];
  float32_t D[n*m];
  float32_t E[n*m];
  bool c_eq_asm;
  bool c_eq_neon;
  matrix_init_rand(A, n*k);
  matrix_init_rand(B, k*m);
  matrix_init(C, n, m, 0);
  print_matrix(A, k, n);
  print_matrix(B, m, k);
  //print_matrix(C, n, m);

  gettimeofday(&tv, NULL);
  start_us = tv.tv_sec + tv.tv_usec;

  for (int i=0; i<10000;++i) {

    matrix_multiply_c(A, B, E, n, m, k);
  }

  end_us = tv.tv_sec + tv.tv_usec;
  printf("c time %lld us\n", end_us - start_us);

  printf("C\n");
  print_matrix(E, n, m);



  printf("===============================\n");

  gettimeofday(&tv, NULL);
  start_us = tv.tv_sec + tv.tv_usec;

  for (int i=0; i<10000;++i) {

    matrix_multiply_neon(A, B, D, n, m, k);

  }

  gettimeofday(&tv, NULL);
  end_us = tv.tv_sec + tv.tv_usec;
  printf("neon time %lld us\n", end_us - start_us);

  printf("Neon\n");
  print_matrix(D, n, m);
  c_eq_neon = matrix_comp(E, D, n, m);
  printf("Neon equal to C? %d\n", c_eq_neon);
  printf("===============================\n");
}

[root@ecs-65bd /root/work/simd]$scl enable devtoolset-7 bash
[root@ecs-65bd simd]# 
[root@ecs-65bd simd]# make t06
gcc  -std=c99  -ggdb3 -o t06  -O3 t06.c

[root@ecs-65bd simd]# ./t06
0.840188 0.277775 0.635712 0.156679 0.612640 0.526745 0.069755 0.064171 
0.394383 0.553970 0.717297 0.400944 0.296032 0.769914 0.949327 0.020023 
0.783099 0.477397 0.141603 0.129790 0.637552 0.400229 0.525995 0.457702 
0.798440 0.628871 0.606969 0.108809 0.524287 0.891529 0.086056 0.063096 
0.911647 0.364784 0.016301 0.998924 0.493583 0.283315 0.192214 0.238280 
0.197551 0.513401 0.242887 0.218257 0.972775 0.352458 0.663227 0.970634 
0.335223 0.952230 0.137232 0.512932 0.292517 0.807725 0.890233 0.902208 
0.768230 0.916195 0.804177 0.839112 0.771358 0.919026 0.348893 0.850920 

0.266666 0.437638 0.687861 0.350360 0.398437 0.147660 0.447034 0.103171 
0.539760 0.931835 0.165974 0.686670 0.814767 0.881062 0.226107 0.126075 
0.375207 0.930810 0.440105 0.956468 0.684219 0.641081 0.187533 0.495444 
0.760249 0.720952 0.880075 0.588640 0.910972 0.431953 0.276235 0.760475 
0.512535 0.284293 0.829201 0.657304 0.482491 0.619596 0.556444 0.984752 
0.667724 0.738534 0.330337 0.858676 0.215825 0.281059 0.416501 0.935004 
0.531606 0.639979 0.228968 0.439560 0.950252 0.786002 0.169607 0.684445 
0.039280 0.354049 0.893372 0.923970 0.920128 0.307458 0.906804 0.383188 

c time 0 us
C
1.436942 1.961773 1.797011 2.130322 1.673388 1.446213 1.231207 1.723954 
2.149403 2.912940 1.766827 2.732131 2.694053 2.331492 1.211395 2.439645 
1.509920 1.988456 1.984648 2.230861 2.231070 1.744762 1.546138 1.847287 
1.775054 2.463739 1.821799 2.562441 1.930141 1.770547 1.377876 1.978081 
1.759245 2.031175 2.333687 2.045852 2.282789 1.507743 1.410338 1.881603 
1.711485 2.253254 2.462129 2.912960 2.930792 2.253012 1.990992 2.484794 
2.242769 3.100412 2.419667 3.315276 3.462130 2.583389 2.000890 2.611014 
2.866955 3.966020 3.556177 4.397132 4.052230 3.070745 2.575969 3.415067 

===============================
neon time 495 us
Neon
1.436942 1.961773 1.797011 2.130322 1.673388 1.446213 1.231207 1.723954 
2.149403 2.912940 1.766827 2.732131 2.694053 2.331492 1.211395 2.439645 
1.509921 1.988456 1.984648 2.230861 2.231070 1.744762 1.546138 1.847287 
1.775054 2.463739 1.821799 2.562441 1.930141 1.770547 1.377876 1.978081 
1.759245 2.031175 2.333687 2.045852 2.282789 1.507743 1.410338 1.881602 
1.711485 2.253254 2.462129 2.912960 2.930792 2.253012 1.990992 2.484794 
2.242769 3.100412 2.419667 3.315276 3.462130 2.583389 2.000890 2.611014 
2.866955 3.966020 3.556177 4.397132 4.052230 3.070745 2.575969 3.415067 

Neon equal to C? 1
===============================

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

悟世者

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值