Home » Notes

SSE/AVX/AVX2/AVX512

June 28, 2020 · 1 min · Gray King

Table of Contents

AVX512 VNNI

tags: Computer Systems,C/C++,优化,High Performance

部分 intel CPU 支持向量指令集同时进行多路整数和浮点数计算，以此来进行对相关算法进行优化，这里整理相关链接：

编译器支持相关封装避免编写汇编代码，官方指南：Intrinsics Guide
基于 sse_mathfun 的 avx_mathfun 封装相关宏和函数
mp3 库 lame 中的 SSE 加速实现 libmp3lame/vector/xmm_quantize_sub.c

AVX512 VNNI

https://en.wikichip.org/wiki/x86/avx512_vnni

Links to this note

Speeding up Cython with SIMD

tags: Python, SIMD, High Performance source: Turner-Trauring, Itamar. “Speeding up Cython with SIMD.” Python⇒Speed, October 18, 2023. https://pythonspeed.com/articles/faster-cython-simd/.

Fast bitset decoding using Intel AVX-512

tags: High Performance,SIMD source: Lemire, Author Daniel. “Fast Bitset Decoding Using Intel AVX-512.” Daniel Lemire’s Blog (blog). Accessed May 12, 2022. https://lemire.me/blog/2022/05/06/fast-bitset-decoding-using-intel-avx-512/.

Luhn algorithm using SWAR and SIMD

tags: SIMD,High Performance source: “Luhn Algorithm Using SWAR and SIMD.” Accessed May 5, 2022. https://nullprogram.com/blog/2022/04/30/. 3x increase after used SIMD.

Removing characters from strings faster with AVX-512

tags: SSE/AVX/AVX2/AVX512,High Performance source: Lemire, Author Daniel. “Removing Characters from Strings Faster with AVX-512.” Daniel Lemire’s Blog (blog). Accessed May 5, 2022. https://lemire.me/blog/2022/04/28/removing-characters-from-strings-faster-with-avx-512/. It’s 21.25 times faster with AVX-152: 0.4 GB/s to 8.5 GB/s.

优化

AVX512

AVX512 VNNI#

Links to this note

AVX512 VNNI