部分 intel CPU 支持向量指令集同时进行多路整数和浮点数计算,以此来进行对相关算法进行优化,这里整理相关链接:
- 编译器支持相关封装避免编写汇编代码,官方指南:Intrinsics Guide
- 基于 sse_mathfun 的 avx_mathfun 封装相关宏和函数
- mp3 库 lame 中的 SSE 加速实现 libmp3lame/vector/xmm_quantize_sub.c
部分 intel CPU 支持向量指令集同时进行多路整数和浮点数计算,以此来进行对相关算法进行优化,这里整理相关链接:
tags: Python, SIMD, High Performance source: Turner-Trauring, Itamar. “Speeding up Cython with SIMD.” Python⇒Speed, October 18, 2023. https://pythonspeed.com/articles/faster-cython-simd/.
tags: High Performance,SIMD source: Lemire, Author Daniel. “Fast Bitset Decoding Using Intel AVX-512.” Daniel Lemire’s Blog (blog). Accessed May 12, 2022. https://lemire.me/blog/2022/05/06/fast-bitset-decoding-using-intel-avx-512/.
tags: SIMD,High Performance source: “Luhn Algorithm Using SWAR and SIMD.” Accessed May 5, 2022. https://nullprogram.com/blog/2022/04/30/. 3x increase after used SIMD.
tags: SSE/AVX/AVX2/AVX512,High Performance source: Lemire, Author Daniel. “Removing Characters from Strings Faster with AVX-512.” Daniel Lemire’s Blog (blog). Accessed May 5, 2022. https://lemire.me/blog/2022/04/28/removing-characters-from-strings-faster-with-avx-512/. It’s 21.25 times faster with AVX-152: 0.4 GB/s to 8.5 GB/s.
AVX512