- tags: Computer Systems
High Performance
Links to this note
BPF
tags: High Performance, Linux
Speeding up Cython with SIMD
tags: Python, SIMD, High Performance source: Turner-Trauring, Itamar. “Speeding up Cython with SIMD.” Python⇒Speed, October 18, 2023. https://pythonspeed.com/articles/faster-cython-simd/.
Overhead of Python asyncio Tasks: 260K/s
tags: Python, High Performance source: Textual Documentation. “Textual - Overhead of Python Asyncio Tasks,” March 8, 2023. https://textual.textualize.io/blog/2023/03/08/overhead-of-python-asyncio-tasks/. Tasks of asyncio from create to run, then shutdown is about: 260K tasks per second.
Fast bitset decoding using Intel AVX-512
tags: High Performance,SIMD source: Lemire, Author Daniel. “Fast Bitset Decoding Using Intel AVX-512.” Daniel Lemire’s Blog (blog). Accessed May 12, 2022. https://lemire.me/blog/2022/05/06/fast-bitset-decoding-using-intel-avx-512/.
Luhn algorithm using SWAR and SIMD
tags: SIMD,High Performance source: “Luhn Algorithm Using SWAR and SIMD.” Accessed May 5, 2022. https://nullprogram.com/blog/2022/04/30/. 3x increase after used SIMD.
Removing characters from strings faster with AVX-512
tags: SSE/AVX/AVX2/AVX512,High Performance source: Lemire, Author Daniel. “Removing Characters from Strings Faster with AVX-512.” Daniel Lemire’s Blog (blog). Accessed May 5, 2022. https://lemire.me/blog/2022/04/28/removing-characters-from-strings-faster-with-avx-512/. It’s 21.25 times faster with AVX-152: 0.4 GB/s to 8.5 GB/s.
NUMA
tags: High Performance source: https://en.wikipedia.org/wiki/Non-uniform_memory_access
Multi-queue NICs
tags: Linux,High Performance,Network,ethtool source: The Cloudflare Blog. “How to Receive a Million Packets per Second,” June 16, 2015. http://blog.cloudflare.com/how-to-receive-a-million-packets/. What are Multi-queue NICs RX queue was used to pass packets between hardware and kernel. Now days NICs support multiple RX queues: Each RX queue is pinned to a separate CPU. Multi-queue hashing algorithms Use a hash from packet to decide the RX queue number. The hash is usually counted from a tuple (src IP, dst IP, src port, dst port)....
How to receive a million packets per second
tags: Network,UDP,High Performance,iptables,ethtool,netstat,NUMA source: The Cloudflare Blog. “How to Receive a Million Packets per Second,” June 16, 2015. http://blog.cloudflare.com/how-to-receive-a-million-packets/. Keys: Make sure traffic won’t be interfered with by the iptables iptables -I INPUT 1 -p udp --dport 4321 -j ACCEPT iptables -t raw -I PREROUTING 1 -p udp --dport 4321 -j NOTRACK #+end_src[[id:C471A6FF-7F4E-4E23-B070-14CE146BFA14][Multi-queue NICs]] 2. The first bottleneck + All packets are received by a signal RX queue, checked out with =ethtool -S=....
Deserializing JSON really fast
tags: Rust,优化,High Performance source: https://blog.datalust.co/deserializing-json-really-fast/
High Performance Browser Networking
tags: 计划读的书,HTTP,High Performance,Network 在线: https://hpbn.co/ source: Grigorik, Ilya. High-Performance Browser Networking. Beijing ; Sebastopol, CA: O’Reilly, 2013. “Good developers know how things work. Great developers know why things work.”
SSE/AVX/AVX2/AVX512
tags: Computer Systems,C/C++,优化,High Performance 部分 intel CPU 支持向量指令集同时进行多路整数和浮点数计算,以此来进行对相关算法进行优化,这里整理相关链接: 编译器支持相关封装避免编写汇编代码,官方指南:Intrinsics Guide 基于 sse_mathfun 的 avx_mathfun 封装相关宏和函数 mp3 库 lame 中的 SSE 加速实现 libmp3lame/vector/xmm_quantize_sub.c AVX512 VNNI https://en.wikichip.org/wiki/x86/avx512_vnni