Taking Smart Notes With Org-mode
  • About
  • Articles
  • Notes
  • Search
Home » Projects

High Performance

January 10, 2022 · 1 min · Gray King
  • tags: Computer Systems

Links to this note


    BPF

    tags: High Performance, Linux

    October 31, 2023 · 1 min · Gray King

    Speeding up Cython with SIMD

    tags: Python, SIMD, High Performance source: Turner-Trauring, Itamar. “Speeding up Cython with SIMD.” Python⇒Speed, October 18, 2023. https://pythonspeed.com/articles/faster-cython-simd/.

    October 31, 2023 · 1 min · Gray King

    Overhead of Python asyncio Tasks: 260K/s

    tags: Python, High Performance source: Textual Documentation. “Textual - Overhead of Python Asyncio Tasks,” March 8, 2023. https://textual.textualize.io/blog/2023/03/08/overhead-of-python-asyncio-tasks/. Tasks of asyncio from create to run, then shutdown is about: 260K tasks per second.

    March 9, 2023 · 1 min · Gray King

    Fast bitset decoding using Intel AVX-512

    tags: High Performance,SIMD source: Lemire, Author Daniel. “Fast Bitset Decoding Using Intel AVX-512.” Daniel Lemire’s Blog (blog). Accessed May 12, 2022. https://lemire.me/blog/2022/05/06/fast-bitset-decoding-using-intel-avx-512/.

    May 12, 2022 · 1 min · Gray King

    Luhn algorithm using SWAR and SIMD

    tags: SIMD,High Performance source: “Luhn Algorithm Using SWAR and SIMD.” Accessed May 5, 2022. https://nullprogram.com/blog/2022/04/30/. 3x increase after used SIMD.

    May 5, 2022 · 1 min · Gray King

    Removing characters from strings faster with AVX-512

    tags: SSE/AVX/AVX2/AVX512,High Performance source: Lemire, Author Daniel. “Removing Characters from Strings Faster with AVX-512.” Daniel Lemire’s Blog (blog). Accessed May 5, 2022. https://lemire.me/blog/2022/04/28/removing-characters-from-strings-faster-with-avx-512/. It’s 21.25 times faster with AVX-152: 0.4 GB/s to 8.5 GB/s.

    May 5, 2022 · 1 min · Gray King

    NUMA

    tags: High Performance source: https://en.wikipedia.org/wiki/Non-uniform_memory_access

    April 16, 2022 · 1 min · Gray King

    Multi-queue NICs

    tags: Linux,High Performance,Network,ethtool source: The Cloudflare Blog. “How to Receive a Million Packets per Second,” June 16, 2015. http://blog.cloudflare.com/how-to-receive-a-million-packets/. What are Multi-queue NICs RX queue was used to pass packets between hardware and kernel. Now days NICs support multiple RX queues: Each RX queue is pinned to a separate CPU. Multi-queue hashing algorithms Use a hash from packet to decide the RX queue number. The hash is usually counted from a tuple (src IP, dst IP, src port, dst port). This guarantees that packets for a single flow will always end up on exactly the same RX queue, and reordering of packets within a single flow can’t happen. ...

    April 16, 2022 · 1 min · Gray King

    How to receive a million packets per second

    tags: Network,UDP,High Performance,iptables,ethtool,netstat,NUMA source: The Cloudflare Blog. “How to Receive a Million Packets per Second,” June 16, 2015. http://blog.cloudflare.com/how-to-receive-a-million-packets/. Keys: Make sure traffic won’t be interfered with by the iptables iptables -I INPUT 1 -p udp --dport 4321 -j ACCEPT iptables -t raw -I PREROUTING 1 -p udp --dport 4321 -j NOTRACK #+end_src[[id:C471A6FF-7F4E-4E23-B070-14CE146BFA14][Multi-queue NICs]] 2. The first bottleneck ​ + All packets are received by a signal RX queue, checked out with =ethtool -S=. ​ + How to solve: according to [[id:C471A6FF-7F4E-4E23-B070-14CE146BFA14][Multi-queue NICs]], change the hash algorithm with =ethtool=: #+begin_src bash ethtool -N eth2 rx-flow-hash udp4 sdfn Multiple threads with NUMA, and with multiple receiver ips to fit in multi-queue hash algorithm. Also note that there is a lock contention on the UDP receive buffer side, see Rivera, Diego, Eduardo Acha, Jose Piquer, and Javier Bustos-Jimenez. “Analysis of Linux UDP Sockets Concurrent Performance.” In 2014 33rd International Conference of the Chilean Computer Science Society (SCCC), 65–69. Talca: IEEE, 2014. https://doi.org/10.1109/SCCC.2014.8. ...

    April 14, 2022 · 2 min · Gray King

    Deserializing JSON really fast

    tags: Rust,优化,High Performance source: https://blog.datalust.co/deserializing-json-really-fast/

    January 4, 2022 · 1 min · Gray King

    High Performance Browser Networking

    tags: 计划读的书,HTTP,High Performance,Network 在线: https://hpbn.co/ source: Grigorik, Ilya. High-Performance Browser Networking. Beijing ; Sebastopol, CA: O’Reilly, 2013. “Good developers know how things work. Great developers know why things work.”

    August 13, 2021 · 1 min · Gray King

    SSE/AVX/AVX2/AVX512

    tags: Computer Systems,C/C++,优化,High Performance 部分 intel CPU 支持向量指令集同时进行多路整数和浮点数计算,以此来进行对相关算法进行优化,这里整理相关链接: 编译器支持相关封装避免编写汇编代码,官方指南:Intrinsics Guide 基于 sse_mathfun 的 avx_mathfun 封装相关宏和函数 mp3 库 lame 中的 SSE 加速实现 libmp3lame/vector/xmm_quantize_sub.c AVX512 VNNI https://en.wikichip.org/wiki/x86/avx512_vnni

    June 28, 2020 · 1 min · Gray King
© 2025 Taking Smart Notes With Org-mode · Powered by Hugo & PaperMod