From d33de297b28ecc2595ecbde6d53cedce11eecadf Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Tue, 28 Oct 2025 20:21:12 +0000 Subject: [PATCH 1/6] Initial plan From ba6d0821ca897b11eb26db77588e3fcc7f9f080a Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Tue, 28 Oct 2025 20:30:22 +0000 Subject: [PATCH 2/6] Initial commit: Plan for ARM migration Co-authored-by: JoeStech <4088382+JoeStech@users.noreply.github.com> --- invocation_reasons.yaml | 38 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) create mode 100644 invocation_reasons.yaml diff --git a/invocation_reasons.yaml b/invocation_reasons.yaml new file mode 100644 index 0000000..077d440 --- /dev/null +++ b/invocation_reasons.yaml @@ -0,0 +1,38 @@ +--- +id: eccf8992-d5d0-498b-ad8c-c5465015f9ff +timestamp: '2025-10-28T20:29:17.393715+00:00' +tool: check_image +args: + image: ubuntu:22.04 +reason: Checking if the ubuntu:22.04 base image in the Dockerfile supports ARM architecture +--- +id: 52295671-de6f-4a9e-a0ac-04ef83bd8029 +timestamp: '2025-10-28T20:29:36.859509+00:00' +tool: migrate_ease_scan +args: + scanner: cpp + arch: armv8-a + git_repo: null + output_format: json + extra_args: null +reason: Scanning the C++ codebase in /workspace for x86-specific code that needs to + be migrated to ARM, including SSE2 intrinsics and architecture-specific optimizations +--- +id: 2aa34fa2-957e-4aaf-9a50-be7a03830cf6 +timestamp: '2025-10-28T20:29:53.389796+00:00' +tool: knowledge_base_search +args: + query: How to convert SSE2 intrinsics to ARM NEON intrinsics for matrix multiplication, + hashing, string search, memory copy operations +reason: Looking for documentation on converting x86 SSE2 intrinsics (_mm_loadu_pd, + _mm_mul_pd, _mm_add_pd, _mm_loadu_si128, _mm_storeu_si128, _mm_cmpeq_epi8, etc.) + to ARM NEON equivalents +--- +id: 1e41f86a-59c8-453c-9f8b-12967a3ef6fc +timestamp: '2025-10-28T20:30:07.381002+00:00' +tool: knowledge_base_search +args: + query: SSE2 NEON conversion _mm_loadu_pd _mm_storeu_si128 _mm_cmpeq_epi8 _mm_movemask_epi8 + arm_neon.h intrinsics +reason: Looking for specific SSE2 to NEON conversion patterns for double precision + operations, integer operations, and memory copy operations From f76d5ddcf504c360871471f05d49d8a5210fe47d Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Tue, 28 Oct 2025 20:36:57 +0000 Subject: [PATCH 3/6] Add ARM NEON support to all benchmark modules Co-authored-by: JoeStech <4088382+JoeStech@users.noreply.github.com> --- Dockerfile | 7 ++++--- README.md | 31 ++++++++++++++++++++++--------- benchmark | Bin 0 -> 41328 bytes hash_operations.cpp | 36 +++++++++++++++++++++++++++--------- main.cpp | 10 ++++++++-- matrix_operations.cpp | 33 +++++++++++++++++++++++++++++++++ memory_operations.cpp | 14 +++++++++++++- polynomial_eval.cpp | 35 +++++++++++++++++++++++++++++++++++ string_search.cpp | 34 +++++++++++++++++++++++++++++++++- 9 files changed, 175 insertions(+), 25 deletions(-) create mode 100755 benchmark diff --git a/Dockerfile b/Dockerfile index 24fec07..c3b981e 100644 --- a/Dockerfile +++ b/Dockerfile @@ -15,8 +15,8 @@ COPY *.h ./ # Copy all C++ source files COPY *.cpp ./ -# Build the application with optimizations -# SSE2 intrinsics are used in the code for x86-64 platforms +# Build the application with optimizations for the target architecture +# Supports both x86-64 (with SSE2) and ARM64 (with NEON) architectures RUN g++ -O2 -o benchmark \ main.cpp \ matrix_operations.cpp \ @@ -24,7 +24,8 @@ RUN g++ -O2 -o benchmark \ string_search.cpp \ memory_operations.cpp \ polynomial_eval.cpp \ - -std=c++11 + -std=c++11 \ + -march=native # Create a startup script COPY start.sh . diff --git a/README.md b/README.md index 6bbe4e4..eae1113 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # Compute Benchmark Suite -A high-performance compute benchmark application optimized for x86-64 architecture with SSE2 SIMD instructions. +A high-performance compute benchmark application optimized for both x86-64 and ARM64 architectures with SIMD instructions. ## Overview @@ -11,7 +11,7 @@ This benchmark suite tests various compute-intensive operations including: - Memory operations (50MB copy operations) - Polynomial evaluation (10M iterations) -The code is optimized using x86 SSE2 SIMD intrinsics for maximum performance on Intel and AMD processors. +The code is optimized using architecture-specific SIMD intrinsics for maximum performance on Intel/AMD (x86-64) and ARM processors. ## Building with Docker @@ -21,6 +21,12 @@ Build the Docker image: docker build -t benchmark-suite . ``` +For multi-architecture builds: + +```bash +docker buildx build --platform linux/amd64,linux/arm64 -t benchmark-suite . +``` + ## Running the Benchmark Run the benchmark suite: @@ -33,16 +39,17 @@ This will execute all benchmark tests and display timing results for each operat ## Architecture Notes -- **Optimized for**: x86-64 architecture with SSE2 support -- **SIMD Instructions**: Uses SSE2 intrinsics (`__m128d`, `__m128i`) for vectorized operations -- **Fallback**: Includes scalar fallback implementation for non-x86 platforms +- **Optimized for**: x86-64 with SSE2 support and ARM64 with NEON support +- **x86-64 SIMD**: Uses SSE2 intrinsics (`__m128d`, `__m128i`) for vectorized operations +- **ARM64 SIMD**: Uses NEON intrinsics (`float64x2_t`, `uint8x16_t`) for vectorized operations +- **Fallback**: Includes scalar fallback implementation for platforms without SIMD support ## Output Example ``` ======================================== Compute Benchmark Suite - x86-64 with SSE2 Optimizations + ARM64 with NEON Optimizations ======================================== === Matrix Multiplication Benchmark === @@ -63,10 +70,16 @@ Hash: 0xbfd8e92e2fb01505 The benchmark suite is organized into separate modules: - `main.cpp` - Main entry point and benchmark orchestration -- `matrix_operations.{h,cpp}` - Matrix multiplication with SSE2 optimizations +- `matrix_operations.{h,cpp}` - Matrix multiplication with SSE2/NEON optimizations - `hash_operations.{h,cpp}` - Cryptographic hashing with SIMD acceleration -- `string_search.{h,cpp}` - String pattern matching using SSE2 +- `string_search.{h,cpp}` - String pattern matching using SSE2/NEON - `memory_operations.{h,cpp}` - Fast memory copy operations - `polynomial_eval.{h,cpp}` - Vectorized polynomial evaluation -Each module uses C++11 standard library and x86 SSE2 intrinsics where applicable. \ No newline at end of file +Each module uses C++11 standard library and architecture-specific SIMD intrinsics where applicable. + +## Supported Architectures + +- **x86-64**: Intel and AMD processors with SSE2 support (standard on all 64-bit x86 CPUs) +- **ARM64**: ARM processors with NEON support (ARMv8-A and later) +- **Generic**: Scalar fallback for other architectures \ No newline at end of file diff --git a/benchmark b/benchmark new file mode 100755 index 0000000000000000000000000000000000000000..b3337b81b1bf38b5f436ec7ef7a5768153ae6518 GIT binary patch literal 41328 zcmeHwdwi2czW0+}pg?Jgf?E|eutljXwP_2sUR&D06G)-bvZAYokft=6CN&o*u9spd zYc@nV?w(y;ci(k)J?D5_-(A+T>k6yb(uRwQ7QC#AO2yj~K`36JfRgk5&CHWLNt?Z# z_x3qeo8Qd*=9+Mi$ud7fr<2r>DP1E`Zs1}LsS*Vx4~qn%Dyc%6h|g(~ zK^lX67K8EfDjuPfhc`2l;~W9cL`-xAlo%nN%E6Kvs~S>Fbm>CD*}O!JrF@(51%t z!k%iZ+LPGOUxnbC8keAqf*PmNwFtV@xR#SiYD_XHG<@W5`k%gP}*_Gdm6c zThqY%)5v`}joh7S;QyJXzMX0G{7V|RGmSpirP2T1H2BmJN3zc=Y2crwsn@YI_3BB( z=khfA%u2(5Wtw_jkOtnEhW}-0^tmIA+)vWfi^j77f2m9bTq>CI^fa@Uy{Z5@y?~4i3mCMJQOUt6Dv7tIu!q`mI*U zYHqBvHaWbGRZgGZ;ccv|ae3U1M%zl4gO`n5a$|$Pw8U!lw|G5ktS*OpmA}R6@OnMo z`bBohdgI~-|K(Pzy}jLNG?uQk`J8qL^g7+E%=QMq(P(e6d98l0&FS}<|uh+KL z>UOC8QdGCBE#T)3q-{H68LY(`kJUx(4s)7a0bh&Ajbvb|xM}GkRPmqf=O(7I%SVlx zG7v_pue}kqorBs^-*T_PxL{4>b+E3(t^CibU@SqUo1DHjkIx|*q}=Wa_yunaLACEO zS$5UmA zWNGmn4qm<-$mPXlrAb+%F*#e2lr61H%2rA>%a>QrGh2~{#Wqr z>O53`@!AOz@U{TP%QL0li}ES}COe(j(M*tf1niIh@)ImGbEFqVK5(9M9IZUxh|`ag z9u#H7Z{E;PA$X#6P{3>~pDP^``4*NR&lb+4(^{4vC+V?}rhM?8BG8YOE)@A!e}5ZN z+0u7J{&t3+C0#D^PL|J<=8L?)=s|34GNfh5Q&89W>VBvD5)SVW^P&px5b&H1UarE` zeE&ThPHS#;txb?OHSj!+!afHy@F^O&qJhuQzy~#Op$aP<(!ghG@bf!(hNkIh8n`;& zQ}r*^z^giVhTyc8P{Xdfd5YEn>Xc8Aehr-WB#!r{2970CJPc^yXzF-SG;q-*tZYyN z=ld8=GNgf%PVpe@Krpp!s4PbVN5_nZd<`7EEglLqa14ZaFlgYpai}EC)WF47pOuzq z;Cc;yg$7RjSq)Vh_#_pGSk}O`>+5Xx7`4Et1x76}YJpJ;j9OsS0;3lA|IGql=r24a zcOK4>L)q_Ju^sK|_GiTU<<7pGURKrEoCj}_q}Yrf<5Pc8l|=Y0R1!THi^ZBZu{<3d zME51~bYKwOp2*WdLG-Cao(>42zfR=oU?BQnB2Nbb(R&kl+Vn@gi98(uL{}#AwDFHF zP2_3AADx%T(?&m9n#j`zKRP{;r;UB|tVEs;a-tcDJZmp1m%gNZzC=%f1* zdD_TFw`s+lV4v3-;Ci1j_kKUWe)5bmOP2_3A9$lHp(?&hIG?Awb zdUReQPaE@SX(CS>^62zL{sPLMmB`b^JerZnUqt!Ozlqo1K>365d~C*snB#7@{3dQ^HcKIq~zzOuNIH0&e#ynUsA_QGUln~@ci5$AKuz;j5AYf>H zVZ|kqln`7@94v&#b^O(cSQClc#iZVKC|4}_umoeobrR}g37$~KvG}N>V|80!_z-H4 zkdi}EJ|@yhjPw&C-9fbh{AC6@N}zAE z#3Q&<`B%7nw8H~ek_rcua|HNq4n6>40>{4Q*sBQ485|a%Mu3zU9J%r~yq6Mzi&3N* z_&A?x6Q0v2SUfB6={*p@6++;@3Gl=O=LZGo6wFWam$CbMH}|sn$@rN$xc3%0cu?;A zaIn77Wb8KXmcv(1LZil}yt-VHzHBbmU$h={B;#t`4HRXcLV;X4R_+9kx?E#g$8PZt~;`S0X*H!ppDBX`sf-I3VoI}@Y#H< z9I8^}(34{aiCx!c{)UdbD!})&|2+M+Kn=Cdb>K^e`rgfG(dZYHLst)-MjSGsMpH^a zx{c;i@G6lj`~AClE_gK7uirL!XsdEJkQ263*9XDDUR09Xlk~HMmK4-=?e=>uq3aFC z3maeO=c+|J8B`>SvXh!;_`z-i!>C? zn?y-I;;JDO2ad?G9=UTzfu(YH;Hh4c)T<$2{h|81)yt|I``$n4eq&ifu;anfnYh7P%D;U z0a)8qU`+J*ia5# zDR;&E*@~S6boRmyhXS8kf&sABx?O$OFNs-ap8@XodpB3#RDE;xEzx2SX{?2QM%{U5 zRv~aDd1q}QGXW)N35I@k+&|r9{Lp0lOb&h^7xi0ckRO$UZ^^;!%KKPW)pdOs*pMta zP()Hyk-aH}xfoDGbI|L*J%nEW9zLiC!F185pqog;`$d0;>?XectiGPNDC_$ba%f(G zTshyM@7hcz2!0_4UzbCx3*^vJgE{!Ma*DaZh62`ROUZwP>I^31K4Z5WzG@Gz)^7d! zg($Iv78%T;+(J3nScoQRDx7H!-;NP(DFmlMbMPy<^W!t-y^6W>qciAp8Fe~LLug%d z&qrCRe+U5?CNg)P)LHZknhNvfAgs4IR6c1@xV&j`_^OXcq$Qm76^nyc6&jCgnjmuQ z`rSmk`UW|aDTkIx1Xvi#&0ie8^U@MIJg?+Xz6O00MoEb!bRH^OAb0lU_w{@~KdUTq z@|>*F$jKY32hlE(W6O7$D|ZH7j~tuQgJH9~uJR?{g^YV{;lj}5d>Z`=@vbg%I)C!` zzJt*%xg}sTDR(Zah!J=~-?a+SP2bWVF?bXZCq{b zWk~)0k9iyzJoL_{#ClQFe(Mg?EwJ2~TPVrNX&IXUIsEg&%~(uhlv=`%6?RjSW3kGG}398+* zjKEU)3C7a5juV8+et}f5XDj*!kT)_EjrG}h*(LWV7s|$6va{O}9P0|_qQ!Gxm1EoG zb*I{}GYY&z!=>}C0t~42g(X;$5LCbZcJd~t^QYW_-v3s1ODS>#*Y5|S0fi>kPnSuH z!nLjFup1ej32r5vwcQdjXvTkVSM~F>|Dt{pIX$Tn&S~9A4@vmJCek`A>#!8+<7VW1 z*+DL znDs4TS7E*->@TcRuKOIcSj(M6ymc3-w$TR`i^^Z&cD0ZbN(2xpsly_R*1RT)G4$G$ z^FeJ1eh$x%Ww9KLD7VsbC7XC}-CzFnU+;bI{2J8ZDP9M;NRfj*@`+C|wPB5P>MXek zvl}|;R@#FrXF$fZp!t*Z<0CCR=S4=2oj^o+JArdlnN4!A_z$t_TSAXzQ}r?H=+_%j z>FSO<3&{X`(EqFT+x&%C_#9Q1ALA95NC2HRKtBW5Pq~vAhA-kF*s9|@bNqQ?MFdaX?>{?rebld6AN2&bllV`W zH82kk??Gdr2=75NF^tu9f+`p=d5iMpq;6k=O=iD^G)! zayveT7b?obBf|TYbtA%`QJjFQPH!)2CDOEL+moE#fox*xV*5|I@{IpBTAnYHgI_gZ zzrwdOi?KQ`mP6Odp_+Vi@TIzk{ol`KLxwJW3)2wJPC?EMzK?Q5)@cV4Hu=Z2cLl_rEm|^{y9eWiSp5B)FmQZ zb7~!~Il1Q>8Aq|I4+C#uxON5{$1KzZON*5HEDUE|qVJjl0krfpVMQ@%Y;+8=LOyma z$z`R;YCH<3e;xev(hUHLNMk&#P^7s7bA_x$=Pi-v0n1xyoxv=U@-*9CypQ z7bC}9J#z5oeCz>DQeCLL&P&a6f z?eP~eblQK)J5BizL7PdF1o}YGxcktv2}nHEa48lr$oRYF>aNQ zktG9MNCuCem8koDZeMfob+&rxJiJ`9QmHMJv9Y4<@gcT^rCzZB-S!1}i@NzW2eGC4 zf>s7}W}^RnkEK%4cYXkL3mQ78Sc-kphflfPR8AVrRa<~Y{^Cwl@}82 zy?@4>EpJmBQx4~%2-`B+Lw7zxWmxq=2W;vT7uh3p6?WxEBzTINAvX7Y+fwP?0Np18>OT8YRBNAI~bGQ1!OU zTQP7bTqIBHH-~Gpb+nx~hs*mgTJ-DRCI!(N!Ouw|R@hjfvX_rL~~-}q-(f~Pe@Dlwm*9fhIP`MeIp4<0wB ze526J;<>7jwuAX*U0+?W8Jo^K?zx5la(s@_cQsSBI{TJuIx+LNp?Nv-tPFGTOB$C3 z_(1*oUDQ+Jep03uJ&lzJJ6V`qsBD2uIW!r&a;yg-l|E*oDl;j8?SH8ybVrW4=!Cg5 z2FLuLw3rH>Qkecs$&foI-Y!p=bzq{vQWl6Wk%O3kc`w{uo*vjX%&hh%00&BU7HGP5|o z;)g@?KcVjZRiMYLdzY5+GtKbR1*E?vc!pLZ0Sq29yiQ%^M}fm~Cx#1F(M4?akld^f zuu)CJDXDFvAAU>@yowzf&1Bl+(5QkECSxpqk`%oaEHxIy`th0g2(Rps^$T{PVa;;S zKqhPI=sc{ilJJ@K;@GKZ2Q~C~I>@9$8(oiFc}TyW&bipwW&RNDQznNRX#8f%J8Q)F z#WIySzj_+KZD^l&18OUe=diNIB=W+>U-pI z+u26@Q@QM4R$#{@pLlb|fz3G&%f@~=SdF9JlY!p8gLmC9yA;);#0(=6&l8F{+|iEX z90FcY2w-IQ(c?5{|CJ#7G}19&M&8|T&WXIfAE&=PhR8REdLm!>`gM`7js|zZ7?E%G z_eS=d?nb;FuZ3ZkPV)g)u}H9Fu=y4qt}}@eS`p+>IRCZ8Gj=yv>!zd>1ba zPu9!26MX%lrI+>F7z4TTuYr%*bO{GPnP1~&!vW0fRlf5<(g0=awPjoBLK-{XSEnHg zex$C)=%nM>YXR!~Z~)zm&fk@->L?0*V;upl8xas%s9Hah+G*a=>p{=lq8)Gy?PNE1xrA5gVggO zEG)Ou3l3#qfY@O+A)i!+jCwZnYY?OP&A}bYvjCfM9D=@67kYNgAn34I(XanfM}6lW zIt;pylKO3HYUm)#LW-d0e2G{+3!2Q>9Z=f9i%l&Dlo=XnU*mZMlNM1HGif*&VB-EL z4xg~KplPwX<8Bg1-?)Xi)ewjK@AtU4E)ti+M}@-p9Z+;&hOYkvU3%lx2g}I=a;RRB zb#LQFqjTU-ox5^^^9qzpna2FQMlH33o}ijCqzoX{(fQxkfK>D2oBJuLp5NRcbkGr_ zumdZfPrk?Rf2yrFM`r;_E4xm_X57Ptrxd~KCm&$+VyXfTKaO~sWj;}sG6Jlh!c$Jb z2_#cok68|KIK@JiIjR^K1*{jI5*bFZG)5HSki^6?CzK;lomw|M<;TMae!=uO!9{FB zMi~IMvY$TQppU)yP$wuqY|Z=|0r>3^E@_rESVHr$%fCLy9InCwsbAmu0<}?SZe0r8PA}vB(Lvk^-q_NsjVo_2t;sqVy^5C zd=~4ej`e{{U@PtS!X_-edh@Yb_0GoXIZEg2nmIv^MS#s@ed?S3z1rY_xa;dRSAOa} z3jK;S*v*@a-G?S?0Gqf_|1vp|dm7KBP`Qowg%zOCubV z=W|#1I#eJh0aatlOk@}s=8f>0)03G$FdSSOvWEbPdx2c&#ofY>(cM(9P(zBqWQP!M zWtaZpIqP{_DXSQ##zL7yR*}eMB$7HSM6ODRye5rEQ-&&1(TMDSi;HY0k?j60JZGv9 z`6XZwmP^7OBCf#^PpEZq#y4viuMvzdBF5r?|Jrj|zf&FojH;1OoEH-UH4~~a0|+nx zS#Ax(8E(19A%k1)Inbz<8z-dwfnuC2RPjs0gCc976*Z2*cj6-PWd_qe?$LKW2&Py& zwQDxUbmpANbYRZ$aX7AIce>~dSwuL;Ygp!hvX?Q(rhdwrVFWM1<-!kf=lDF!9914= z1Wbf!7{zq-@wf;&Pb8(>35v6aQQXV)5K0AjM!70M@jga)ay3q{^?;!Glrk_?OZ?Cv za0hlRuj26|-B%fuNpB8AxIy`PIIvdvWH|6LW#4e%dCGIcf!WIA04jETh}i<;G-^Y0 zsFTM+q}BZ#gLR(Lub#_R<#aw&n>Q#2cVQ!mbvWI@%at$qu%+zxody4>e>!)}eYs=V zJ^1#(v0nJw4>b;mpP$n4_f{-U*$b?n;UW{=^Hr?;;X-A!Vv_HyfbZP-SF%iY8)2r7 z_HcZ(@5NYA`|Pqes6lrs-yo*V@$-^ozGNt4vi2@ z@E#63bMiT`<>X0n=k|PDp_3AUsPY(4lllevsyRyFo=?gNdut*n@d%S|WlQABM$3(|MaDi$mv_MzmGgF%e$3RM3_xR2+w(+D?&c66JT6i4C_B@Ae^hQ7c^@B znIg1!j$-&c13yCGWKWO=SPsWN@e!@>Vh?UQ`(*CzdU3vTR4IUHGkT{m(E11TZ!$T|kKHwlK2PgobjS>+e<(qJ^>BKvAJid}Oi%hR2ffh$ z8+eUm{bjBGSCUtRk4so2P^(l9GgII)rEoa#JVieon5~?872fJk_|SN(k6=;NTfKyI z;>?~w)%%lcL6uaUk?&8?oo4(P^Rpb%$OKLpv}T1$##t1OguxX#W7E3cLv@f^+b-{@M0voykD;T-1}DKQ2zEvbjm(l zUT>Ex2YqAu4j$W*)ziBvLRa72hYo4|ZS^wtRs#L^W4J5!qJfSOs$+p6+#1G@7cAi! zw_HX(blu_m&~-6ZHICD?3x4x~ebgM;6~Gc}PBz?ehoWCUpZq3Ub3VZQCapO?Q7+gE z3JmrL;>b5Mv+u&Y=I7~H6Ksl*mV>75$TvkM=sh#L1AxfMBGYDC=w(6{Hvf)Iku#pV z9>;E24tBLs-wIb%fT{!8c4Yf{?#9tAecr{ME{Mardj$>xMdh@{kmGRk;P+ zlf38~WYFLzmLg1H!@TIfwm=&5Im$BBe;Y9&qYNQK>hKo?N!#wj%!{#Y_qVtgnbC}NGa4k$1`ZS-FSc%+T~uM-k0wG!<@;sjzR z{N`LC@dtpBiF!%i+sGtVn|R-mIDop5@#arTo z$GSe=1&=0n+UBrhu=?icUqC@i(X&~7XurJ$qdeJ%yuf^j*l#~aJ_PgEK_-fpd-n|^ zU{Af(H5vP?|5hf$DDGlmh+r6Kg@1dkx27@0zv zx(5y`HxGloshEes`jk1tV4Ia`fT>Qb5?ze#EwKb*7x=xf>S!3Xz^DaAEih_{xR3KNwW*1_S>^bi*2_zs1ndU@A2%!B4-oI`6R255W3V zyz&1a->U_I-R5?C{02LI>ecTs;DpiZuZcnSz<}#R2-2g*Q!8I6aJgxZc zT89SS5ODe(lEF~zav4^tweuP5Ky*3$j;3i6xA7HMu-^i9xx|m`O2R;7>MIOVBaDj3 z(CU-u$6Q^01AfsJ0H&L4^R?gyn>A9&dbKvc%^=7Pi{?p$b%mj%UDE!@^A%U{A99u& z9qr)Bg&F8apKW&h#Rm3c+dZ3DD%_1Ines;~&LkH)ccP)nxiJ0fKpQ?U1i(-KeL=s_8&}CKq z>|qMyy31HNyNcrsJuh&pbmwPYN#CXZHUK5h#$wBnuHP1mbs%1Wn9iEpx?{0IRP<5A z4KZy7>#M=?Cz|*cT5r2r-uss$V1fM~~Rfq?AalAtO z`(m-p#2@i}RFMeas7U_~oM=GKrd>Erp!mgDtR8Y6#gRw>@K)gH?B|HP5x<6b5OEIZ z>TxKs9B~`&W1b~_aZk7lbZzfI55)QJLq7CdaS-)GT#r|eLx>+mY=E8;UNh$+EN zf`$4CYjQfIYtO5^w5)JCpp5<|#Jga7wa=(Gcx+W}*#WAzTyT%7nWp&yev{UO^4R zb)o1>{B^t>i~R}plosUWhco8o6>Q9`&NFPts?D3(nY|#dBx7mbOaLHkUS9sZyquce z9WU*v&nxN7-jKC1Gn|pJbbK;`JR#%y6i^NHqIxw!&H(JV9%%|>zjnZ$#lg=q0?g0L z-$?b^Kst10eV^A1QWs7D6qQ60XvB;9)rX*)Msx`3xYONHZ(%Js_7S?^Jzs4{3vkFb z1E8ThiL9~cFp&2E)%pI+d3lC#*1Wu#8?&qPN;ZtC&a3DgTboy(@#pb*6}5RK)p;|4 z0ZeX$g%hL}@S^q`M4vqmeJtzMtYQ32sr)wSGN)#MBV?p-G)>6dqZ_frJVB!VGl0Kx z*lrK${s@vV= zZfnsOOOVd02mieK&M_OZH)e%1@2Bob_M`UeKp)&fI+PEu?=btdW&VIFHQa#J6Aa+@ zNAQs#lXzzx#=y>N>m0jb%*O0+*8Q0{g{O9Y2;e7iqcNRu5tieR4w;{Uk2wX!vburr zx7%?+-ZKVE-ZNE}ypAQAt4Fp!by3NHKN&oq`3w9E(pi4esbh&&sd~U&z^Q-_)1eDx zkwZ$6zaY8Y7v~M=G6R_-wr<@1-Qf8UJZb~fRom~-%S6A+7{IWHKZ^G_RyVTO z&%xS*_p+BCWmh0e=m8*AK*EwMbJS zm;%^gJaC)FAiRE8L9ce;eV6nr;`(WQA;wl`_R_qz%$?{wN#1bY1oAs~g6>A>Qi^oe zWYVQgb&pKuP4EwoL*{)#CiVBtpee))>eHx47Cn-IO~M)1oZoj=jUatZkv~%o3VPulKjk!)yp$O8RLOy&K*k*(KLblsDyguK!;0^%z6&|I6a{= zevCYv!FVyQM1v4}}{FSry1E#WlE1{QX0Uli-AwEk;ggT z$}W{=U&LU!59+dyWJ|x(WgpCDINzSd;4`oJo+SOdF8lLwD4Y4gI4PX*ROVmDNso<1 zE#DX$)giraTr%Hjm(@x5@%kTSJ}^#-65b;@w<6g^Xg?g6jU*|N$BD`-Zj_`CblH2x zNgrnfv)&meof?Bl{}Je{>sQ16HEDnBU*GHLVO9E{T=779ec zWRV{$VthM>2Rh6`P~ijOKciFiHVDJ1|3{iCAKyJ?LB%^Ay7nhV532-u9YEB10 zmUJwZ3a5UkL&-+bkqiQXojs3$PZ#}k1F(r5Zb}lnqXG=4dkns!kTan3UkWSn$M6N5 z&R8j|!ym!l5pX)!r|=8>5xjf`%Sv?aPT^Vnjii4+;DqlH_@p<5!}uEspS~+h`1cBY zRnH4xppo#)04MyX1U{W5Q($*hBiN@|;D0FaRkzxehX4Hnzh5A-dqnW>PlLZ-;ODE1 znKqL2u7Fpizy}3f{eLwDxKqw zocRCmslwCg-^B1IOSvu;GZBwue=qPQ>1=7xp1`A27vpoTl0L-pG#)nkC4 zrDinovkaau4bJ31y2qpN65!KO-xb&Jyn4^I3ghG~f>&`k-Rn@e8*suOn9K9jKPWuQ z@OfJMN%thgXLCJ=(tQJk@!&`B?n`-|?(`_oJAi!Ye5s&=W#b>S04KTiLN2?nMd=d8 zCyo|R{3Z?lPKKW^b$o}T(j6X!C73u3kSletZ2aRr;7=lpuHYbKS@?|c5gAQ%R`$sD zzly5alVw(AsCD3EWoD$o-g3)yj~3WNc@*Ec)nD%h%=}ad~QpF|6m&UlY&qE ze9nOG_9*N~ga0wXGbY8?WgHv+GY$T@u_N1mI^ZMLOW>#eKlg0{zFG8JwHr^(!hT}@Qhpl!To0w;--ro&Bt6%sf#WKU*&)@xeUb)$0_-*k^FmbzQ%}m*@pziBr$P9u`#4yt z1DyPlB=7}Tx{<-BO3T-&5FT3@pJW*29!LX!9B{H*YCq`_{8c?wj(gJZImzHyw}=;7 zeS#nl8-bDRIVTMq&pt-PZ%PASp9cO(8u*zs@X5ow6stiA;3Mh3fWcAUMZBVPhfcwk zhEJQomrNXwy(vJlI}QHZfKS8jN&SDW)qTv@f=|a~oDbbnv(RAo`i;dN$!fK&bXxtk zRT6(9YW2ilTNc~f+9d0ZiyQoN`8!B+Q-jUiWHQ+rOiLFv7_G%lpVx-}OCC6L`IE?+ zQd4VdvEAcpBH$tctq|`_*AkgEUMxy0y*76fUT@yvFs)Y0N=xGJG4X%i)6-SxToN-IE=Cj_*zhm*raXo=!Ds#b0BN2pc@9-)pP zCxIapYqdGu0GzrKCeu>D%J{=j4|H|dTFs5t#uBTk zVQ#z$(3(Df0AG2vOHB^1W0li~cb!&$tJUuExE($;bCbuq%H>&Ub6K14Y}jYD1=^+f zOIK>dkua>yPPfx)^LlM-tq!-}yH;xU+FBjfra)`!T99aRcpK~VC!t!c^OsiFnXIP8 zwN@*T7c5?8HOT@f*DjT;wKpuTt~1w2)&-U&^QtY@CG+Pum{7Osc@`5W(DgiJc&uu& zS1quZ=hZABiYKuSx7%U$ui>?KxK}yd4s)xewXD=+X`NeIj?c2n z%jwgIkJ34FFE6hsn`@kdjFBMarIqECb1yHgBnaV^&Mli;Sz0l-yo68)T3%9GdU;vN z<#ULov4VXVOUuj3O6QcBOhzMK3a@gwt<`N(?)<6>sV7Emi2N)$`0& zV{ut>J?y;d+-QsnBMdL}?{IJ_*3#nA;yLiePPgF7RK=rztE1iSz*h-8?quc-{?Zb) zYA$qne+$<=MR5rZ%jfVXNkIqnd93*2K$FWM)m~rhbu<@St*hGGt(XIR9@5Ld*1DQG z_y!L(m(%TYc>U(JwGZ9cyhvg~bZ>sQw=THVU(uO)6xM~u-vr`Z+o zwIp|6Fwpj0jkgvSYW6zPS%=&guw2fSc611v7plRsaU+y5XOigMYU}W>1{`Cvf;!5;VcYt{t3~RT? z+lpxo-{rtJ89ZJ%QKPk?46dn_tx_Prv93m)g~%1mwOYkQs&?pqcG|691OcKVd{!zR z*8IjYsLk4+)iT*A#YW-a{#mDKECJUhr?1W9!+J?;=VIhLD=}D>S2xj#?HHd) zWDKX=?gq_zdOP-o*jyjLv_(yQ1!|Ik|BpUiOCa_Ze zAIzwxVf}&GbwsOrTt1Uii;r2e5nU_3dc>fiwPT}gC1wZfGWQxMDoI~(YJ~-AYOoZe zi9zd{HEUwnWmT)kEu1c`S&S7vXKNDxUaZW;V*GzrNi2K0uIf5BS<@RE$;%~A4tx4lZx>*JV!CSPjQ={78GUlEgv*>qRz^{{nYv7<$-&s&Vz!L9m7w;cC8g(DD;ax<^MrM_lYzPaD!Ws)*NLy=PV93ZXF7k!+{dUp*&q0!AWL`Rcu`8msrTKw%-({BT*-df#gh z3aj!}dbR#P74-8MH+4Ka#J~tM^-KTu6nQ zklKFxQKqFYaByTbrtep3gW7*o*hffc>GNH@M2&y;cjTV{I(2!qn(8|aHBObM=G8b8 z{Ph}o^__&9KMTafOVz(hrBmY+`BlKxh7ut^Wthb8 zA!ccjkyhDhW3}&UQRNPvdkE+46!1+VoD(PPr`8MC4{_a*jE8|A^5i)w15lN#B2Gd= Vt-A`N_&7OM1p3W6ZAi&V{|zGPRSp0E literal 0 HcmV?d00001 diff --git a/hash_operations.cpp b/hash_operations.cpp index 0d1d1ca..cf47d6f 100644 --- a/hash_operations.cpp +++ b/hash_operations.cpp @@ -7,8 +7,14 @@ #ifdef __x86_64__ #include #define USE_X86_SIMD 1 +#define USE_ARM_NEON 0 +#elif defined(__aarch64__) || defined(__ARM_NEON) +#include +#define USE_X86_SIMD 0 +#define USE_ARM_NEON 1 #else #define USE_X86_SIMD 0 +#define USE_ARM_NEON 0 #endif unsigned long long compute_hash(const char* data, size_t len) { @@ -20,20 +26,32 @@ unsigned long long compute_hash(const char* data, size_t len) { for (; i + 16 <= len; i += 16) { __m128i chunk = _mm_loadu_si128(reinterpret_cast(data + i)); - // Extract bytes and update hash + // Store to array and extract bytes + alignas(16) unsigned char bytes[16]; + _mm_storeu_si128(reinterpret_cast<__m128i*>(bytes), chunk); + + // Update hash for each byte + for (int j = 0; j < 16; j++) { + hash = ((hash << 5) + hash) + bytes[j]; + } + } +#elif USE_ARM_NEON + // ARM64 optimized path using NEON + for (; i + 16 <= len; i += 16) { + uint8x16_t chunk = vld1q_u8(reinterpret_cast(data + i)); + + // Store to array and extract bytes + alignas(16) uint8_t bytes[16]; + vst1q_u8(bytes, chunk); + + // Update hash for each byte for (int j = 0; j < 16; j++) { - unsigned char byte = _mm_extract_epi16(chunk, j / 2); - if (j % 2 == 0) { - byte = byte & 0xFF; - } else { - byte = (byte >> 8) & 0xFF; - } - hash = ((hash << 5) + hash) + byte; + hash = ((hash << 5) + hash) + bytes[j]; } } #endif - // Process remaining bytes (or all bytes on non-x86) + // Process remaining bytes (or all bytes on non-SIMD) for (; i < len; i++) { hash = ((hash << 5) + hash) + data[i]; } diff --git a/main.cpp b/main.cpp index 1c6e1a7..0ddb8bb 100644 --- a/main.cpp +++ b/main.cpp @@ -1,6 +1,6 @@ /* * High-Performance Compute Benchmark Suite - * Optimized for x86-64 architecture with SSE/AVX SIMD instructions + * Optimized for x86-64 and ARM64 architectures with SIMD instructions */ #include @@ -12,8 +12,13 @@ #ifdef __x86_64__ #define USE_X86_SIMD 1 +#define USE_ARM_NEON 0 +#elif defined(__aarch64__) || defined(__ARM_NEON) +#define USE_X86_SIMD 0 +#define USE_ARM_NEON 1 #else #define USE_X86_SIMD 0 +#define USE_ARM_NEON 0 #endif int main() { @@ -21,9 +26,10 @@ int main() { std::cout << " Compute Benchmark Suite" << std::endl; #if USE_X86_SIMD std::cout << " x86-64 with SSE2 Optimizations" << std::endl; +#elif USE_ARM_NEON + std::cout << " ARM64 with NEON Optimizations" << std::endl; #else std::cout << " Generic Build (No SIMD)" << std::endl; - std::cout << " NOTE: This code is optimized for x86-64" << std::endl; #endif std::cout << "========================================" << std::endl; diff --git a/matrix_operations.cpp b/matrix_operations.cpp index f85a899..c235dfd 100644 --- a/matrix_operations.cpp +++ b/matrix_operations.cpp @@ -7,8 +7,14 @@ #ifdef __x86_64__ #include #define USE_X86_SIMD 1 +#define USE_ARM_NEON 0 +#elif defined(__aarch64__) || defined(__ARM_NEON) +#include +#define USE_X86_SIMD 0 +#define USE_ARM_NEON 1 #else #define USE_X86_SIMD 0 +#define USE_ARM_NEON 0 #endif Matrix::Matrix(size_t r, size_t c) : rows(r), cols(c) { @@ -58,6 +64,33 @@ Matrix Matrix::multiply(const Matrix& other) const { sum += data[i][k] * other.data[k][j]; } + result.data[i][j] = sum; + } + } +#elif USE_ARM_NEON + // ARM64 optimized path using NEON + for (size_t i = 0; i < rows; i++) { + for (size_t j = 0; j < other.cols; j++) { + float64x2_t sum_vec = vdupq_n_f64(0.0); + size_t k = 0; + + // Process 2 elements at a time with NEON + for (; k + 1 < cols; k += 2) { + float64x2_t a_vec = vld1q_f64(&data[i][k]); + float64x2_t b_vec; + double b_arr[2] = {other.data[k][j], other.data[k+1][j]}; + b_vec = vld1q_f64(b_arr); + sum_vec = vmlaq_f64(sum_vec, a_vec, b_vec); + } + + // Horizontal add + double sum = vgetq_lane_f64(sum_vec, 0) + vgetq_lane_f64(sum_vec, 1); + + // Handle remaining element + if (k < cols) { + sum += data[i][k] * other.data[k][j]; + } + result.data[i][j] = sum; } } diff --git a/memory_operations.cpp b/memory_operations.cpp index 0e5b970..f5ef8bf 100644 --- a/memory_operations.cpp +++ b/memory_operations.cpp @@ -6,8 +6,14 @@ #ifdef __x86_64__ #include #define USE_X86_SIMD 1 +#define USE_ARM_NEON 0 +#elif defined(__aarch64__) || defined(__ARM_NEON) +#include +#define USE_X86_SIMD 0 +#define USE_ARM_NEON 1 #else #define USE_X86_SIMD 0 +#define USE_ARM_NEON 0 #endif void fast_memcpy(void* dest, const void* src, size_t n) { @@ -21,9 +27,15 @@ void fast_memcpy(void* dest, const void* src, size_t n) { __m128i chunk = _mm_loadu_si128(reinterpret_cast(s + i)); _mm_storeu_si128(reinterpret_cast<__m128i*>(d + i), chunk); } +#elif USE_ARM_NEON + // ARM64 optimized path using NEON + for (; i + 16 <= n; i += 16) { + uint8x16_t chunk = vld1q_u8(reinterpret_cast(s + i)); + vst1q_u8(reinterpret_cast(d + i), chunk); + } #endif - // Copy remaining bytes (or all on non-x86) + // Copy remaining bytes (or all on non-SIMD) for (; i < n; i++) { d[i] = s[i]; } diff --git a/polynomial_eval.cpp b/polynomial_eval.cpp index db2247a..f741ec0 100644 --- a/polynomial_eval.cpp +++ b/polynomial_eval.cpp @@ -5,8 +5,14 @@ #ifdef __x86_64__ #include #define USE_X86_SIMD 1 +#define USE_ARM_NEON 0 +#elif defined(__aarch64__) || defined(__ARM_NEON) +#include +#define USE_X86_SIMD 0 +#define USE_ARM_NEON 1 #else #define USE_X86_SIMD 0 +#define USE_ARM_NEON 0 #endif double polynomial_eval_sse(double x, const std::vector& coeffs) { @@ -39,6 +45,35 @@ double polynomial_eval_sse(double x, const std::vector& coeffs) { result += coeffs[i] * power_arr[0]; } + return result; +#elif USE_ARM_NEON + // ARM64 optimized path using NEON + float64x2_t result_vec = vdupq_n_f64(0.0); + float64x2_t power_vec; + double power_arr[2] = {1.0, x}; + power_vec = vld1q_f64(power_arr); + float64x2_t power_mult = vdupq_n_f64(x * x); + + size_t i = 0; + + // Process 2 coefficients at a time + for (; i + 1 < coeffs.size(); i += 2) { + float64x2_t coeff_vec; + double coeff_arr[2] = {coeffs[i], coeffs[i + 1]}; + coeff_vec = vld1q_f64(coeff_arr); + float64x2_t term = vmulq_f64(coeff_vec, power_vec); + result_vec = vaddq_f64(result_vec, term); + power_vec = vmulq_f64(power_vec, power_mult); + } + + // Horizontal add + double result = vgetq_lane_f64(result_vec, 0) + vgetq_lane_f64(result_vec, 1); + + // Handle remaining coefficient + if (i < coeffs.size()) { + result += coeffs[i] * vgetq_lane_f64(power_vec, 0); + } + return result; #else // Fallback scalar implementation diff --git a/string_search.cpp b/string_search.cpp index 7c5c340..7e87388 100644 --- a/string_search.cpp +++ b/string_search.cpp @@ -5,8 +5,14 @@ #ifdef __x86_64__ #include #define USE_X86_SIMD 1 +#define USE_ARM_NEON 0 +#elif defined(__aarch64__) || defined(__ARM_NEON) +#include +#define USE_X86_SIMD 0 +#define USE_ARM_NEON 1 #else #define USE_X86_SIMD 0 +#define USE_ARM_NEON 0 #endif int simd_string_search(const std::string& text, const std::string& pattern) { @@ -44,9 +50,35 @@ int simd_string_search(const std::string& text, const std::string& pattern) { } } } +#elif USE_ARM_NEON + // ARM64 optimized path using NEON + uint8x16_t first_char_vec = vdupq_n_u8(first_char); + + for (; i + 16 <= text_len - pattern_len + 1; i += 16) { + uint8x16_t text_chunk = vld1q_u8(reinterpret_cast(text.data() + i)); + uint8x16_t cmp = vceqq_u8(text_chunk, first_char_vec); + + // Store comparison results to array + alignas(16) uint8_t cmp_result[16]; + vst1q_u8(cmp_result, cmp); + + // Check each potential match + for (int bit = 0; bit < 16 && i + bit <= text_len - pattern_len; bit++) { + if (cmp_result[bit]) { + bool match = true; + for (size_t j = 1; j < pattern_len; j++) { + if (text[i + bit + j] != pattern[j]) { + match = false; + break; + } + } + if (match) count++; + } + } + } #endif - // Handle remaining characters (or all on non-x86) + // Handle remaining characters (or all on non-SIMD) for (; i <= text_len - pattern_len; i++) { bool match = true; for (size_t j = 0; j < pattern_len; j++) { From dff3da3ac7ed2db48e7beb7692968c8213abdfe6 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Tue, 28 Oct 2025 20:38:40 +0000 Subject: [PATCH 4/6] Add .gitignore and remove build artifacts Co-authored-by: JoeStech <4088382+JoeStech@users.noreply.github.com> --- .gitignore | 14 ++++++++++++++ benchmark | Bin 41328 -> 0 bytes 2 files changed, 14 insertions(+) create mode 100644 .gitignore delete mode 100755 benchmark diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..145c4b2 --- /dev/null +++ b/.gitignore @@ -0,0 +1,14 @@ +# Build artifacts +benchmark +*.o +*.out + +# Temporary files +*.swp +*.swo +*~ + +# IDE files +.vscode/ +.idea/ +*.iml diff --git a/benchmark b/benchmark deleted file mode 100755 index b3337b81b1bf38b5f436ec7ef7a5768153ae6518..0000000000000000000000000000000000000000 GIT binary patch literal 0 HcmV?d00001 literal 41328 zcmeHwdwi2czW0+}pg?Jgf?E|eutljXwP_2sUR&D06G)-bvZAYokft=6CN&o*u9spd zYc@nV?w(y;ci(k)J?D5_-(A+T>k6yb(uRwQ7QC#AO2yj~K`36JfRgk5&CHWLNt?Z# z_x3qeo8Qd*=9+Mi$ud7fr<2r>DP1E`Zs1}LsS*Vx4~qn%Dyc%6h|g(~ zK^lX67K8EfDjuPfhc`2l;~W9cL`-xAlo%nN%E6Kvs~S>Fbm>CD*}O!JrF@(51%t z!k%iZ+LPGOUxnbC8keAqf*PmNwFtV@xR#SiYD_XHG<@W5`k%gP}*_Gdm6c zThqY%)5v`}joh7S;QyJXzMX0G{7V|RGmSpirP2T1H2BmJN3zc=Y2crwsn@YI_3BB( z=khfA%u2(5Wtw_jkOtnEhW}-0^tmIA+)vWfi^j77f2m9bTq>CI^fa@Uy{Z5@y?~4i3mCMJQOUt6Dv7tIu!q`mI*U zYHqBvHaWbGRZgGZ;ccv|ae3U1M%zl4gO`n5a$|$Pw8U!lw|G5ktS*OpmA}R6@OnMo z`bBohdgI~-|K(Pzy}jLNG?uQk`J8qL^g7+E%=QMq(P(e6d98l0&FS}<|uh+KL z>UOC8QdGCBE#T)3q-{H68LY(`kJUx(4s)7a0bh&Ajbvb|xM}GkRPmqf=O(7I%SVlx zG7v_pue}kqorBs^-*T_PxL{4>b+E3(t^CibU@SqUo1DHjkIx|*q}=Wa_yunaLACEO zS$5UmA zWNGmn4qm<-$mPXlrAb+%F*#e2lr61H%2rA>%a>QrGh2~{#Wqr z>O53`@!AOz@U{TP%QL0li}ES}COe(j(M*tf1niIh@)ImGbEFqVK5(9M9IZUxh|`ag z9u#H7Z{E;PA$X#6P{3>~pDP^``4*NR&lb+4(^{4vC+V?}rhM?8BG8YOE)@A!e}5ZN z+0u7J{&t3+C0#D^PL|J<=8L?)=s|34GNfh5Q&89W>VBvD5)SVW^P&px5b&H1UarE` zeE&ThPHS#;txb?OHSj!+!afHy@F^O&qJhuQzy~#Op$aP<(!ghG@bf!(hNkIh8n`;& zQ}r*^z^giVhTyc8P{Xdfd5YEn>Xc8Aehr-WB#!r{2970CJPc^yXzF-SG;q-*tZYyN z=ld8=GNgf%PVpe@Krpp!s4PbVN5_nZd<`7EEglLqa14ZaFlgYpai}EC)WF47pOuzq z;Cc;yg$7RjSq)Vh_#_pGSk}O`>+5Xx7`4Et1x76}YJpJ;j9OsS0;3lA|IGql=r24a zcOK4>L)q_Ju^sK|_GiTU<<7pGURKrEoCj}_q}Yrf<5Pc8l|=Y0R1!THi^ZBZu{<3d zME51~bYKwOp2*WdLG-Cao(>42zfR=oU?BQnB2Nbb(R&kl+Vn@gi98(uL{}#AwDFHF zP2_3AADx%T(?&m9n#j`zKRP{;r;UB|tVEs;a-tcDJZmp1m%gNZzC=%f1* zdD_TFw`s+lV4v3-;Ci1j_kKUWe)5bmOP2_3A9$lHp(?&hIG?Awb zdUReQPaE@SX(CS>^62zL{sPLMmB`b^JerZnUqt!Ozlqo1K>365d~C*snB#7@{3dQ^HcKIq~zzOuNIH0&e#ynUsA_QGUln~@ci5$AKuz;j5AYf>H zVZ|kqln`7@94v&#b^O(cSQClc#iZVKC|4}_umoeobrR}g37$~KvG}N>V|80!_z-H4 zkdi}EJ|@yhjPw&C-9fbh{AC6@N}zAE z#3Q&<`B%7nw8H~ek_rcua|HNq4n6>40>{4Q*sBQ485|a%Mu3zU9J%r~yq6Mzi&3N* z_&A?x6Q0v2SUfB6={*p@6++;@3Gl=O=LZGo6wFWam$CbMH}|sn$@rN$xc3%0cu?;A zaIn77Wb8KXmcv(1LZil}yt-VHzHBbmU$h={B;#t`4HRXcLV;X4R_+9kx?E#g$8PZt~;`S0X*H!ppDBX`sf-I3VoI}@Y#H< z9I8^}(34{aiCx!c{)UdbD!})&|2+M+Kn=Cdb>K^e`rgfG(dZYHLst)-MjSGsMpH^a zx{c;i@G6lj`~AClE_gK7uirL!XsdEJkQ263*9XDDUR09Xlk~HMmK4-=?e=>uq3aFC z3maeO=c+|J8B`>SvXh!;_`z-i!>C? zn?y-I;;JDO2ad?G9=UTzfu(YH;Hh4c)T<$2{h|81)yt|I``$n4eq&ifu;anfnYh7P%D;U z0a)8qU`+J*ia5# zDR;&E*@~S6boRmyhXS8kf&sABx?O$OFNs-ap8@XodpB3#RDE;xEzx2SX{?2QM%{U5 zRv~aDd1q}QGXW)N35I@k+&|r9{Lp0lOb&h^7xi0ckRO$UZ^^;!%KKPW)pdOs*pMta zP()Hyk-aH}xfoDGbI|L*J%nEW9zLiC!F185pqog;`$d0;>?XectiGPNDC_$ba%f(G zTshyM@7hcz2!0_4UzbCx3*^vJgE{!Ma*DaZh62`ROUZwP>I^31K4Z5WzG@Gz)^7d! zg($Iv78%T;+(J3nScoQRDx7H!-;NP(DFmlMbMPy<^W!t-y^6W>qciAp8Fe~LLug%d z&qrCRe+U5?CNg)P)LHZknhNvfAgs4IR6c1@xV&j`_^OXcq$Qm76^nyc6&jCgnjmuQ z`rSmk`UW|aDTkIx1Xvi#&0ie8^U@MIJg?+Xz6O00MoEb!bRH^OAb0lU_w{@~KdUTq z@|>*F$jKY32hlE(W6O7$D|ZH7j~tuQgJH9~uJR?{g^YV{;lj}5d>Z`=@vbg%I)C!` zzJt*%xg}sTDR(Zah!J=~-?a+SP2bWVF?bXZCq{b zWk~)0k9iyzJoL_{#ClQFe(Mg?EwJ2~TPVrNX&IXUIsEg&%~(uhlv=`%6?RjSW3kGG}398+* zjKEU)3C7a5juV8+et}f5XDj*!kT)_EjrG}h*(LWV7s|$6va{O}9P0|_qQ!Gxm1EoG zb*I{}GYY&z!=>}C0t~42g(X;$5LCbZcJd~t^QYW_-v3s1ODS>#*Y5|S0fi>kPnSuH z!nLjFup1ej32r5vwcQdjXvTkVSM~F>|Dt{pIX$Tn&S~9A4@vmJCek`A>#!8+<7VW1 z*+DL znDs4TS7E*->@TcRuKOIcSj(M6ymc3-w$TR`i^^Z&cD0ZbN(2xpsly_R*1RT)G4$G$ z^FeJ1eh$x%Ww9KLD7VsbC7XC}-CzFnU+;bI{2J8ZDP9M;NRfj*@`+C|wPB5P>MXek zvl}|;R@#FrXF$fZp!t*Z<0CCR=S4=2oj^o+JArdlnN4!A_z$t_TSAXzQ}r?H=+_%j z>FSO<3&{X`(EqFT+x&%C_#9Q1ALA95NC2HRKtBW5Pq~vAhA-kF*s9|@bNqQ?MFdaX?>{?rebld6AN2&bllV`W zH82kk??Gdr2=75NF^tu9f+`p=d5iMpq;6k=O=iD^G)! zayveT7b?obBf|TYbtA%`QJjFQPH!)2CDOEL+moE#fox*xV*5|I@{IpBTAnYHgI_gZ zzrwdOi?KQ`mP6Odp_+Vi@TIzk{ol`KLxwJW3)2wJPC?EMzK?Q5)@cV4Hu=Z2cLl_rEm|^{y9eWiSp5B)FmQZ zb7~!~Il1Q>8Aq|I4+C#uxON5{$1KzZON*5HEDUE|qVJjl0krfpVMQ@%Y;+8=LOyma z$z`R;YCH<3e;xev(hUHLNMk&#P^7s7bA_x$=Pi-v0n1xyoxv=U@-*9CypQ z7bC}9J#z5oeCz>DQeCLL&P&a6f z?eP~eblQK)J5BizL7PdF1o}YGxcktv2}nHEa48lr$oRYF>aNQ zktG9MNCuCem8koDZeMfob+&rxJiJ`9QmHMJv9Y4<@gcT^rCzZB-S!1}i@NzW2eGC4 zf>s7}W}^RnkEK%4cYXkL3mQ78Sc-kphflfPR8AVrRa<~Y{^Cwl@}82 zy?@4>EpJmBQx4~%2-`B+Lw7zxWmxq=2W;vT7uh3p6?WxEBzTINAvX7Y+fwP?0Np18>OT8YRBNAI~bGQ1!OU zTQP7bTqIBHH-~Gpb+nx~hs*mgTJ-DRCI!(N!Ouw|R@hjfvX_rL~~-}q-(f~Pe@Dlwm*9fhIP`MeIp4<0wB ze526J;<>7jwuAX*U0+?W8Jo^K?zx5la(s@_cQsSBI{TJuIx+LNp?Nv-tPFGTOB$C3 z_(1*oUDQ+Jep03uJ&lzJJ6V`qsBD2uIW!r&a;yg-l|E*oDl;j8?SH8ybVrW4=!Cg5 z2FLuLw3rH>Qkecs$&foI-Y!p=bzq{vQWl6Wk%O3kc`w{uo*vjX%&hh%00&BU7HGP5|o z;)g@?KcVjZRiMYLdzY5+GtKbR1*E?vc!pLZ0Sq29yiQ%^M}fm~Cx#1F(M4?akld^f zuu)CJDXDFvAAU>@yowzf&1Bl+(5QkECSxpqk`%oaEHxIy`th0g2(Rps^$T{PVa;;S zKqhPI=sc{ilJJ@K;@GKZ2Q~C~I>@9$8(oiFc}TyW&bipwW&RNDQznNRX#8f%J8Q)F z#WIySzj_+KZD^l&18OUe=diNIB=W+>U-pI z+u26@Q@QM4R$#{@pLlb|fz3G&%f@~=SdF9JlY!p8gLmC9yA;);#0(=6&l8F{+|iEX z90FcY2w-IQ(c?5{|CJ#7G}19&M&8|T&WXIfAE&=PhR8REdLm!>`gM`7js|zZ7?E%G z_eS=d?nb;FuZ3ZkPV)g)u}H9Fu=y4qt}}@eS`p+>IRCZ8Gj=yv>!zd>1ba zPu9!26MX%lrI+>F7z4TTuYr%*bO{GPnP1~&!vW0fRlf5<(g0=awPjoBLK-{XSEnHg zex$C)=%nM>YXR!~Z~)zm&fk@->L?0*V;upl8xas%s9Hah+G*a=>p{=lq8)Gy?PNE1xrA5gVggO zEG)Ou3l3#qfY@O+A)i!+jCwZnYY?OP&A}bYvjCfM9D=@67kYNgAn34I(XanfM}6lW zIt;pylKO3HYUm)#LW-d0e2G{+3!2Q>9Z=f9i%l&Dlo=XnU*mZMlNM1HGif*&VB-EL z4xg~KplPwX<8Bg1-?)Xi)ewjK@AtU4E)ti+M}@-p9Z+;&hOYkvU3%lx2g}I=a;RRB zb#LQFqjTU-ox5^^^9qzpna2FQMlH33o}ijCqzoX{(fQxkfK>D2oBJuLp5NRcbkGr_ zumdZfPrk?Rf2yrFM`r;_E4xm_X57Ptrxd~KCm&$+VyXfTKaO~sWj;}sG6Jlh!c$Jb z2_#cok68|KIK@JiIjR^K1*{jI5*bFZG)5HSki^6?CzK;lomw|M<;TMae!=uO!9{FB zMi~IMvY$TQppU)yP$wuqY|Z=|0r>3^E@_rESVHr$%fCLy9InCwsbAmu0<}?SZe0r8PA}vB(Lvk^-q_NsjVo_2t;sqVy^5C zd=~4ej`e{{U@PtS!X_-edh@Yb_0GoXIZEg2nmIv^MS#s@ed?S3z1rY_xa;dRSAOa} z3jK;S*v*@a-G?S?0Gqf_|1vp|dm7KBP`Qowg%zOCubV z=W|#1I#eJh0aatlOk@}s=8f>0)03G$FdSSOvWEbPdx2c&#ofY>(cM(9P(zBqWQP!M zWtaZpIqP{_DXSQ##zL7yR*}eMB$7HSM6ODRye5rEQ-&&1(TMDSi;HY0k?j60JZGv9 z`6XZwmP^7OBCf#^PpEZq#y4viuMvzdBF5r?|Jrj|zf&FojH;1OoEH-UH4~~a0|+nx zS#Ax(8E(19A%k1)Inbz<8z-dwfnuC2RPjs0gCc976*Z2*cj6-PWd_qe?$LKW2&Py& zwQDxUbmpANbYRZ$aX7AIce>~dSwuL;Ygp!hvX?Q(rhdwrVFWM1<-!kf=lDF!9914= z1Wbf!7{zq-@wf;&Pb8(>35v6aQQXV)5K0AjM!70M@jga)ay3q{^?;!Glrk_?OZ?Cv za0hlRuj26|-B%fuNpB8AxIy`PIIvdvWH|6LW#4e%dCGIcf!WIA04jETh}i<;G-^Y0 zsFTM+q}BZ#gLR(Lub#_R<#aw&n>Q#2cVQ!mbvWI@%at$qu%+zxody4>e>!)}eYs=V zJ^1#(v0nJw4>b;mpP$n4_f{-U*$b?n;UW{=^Hr?;;X-A!Vv_HyfbZP-SF%iY8)2r7 z_HcZ(@5NYA`|Pqes6lrs-yo*V@$-^ozGNt4vi2@ z@E#63bMiT`<>X0n=k|PDp_3AUsPY(4lllevsyRyFo=?gNdut*n@d%S|WlQABM$3(|MaDi$mv_MzmGgF%e$3RM3_xR2+w(+D?&c66JT6i4C_B@Ae^hQ7c^@B znIg1!j$-&c13yCGWKWO=SPsWN@e!@>Vh?UQ`(*CzdU3vTR4IUHGkT{m(E11TZ!$T|kKHwlK2PgobjS>+e<(qJ^>BKvAJid}Oi%hR2ffh$ z8+eUm{bjBGSCUtRk4so2P^(l9GgII)rEoa#JVieon5~?872fJk_|SN(k6=;NTfKyI z;>?~w)%%lcL6uaUk?&8?oo4(P^Rpb%$OKLpv}T1$##t1OguxX#W7E3cLv@f^+b-{@M0voykD;T-1}DKQ2zEvbjm(l zUT>Ex2YqAu4j$W*)ziBvLRa72hYo4|ZS^wtRs#L^W4J5!qJfSOs$+p6+#1G@7cAi! zw_HX(blu_m&~-6ZHICD?3x4x~ebgM;6~Gc}PBz?ehoWCUpZq3Ub3VZQCapO?Q7+gE z3JmrL;>b5Mv+u&Y=I7~H6Ksl*mV>75$TvkM=sh#L1AxfMBGYDC=w(6{Hvf)Iku#pV z9>;E24tBLs-wIb%fT{!8c4Yf{?#9tAecr{ME{Mardj$>xMdh@{kmGRk;P+ zlf38~WYFLzmLg1H!@TIfwm=&5Im$BBe;Y9&qYNQK>hKo?N!#wj%!{#Y_qVtgnbC}NGa4k$1`ZS-FSc%+T~uM-k0wG!<@;sjzR z{N`LC@dtpBiF!%i+sGtVn|R-mIDop5@#arTo z$GSe=1&=0n+UBrhu=?icUqC@i(X&~7XurJ$qdeJ%yuf^j*l#~aJ_PgEK_-fpd-n|^ zU{Af(H5vP?|5hf$DDGlmh+r6Kg@1dkx27@0zv zx(5y`HxGloshEes`jk1tV4Ia`fT>Qb5?ze#EwKb*7x=xf>S!3Xz^DaAEih_{xR3KNwW*1_S>^bi*2_zs1ndU@A2%!B4-oI`6R255W3V zyz&1a->U_I-R5?C{02LI>ecTs;DpiZuZcnSz<}#R2-2g*Q!8I6aJgxZc zT89SS5ODe(lEF~zav4^tweuP5Ky*3$j;3i6xA7HMu-^i9xx|m`O2R;7>MIOVBaDj3 z(CU-u$6Q^01AfsJ0H&L4^R?gyn>A9&dbKvc%^=7Pi{?p$b%mj%UDE!@^A%U{A99u& z9qr)Bg&F8apKW&h#Rm3c+dZ3DD%_1Ines;~&LkH)ccP)nxiJ0fKpQ?U1i(-KeL=s_8&}CKq z>|qMyy31HNyNcrsJuh&pbmwPYN#CXZHUK5h#$wBnuHP1mbs%1Wn9iEpx?{0IRP<5A z4KZy7>#M=?Cz|*cT5r2r-uss$V1fM~~Rfq?AalAtO z`(m-p#2@i}RFMeas7U_~oM=GKrd>Erp!mgDtR8Y6#gRw>@K)gH?B|HP5x<6b5OEIZ z>TxKs9B~`&W1b~_aZk7lbZzfI55)QJLq7CdaS-)GT#r|eLx>+mY=E8;UNh$+EN zf`$4CYjQfIYtO5^w5)JCpp5<|#Jga7wa=(Gcx+W}*#WAzTyT%7nWp&yev{UO^4R zb)o1>{B^t>i~R}plosUWhco8o6>Q9`&NFPts?D3(nY|#dBx7mbOaLHkUS9sZyquce z9WU*v&nxN7-jKC1Gn|pJbbK;`JR#%y6i^NHqIxw!&H(JV9%%|>zjnZ$#lg=q0?g0L z-$?b^Kst10eV^A1QWs7D6qQ60XvB;9)rX*)Msx`3xYONHZ(%Js_7S?^Jzs4{3vkFb z1E8ThiL9~cFp&2E)%pI+d3lC#*1Wu#8?&qPN;ZtC&a3DgTboy(@#pb*6}5RK)p;|4 z0ZeX$g%hL}@S^q`M4vqmeJtzMtYQ32sr)wSGN)#MBV?p-G)>6dqZ_frJVB!VGl0Kx z*lrK${s@vV= zZfnsOOOVd02mieK&M_OZH)e%1@2Bob_M`UeKp)&fI+PEu?=btdW&VIFHQa#J6Aa+@ zNAQs#lXzzx#=y>N>m0jb%*O0+*8Q0{g{O9Y2;e7iqcNRu5tieR4w;{Uk2wX!vburr zx7%?+-ZKVE-ZNE}ypAQAt4Fp!by3NHKN&oq`3w9E(pi4esbh&&sd~U&z^Q-_)1eDx zkwZ$6zaY8Y7v~M=G6R_-wr<@1-Qf8UJZb~fRom~-%S6A+7{IWHKZ^G_RyVTO z&%xS*_p+BCWmh0e=m8*AK*EwMbJS zm;%^gJaC)FAiRE8L9ce;eV6nr;`(WQA;wl`_R_qz%$?{wN#1bY1oAs~g6>A>Qi^oe zWYVQgb&pKuP4EwoL*{)#CiVBtpee))>eHx47Cn-IO~M)1oZoj=jUatZkv~%o3VPulKjk!)yp$O8RLOy&K*k*(KLblsDyguK!;0^%z6&|I6a{= zevCYv!FVyQM1v4}}{FSry1E#WlE1{QX0Uli-AwEk;ggT z$}W{=U&LU!59+dyWJ|x(WgpCDINzSd;4`oJo+SOdF8lLwD4Y4gI4PX*ROVmDNso<1 zE#DX$)giraTr%Hjm(@x5@%kTSJ}^#-65b;@w<6g^Xg?g6jU*|N$BD`-Zj_`CblH2x zNgrnfv)&meof?Bl{}Je{>sQ16HEDnBU*GHLVO9E{T=779ec zWRV{$VthM>2Rh6`P~ijOKciFiHVDJ1|3{iCAKyJ?LB%^Ay7nhV532-u9YEB10 zmUJwZ3a5UkL&-+bkqiQXojs3$PZ#}k1F(r5Zb}lnqXG=4dkns!kTan3UkWSn$M6N5 z&R8j|!ym!l5pX)!r|=8>5xjf`%Sv?aPT^Vnjii4+;DqlH_@p<5!}uEspS~+h`1cBY zRnH4xppo#)04MyX1U{W5Q($*hBiN@|;D0FaRkzxehX4Hnzh5A-dqnW>PlLZ-;ODE1 znKqL2u7Fpizy}3f{eLwDxKqw zocRCmslwCg-^B1IOSvu;GZBwue=qPQ>1=7xp1`A27vpoTl0L-pG#)nkC4 zrDinovkaau4bJ31y2qpN65!KO-xb&Jyn4^I3ghG~f>&`k-Rn@e8*suOn9K9jKPWuQ z@OfJMN%thgXLCJ=(tQJk@!&`B?n`-|?(`_oJAi!Ye5s&=W#b>S04KTiLN2?nMd=d8 zCyo|R{3Z?lPKKW^b$o}T(j6X!C73u3kSletZ2aRr;7=lpuHYbKS@?|c5gAQ%R`$sD zzly5alVw(AsCD3EWoD$o-g3)yj~3WNc@*Ec)nD%h%=}ad~QpF|6m&UlY&qE ze9nOG_9*N~ga0wXGbY8?WgHv+GY$T@u_N1mI^ZMLOW>#eKlg0{zFG8JwHr^(!hT}@Qhpl!To0w;--ro&Bt6%sf#WKU*&)@xeUb)$0_-*k^FmbzQ%}m*@pziBr$P9u`#4yt z1DyPlB=7}Tx{<-BO3T-&5FT3@pJW*29!LX!9B{H*YCq`_{8c?wj(gJZImzHyw}=;7 zeS#nl8-bDRIVTMq&pt-PZ%PASp9cO(8u*zs@X5ow6stiA;3Mh3fWcAUMZBVPhfcwk zhEJQomrNXwy(vJlI}QHZfKS8jN&SDW)qTv@f=|a~oDbbnv(RAo`i;dN$!fK&bXxtk zRT6(9YW2ilTNc~f+9d0ZiyQoN`8!B+Q-jUiWHQ+rOiLFv7_G%lpVx-}OCC6L`IE?+ zQd4VdvEAcpBH$tctq|`_*AkgEUMxy0y*76fUT@yvFs)Y0N=xGJG4X%i)6-SxToN-IE=Cj_*zhm*raXo=!Ds#b0BN2pc@9-)pP zCxIapYqdGu0GzrKCeu>D%J{=j4|H|dTFs5t#uBTk zVQ#z$(3(Df0AG2vOHB^1W0li~cb!&$tJUuExE($;bCbuq%H>&Ub6K14Y}jYD1=^+f zOIK>dkua>yPPfx)^LlM-tq!-}yH;xU+FBjfra)`!T99aRcpK~VC!t!c^OsiFnXIP8 zwN@*T7c5?8HOT@f*DjT;wKpuTt~1w2)&-U&^QtY@CG+Pum{7Osc@`5W(DgiJc&uu& zS1quZ=hZABiYKuSx7%U$ui>?KxK}yd4s)xewXD=+X`NeIj?c2n z%jwgIkJ34FFE6hsn`@kdjFBMarIqECb1yHgBnaV^&Mli;Sz0l-yo68)T3%9GdU;vN z<#ULov4VXVOUuj3O6QcBOhzMK3a@gwt<`N(?)<6>sV7Emi2N)$`0& zV{ut>J?y;d+-QsnBMdL}?{IJ_*3#nA;yLiePPgF7RK=rztE1iSz*h-8?quc-{?Zb) zYA$qne+$<=MR5rZ%jfVXNkIqnd93*2K$FWM)m~rhbu<@St*hGGt(XIR9@5Ld*1DQG z_y!L(m(%TYc>U(JwGZ9cyhvg~bZ>sQw=THVU(uO)6xM~u-vr`Z+o zwIp|6Fwpj0jkgvSYW6zPS%=&guw2fSc611v7plRsaU+y5XOigMYU}W>1{`Cvf;!5;VcYt{t3~RT? z+lpxo-{rtJ89ZJ%QKPk?46dn_tx_Prv93m)g~%1mwOYkQs&?pqcG|691OcKVd{!zR z*8IjYsLk4+)iT*A#YW-a{#mDKECJUhr?1W9!+J?;=VIhLD=}D>S2xj#?HHd) zWDKX=?gq_zdOP-o*jyjLv_(yQ1!|Ik|BpUiOCa_Ze zAIzwxVf}&GbwsOrTt1Uii;r2e5nU_3dc>fiwPT}gC1wZfGWQxMDoI~(YJ~-AYOoZe zi9zd{HEUwnWmT)kEu1c`S&S7vXKNDxUaZW;V*GzrNi2K0uIf5BS<@RE$;%~A4tx4lZx>*JV!CSPjQ={78GUlEgv*>qRz^{{nYv7<$-&s&Vz!L9m7w;cC8g(DD;ax<^MrM_lYzPaD!Ws)*NLy=PV93ZXF7k!+{dUp*&q0!AWL`Rcu`8msrTKw%-({BT*-df#gh z3aj!}dbR#P74-8MH+4Ka#J~tM^-KTu6nQ zklKFxQKqFYaByTbrtep3gW7*o*hffc>GNH@M2&y;cjTV{I(2!qn(8|aHBObM=G8b8 z{Ph}o^__&9KMTafOVz(hrBmY+`BlKxh7ut^Wthb8 zA!ccjkyhDhW3}&UQRNPvdkE+46!1+VoD(PPr`8MC4{_a*jE8|A^5i)w15lN#B2Gd= Vt-A`N_&7OM1p3W6ZAi&V{|zGPRSp0E From 22d4e05b6172bd0d17d5270c4e6e5db6c1c0b7d0 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Tue, 28 Oct 2025 20:40:55 +0000 Subject: [PATCH 5/6] Finalize ARM migration with comprehensive SIMD support Co-authored-by: JoeStech <4088382+JoeStech@users.noreply.github.com> --- _codeql_detected_source_root | 1 + 1 file changed, 1 insertion(+) create mode 120000 _codeql_detected_source_root diff --git a/_codeql_detected_source_root b/_codeql_detected_source_root new file mode 120000 index 0000000..945c9b4 --- /dev/null +++ b/_codeql_detected_source_root @@ -0,0 +1 @@ +. \ No newline at end of file From e3d5b9bc494c1469dd84cc9668eb4d3b000a9965 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Tue, 28 Oct 2025 20:41:51 +0000 Subject: [PATCH 6/6] Remove temporary symlink Co-authored-by: JoeStech <4088382+JoeStech@users.noreply.github.com> --- _codeql_detected_source_root | 1 - 1 file changed, 1 deletion(-) delete mode 120000 _codeql_detected_source_root diff --git a/_codeql_detected_source_root b/_codeql_detected_source_root deleted file mode 120000 index 945c9b4..0000000 --- a/_codeql_detected_source_root +++ /dev/null @@ -1 +0,0 @@ -. \ No newline at end of file