perf: copy optimization by jodavies · Pull Request #799 · form-dev/form

jodavies · 2026-02-14T13:20:03Z

Here is an optimisation experiment, replacing all NCOPY/WCOPY macros with memmove (we can't be sure that memory regions never overlap in use of the macro for memcpy). The replacement alone is a negligible performance improvement (tentatively 1%?) but it is hard to detect within the usual run-to-run variation.

The followup commits improve some existing copies within the code by using the macros instead, and moving some conditionals outside of the copies. I identified the expensive copies with a profiler running the Forcer benchmark.

On my system (Ryzen 7900X, Ubuntu 24.04, GCC 13.3.0, tform -w12,), the results for the usual benchmarks are as follows:

Benchmark	Speedup w.r.t. v5.0.0
chromatic	1.05 ± 0.01
color	1.01 ± 0.01
fmft	1.03 ± 0.01
forcer	1.07 ± 0.00
forcer-exp	1.08 ± 0.01
mass-fact	1.00 ± 0.05
mbox1l	1.01 ± 0.02
minceex	1.07 ± 0.02
mincer	1.00 ± 0.05
sort-disk	0.98 ± 0.02
sort-large	0.99 ± 0.01
sort-small	1.01 ± 0.01
trace	1.02 ± 0.01

vermaseren · 2026-02-15T07:27:05Z

I think that most of the gain comes from copies with very many words, while the smaller copies give a loss. memcpy has to perform some tests, and hence has an overhead. I looked into this with Jan Kuipers because he liked memcpy but we had to find a number of bad bugs because of overwriting, and because of that we had to study those routines.

…

On 14 Feb 2026, at 14:20, jodavies ***@***.***> wrote: Here is an optimisation experiment, replacing all NCOPY/WCOPY macros with memmove (we can't be sure that memory regions never overlap in use of the macro for memcpy). The replacement alone is a negligible performance improvement (tentatively 1%?) but it is hard to detect within the usual run-to-run variation. The followup commits improve some existing copies within the code by using the macros instead, and moving some conditionals outside of the copies. I identified the expensive copies with a profiler running the Forcer benchmark. On my system (Ryzen 7900X, Ubuntu 24.04, GCC 13.3.0, tform -w12,), the results for the usual benchmarks are as follows: Benchmark Speedup w.r.t. v5.0.0 chromatic 1.05 ± 0.01 color 1.01 ± 0.01 fmft 1.03 ± 0.01 forcer 1.07 ± 0.00 forcer-exp 1.08 ± 0.01 mass-fact 1.00 ± 0.05 mbox1l 1.01 ± 0.02 minceex 1.07 ± 0.02 mincer 1.00 ± 0.05 sort-disk 0.98 ± 0.02 sort-large 0.99 ± 0.01 sort-small 1.01 ± 0.01 trace 1.02 ± 0.01 You can view, comment on, or merge this pull request online at: #799 Commit Summary d2e8ae4 <d2e8ae4> perf: use memmove for all NCOPY/WCOPY macros 2442e58 <2442e58> perf: tform: use NCOPY for an expensive copy in PutToMaster d93c202 <d93c202> perf: improve copies in InFunction 25d1630 <25d1630> perf: improve copies in PrepPoly 8941fac <8941fac> perf: improve copies in PutBracket, DoIfStatement, Generator, PrepPoly File Changes (6 files <https://github.com/form-dev/form/pull/799/files>) M sources/declare.h <https://github.com/form-dev/form/pull/799/files#diff-6f8d49bdc8f58224062fa24277a820c1bea7ab29bb935d4f755d0697912f50b6> (12) M sources/execute.c <https://github.com/form-dev/form/pull/799/files#diff-f625dc8f12d0c293df79fc18df8fb1340aeaabdbc09efe3e9a06d7c88773bf26> (4) M sources/if.c <https://github.com/form-dev/form/pull/799/files#diff-5f3f8762053cab97ba35c5105c4cde000ac3a0f93f3d8028da1649f55cf550f1> (2) M sources/proces.c <https://github.com/form-dev/form/pull/799/files#diff-7d9b915201d0ad01f5242de4aa4a2660d84d7c66ffe4b2a3b44e68628709f931> (62) M sources/sort.c <https://github.com/form-dev/form/pull/799/files#diff-fe8d53f77e0481de37e05715c6377529b32237680a601ddf3ea1b4530b15d604> (6) M sources/threads.c <https://github.com/form-dev/form/pull/799/files#diff-bc875f2135c9865c088b27fae715e0b913cef31c269e97c0b79fcf84b4c77f8c> (6) Patch Links: https://github.com/form-dev/form/pull/799.patch https://github.com/form-dev/form/pull/799.diff — Reply to this email directly, view it on GitHub <#799>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABJPCETWJLAARBOMXN2SNIT4L4OJXAVCNFSM6AAAAACVEDYHOKVHI2DSMVQWIX3LMV43ASLTON2WKOZTHE2DCNBUGMZTCNA>. You are receiving this because you are subscribed to this thread.

coveralls · 2026-02-16T06:17:49Z

coverage: 58.029% (-0.01%) from 58.043%
when pulling c5931f9 on jodavies:memmove
into c134010 on form-dev:master.

Hoist conditionals out of some data copying loops or simplify while loop termination conditions. Use of memmove does not measurably affect performance, leave a comment about this.

jodavies · 2026-02-26T09:11:18Z

I ran benchmarks with more samples, I think indeed the use of memmove doesn't lead to a measurable performance difference. I cleaned up the commits to include only the obvious wins and use the original macros.

This results in 6-7% improvement for Forcer, 3% for mincer-exact, 0-1% for everything else.

tueda · 2026-02-26T12:11:30Z

Coveralls is completely down... See: https://status.coveralls.io/

jodavies · 2026-03-05T12:59:24Z

Here are benchmark numbers for a much older Intel system with 2x Xeon E5-2667 v4, running tform -w16, Ubuntu 24.04, GCC 13.3.

Benchmark	Speedup w.r.t. v5.0.0
chromatic	1.03 ± 0.01
color	1.08 ± 0.01
fmft	1.04 ± 0.02
forcer-exp	1.13 ± 0.00
forcer	1.11 ± 0.01
mass-fact	1.03 ± 0.01
mbox1l	1.01 ± 0.02
minceex	1.08 ± 0.02
mincer	1.02 ± 0.01
sort-disk	1.04 ± 0.02
sort-large	0.98 ± 0.03
sort-small	1.05 ± 0.03
trace	1.00 ± 0.05

tueda · 2026-03-11T13:14:49Z

Benchmark results for my system (Intel Core i9-12900, Ubuntu 20.04, x86_64) with tform -w8. Used /tmp instead of /dev/shm for technical reasons. Clear improvements, especially for forcer, forcer-exp and minceex.

Benchmark	Speedup	95% bootstrap CI
chromatic	1.05	[1.05, 1.05]
color	1.08	[1.07, 1.08]
fmft	1.05	[1.04, 1.05]
forcer	1.20	[1.20, 1.21]
forcer-exp	1.26	[1.26, 1.27]
mbox1l	1.03	[1.03, 1.04]
minceex	1.12	[1.11, 1.12]
mincer	1.03	[1.03, 1.03]
sort-disk	1.00	[1.00, 1.01]
sort-large	1.00	[0.99, 1.01]
sort-small	1.01	[1.01, 1.02]
trace	0.99	[0.98, 0.99]

Details

Speedup of B over A (mean) = (mean time of A) / (mean time of B)

A:

TFORM 5.0.0 (Jan 27 2026, v5.0.0)
-backtrace  +flint=3.4.0  +gmp=6.3.0   -mpi    +pthreads  +zlib=1.2.11
-debugging  +float        +mpfr=4.2.2  +posix  -windows   +zstd=1.4.4
Compiler: GCC 10.5.0
Architecture: x86_64

B:

TFORM 5.0.0 (Feb 24 2026, v5.0.0-1-gc5931f9)
-backtrace  +flint=3.4.0  +gmp=6.3.0   -mpi    +pthreads  +zlib=1.2.11
-debugging  +float        +mpfr=4.2.2  +posix  -windows   +zstd=1.4.4
Compiler: GCC 10.5.0
Architecture: x86_64

Paired runs with n = 30 per benchmark with /tmp instead of /dev/shm. Used the scripts from this snapshot. The binaries were built for the x86-64-v1 baseline.

Environment:


OS	Ubuntu 20.04.6 LTS
Kernel	Linux 5.15.0-84-generic
Architecture	x86_64
CPU	Intel Core i9-12900
CPU configuration	16 cores / 24 threads (8 P-cores + 8 E-cores)
Memory	62.6 GiB
Storage	WD_BLACK SN770 1TB NVMe SSD

perf: improve some expensive copies

c5931f9

Hoist conditionals out of some data copying loops or simplify while loop termination conditions. Use of memmove does not measurably affect performance, leave a comment about this.

jodavies force-pushed the memmove branch from 8941fac to c5931f9 Compare February 26, 2026 09:07

jodavies changed the title ~~WIP copy optimisation~~ perf: copy optimization Feb 26, 2026

jodavies merged commit 2f63692 into form-dev:master Mar 10, 2026
230 of 255 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: copy optimization#799

perf: copy optimization#799
jodavies merged 1 commit intoform-dev:masterfrom
jodavies:memmove

jodavies commented Feb 14, 2026

Uh oh!

vermaseren commented Feb 15, 2026 via email

Uh oh!

coveralls commented Feb 16, 2026 •

edited

Loading

Uh oh!

jodavies commented Feb 26, 2026

Uh oh!

tueda commented Feb 26, 2026

Uh oh!

jodavies commented Mar 5, 2026

Uh oh!

Uh oh!

tueda commented Mar 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jodavies commented Feb 14, 2026

Uh oh!

vermaseren commented Feb 15, 2026 via email

Uh oh!

coveralls commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jodavies commented Feb 26, 2026

Uh oh!

tueda commented Feb 26, 2026

Uh oh!

jodavies commented Mar 5, 2026

Uh oh!

Uh oh!

tueda commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

coveralls commented Feb 16, 2026 •

edited

Loading

tueda commented Mar 11, 2026 •

edited

Loading