Skip to content

perf: make max and min dollar variables thread-local#807

Open
jodavies wants to merge 1 commit intoform-dev:masterfrom
jodavies:dollars-min-max
Open

perf: make max and min dollar variables thread-local#807
jodavies wants to merge 1 commit intoform-dev:masterfrom
jodavies:dollars-min-max

Conversation

@jodavies
Copy link
Collaborator

Use thread-local dollar variables for ModuleOption minimum and maximum dollars. With this change they work exactly like ModuleOption local, except at the end of the module the final value is computed and put into the global dollar value.


For some "real-world" tests which use ModuleOption minimum/maximum there is reasonable speedup. It is more obvious at higher worker counts, but not if there are too many workers for any real scaling. The "mincer-exact" test seems to have a small regression; for now I am not sure why. "mincer" really likes this change...

Benchmark Speedup w.r.t. master, -w24
fmft 1.00 ± 0.01
forcer 1.01 ± 0.00
forcer-exp 1.02 ± 0.01
minceex 0.99 ± 0.00
mincer 1.06 ± 0.00
(Ryzen 7900X)
Benchmark Speedup w.r.t. master, -w32 Speedup w.r.t. master, -w64
fmft 1.04 ± 0.01 1.06 ± 0.01
forcer 1.00 ± 0.01 1.00 ± 0.01
forcer-exp 1.06 ± 0.01 1.02 ± 0.01
minceex 0.98 ± 0.01 0.97 ± 0.02
mincer 1.17 ± 0.01 1.24 ± 0.01
(EPYC 9845)

In the following benchmark which is targeted specifically at the dollar-variable locks, the speedup is much larger:

Workers tform-master (s) tform-test (s) Speedup
-w8 34.473 ± 0.656 16.622 ± 0.168 2.07 ± 0.04
-w16 27.463 ± 0.618 13.096 ± 0.140 2.10 ± 0.05
-w32 23.740 ± 0.206 10.623 ± 0.115 2.23 ± 0.03
-w64 21.563 ± 0.186 9.673 ± 0.150 2.23 ± 0.04
(EPYC 9845)
#-

#: TermsInSmall 2M

Symbol x,i;
CFunction sum;

#define N "{32*8}"
#define BLOCK "200000"

Local test =
	#do i = 0,{`N'-1}
		+ sum(i,`i'*`BLOCK',{`i'+1}*`BLOCK'-1,x^i)
	#enddo
	;
.sort
Identify sum(?a) = sum_(?a);
.sort

#$min = `N'*`BLOCK';
#$max = -1;

If (Count(x,1) > $max) $max = count_(x,1);
If (Count(x,1) < $min) $min = count_(x,1);

ModuleOption maximum $max;
ModuleOption minimum $min;
.sort

#message max: `$max'
#message min: `$min'
.end

Use thread-local dollar variables for ModuleOption minimum and maximum dollars.
With this change they work exactly like ModuleOption local, except at the end
of the module the final value is computed and put into the global dollar value.
@coveralls
Copy link

Coverage Status

coverage: 58.067% (+0.02%) from 58.048%
when pulling 33335ca on jodavies:dollars-min-max
into bf24cec on form-dev:master.

@jodavies
Copy link
Collaborator Author

This also fixes/works around a regression due to #797 , which I found when trying to add HyperFORM tests to the CI.

The problem is, that now we lock the min/max doillars over the larger region, in case they themselves appear on a RHS. This can cause deadlocks, in case of $dol = max_($dol,5); and similar constructions. AssignDollar then calls Generator, then InFunction, which tries to acquire its own lock on the dollar, which is already held.

With this PR each thread has its own dollar, and we don't lock in that region, avoiding the issue. Possible a problem is still present for MODSUM.

The "proper fix" probably would be to use a recursive pthreads mutex, so that a thread can call that lock for a second time.

@jodavies
Copy link
Collaborator Author

Using recursive locks for dollar variables (on the master branch) is about a 1% performance regression.

This PR of course means we don't need them for MODMAX and MODMIN, but I think MODSUM can still deadlock (and always could, but I think people don't really use them...?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants