[herd] Global count of useful hardware update#1733
Open
Conversation
c9f7855 to
bbcb779
Compare
bbcb779 to
f760a93
Compare
f760a93 to
172203a
Compare
relokin
reviewed
Mar 31, 2026
relokin
reviewed
Mar 31, 2026
The function [delay_kont] can be used to extract the value returned
by a monad (second argument below, type `'a`).
```
val delay_kont : string -> 'a t -> ('a -> 'a t -> 'b t) -> 'b t
```
The continuation function `(fun v mv -> ... )` can then examine
the returned value `v` and combine the monad `mv` independently,
which proves very convenient in many occasion.
The change performed by this commit permits affine (_i.e._ one or zero
effective occurrence), while linear usage (exactly one occurrence) was
mandatory before.
+ Efficient group function: sort, then group.
+ Suffix based generators:
- generate all suffixes,
- cross product of suffixes.
When TTHM=HA or TTHM=HD are active, HW update of the AF flag is performed. This include the so-called "spurious" updates that are performed independently of test code. For efficiency reason we limit the number of such spurious updates to what is necessary. We do so by a global scan of the execution candidates counting the writes that may unset the AF flag in the final set of effects. Notice that we also consider the initial writes in this scan. We perform one optimisation: by exception, when a write effect value is the same as the value read by the same instruction from the same location, there is no need to add a supplementary spurious update as the (potential) update associated to the write that stored the value has already been counted and is sufficient.
172203a to
5247305
Compare
maranget
added a commit
that referenced
this pull request
Apr 1, 2026
As introduced by PR #1733 each (potential) spurious update of the AF flag is "motivated" by a write that can unset this AF flag. This PR reinforces the link by adding an equation that equates the value written and the value read by the update, resulting in better performance in some cases.
maranget
added a commit
that referenced
this pull request
Apr 1, 2026
As introduced by PR #1733 each (potential) spurious update of the AF flag is "motivated" by a write that can unset this AF flag. This PR reinforces the link by adding an equation that equates the value written and the value read by the update, resulting in better performance in some cases.
Member
Author
|
Hi @relokin and @HadrienRenaud, do I have your approval for merging? |
relokin
approved these changes
Apr 2, 2026
Member
relokin
left a comment
There was a problem hiding this comment.
This PR looks good to me, thanks Luc!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR is an evolution of PR #1730. The number of useful hardware updates is evaluated by a global scan of events at the very end of execution candidate construction. The scan counts the number of explicit write effects to locations that can be page table entries of values whose AF flag can be zero.
The technique sounds more general and should yield a better upper bound than the local technique of PR #1730.
Additionally , we perform the simple identification of the case of a failing CAS that nevertheless writes the value read from memory into memory again. In such a case, there is no need to count another hardware update.