Skip to content

use -p threads flag#97

Draft
jurraca wants to merge 4 commits intoasmap:masterfrom
jurraca:threads-flag
Draft

use -p threads flag#97
jurraca wants to merge 4 commits intoasmap:masterfrom
jurraca:threads-flag

Conversation

@jurraca
Copy link
Collaborator

@jurraca jurraca commented Oct 21, 2025

Use the -p threads flag from rpki-client 9.6.

This is a draft of a PR that is a breaking change, since previous versions of rpki-client do not have this flag, and this should not be merged until we have a strategy for that.

The changes simplify the code, but we lose information about valid_until and valid_since which was previously used to select the most likely useful announcement. Thus our "smallest ASN wins" heuristic is doing work than previously.

On average I get a runtime for parsing of about 45 seconds compared to 3-4 minutes previously.

@fjahr

@jurraca jurraca force-pushed the threads-flag branch 2 times, most recently from 2e82c92 to 9d8ca64 Compare October 21, 2025 16:35
"-n",
"-d",
context.data_dir_rpki_cache,
"-p 16",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should derive thread count from OS

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could look at how core selects the number of script check worker threads: https://github.com/bitcoin/bitcoin/blob/master/src/node/chainstatemanager_args.cpp#L55

continue

# We are only interested in ROAs
if roa['type'] != "roa":
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they're all ROAs, iiuc

continue

# We are only interested in valid ROAs
if roa['validation'] != "OK":
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only valid ROAs are output

tal_options = [item for path in data_tals(context) for item in ('-t', path)]

cpu_count = os.cpu_count()
threads = cpu_count if cpu_count else 4
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since #98 is merged now, please put this threads count stuff into some util function that is shared between the two different places.

@jurraca jurraca force-pushed the threads-flag branch 2 times, most recently from 4d6548e to baa764b Compare November 23, 2025 14:39
@jurraca jurraca marked this pull request as ready for review December 2, 2025 13:25
Removes test clauses which are no longer relevant,
and test cases which no longer apply.
@jurraca
Copy link
Collaborator Author

jurraca commented Dec 2, 2025

Looking at the CI reproduction failure, this is what I was somewhat concerned about using expiry instead of valid_since and valid_until checks.
The diff between running the CI input data with this branch and master is 22185, of which 6 are definitely not due to the "lower ASN wins" heuristic. But the rest were originally (master) assigned differently due to the valid_until or valid_since heuristic, or the "lower ASN wins" heuristic.

(the six outliers, here the "older" file is the master branch file)

27.124.87.0/24 AS41239 # was AS9341
27.124.88.0/24 AS41239 # was AS9341
103.162.141.0/24 AS141668 # was AS141083
103.166.234.0/24 AS141963 # was AS140407
119.2.64.0/19 AS131704 # was AS17450
202.47.66.0/24 AS150228 # was AS138106

All the ROAs are valid and so it's not that we're issuing bad ROAs, but this big of a change between two runs is difficult to justify, especially since our current heuristics are more precise than the ones in this branch.

@jurraca jurraca marked this pull request as draft December 2, 2025 16:19
@fjahr
Copy link
Collaborator

fjahr commented Dec 3, 2025

Yeah, good point @jurraca . I've been thinking about this again and when I see the run times on my machine currently, it's already very fast, like just over 10min on my machine usually. Squeezing out another 1-2min would still be great but I agree with you that I am not sure it might not be worth the hassle with the versioning and the (perceived) lower quality of the data due to the missing information. I would suggest we put this on pause for the moment and get back to this when either we have another, even better reason to introduce a breaking change or we find a solution that gives us a more consistent result (however that might happen). I am interested in going deeper into the internals of rpki-client and understand what is happening there but I won't be getting to it before January.

Does that sound good to you? I am also not so keen on going through the breaking change at the same time as the chances of getting the embedding done are increasing. Any additional confusion about the data and our process might be detrimental if we continue to make progress in the same way we did these last few weeks. But if we have this progress we might get the embedding soon and then we can revisit this when the process is more established and more people can weigh in on the trade-offs here...

@jurraca
Copy link
Collaborator Author

jurraca commented Dec 3, 2025

Agreed on all points. Would rather focus on stability now, and keep this branch parked until we're ready to revisit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants