Conversation
2e82c92 to
9d8ca64
Compare
kartograf/rpki/fetch.py
Outdated
| "-n", | ||
| "-d", | ||
| context.data_dir_rpki_cache, | ||
| "-p 16", |
There was a problem hiding this comment.
should derive thread count from OS
There was a problem hiding this comment.
You could look at how core selects the number of script check worker threads: https://github.com/bitcoin/bitcoin/blob/master/src/node/chainstatemanager_args.cpp#L55
| continue | ||
|
|
||
| # We are only interested in ROAs | ||
| if roa['type'] != "roa": |
There was a problem hiding this comment.
they're all ROAs, iiuc
| continue | ||
|
|
||
| # We are only interested in valid ROAs | ||
| if roa['validation'] != "OK": |
There was a problem hiding this comment.
only valid ROAs are output
kartograf/rpki/fetch.py
Outdated
| tal_options = [item for path in data_tals(context) for item in ('-t', path)] | ||
|
|
||
| cpu_count = os.cpu_count() | ||
| threads = cpu_count if cpu_count else 4 |
There was a problem hiding this comment.
Since #98 is merged now, please put this threads count stuff into some util function that is shared between the two different places.
4d6548e to
baa764b
Compare
|
Looking at the CI reproduction failure, this is what I was somewhat concerned about using (the six outliers, here the "older" file is the master branch file) All the ROAs are valid and so it's not that we're issuing bad ROAs, but this big of a change between two runs is difficult to justify, especially since our current heuristics are more precise than the ones in this branch. |
|
Yeah, good point @jurraca . I've been thinking about this again and when I see the run times on my machine currently, it's already very fast, like just over 10min on my machine usually. Squeezing out another 1-2min would still be great but I agree with you that I am not sure it might not be worth the hassle with the versioning and the (perceived) lower quality of the data due to the missing information. I would suggest we put this on pause for the moment and get back to this when either we have another, even better reason to introduce a breaking change or we find a solution that gives us a more consistent result (however that might happen). I am interested in going deeper into the internals of rpki-client and understand what is happening there but I won't be getting to it before January. Does that sound good to you? I am also not so keen on going through the breaking change at the same time as the chances of getting the embedding done are increasing. Any additional confusion about the data and our process might be detrimental if we continue to make progress in the same way we did these last few weeks. But if we have this progress we might get the embedding soon and then we can revisit this when the process is more established and more people can weigh in on the trade-offs here... |
|
Agreed on all points. Would rather focus on stability now, and keep this branch parked until we're ready to revisit. |
Use the
-pthreads flag fromrpki-client9.6.This is a draft of a PR that is a breaking change, since previous versions of
rpki-clientdo not have this flag, and this should not be merged until we have a strategy for that.The changes simplify the code, but we lose information about
valid_untilandvalid_sincewhich was previously used to select the most likely useful announcement. Thus our "smallest ASN wins" heuristic is doing work than previously.On average I get a runtime for parsing of about 45 seconds compared to 3-4 minutes previously.
@fjahr