make ANTLR3 produce Reproducible output#209
Conversation
Signed-off-by: Hervé Boutemy <hboutemy@apache.org>
Signed-off-by: Hervé Boutemy <hboutemy@apache.org>
e8866a1 to
75e5e87
Compare
|
Is it removing the timestamp going to break any codes on it? I agree it was a dumb idea but I'm afraid to change it now. haha |
|
perhaps adding an option for users to choose if they want this timestamp or not is a better choice (like JAXB that provides |
|
I'm generally opposed to options, I'm afraid. In this case it's a fairly heavy change just to get rid of this date output, which in retrospect was definitely a mistake on my part. I agree that the should be reproducible but I'm not sure risking backward compatibility is worth it. I do know that some companies simply remove that line using their build tools. Is this possible with maven? |
|
yes, we do it with maven-replacer-plugin |
|
one question: there are 2 sources of non-reproducible bit
fixed by the sorting in the commit on file tool/src/main/antlr3/org/antlr/grammar/v3/CodeGenTreeWalker.g is it possible to fix the reproducibility issue for the elements, please? This would reduce the places where we need to postprocess |
|
Instead of totally dropping the timestamp, antlr could be updated to support SOURCE_DATE_EPOCH for the build-time-stamp. This should leave any existing use-cases for the build-stamp undisturbed but make it easy for users to opt-in to a deterministic mode using this standard. I have also observed non-determinism in the order of methods in the "Delegated rules" section of the generated parser. Other hazards from looking through the templates and the code:
My audit is far from complete though. Things to ponder: One could replace all HashSets and HashMaps by their Linked versions but that seems like a rather heavy hammer. It would also be nice if the collections that needed to have deterministic iteration had a distinguishing static type to avoid re-introducing problems. |
|
It seems I just duplicated your work in #228 |
as found while rebuilding projects using ANTLR3 (including ANTLR3 itself), there are non-reproducible outputs at 2 levels:
this PR fixes the 2 issues in 2 separate commits: