Hi, I read your paper, and I’m interested in your dataset.
But I'm having trouble with two issues.
First, I cannot find some commit hashes in the repositories.
I cannot find #009022987(line 23 in ./ACCUMULO.csv) in accumulo repository(https://github.com/apache/accumulo) and #187838022(line 24 in ./LUCENE.csv) in lucene-solr repository(https://github.com/apache/lucene-solr).
I think #9022987e0 and #0187838022 are the correct commit hashes for accumulo and lucene-solr respectively.
And I cannot find all commit hashes in ./JCR.csv in jackrabbit repository(https://github.com/apache/jackrabbit).
Second, I found #70a5ffe4 is duplicated in OOZIE.csv(line 17).
Could you check these issues?