Skip to content

Open Access Version Guesser#1969

Open
Alex Kiessling (ajkiessl) wants to merge 27 commits into
mainfrom
1950-version-guesser
Open

Open Access Version Guesser#1969
Alex Kiessling (ajkiessl) wants to merge 27 commits into
mainfrom
1950-version-guesser

Conversation

@ajkiessl
Copy link
Copy Markdown
Collaborator

@ajkiessl Alex Kiessling (ajkiessl) commented May 12, 2026

closes #1950

Implements the open access version guessing feature from RMD. The implementation is a little different, but functionality is pretty much the same. The most notable difference, but still minor, is that we don't store journal info in ScholarSphere, and journal matching in document metadata is a published version signal. We do hold publisher info, so I am using that instead.

Also, I added unknown as an open access version value. This is useful to store if the version cannot be determined. It doesn't need to be an option users see, though.

…classes and pulling the score calculator rules from config/open_access_version_guessing_rules.csv. Adds tests for the version guesser class. Still need tests for the score calculator.
…ssVersionScoreCalculator. Adds fixtures for real score calculation and testing as many permutations of the score calculator as possible
… with arxiv watermark.

- Removes docx fixture file.  No need to test real fixture for non-pdf blocking
- Modularized:  OpenAccessVersion module
- Started adding in ExifTool version checking code.  Includes 'exiftool_vendored' gem
… so this module is cleaner. Updates tests. Adds pdf? and docx? methods to FileResource.
@ajkiessl Alex Kiessling (ajkiessl) changed the title WIP: Open Access Version Guesser Open Access Version Guesser May 18, 2026
@ajkiessl Alex Kiessling (ajkiessl) marked this pull request as ready for review May 18, 2026 19:00
…nknown if exif checking finds latex indicators, or if a version score cannot be calculated. Fixed up the version guesser tests, and any tests that now need to test for the unknown value.
… works better with some of the gems being used. Version guessing works end-to-end, but now I need to fix some things up and fix tests.
… version is an open access upload. Fixes tests so they do not use IO file object during version guessing. Now use Tempfile.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement article version guesser

1 participant