Skip to content

fixed alphanumeric character mismatches in parse_observation_longcode with latest fragalysis uploads#18

Open
BenEmeryCMD wants to merge 1 commit into
xchem:mainfrom
BenEmeryCMD:fix/parse_observation_longcode
Open

fixed alphanumeric character mismatches in parse_observation_longcode with latest fragalysis uploads#18
BenEmeryCMD wants to merge 1 commit into
xchem:mainfrom
BenEmeryCMD:fix/parse_observation_longcode

Conversation

@BenEmeryCMD

Copy link
Copy Markdown

This PR fixes parsing of newer Fragalysis observation longcodes in Fragalysis.py.

Some newer fragalysis uploads can contain alphanumeric characters where the existing parser expects only numeric characters.

Example new upload format in target lb36049-14 (NXT1-NXF1) :

observation = x5064b
longcode = NXT1A-x5064_B_204_B_v1

vs. old format

observation = x5064c
longcode = NXT1A-x5064_B_205_0_v1

@cvallee cvallee self-requested a review June 29, 2026 11:45
@cvallee

cvallee commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator

Hi @BenEmeryCMD, I've tested your changes locally. When running add_hits with an xchem-hippo from pip install I get the following error:

UnsupportedFragalysisLongcodeError: NXT1A-x5155_B_303_0_1_NXT1A-x5194+B+303+C+1__LIG

Then, when running add_hits from your branch, it goes past the error, which means your fix worked. However I get a SanitisationError:

155if mol:                                                                                │
│   156 │   │   smiles = MolToSmiles(mol, True)                                                    │
│   157elif sanitisation_failed == "error":                                                   │
│ ❱ 158 │   │   raise SanitisationError                                                            │
│   159elif sanitisation_failed == "warning":                                                 │
│   160 │   │   mrich.warning(f"sanitisation failed for {smiles=}")                                │
│   161                                                                                            │
│                                                                                                  │
│ ╭──────────────────────────── locals ─────────────────────────────╮                              │
│ │                 key = '[STB]'                                   │                              │
│ │                 mol = None                                      │                              │
│ │         orig_smiles = 'O=C([O-])C1CCN([S@TB2](=O)(=O)C2CC2)CC1' │                              │
│ │             radical = 'error'                                   │                              │
│ │                   s = 'O=C([O-])C1CCN([S@TB2](=O)(=O)C2CC2)CC1' │                              │
│ │ sanitisation_failed = 'error'                                   │                              │
│ │              smiles = 'O=C([O-])C1CCN([STB2](=O)(=O)C2CC2)CC1'  │                              │
│ │       stereo_smiles = 'O=C([O-])C1CCN([S@TB2](=O)(=O)C2CC2)CC1' │                              │
│ │           verbosity = False                                     │                              │
│ ╰─────────────────────────────────────────────────────────────────╯                              │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
SanitisationError

The issue seems to come from TB2 in the SMILES of a fragment with a a SO2 group

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants