Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
75 commits
Select commit Hold shift + click to select a range
cf26f9f
update readme 📝
David-Araripe Oct 2, 2024
26a13ad
First commit to uniprotKB api implementation
David-Araripe Nov 7, 2024
aad954b
start implementing base classes for the general queries
David-Araripe Nov 7, 2024
c79c359
remove unused import
David-Araripe Nov 8, 2024
e1a3fb0
add docstrings to python-wrapped UniProtKB fields
David-Araripe Nov 8, 2024
9f835ac
allow for int inputs for some of the fileds & add more info to module…
David-Araripe Nov 8, 2024
7eb46c8
add spacing to fix docstring display formatting
David-Araripe Nov 8, 2024
15b6223
move methods from interface to idmapping api - new api requires a dif…
David-Araripe Nov 8, 2024
4c9165f
refactor uniprot fields to lowercase to avoid confusion
David-Araripe Nov 8, 2024
996fb64
set docstrings examples to lowercase
David-Araripe Nov 8, 2024
6f97660
bugfix - xrefcount doesn't work; change to xref_count instead
David-Araripe Nov 8, 2024
dc9e33d
Add working version of the new uniprot kb api wrapper
David-Araripe Nov 8, 2024
25e55d6
add docstrings to uniprot_kb_fields __init__ methods
David-Araripe Nov 12, 2024
546ffa0
get method with progress bar & add tqdm as dependency
David-Araripe Nov 12, 2024
e331258
Merge pull request #3 from David-Araripe/feature/general_queries
David-Araripe Nov 12, 2024
a1bd5c2
- rm utf8 comment from modules
David-Araripe Nov 12, 2024
fbfc556
rename testing module for the id mapping api
David-Araripe Nov 12, 2024
9af303d
add tests for the uniprotKB fields & and the QueryBuilder class
David-Araripe Nov 12, 2024
134d007
fix QueryBuilder class to pass unit tests
David-Araripe Nov 12, 2024
aed5261
rename test class for consistency
David-Araripe Nov 12, 2024
909b062
add tests to the ProtKB class
David-Araripe Nov 12, 2024
d6e78ee
Change field "Organism ID" to (ID) for consistency with Uniprot retri…
David-Araripe Nov 12, 2024
bc7c7b8
add docstrings to the utils.read_fields_table method
David-Araripe Nov 12, 2024
61b43ef
deprecate file formats in favor of only using tsv
David-Araripe Nov 15, 2024
32e6406
remove test on parsing results from different formats
David-Araripe Nov 15, 2024
701d882
Merge pull request #4 from David-Araripe/refactor/tests
David-Araripe Jan 8, 2025
1ff0c74
remove __call__ method from idmapping_api for consistency with the ot…
David-Araripe Jan 8, 2025
092ad50
update readme
David-Araripe Jan 8, 2025
d940bc4
docs: initialize sphinx documentation structure
David-Araripe Jan 8, 2025
001891f
docs: add make files and requirements for documentation
David-Araripe Jan 8, 2025
41a6a9e
ci: add documentation build and deploy workflow
David-Araripe Jan 8, 2025
9026334
fix: correct Makefile indentation and reorganize docs files
David-Araripe Jan 8, 2025
d862602
refactor: move GitHub workflow to docs directory
David-Araripe Jan 8, 2025
acfdc6b
docs: remove base classes from API documentation
David-Araripe Jan 8, 2025
da70f1d
docs: add return fields reference documentation
David-Araripe Jan 8, 2025
678af92
add backticks on "|" to correct docs formatting
David-Araripe Jan 8, 2025
eeed00b
docs: add fields reference documentation
David-Araripe Jan 8, 2025
4a8d35d
docs: update index to include field reference
David-Araripe Jan 8, 2025
3d9ee97
docs: update conf.py to handle CSV file
David-Araripe Jan 8, 2025
e7c9ff5
docs: fix warnings coming from Sphinx
David-Araripe Jan 8, 2025
321a7f8
fix sphinx warnings and add table with the supported return fields
David-Araripe Jan 8, 2025
e4304b7
change "Handling Failed Mappings"
David-Araripe Jan 8, 2025
5263d9e
docs: update field querying example
David-Araripe Jan 8, 2025
b75accb
Merge pull request #6 from David-Araripe/create/docs
David-Araripe Jan 8, 2025
5dd3d39
fix: correct document deployment to avoid recursive copying
David-Araripe Jan 8, 2025
61191cc
fix: update documentation workflow to prevent recursive copying
David-Araripe Jan 8, 2025
78b8e83
feat: add documentation build script
David-Araripe Jan 8, 2025
bcb06c3
refactor: simplify documentation deployment workflow
David-Araripe Jan 8, 2025
d7d03c5
chore: standardize documentation workflow location
David-Araripe Jan 8, 2025
f5ccb7b
remove duplicated workflow
David-Araripe Jan 8, 2025
4f4a870
remove build docs.sh
David-Araripe Jan 8, 2025
c92cf60
update readme title
David-Araripe Jan 8, 2025
0bc8a87
docs: enhance sphinx theme configuration
David-Araripe Jan 8, 2025
11d7da0
ci: ensure sphinx theme is installed during build
David-Araripe Jan 8, 2025
7239648
docs: simplify and improve sphinx configuration
David-Araripe Jan 8, 2025
21d0107
ci: update documentation workflow with correct base URLs
David-Araripe Jan 8, 2025
dcefd65
test new gh action for docs build
David-Araripe Jan 8, 2025
4455639
docs: update make file & add debugging option to build action
David-Araripe Jan 8, 2025
30912fa
fix .nojekyll placement on the development branch
David-Araripe Jan 9, 2025
94e069c
Update CI - test python versions 3.7 - 3.13 on Ubuntu and 3.11 on Win…
David-Araripe Jan 9, 2025
9332662
remove statements using ProtMapper.__call__ on the tests (deprecated)
David-Araripe Jan 9, 2025
c822dd5
fix test running & change python version 3.7 to 3.7.4
David-Araripe Jan 9, 2025
fb3eb5e
revert back to testing python 3.7 but using Ubuntu22.04
David-Araripe Jan 9, 2025
79942a3
rever to typing.List[type] for python < 3.9 compatibility
David-Araripe Jan 9, 2025
08de293
revert to typing.List[type] for python < 3.9 compatibility
David-Araripe Jan 9, 2025
85cb330
flexible use of pkg_resources or importlib depending regardless of py…
David-Araripe Jan 9, 2025
79de340
revert to typing.Tuple for python < 3.9 compatibility
David-Araripe Jan 9, 2025
76781c5
type hint some methods & return values
David-Araripe Jan 10, 2025
762021a
add python 3.13 to supported list
David-Araripe Jan 10, 2025
b6c363e
Merge pull request #7 from David-Araripe/update/ci
David-Araripe Jan 10, 2025
adbe177
update field_reference page on the docs
David-Araripe Feb 10, 2025
297c5a9
Update the docs 📝
David-Araripe Feb 10, 2025
3211532
Update README 📝 to reflect latest changes
David-Araripe Feb 10, 2025
c47cf86
Merge pull request #9 from David-Araripe/update/readme
David-Araripe Feb 10, 2025
cbedbbf
update index.rst with better Quick Start examples
David-Araripe Feb 10, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 19 additions & 32 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,66 +4,53 @@ on:
push:
branches: [ master ]
pull_request:
branches: [ master ]
branches: [ master, dev ]

jobs:
ubuntu_3_11:
runs-on: ubuntu-latest
ubuntu_matrix:
runs-on: ubuntu-22.04 # latest fails testing python 3.7 - https://github.com/actions/setup-python/issues/962
strategy:
matrix:
python-version: ['3.7', '3.8', '3.9', '3.10', '3.11', '3.12', '3.13']
steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: '3.11'
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
pip install -e .
pip install -e ".[dev]"
- name: Run tests
run: python -m unittest discover

ubuntu_3_7:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.7'
- name: Install dependencies
run: |
pip install -e .
pip install -e ".[dev]"
- name: Run tests
run: python -m unittest discover
run: python -m unittest discover tests/

macos:
runs-on: macos-latest
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v2
uses: actions/setup-python@v5
with:
python-version: '3.9'
python-version: '3.11'
- name: Install dependencies
run: |
pip install -e .
pip install -e ".[dev]"
- name: Run tests
run: python -m unittest discover
run: python -m unittest discover tests/

windows:
runs-on: windows-latest
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v2
uses: actions/setup-python@v5
with:
python-version: '3.9'
python-version: '3.11'
- name: Install dependencies
run: |
pip install -e .
pip install -e ".[dev]"
- name: Run tests
run: python -m unittest discover

run: python -m unittest discover tests/
55 changes: 55 additions & 0 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
name: Documentation

on:
push:
branches: [ master, dev ]
pull_request:
branches: [ master, dev ]

jobs:
docs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'

- name: Install dependencies
run: |
python -m pip install --upgrade pip
cd docs
pip install -r requirements.txt
pip install sphinx-rtd-theme --upgrade
cd ..
pip install .

- name: Build Documentation
run: |
cd docs
make html

- name: Create root .nojekyll # Should be at the root, so we create sub-dir & push it
if: github.event_name == 'push' && (github.ref == 'refs/heads/master' || github.ref == 'refs/heads/dev')
run: |
mkdir -p gh-pages-root
touch gh-pages-root/.nojekyll

- name: Deploy .nojekyll
if: github.event_name == 'push' && (github.ref == 'refs/heads/master' || github.ref == 'refs/heads/dev')
uses: JamesIves/github-pages-deploy-action@v4
with:
branch: gh-pages
folder: gh-pages-root
clean: false

- name: Deploy Documentation
if: github.event_name == 'push' && (github.ref == 'refs/heads/master' || github.ref == 'refs/heads/dev')
uses: JamesIves/github-pages-deploy-action@v4
with:
branch: gh-pages
folder: docs/build/html
target-folder: ${{ github.ref == 'refs/heads/dev' && 'dev' || 'stable' }}
clean: false
16 changes: 16 additions & 0 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# .readthedocs.yaml
version: 2

build:
os: ubuntu-22.04
tools:
python: "3.10"

python:
install:
- requirements: requirements-docs.txt
- method: pip
path: .

sphinx:
configuration: docs/source/conf.py
108 changes: 77 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,34 +7,54 @@

# UniProtMapper <img align="left" width="40" height="40" src="https://raw.githubusercontent.com/whitead/protein-emoji/main/src/protein-72-color.svg">

A Python wrapper for UniProt's [Retrieve/ID Mapping](https://www.uniprot.org/id-mapping) RESTful API. This package supports the following functionalities:
Easily retrieve UniProt data and map protein identifiers using this Python package for UniProt's Retrieve & ID Mapping RESTful APIs. [Read the full documentation](https://david-araripe.github.io/UniProtMapper/stable/index.html).

1. Map (almost) any UniProt [cross-referenced IDs](https://github.com/David-Araripe/UniProtMapper/blob/master/src/UniProtMapper/resources/uniprot_mapping_dbs.json) to other identifiers & vice-versa;
2. Programmatically retrieve any of the supported [return](https://www.uniprot.org/help/return_fields) and [cross-reference fields](https://www.uniprot.org/help/return_fields_databases) from both UniProt-SwissProt and UniProt-TrEMBL (unreviewed) databases;
## 📚 Table of Contents

For these, check [Example 1](#example-1-mapping-ids) and [Example 2](#example-2-retrieving-information) below. Both functionalities can also be accessed through the CLI. For more information, check [CLI](#-cli).
- [⛏️ Features](#️-features)
- [📦 Installation](#-installation)
- [🛠️ Usage](#️-usage)
- [Mapping IDs](#mapping-ids)
- [Retrieving Information](#retrieving-information)
- [Field-based Querying](#field-based-querying)
- [📖 Documentation](#-documentation)
- [💻 Command Line Interface (CLI)](#-command-line-interface-cli)
- [👏🏼 Credits](#-credits)

## ⛏️ Features
UniProtMapper is a tool for bioinformatics and proteomics research that supports:

1. Mapping any UniProt [cross-referenced IDs](https://github.com/David-Araripe/UniProtMapper/blob/master/src/UniProtMapper/resources/uniprot_mapping_dbs.json) to other identifiers & vice-versa;
2. Programmatically retrieving any of the supported [return](https://www.uniprot.org/help/return_fields) and [cross-reference fields](https://www.uniprot.org/help/return_fields_databases) from both UniProt-SwissProt and UniProt-TrEMBL (unreviewed) databases. For a full table containing all the supported resources, refer to the [supported fields](https://david-araripe.github.io/UniProtMapper/stable/field_reference.html#supported-fields) in the docs;
3. Querying UniProtKB entries using complex field-based queries with boolean operators `~` (NOT), `|` (OR), `&` (AND).

For the first two functionalities, check the examples [Mapping IDs](#mapping-ids) and [Retrieving Information](#retrieving-information) below. The third, see [Field-based Querying](#field-based-querying).

The ID mapping API can also be accessed through the CLI. For more information, check [CLI](#-command-line-interface-cli).

## 📦 Installation

From PyPI:
``` Shell
### From PyPI (recommended):
```shell
python -m pip install uniprot-id-mapper
```

Directly from GitHub:
``` Shell
### Directly from GitHub:
```shell
python -m pip install git+https://github.com/David-Araripe/UniProtMapper.git
```

From source:
``` Shell
### From source:
```shell
git clone https://github.com/David-Araripe/UniProtMapper
cd UniProtMapper
python -m pip install .
```

# 🛠️ Usage
## Example 1: Mapping IDs
To map IDs, the user can either call the object directly or use the `get` method to obtain the response. The different identifiers that are used by the API are designated by the `from_db` and `to_db` parameters. For example:

## Mapping IDs
Use UniProtMapper to easily map between different protein identifiers:

``` python
from UniProtMapper import ProtMapper
Expand All @@ -44,22 +64,18 @@ mapper = ProtMapper()
result, failed = mapper.get(
ids=["P30542", "Q16678", "Q02880"], from_db="UniProtKB_AC-ID", to_db="Ensembl"
)

result, failed = mapper(
ids=["P30542", "Q16678", "Q02880"], from_db="UniProtKB_AC-ID", to_db="Ensembl"
)
```
Where failed corresponds to a list of the identifiers that failed to be mapped and result is the following pandas DataFrame:
The `result` is a pandas DataFrame containing the mapped IDs (see below), while `failed` is a list of identifiers that couldn't be mapped.

| | UniProtKB_AC-ID | Ensembl |
|---:|:------------------|:-------------------|
| 0 | P30542 | ENSG00000163485.17 |
| 1 | Q16678 | ENSG00000138061.12 |
| 2 | Q02880 | ENSG00000077097.17 |

## Example 2: Retrieving information
## Retrieving Information

The supported [return](https://www.uniprot.org/help/return_fields) and [cross-reference fields](https://www.uniprot.org/help/return_fields_databases) are both accessible through UniProt's website or by the attribute `ProtMapper.fields_table`. For example:
All [supported return fields](https://david-araripe.github.io/UniProtMapper/stable/field_reference.html#supported-fields) are both accessible through the attribute `ProtMapper.fields_table`:

```Python
from UniProtMapper import ProtMapper
Expand All @@ -76,28 +92,54 @@ df.head()
| 3 | Gene Names (primary) | gene_primary | Names & Taxonomy | yes | uniprot_field |
| 4 | Gene Names (synonym) | gene_synonym | Names & Taxonomy | yes | uniprot_field |

To retrieve information, the user can either call the object directly or use the `get` method to obtain the response. For example:
From the DataFrame, all `return_field` entries can be used to access UniProt data programmatically:

```Python
# To retrieve the default fields:
result, failed = mapper.get(["Q02880"])
>>> Fetched: 1 / 1

result, failed = mapper(["Q02880"])
# Retrieve custom fields:
fields = ["accession", "organism_name", "structure_3d"]
result, failed = mapper.get(["Q02880"], fields=fields)
>>> Fetched: 1 / 1
```

Custom returned fields can be retrieved by passing a list of fields to the `fields` parameter. These fields need to be within `UniProtRetriever.fields_table["returned_field"]` and will be returned with columns named as their respective `Label`.
## Field-based Querying

The object already has a list of default fields under `self.default_fields`, but these are ignored if the parameter `fields` is passed.
UniProtMapper supports complex field-based protein queries using boolean operators (AND, OR, NOT) through the `uniprotkb_fields` module. This allows you to create sophisticated searches combining multiple criteria. For example:

```Python
fields = ["accession", "organism_name", "structure_3d"]
result, failed = mapper.get(["Q02880"], fields=fields)
```python
from UniProtMapper import ProtKB
from UniProtMapper.uniprotkb_fields import (
organism_name,
length,
reviewed,
date_modified
)

# Find reviewed human proteins with length between 100-200 amino acids
# that were modified after January 1st, 2024
query = (
organism_name("human") &
reviewed(True) &
length(100, 200) &
date_modified("2024-01-01", "*")
)

protkb = ProtKB()
result = protkb.get(query)
```
For a list of all fields and their descriptions, check the API reference for the [uniprotkb_fields](https://david-araripe.github.io/UniProtMapper/stable/api/UniProtMapper.html#module-UniProtMapper.uniprotkb_fields) module reference.

# 💻 CLI
## 📖 Documentation

The package also comes with a CLI that can be used to map IDs and retrieve information. To map IDs, the user can use the `protmap` command, accessible after installation. Here is a list of the available arguments, shown by `protmap -h`:
- [Stable Branch Documentation](https://david-araripe.github.io/UniProtMapper/stable/index.html) (master branch)
- [Development Documentation](https://david-araripe.github.io/UniProtMapper/dev/index.html) (dev branch)

# 💻 Command Line Interface (CLI)

UniProtMapper provides a CLI for the ID Mapping class, `ProtMapper`, for easy access to lookups and data retrieval. Here is a list of the available arguments, shown by `protmap -h`:

```text
usage: UniProtMapper [-h] -i [IDS ...] [-r [RETURN_FIELDS ...]] [--default-fields] [-o OUTPUT]
Expand Down Expand Up @@ -130,14 +172,18 @@ optional arguments:
references, see: <pkg_path>/resources/uniprot_mapping_dbs.json
-over, --overwrite If desired to overwrite an existing file when using -o/--output
-pf, --print-fields Prints the available return fields and exits the program.
```
```

Usage example, retrieving default fields from `<pkg_path>/resources/cli_return_fields.txt`:
<p align="center">
<img src="https://github.com/David-Araripe/UniProtMapper/blob/master/figures/cli_example_fig.png?raw=true" alt="Image displaying the output of UniProtMapper's CLI, protmap"/>
</p>

# 👏🏼 Credits:
## 👏🏼 Credits

- [UniProt](https://www.uniprot.org/) for providing the API and the amazing database;
- [Andrew White and the University of Rochester](https://github.com/whitead/protein-emoji) for the protein emoji;
- [Andrew White and the University of Rochester](https://github.com/whitead/protein-emoji) for the protein emoji;

---

For issues, feature requests, or questions, please open an issue on the GitHub repository.
19 changes: 19 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Minimal makefile for Sphinx documentation

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
27 changes: 27 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Documentation

This directory contains the documentation for UniProtMapper.

## Building Documentation Locally

1. Install documentation dependencies:
```bash
pip install -r requirements.txt
```

2. Build the documentation:
```bash
make html
```

3. View the documentation by opening `build/html/index.html` in your web browser.

## Documentation Structure

- `source/`: Documentation source files
- `conf.py`: Sphinx configuration
- `index.rst`: Homepage
- `api/`: API reference
- `tutorials/`: Usage tutorials
- `requirements.txt`: Documentation dependencies
- `Makefile` & `make.bat`: Build scripts
Loading