The repository contains code for machine learning (ML)-powered search engine specifically tailored for analyzing tera-scale high-resolution mass spectrometry (HRMS) data.
- First, you have to git clone and install dependencies from MEDUSA python package. It is recommended to use the same Conda environment with Python 3.8 for medusa and medusa search. Then you have to change branch to
medusa_search.
conda create -n medusa python=3.8
git checkout medusa_search
pip install -r requirements.txt- After that, you have to clone
medusa-searchrepository and install additional requirements.
pip install -r requirements.txt- Change file permissions for bash-scripts inside
searchfolder.
chmod 744 search/*.sh- Now you can run
setup.pyfile and follow the instructions. You will need to specify the path to the MEDUSA package folder and to HRMS database.
python setup.py- Every time you perform search procedure, medusa-search takes functions from medusa_repository_path folder. You can create a .pth file in the site directory to add medusa_repository_path.
# find site directory
SITEDIR=$(python -m site --user-site)
# create if site directory doesn't exist
mkdir -p "$SITEDIR"
# create new .pth file with medusa_repository_path
echo "<your medusa repository path>" > "$SITEDIR/medusa.pth"- To use command-line interface you should always go to
searchfolder and run main.py script
cd search
python main.py-
the procedure of batches creation and indexing should be performed before search. It can be performed with
create_batchesandindexcommands respectively. -
After indexing,
searchcan be performed. Results are saved inmedusa-search/search/reportsfolder.
Most important commands:
| Command | Description |
|---|---|
create_batches |
Create batches (or shards) of spectra filenames |
create_unique_batch |
Create batch only with spectra filenames that have specific word indicator |
index |
Index filenames from batches located in medusa-search/search/batches directory and save results in medusa-search/search/index_pickles directory |
search |
Search formula in spectra indexed in specific directory. Results are saved in medusa-search/search/reports |
P.S. More explanations can be found in the Supporting information of the article.