Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
2bc8661
Switch from GitHub API to local git cloning for tool extraction
paulzierep Jun 25, 2026
26a0c74
Fix mypy type errors (variable shadowing, missing return annotations)
paulzierep Jun 25, 2026
41fb91c
Remove generated resource files from branch
paulzierep Jun 25, 2026
9c9daa4
Revert "Remove generated resource files from branch"
paulzierep Jun 25, 2026
b2094c2
Remove test tool files from PR
paulzierep Jun 25, 2026
bd0205e
Revert "Remove test tool files from PR"
paulzierep Jun 25, 2026
951d0f9
Restore test_tools files to original state
paulzierep Jun 25, 2026
cd77b52
Commit fresh server list alongside tools on merge
paulzierep Jun 25, 2026
dfb0616
Switch to raw git commands for merge commit
paulzierep Jun 25, 2026
ca03a66
Fix lint: unused Mock import and Black formatting
paulzierep Jun 26, 2026
15b2ef5
Fix isort import ordering
paulzierep Jun 26, 2026
be5f0e4
Enable tool and workflow tests on all PRs (no secrets needed)
paulzierep Jun 26, 2026
f7f07f8
Fix three issues in clone-based tool extraction
paulzierep Jun 26, 2026
b261043
Fix chained macro token resolution (e.g. @VERSION@ -> @TOOL_VERSION@)
paulzierep Jun 26, 2026
122c87c
Use --clone-depth 0 in extract_all_tools.sh for full git history
paulzierep Jun 26, 2026
b526143
Update commit step to use git-auto-commit-action with PAT
paulzierep Jun 26, 2026
bc022a6
Merge main into clone-tools-locally
paulzierep Jun 26, 2026
92c5622
Fix unresolved merge conflict markers in fetch_filter_resources.yaml
paulzierep Jun 26, 2026
7fb8c20
Add unit tests for clone logic helpers
paulzierep Jun 26, 2026
289a9e0
Fix pre-existing test collection errors
paulzierep Jun 26, 2026
d34deca
Add comprehensive unit tests using real Galaxy tool wrappers
paulzierep Jun 26, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 14 additions & 3 deletions .github/workflows/fetch_filter_resources.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -63,8 +63,6 @@ jobs:
- name: Fetch all tool stepwise
run: |
bash sources/bin/extract_all_tools.sh "${{ matrix.subset }}"
env:
GITHUB_API_KEY: ${{ secrets.GH_API_TOKEN }}
- name: Archive tool sublists production artifacts
uses: actions/upload-artifact@v7
with:
Expand Down Expand Up @@ -139,7 +137,9 @@ jobs:
# path: communities/all/resources/citations.json
merge-fetch:
runs-on: ubuntu-latest
if: ${{ !cancelled() && needs.fetch-tools-stepwise.result == 'success' }}
needs:
- fetch-servers
- fetch-tools-stepwise
- fetch-tutorials
- fetch-workflows
Expand All @@ -165,19 +165,30 @@ jobs:
merge-multiple: true
path: communities/all/resources/
- name: Download tutorials
if: needs.fetch-tutorials.result == 'success'
uses: actions/download-artifact@v8
with:
pattern: tutorials
merge-multiple: true
path: communities/all/resources/
- name: Download workflows
if: needs.fetch-workflows.result == 'success'
uses: actions/download-artifact@v8
with:
pattern: workflows
merge-multiple: true
path: communities/all/resources/
- name: Download available servers
uses: actions/download-artifact@v8
with:
pattern: available-servers
merge-multiple: true
path: sources/data/
- name: Display structure of downloaded files
run: ls -R communities/all/resources/
run: |
ls -R communities/all/resources/
echo "---"
ls -R sources/data/
- name: Merge all tools
run: | #merge files with only one header -> https://stackoverflow.com/questions/16890582/unixmerge-multiple-csv-files-with-same-header-by-keeping-the-header-of-the-firs; map(.[]) -> https://stackoverflow.com/questions/42011086/merge-arrays-of-json (get flat array, one tool per entry)
awk 'FNR==1 && NR!=1{next;}{print}' communities/all/resources/repositories*.list_tools.tsv > communities/all/resources/tools.tsv
Expand Down
11 changes: 1 addition & 10 deletions .github/workflows/run_tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -32,20 +32,16 @@ jobs:
test-tools:
runs-on: ubuntu-latest
# This job runs tests for tools.
# It checks for internal pull requests targeting the main branch,
# as well as pushes to the dev branch.
# The workflow performs the following steps:
# 1. Checkout the repository code.
# 2. Set up Python environment using the specified version.
# 3. Install required Python packages.
# 4. Extract tools using a provided script.
# 5. Filter community tools using a provided script.
# 6. Format tools into an interactive table and generate a word cloud.
if: github.event.pull_request.head.repo.full_name == github.repository || github.ref == 'refs/heads/dev'
strategy:
matrix:
python-version: ['3.11']
environment: fetch-tools
steps:
- name: Checkout
uses: actions/checkout@v7
Expand All @@ -56,9 +52,7 @@ jobs:
run: python -m pip install -r requirements.txt
- name: Tool extraction
run: |
bash sources/bin/extract_all_tools.sh test
env:
GITHUB_API_KEY: ${{ secrets.GH_API_TOKEN }}
bash sources/bin/extract_all_tools.sh test
- name: Tool filter
run: |
bash sources/bin/get_community_tools.sh test
Expand Down Expand Up @@ -101,9 +95,6 @@ jobs:
test-workflows:
runs-on: ubuntu-latest
# This job runs tests for workflows.
# It performs the same checks as the other jobs, ensuring
# quality for workflow scripts.
if: github.event.pull_request.head.repo.full_name == github.repository || github.ref == 'refs/heads/dev'
strategy:
matrix:
python-version: ['3.11']
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@ galaxycodex/bin/*
galaxycodex/lib/*
_site/
Gemfile.lock
sys
50 changes: 50 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Changelog

## [Unreleased] — Local git cloning for tool extraction

### Changed

- **`sources/bin/extract_galaxy_tools.py`** — Replaced GitHub API (PyGithub) with local git cloning:
- `get_tool_repositories()` now clones `planemo-monitor` locally instead of reading files via GitHub API
- `clone_repositories()` clones/pulls each tool repo into a local cache directory
- `parse_tools_from_local()` parses tools from the local filesystem instead of via `ContentFile` objects
- `get_tool_metadata_from_local()` uses `galaxy.util.xml_macros` for proper XML macro expansion
- Removed `--api` flag (no GitHub token required), added `--repo-dir`, `--repo-url`, `--workers`
- Parallel parsing support via `ThreadPoolExecutor`
- Handles non-GitHub URLs (GitLab, self-hosted, etc.)
- **`sources/bin/extract_all_tools.sh`** — Removed `--api $GITHUB_API_KEY` from all invocations
- **`.github/workflows/fetch_filter_resources.yaml`** — Removed unused `GITHUB_API_KEY` env var
- **`.github/workflows/run_tests.yaml`** — Removed unused `GITHUB_API_KEY` env var

### Added

- **74 new tools** discovered that the old GitHub API approach missed:

| Repository | Tools gained | Reason previously missed |
|---|---|---|
| `gregvonkuster/galaxy_tools` | 41 | Rate limiting / API failures during `get_contents()` |
| `galaxy-team/galaxy-tools` (gitlab.pasteur.fr) | 18 | Non-`https://github.com/` URL rejected by old `get_github_repo()` |
| `galaxyproject/tools-iuc` | 10 | New tools added since last production run |
| `bgruening/galaxytools` | 3 | Nested `.shed.yml` not found by old parser |
| `galaxyecology/tools-ecology` | 1 | Tool added since last production run |
| `galaxyproteomics/tools-galaxyp` | 1 | Tool added since last production run |

- **36 conda packages** now correctly resolved via XML macro expansion (old code relied on `etree.fromstring()` which cannot resolve macros)
- `--workers N` flag for parallel tool parsing
- `--repo-url` flag for specifying individual repos to process
- `--repo-dir` flag for customizing the local clone cache location
- `--clone-depth` flag for controlling git clone depth (default: 1 for CI-efficient shallow clones; pass 0 for full history)
- Repository URL deduplication in `clone_repositories()` prevents cloning the same repo twice

### Removed

- `--api` / `GITHUB_API_KEY` requirement — tool extraction no longer needs a GitHub token
- `get_github_repo()`, `get_string_content()`, `get_suite_ID_fallback()`, `get_tools()` — replaced by local-cloning equivalents

### Fixed

- Non-GitHub repository URLs (e.g., GitLab, self-hosted) are now supported
- XML macro expansion via `galaxy.util.xml_macros` finds requirements and cross-references that simple XML parsing missed
- Nested tool directory structures handled more reliably
- No more GitHub API rate limiting issues during extraction
- Duplicate repository URLs are now skipped (was wasting ~12 GB and clone time)
Loading