Skip to content

Factorize common workflow generation functionality.#147

Merged
eladrion merged 10 commits intomainfrom
137_use_stdlib_commons_functionality
Feb 9, 2026
Merged

Factorize common workflow generation functionality.#147
eladrion merged 10 commits intomainfrom
137_use_stdlib_commons_functionality

Conversation

@eladrion
Copy link
Contributor

@eladrion eladrion commented Dec 12, 2025

Pull Request Overview

The PR extrapolates functionality for generating CWL and SnakeMake files (e.g. indentation, name generation) into a separate class to reduce code duplication. Also, padding of workflow numbers are now done by using String format tags.

Related Issue

Resolves #137

Changes Introduced

Merely refactoring.

How Has This Been Tested?

Local tests were run on the test suite.

Checklist

  • I have referenced a related issue.
  • I have followed the project’s style guidelines.
  • My changes include tests, if applicable.
  • All tests pass locally.
  • I have added myself to the CITATION.cff file, if not already present.

@eladrion eladrion self-assigned this Dec 12, 2025
@eladrion
Copy link
Contributor Author

Tested with the following config:
config_refac_137.json
and got as Workflow 3 the following:

# WorkflowNo_2
# This workflow is generated by APE (https://github.com/workflomics/ape).

rule all:
    input:
        'add-path/gProfiler_out_1'

rule XTandem_01:
'    input:
        'add-path/input_1',
        'add-path/input_2'
    output:
        'add-path/XTandem_out_1'
    shell: 'add-path-to-implementation/XTandem {input} {output}'

rule PeptideProphet_02:
'    input:
        'add-path/XTandem_out_1',
        'add-path/input_1',
        'add-path/input_2'
    output:
        'add-path/PeptideProphet_out_1',
        'add-path/PeptideProphet_out_2'
    shell: 'add-path-to-implementation/PeptideProphet {input} {output}'

rule ProteinProphet_03:
'    input:
        'add-path/PeptideProphet_out_1',
        'add-path/input_2'
    output:
        'add-path/ProteinProphet_out_1',
        'add-path/ProteinProphet_out_2'
    shell: 'add-path-to-implementation/ProteinProphet {input} {output}'

rule protXml2IdList_04:
'    input:
        'add-path/ProteinProphet_out_1'
    output:
        'add-path/protXml2IdList_out_1'
    shell: 'add-path-to-implementation/protXml2IdList {input} {output}'

rule gProfiler_05:
'    input:
        'add-path/protXml2IdList_out_1'
    output:
        'add-path/gProfiler_out_1'
    shell: 'add-path-to-implementation/gProfiler {input} {output}'

Does this look good, @CGru21 ?

@eladrion
Copy link
Contributor Author

candidate_workflow_3 This is the respective PNG

@eladrion
Copy link
Contributor Author

eladrion commented Jan 12, 2026

It seems to match the corresponding CWL representation, apart from the placeholder paths:

# WorkflowNo_2
# This workflow is generated by APE (https://github.com/workflomics/ape).
cwlVersion: v1.2
class: Workflow

label: WorkflowNo_2
doc: A workflow including the tool(s) XTandem, PeptideProphet, ProteinProphet, protXml2IdList, gProfiler.

inputs:
  input_1:
    type: File
    format: "http://edamontology.org/format_3244" # mzML
  input_2:
    type: File
    format: "http://edamontology.org/format_1929" # FASTA
steps:
  XTandem_01:
    run: https://raw.githubusercontent.com/Workflomics/tools-and-domains/main/cwl-tools/xtandem/xtandem.cwl
    in:
      XTandem_in_1: input_1
      XTandem_in_2: input_2
    out: [XTandem_out_1]
  PeptideProphet_02:
    run: https://raw.githubusercontent.com/Workflomics/tools-and-domains/main/cwl-tools/peptideprophet/peptideprophet.cwl
    in:
      PeptideProphet_in_1: XTandem_01/XTandem_out_1
      PeptideProphet_in_2: input_1
      PeptideProphet_in_3: input_2
    out: [PeptideProphet_out_1, PeptideProphet_out_2]
  ProteinProphet_03:
    run: https://raw.githubusercontent.com/Workflomics/tools-and-domains/main/cwl-tools/proteinprophet/proteinprophet.cwl
    in:
      ProteinProphet_in_1: PeptideProphet_02/PeptideProphet_out_1
      ProteinProphet_in_2: input_2
    out: [ProteinProphet_out_1, ProteinProphet_out_2]
  protXml2IdList_04:
    run: https://raw.githubusercontent.com/Workflomics/tools-and-domains/main/cwl-tools/protXml2IdList/protXml2IdList.cwl
    in:
      protXml2IdList_in_1: ProteinProphet_03/ProteinProphet_out_1
    out: [protXml2IdList_out_1]
  gProfiler_05:
    run: https://raw.githubusercontent.com/Workflomics/tools-and-domains/main/cwl-tools/gprofiler/gprofiler.cwl
    in:
      gProfiler_in_1: protXml2IdList_04/protXml2IdList_out_1
    out: [gProfiler_out_1]
outputs:
  output_1:
    type: File
    format: "http://edamontology.org/format_3464" # JSON_p
    outputSource: gProfiler_05/gProfiler_out_1

@eladrion
Copy link
Contributor Author

Hi @vedran-kasalica and @CGru21, I think I am done with this PR which includes some refactorings to make some input handling consistent and otherwise extrapolates common functionality into separate methods to make reuse and extension easier. Also, I added some missing methods for the configuration of the Snakemake export. Please take a look. I will do something else now.

@eladrion eladrion mentioned this pull request Jan 22, 2026
11 tasks
@eladrion
Copy link
Contributor Author

@CGru21: Waiting with this PR until we have a review on #149 and will afterwards merge. Then I will rebase this branch and ask for reviews.

@CGru21
Copy link
Collaborator

CGru21 commented Feb 9, 2026

I tested this with my snakemake examples and everything looks good.

…olutionCreationUtils` and also replace dependent padding (`if (stepNumber < 10)`) by String format flags
…th `getInputTypes()` and refactor the functions `generateRuleInput()` and `generateRuleOutput()` such that String join is used making removal of superfluous trailing newline and colon obsolete.
…thod and additionally make configuration checking for outputs consistent in all methods.
@eladrion eladrion force-pushed the 137_use_stdlib_commons_functionality branch from dd25c9b to 15fa60f Compare February 9, 2026 09:43
@eladrion
Copy link
Contributor Author

eladrion commented Feb 9, 2026

Local additional tests also ran successfully

@eladrion
Copy link
Contributor Author

eladrion commented Feb 9, 2026

Merging now

@eladrion eladrion merged commit 1fd14cc into main Feb 9, 2026
1 check passed
@eladrion eladrion deleted the 137_use_stdlib_commons_functionality branch February 9, 2026 10:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use standard library and commons functionality for some implementations

2 participants

Comments