Skip to content

Add explain flag and merged config dump#184

Merged
nuwang merged 17 commits intomainfrom
add_explainability
Mar 11, 2026
Merged

Add explain flag and merged config dump#184
nuwang merged 17 commits intomainfrom
add_explainability

Conversation

@nuwang
Copy link
Copy Markdown
Member

@nuwang nuwang commented Feb 19, 2026

This PR adds support for

  1. tpv dump command to view merged config
  2. --explain flag to tpv dry-run so admins can trace how a particular decision was made

closes: #153

@nuwang nuwang requested a review from cat-bro February 19, 2026 17:21
@coveralls
Copy link
Copy Markdown
Collaborator

coveralls commented Feb 20, 2026

Pull Request Test Coverage Report for Build 22773699455

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 355 of 360 (98.61%) changed or added relevant lines in 8 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.1%) to 95.681%

Changes Missing Coverage Covered Lines Changed/Added Lines %
tpv/core/entities.py 15 16 93.75%
tpv/core/explain.py 117 119 98.32%
tpv/core/mapper.py 61 63 96.83%
Totals Coverage Status
Change from base Build 22251655819: 0.1%
Covered Lines: 1726
Relevant Lines: 1757

💛 - Coveralls

@nuwang nuwang added the enhancement New feature or request label Feb 21, 2026
@nuwang nuwang force-pushed the add_explainability branch from d5bdeaf to f2e27f5 Compare February 21, 2026 06:44
@mvdbeek
Copy link
Copy Markdown
Member

mvdbeek commented Feb 26, 2026

@pauldg is also interested in this

@cat-bro
Copy link
Copy Markdown
Collaborator

cat-bro commented Mar 2, 2026

Hi @nuwang , this is going to be very useful!

tpv dump is perfect.

For tpv dry-run --explain I have some notes. I’m going to paste the output into the next comment with line numbers so that I can comment on parts of it.

@cat-bro
Copy link
Copy Markdown
Collaborator

cat-bro commented Mar 2, 2026

     1	tpv dry-run --tool=toolshed.g2.bx.psu.edu/repos/bgruening/antismash/antismash/6.1.1+galaxy1
        --input-size=6 --job-conf tpv_check/explain_job_conf.yml --explain
     2
     3	========================================================================
     4	TPV SCHEDULING DECISION TRACE
     5	========================================================================
     6
     7	--- Configuration Loading ---
     8	  [1] Loaded config: https://gxy.io/tpv/db-v2.yml
     9
    10	  [2] Loaded config:
        /Users/cat/dev/infrastructure/tpv_check/total_perspective_vortex/default_tool.yml
    11
    12	  [3] Loaded config:
        /Users/cat/dev/infrastructure/tpv_check/total_perspective_vortex/tools.yml
    13
    14	  [4] Loaded config:
        /Users/cat/dev/infrastructure/tpv_check/total_perspective_vortex/tool_pulsar_scores.yml
    15
    16	  [5] Loaded config:
        /Users/cat/dev/infrastructure/tpv_check/total_perspective_vortex/users.yml
    17
    18	  [6] Loaded config:
        /Users/cat/dev/infrastructure/tpv_check/total_perspective_vortex/destinations.yml
    19
    20	--- Entity Matching ---
    21	  [7] Tool 'toolshed.g2.bx.psu.edu/repos/bgruening/antismash/antismash/6.1.1+galaxy1':
        matched entity 'toolshed.g2.bx.psu.edu/repos/bgruening/antismash/antismash/*'
    22
    23	  [8] No user specified
    24
    25	--- Entity Combining ---
    26	  [9] Combining entities: Tool(toolshed.g2.bx.psu.edu/repos/bgruening/antismash/antismash/*)
    27	        cores=10, mem=24, gpus=0
    28	        scheduling: require=[], prefer=[], reject=['offline']
    29
    30	--- Rule Evaluation ---
    31	  [10] Rule 'login_required_rule' (if: require_login and user is None) -> not matched
    32
    33	  [11] Rule 'minimum_singularity_version_positive_rule' (if: minimum_singularity_version is
        not None and helpers.tool_version_gte(tool, minim) -> not matched
    34
    35	  [12] Rule 'minimum_singularity_version_negative_rule' (if: minimum_singularity_version is
        not None and helpers.tool_version_lt(tool, minimu) -> not matched
    36
    37	  [13] Rule 'max_concurrent_job_count_for_tool_rule' (if: total_limit_exceeded = False
    38	user_limit_exceeded = False
    39	if max_concurrent_job_c) -> not matched
    40
    41	  [14] Rule 'pulsar_score_prefer_pulsar_rule' (if: result = pulsar_score is not None and (
    42	  helpers.tag_values_match(entity, ['pul) -> MATCHED
    43
    44	  [15] Rule 'pulsar_score_prefer_slurm_rule' (if: from numbers import Number
    45	result = pulsar_score is not None and isinstance(enti) -> not matched
    46
    47	--- Resource Evaluation ---
    48	  [16] Evaluated resource expressions
    49	        cores=10, mem=24, gpus=0
    50
    51	--- Destination Matching ---
    52	  [17] tpvdb_local: REJECTED
    53	        destination is abstract
    54
    55	  [18] tpvdb_slurm: REJECTED
    56	        destination is abstract
    57
    58	  [19] default: REJECTED
    59	        destination is abstract
    60
    61	  [20] _slurm_destination: REJECTED
    62	        destination is abstract
    63
    64	  [21] _pulsar_destination: REJECTED
    65	        destination is abstract
    66
    67	  [22] slurm: MATCHED
    68	        capacity: max_cores=32, max_mem=125
    69
    70	  [23] slurm-training: REJECTED
    71	        tag mismatch - entity requires [], rejects ['offline'] dest tags are ['training',
        'docker', 'singularity', 'slurm', 'gtdbtk_database', 'bakta_database', 'funannotate',
        'eggnog', 'verkko_venv', 'phastest', 'medaka_venv_211', 'tool_type_user_defined']
    72
    73	  [24] interactive_pulsar: REJECTED
    74	        tag mismatch - entity requires [], rejects ['offline'] dest tags are
        ['interactive_pulsar', 'docker', 'singularity', 'tool_type_user_defined']
    75
    76	  [25] pulsar-mel2: REJECTED
    77	        cores 10 exceeds max_accepted_cores 8
    78
    79	  [26] pulsar-mel3: MATCHED
    80	        capacity: max_cores=32, max_mem=62.5
    81
    82	  [27] pulsar-high-mem1: REJECTED
    83	        mem 24 below min_accepted_mem 62.51
    84
    85	  [28] pulsar-high-mem2: REJECTED
    86	        tag mismatch - entity requires [], rejects ['offline'] dest tags are ['pulsar',
        'pulsar-high-mem2', 'docker', 'singularity', 'cvmfs_cache_100plus', 'cvmfs_cache_800plus',
        'phastest', 'tool_type_user_defined']
    87
    88	  [29] pulsar-mel-blast: REJECTED
    89	        tag mismatch - entity requires [], rejects ['offline'] dest tags are ['pulsar',
        'pulsar-blast', 'offline', 'docker', 'singularity', 'pulsar-mel-blast',
        'cvmfs_cache_100plus', 'cvmfs_cache_800plus', 'tool_type_user_defined']
    90
    91	  [30] pulsar-qld-high-mem0: REJECTED
    92	        mem 24 below min_accepted_mem 400
    93
    94	  [31] pulsar-qld-high-mem1: REJECTED
    95	        mem 24 below min_accepted_mem 62.51
    96
    97	  [32] pulsar-qld-high-mem2: REJECTED
    98	        mem 24 below min_accepted_mem 58
    99
   100	  [33] pulsar-nci-training: REJECTED
   101	        tag mismatch - entity requires [], rejects ['offline'] dest tags are ['pulsar',
        'training', 'docker', 'singularity', 'pulsar-nci-training', 'cvmfs_cache_100plus',
        'pulsar-blast', 'bakta_database', 'funannotate', 'eggnog', 'medaka_venv_211',
        'tool_type_user_defined']
   102
   103	  [34] pulsar-qld-blast: REJECTED
   104	        tag mismatch - entity requires [], rejects ['offline'] dest tags are ['pulsar',
        'pulsar-blast', 'docker', 'singularity', 'pulsar-qld-blast', 'cvmfs_cache_100plus',
        'cvmfs_cache_800plus', 'tool_type_user_defined']
   105
   106	  [35] pulsar-QLD: MATCHED
   107	        capacity: max_cores=16, max_mem=62.5
   108
   109	  [36] pulsar-azure: REJECTED
   110	        tag mismatch - entity requires [], rejects ['offline'] dest tags are ['pulsar',
        'pulsar-azure', 'offline', 'docker', 'singularity', 'tool_type_user_defined']
   111
   112	  [37] pulsar-azure-gpu: REJECTED
   113	        tag mismatch - entity requires [], rejects ['offline'] dest tags are ['pulsar',
        'pulsar-azure-gpu', 'offline', 'docker', 'singularity', 'tool_type_user_defined']
   114
   115	  [38] pulsar-azure-1-gpu: REJECTED
   116	        tag mismatch - entity requires [], rejects ['offline'] dest tags are ['pulsar',
        'pulsar-azure-1-gpu', 'offline', 'docker', 'singularity', 'tool_type_user_defined']
   117
   118	  [39] _pulsar_qld_gpu: REJECTED
   119	        destination is abstract
   120
   121	  [40] pulsar-qld-gpu1: REJECTED
   122	        tag mismatch - entity requires [], rejects ['offline'] dest tags are ['pulsar',
        'pulsar-qld-gpu', 'docker', 'singularity', 'pulsar-qld-gpu1', 'pulsar-qld-gpu-alphafold',
        'tool_type_user_defined']
   123
   124	  [41] pulsar-qld-gpu2: REJECTED
   125	        tag mismatch - entity requires [], rejects ['offline'] dest tags are ['pulsar',
        'pulsar-qld-gpu', 'docker', 'singularity', 'pulsar-qld-gpu2', 'pulsar-qld-gpu-alphafold',
        'tool_type_user_defined']
   126
   127	  [42] pulsar-qld-gpu3: REJECTED
   128	        tag mismatch - entity requires [], rejects ['offline'] dest tags are ['pulsar',
        'pulsar-qld-gpu', 'docker', 'singularity', 'pulsar-qld-gpu3', 'pulsar-qld-gpu-alphafold',
        'tool_type_user_defined']
   129
   130	  [43] pulsar-qld-gpu4: REJECTED
   131	        tag mismatch - entity requires [], rejects ['offline'] dest tags are ['pulsar',
        'pulsar-qld-gpu', 'docker', 'singularity', 'pulsar-qld-gpu4', 'pulsar-qld-gpu-other',
        'tool_type_user_defined']
   132
   133	  [44] pulsar-qld-gpu5: REJECTED
   134	        tag mismatch - entity requires [], rejects ['offline'] dest tags are ['pulsar',
        'pulsar-qld-gpu', 'docker', 'singularity', 'pulsar-qld-gpu5', 'pulsar-qld-gpu-other',
        'tool_type_user_defined']
   135
   136	--- Destination Ranking ---
   137	  [45] #1 slurm (score: -9)
   138
   139	  [46] #2 pulsar-mel3 (score: -1)
   140
   141	  [47] #3 pulsar-mel3 (score: -1)
   142
   143	--- Destination Evaluation ---
   144	  [48] Evaluating destination 'slurm'
   145
   146	--- Rule Evaluation ---
   147	  [49] Rule 'slurm_destination_singularity_rule' (if:
        entity.params.get('singularity_enabled')) -> MATCHED
   148
   149	  [50] Rule 'slurm_destination_docker_rule' (if: entity.params.get('docker_enabled')) -> not
        matched
   150
   151	--- Final Result ---
   152	  [51] Destination: slurm
   153	        runner: slurm
   154	        cores: 10, mem: 24, gpus: 0
   155	        params: {'singularity_enabled': True, 'tpv_cores': '10', 'tpv_gpus': '0', 'tpv_mem':
        '24', 'nativeSpecification': '--nodes=1 --ntasks=10 --ntasks-per-node=10 --mem=24576
        --partition=main', 'metadata_strategy': 'extended', 'singularity_volumes':
        '$job_directory:rw,$galaxy_root:ro,$tool_directory:ro,/mnt/user-data-volA:ro,/mnt/user-data-
        volB:ro,/mnt/user-data-volD:ro,/mnt/user-data-qld:ro,/mnt/custom-indices:ro,/cvmfs/data.gala
        xyproject.org:ro,/tmp:rw', 'singularity_default_container_id':
        '/cvmfs/singularity.galaxyproject.org/all/python:3.8.3'}
   156	        env: [{'name': 'HDF5_USE_FILE_LOCKING', 'value': 'FALSE'}, {'name':
        'SINGULARITYENV_HDF5_USE_FILE_LOCKING', 'value': 'FALSE'}, {'name': '_JAVA_OPTIONS',
        'value': '-Xmx24G -Xms1G'}, {'name': 'SINGULARITYENV__JAVA_OPTIONS', 'value': '-Xmx24G
        -Xms1G'}]
   157
   158	========================================================================
   159	!!python/object:galaxy.jobs.JobDestination
   160	converted: false
   161	env:
   162	- {name: HDF5_USE_FILE_LOCKING, value: 'FALSE'}
   163	- {name: SINGULARITYENV_HDF5_USE_FILE_LOCKING, value: 'FALSE'}
   164	- {name: _JAVA_OPTIONS, value: -Xmx24G -Xms1G}
   165	- {name: SINGULARITYENV__JAVA_OPTIONS, value: -Xmx24G -Xms1G}
   166	id: slurm
   167	legacy: false
   168	params: {metadata_strategy: extended, nativeSpecification: --nodes=1 --ntasks=10
   169	    --ntasks-per-node=10 --mem=24576 --partition=main,
   170	    singularity_default_container_id:
   171	    /cvmfs/singularity.galaxyproject.org/all/python:3.8.3, singularity_enabled:
   172	    true, singularity_volumes:
        '$job_directory:rw,$galaxy_root:ro,$tool_directory:ro,/mnt/user-data-volA:ro,/mnt/user-data-
        volB:ro,/mnt/user-data-volD:ro,/mnt/user-data-qld:ro,/mnt/custom-indices:ro,/cvmfs/data.gala
        xyproject.org:ro,/tmp:rw',
   173	  tpv_cores: '10', tpv_gpus: '0', tpv_mem: '24'}
   174	resubmit: []
   175	runner: slurm
   176	shell: null
   177	tags: [registered_user_concurrent_jobs_12]
   178	url: null

@cat-bro
Copy link
Copy Markdown
Collaborator

cat-bro commented Mar 2, 2026

Line 25: entity combining - the combined entity has accept: [‘pulsar’] which is left out and is important for later matchmaking.

Lines 52-68: Maybe the abstract destinations could be left out. There is a lot of other good info in here and the fact that entities do not match with abstract destinations is not interesting.

Line 70: slurm-training is rejected for a tag mismatch but it’s not clear why. It would be better if all tag categories (accept/prefer/require/reject) were listed for the entity, and if tags were separated into categories for the destination. The reason that there is a tag mismatch is that slurm-training requires the ’training’ tag and the entity does not have the ‘training' tag, but it is not obvious from this explanation.

Lines 137-141: There is something odd here because pulsar-mel3 is listed twice and pulsar-QLD also matched. The ranking function being used by the job conf in this case is weighted_random_sampling.

Everything else looks fantastic!

@nuwang
Copy link
Copy Markdown
Member Author

nuwang commented Mar 6, 2026

Thanks @cat-bro, that was super useful feedback. I think all the issues you highlighted have been addressed now. The last one was particularly interesting - because it's a consequence of using weighted random sampling without weights being defined. As a result, the same destination is considered again when making the next random choice. I've changed it so that, if weights are not defined, it falls backs to standard random sampling (without replacement).

Copy link
Copy Markdown
Collaborator

@cat-bro cat-bro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @nuwang !

@nuwang nuwang merged commit 06d65a1 into main Mar 11, 2026
3 checks passed
@nuwang nuwang deleted the add_explainability branch March 11, 2026 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Command to dump merged config

4 participants