-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathSnakefile_ItIsNotDoingWhatIWantItToDo
More file actions
120 lines (104 loc) · 4.55 KB
/
Snakefile_ItIsNotDoingWhatIWantItToDo
File metadata and controls
120 lines (104 loc) · 4.55 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
# SPDX-FileCopyrightText: euronion
#
# SPDX-License-Identifier: MIT
from pathlib import Path
# This Snakefile demonstrates how to show and investigate the DAG of a Snakemake workflow.
# It contains several rules that depend on each other and are connected via their input and output files.
rule create_aux_file:
message:
"Creating aux file that might already exist."
output:
fn="data/dag/aux.file",
run:
Path(output["fn"]).touch()
# rule create_file:
# message:
# "Creating a file with wildcard extension ext={wildcards.ext}"
# output:
# fn="data/dag/raw.{ext}",
# run:
# Path(output["fn"]).touch()
rule process_data:
input:
csv="data/dag/raw.csv",
json="data/dag/raw.json",
txt="data/dag/raw.txt",
external_data="data/dag/external.data",
output:
data="data/dag/processed_data.txt",
run:
Path(output["data"]).touch()
rule get_external_data:
input:
"somewhere/over/the/rainbow/external.data",
output:
"data/dag/external.data",
run:
Path(output["fn"]).touch()
rule derive_insights:
input:
aux="data/dag/aux.file",
data="data/dag/processed_data.txt",
output:
insights="data/dag/insights.pdf",
default_target: True
run:
Path(output["insights"]).touch()
rule delete_external_data:
message:
"Deleting external data file to demonstrate how it can mess up the DAG."
input:
"data/dag/external.data",
run:
Path(input[0]).unlink(missing_ok=True)
# Note: Before running the workflow, ensure that the following file exists:
# > touch data/dag/external.data
#
# Regular dry-run to see what would be done
# > snakemake -c1 -s Snakefile_ItIsNotDoingWhatIWantItToDo --dry-run
#
# List all available rules in the Snakefile (helpful e.g. if your rule cannot be found and you want to check if Snakemake actually sees the rule)
# > snakemake -c1 -s Snakefile_ItIsNotDoingWhatIWantItToDo --list
#
# Nicer summary
# > snakemake -c1 -s Snakefile_ItIsNotDoingWhatIWantItToDo --summary
#
# Looking at the DAG to confirm the correct execution order
# > snakemake -c1 -F -s Snakefile_ItIsNotDoingWhatIWantItToDo --dag | dot -Tpng > dag.png
#
# Another option is the rulegraph, which shows the rules and their connections without wildcards.
# Particularly useful for large DAGs
# > snakemake -c1 -s Snakefile_ItIsNotDoingWhatIWantItToDo --rulegraph | dot -Tpng > rulegraph.png
#
# Or the filegraph, which shows the files and their connections without wildcards.
# > snakemake -c1 -s Snakefile_ItIsNotDoingWhatIWantItToDo --filegraph | dot -Tpng > filegraph.png
#
# Note that DAG, rulegraph and filegraph will only show the parts of the workflow that need to be executed to run.
# I.e. to see all rule and file dependencies, use the --forceall or -F flag
# > snakemake -F -c1 -s Snakefile_ItIsNotDoingWhatIWantItToDo --dag | dot -Tpng > dag_F.png
# > snakemake -F -c1 -s Snakefile_ItIsNotDoingWhatIWantItToDo --filegraph | dot -Tpng > filegraph_F.png
#
# Special case (some call it a bug, some call it a feature):
# Since around snakemake=~7.21, rules that cannot be executed (e.g. because their input files do not exist)
# AND that do not need to be executed (e.g. because their output files already exist) are not considered in the DAG
# EVEN IF the --forceall/-F flag is used.
# Let's see this in action with the rule `get_external_data`, for which the input file does not exist.
# Because of the missing input file, the rule exeuction fails:
# > snakemake -F -c1 -s Snakefile_ItIsNotDoingWhatIWantItToDo get_external_data --dry-run
#
# But since the output file exists, dependent rules can still be executed, e.g.:
# > snakemake -F -c1 -s Snakefile_ItIsNotDoingWhatIWantItToDo derive_insights --dry-run
#
# If we now mess up the external data file, e.g. by deleting it:
# > snakemake -F -c1 -s Snakefile_ItIsNotDoingWhatIWantItToDo delete_external_data
#
# Then now other rules that depend on the external data file cannot be executed anymore.
# The expectation would be that with the -F flag now Snakemake complains about the missing input file.
# But instead, Snakemake will only and quietly run the parts of the workflow for which all input files exist:
# > snakemake -F -c1 -s Snakefile_ItIsNotDoingWhatIWantItToDo --dry-run
#
# The DAG will also not show the missing parts:
# > snakemake -F -c1 -s Snakefile_ItIsNotDoingWhatIWantItToDo --dag | dot -Tpng > dag_F_after_delete.png
#
# Although all rules are still shown and recognised by Snakemake:
# > snakemake -F -c1 -s Snakefile_ItIsNotDoingWhatIWantItToDo --list