|
| 1 | +Insights on the DAG JSON Format and How RCA Can Work |
| 2 | + |
| 3 | +Executive summary |
| 4 | + |
| 5 | +The current JSON format has improved from a simple DAG export into a usable artifact-level provenance map. |
| 6 | + |
| 7 | +That is a meaningful shift. |
| 8 | + |
| 9 | +Earlier, the graph mainly described node-to-node dependency structure. Now it describes which concrete artifacts flow into which nodes. That makes the format much more useful for investigation, debugging, and root cause analysis. |
| 10 | + |
| 11 | +The current format is now strong enough for: |
| 12 | + • tracing likely upstream sources of a false claim in the final report |
| 13 | + • narrowing the search to a small set of relevant artifacts |
| 14 | + • identifying which node and source files are likely responsible |
| 15 | + • distinguishing between a claim that was inherited versus a claim that was introduced later |
| 16 | + |
| 17 | +However, it is still not enough for perfect claim-level attribution. It can identify likely culprit artifacts and nodes, but it cannot yet prove exactly which transformation step introduced a specific false sentence. |
| 18 | + |
| 19 | +What the JSON format currently does well |
| 20 | + |
| 21 | +1. It models the pipeline as dataflow, not just control flow |
| 22 | + |
| 23 | +The biggest improvement is replacing broad dependency links with explicit inputs. |
| 24 | + |
| 25 | +Instead of saying: |
| 26 | + • node A depends on node B |
| 27 | + |
| 28 | +it now says: |
| 29 | + • node A consumes artifact X from node B |
| 30 | + |
| 31 | +That is much more useful for debugging. |
| 32 | + |
| 33 | +For RCA, this matters because a false claim is usually carried by data, not just by execution order. |
| 34 | + |
| 35 | +2. It makes artifact lineage inspectable |
| 36 | + |
| 37 | +Each node now exposes: |
| 38 | + • its produced artifacts |
| 39 | + • its consumed artifacts |
| 40 | + • the source files associated with its implementation |
| 41 | + |
| 42 | +This allows backward tracing from the final report to upstream intermediate artifacts. |
| 43 | + |
| 44 | +3. It provides a practical bridge from output to code |
| 45 | + |
| 46 | +The source_files section helps connect: |
| 47 | + • the artifact chain |
| 48 | +n- the node |
| 49 | + • the relevant Python files |
| 50 | + |
| 51 | +That means the JSON is useful not only for graph inspection, but also for code-level investigation. |
| 52 | + |
| 53 | +4. It is now usable for structured debugging |
| 54 | + |
| 55 | +A false claim in the final report can now be investigated as a graph traversal problem: |
| 56 | + 1. find the report node |
| 57 | + 2. inspect its input artifacts |
| 58 | + 3. search those artifacts for the false claim |
| 59 | + 4. when the claim is found upstream, recurse into that node’s inputs |
| 60 | + 5. continue until reaching the earliest artifact that contains the false claim |
| 61 | + 6. inspect that node’s source files |
| 62 | + |
| 63 | +That is already a workable investigation process. |
| 64 | + |
| 65 | +What the format still lacks |
| 66 | + |
| 67 | +1. Artifact semantics are too thin |
| 68 | + |
| 69 | +Artifacts currently have only a path. |
| 70 | + |
| 71 | +That is useful, but weak. |
| 72 | + |
| 73 | +The format would be stronger if artifacts also had explicit metadata such as: |
| 74 | + • id |
| 75 | + • format |
| 76 | + • role |
| 77 | + • is_intermediate |
| 78 | + • audience |
| 79 | + |
| 80 | +Example: |
| 81 | + |
| 82 | +{ |
| 83 | + "id": "executive_summary_markdown", |
| 84 | + "path": "025-2-executive_summary.md", |
| 85 | + "format": "md", |
| 86 | + "role": "summary_markdown" |
| 87 | +} |
| 88 | + |
| 89 | +This would make it easier to reason about whether a false claim likely originated in: |
| 90 | + • raw generation |
| 91 | + • normalized machine-readable output |
| 92 | + • markdown rendering |
| 93 | + • final report assembly |
| 94 | + |
| 95 | +2. Claim-level provenance is missing |
| 96 | + |
| 97 | +The current graph can tell us: |
| 98 | + • which artifacts fed into a node |
| 99 | + • which nodes likely influenced the report |
| 100 | + |
| 101 | +But it cannot tell us: |
| 102 | + • which sentence in the report came from which artifact section |
| 103 | + • whether a sentence was synthesized from several artifacts |
| 104 | + • whether the final renderer introduced a fresh falsehood |
| 105 | + |
| 106 | +This is the main gap between good artifact-level RCA and true claim-level RCA. |
| 107 | + |
| 108 | +3. Runtime behavior is not captured |
| 109 | + |
| 110 | +The JSON describes intended structure, not actual execution. |
| 111 | + |
| 112 | +It does not capture: |
| 113 | + • prompt inputs actually loaded at runtime |
| 114 | + • truncation behavior |
| 115 | + • model configuration |
| 116 | + • retries |
| 117 | + • prompt templates |
| 118 | + • hashes of input and output artifacts |
| 119 | + • whether source files were actually included in prompt context |
| 120 | + |
| 121 | +For LLM-heavy systems, this missing runtime provenance is important. |
| 122 | + |
| 123 | +4. The graph does not encode why an input was used |
| 124 | + |
| 125 | +Right now, an input edge says: |
| 126 | + • this artifact was used |
| 127 | + |
| 128 | +But not: |
| 129 | + • what it was used for |
| 130 | + |
| 131 | +A stronger format could allow fields like: |
| 132 | + |
| 133 | +{ |
| 134 | + "from_node": "executive_summary", |
| 135 | + "artifact_path": "025-2-executive_summary.md", |
| 136 | + "used_for": "decision-maker summary section" |
| 137 | +} |
| 138 | + |
| 139 | +That would improve interpretability during investigation. |
| 140 | + |
| 141 | +How RCA can work with the current format |
| 142 | + |
| 143 | +Goal |
| 144 | + |
| 145 | +The goal of RCA is to answer questions like: |
| 146 | + • Why is a false claim shown in 030-report.html? |
| 147 | + • Which upstream artifact first contained it? |
| 148 | + • Which node likely introduced it? |
| 149 | + • Which source file should be inspected first? |
| 150 | + |
| 151 | +Investigation strategy |
| 152 | + |
| 153 | +Step 1: Start from the final artifact |
| 154 | + |
| 155 | +Begin with the final output artifact, such as: |
| 156 | + • 030-report.html |
| 157 | + |
| 158 | +Find the node that produces it. |
| 159 | + |
| 160 | +Step 2: Inspect direct inputs to the final node |
| 161 | + |
| 162 | +Look at the report node’s inputs. |
| 163 | + |
| 164 | +These are the first suspects. |
| 165 | + |
| 166 | +Check whether the false claim exists in any of those artifacts. |
| 167 | + |
| 168 | +Typical possibilities include: |
| 169 | + • executive_summary.md |
| 170 | + • review_plan.md |
| 171 | + • questions_and_answers.md |
| 172 | + • premortem.md |
| 173 | + • project_plan.md |
| 174 | + • team.md |
| 175 | + |
| 176 | +Step 3: Find the earliest artifact containing the claim |
| 177 | + |
| 178 | +Once a matching upstream artifact is found, move to the node that produced it. |
| 179 | + |
| 180 | +Then inspect that node’s own inputs. |
| 181 | + |
| 182 | +Repeat the process until reaching the earliest artifact where the false claim appears. |
| 183 | + |
| 184 | +That artifact is the best candidate for the first introduction point. |
| 185 | + |
| 186 | +Step 4: Inspect the producing node’s source files |
| 187 | + |
| 188 | +Once the likely introduction node has been found, inspect its source_files. |
| 189 | + |
| 190 | +In practice, the first files to inspect are often: |
| 191 | + • the workflow_node file for orchestration and wiring |
| 192 | + • the business_logic file for actual transformation logic |
| 193 | + |
| 194 | +Step 5: Classify the failure mode |
| 195 | + |
| 196 | +Once the suspect node is identified, classify the false claim into one of these rough categories: |
| 197 | + • input falsehood: the claim was already present upstream |
| 198 | + • transformation error: the node misread or distorted upstream content |
| 199 | + • summarization drift: the claim changed during markdown or summary generation |
| 200 | + • aggregation error: several true inputs were combined into a false conclusion |
| 201 | + • renderer error: the final report step introduced or misformatted the claim |
| 202 | + • prompt-induced hallucination: the LLM invented unsupported content |
| 203 | + |
| 204 | +This classification matters because the fix depends on the failure mode. |
| 205 | + |
| 206 | +Example RCA flow |
| 207 | + |
| 208 | +Suppose the final report contains the false claim: |
| 209 | + |
| 210 | +The project requires 12 full-time engineers. |
| 211 | + |
| 212 | +A practical investigation would look like this: |
| 213 | + 1. search 030-report.html for the claim |
| 214 | + 2. inspect the report node inputs |
| 215 | + 3. search 025-2-executive_summary.md |
| 216 | + 4. search 024-2-review_plan.md |
| 217 | + 5. search 013-team.md |
| 218 | + 6. if the claim appears in 013-team.md, inspect the team_markdown node |
| 219 | + 7. inspect that node’s inputs: |
| 220 | + • 011-2-enrich_team_members_environment_info.json |
| 221 | + • 012-review_team_raw.json |
| 222 | + 8. search those artifacts for the same claim or the numeric value |
| 223 | + 9. continue upstream until the earliest occurrence is found |
| 224 | + 10. inspect the producing node’s source_files |
| 225 | + |
| 226 | +This gives a clear investigation trail from report output back to likely code. |
| 227 | + |
| 228 | +What the current format is sufficient for |
| 229 | + |
| 230 | +The current format is sufficient for: |
| 231 | + • artifact-chain investigation |
| 232 | + • identifying likely upstream culprit nodes |
| 233 | + • narrowing debugging scope |
| 234 | + • inspecting transformation paths |
| 235 | + • connecting output problems to relevant code files |
| 236 | + |
| 237 | +That is already very useful. |
| 238 | + |
| 239 | +What the current format is not sufficient for |
| 240 | + |
| 241 | +The current format is not sufficient for: |
| 242 | + • proving which exact sentence transformation introduced a false claim |
| 243 | + • attributing a sentence to a specific prompt span |
| 244 | + • reconstructing exact runtime prompt context |
| 245 | + • distinguishing between listed inputs and actually attended inputs |
| 246 | + • auditing LLM behavior at a fine-grained level |
| 247 | + |
| 248 | +So the format is good for investigation, but not perfect for forensic proof. |
| 249 | + |
| 250 | +Recommended improvements |
| 251 | + |
| 252 | +1. Give artifacts stable IDs and metadata |
| 253 | + |
| 254 | +Example: |
| 255 | + |
| 256 | +{ |
| 257 | + "id": "review_plan_markdown", |
| 258 | + "path": "024-2-review_plan.md", |
| 259 | + "format": "md", |
| 260 | + "role": "review_output" |
| 261 | +} |
| 262 | + |
| 263 | +2. Add optional purpose information to inputs |
| 264 | + |
| 265 | +Example: |
| 266 | + |
| 267 | +{ |
| 268 | + "from_node": "review_plan", |
| 269 | + "artifact_path": "024-2-review_plan.md", |
| 270 | + "used_for": "quality review section" |
| 271 | +} |
| 272 | + |
| 273 | +3. Add node kind metadata |
| 274 | + |
| 275 | +Examples: |
| 276 | + • generator |
| 277 | + • validator |
| 278 | + • formatter |
| 279 | + • consolidator |
| 280 | + • report_assembler |
| 281 | + • diagnostic |
| 282 | + |
| 283 | +This helps distinguish between nodes that are likely to introduce content versus those that mostly reformat it. |
| 284 | + |
| 285 | +4. Add runtime provenance logs outside the DAG schema |
| 286 | + |
| 287 | +For example: |
| 288 | + • run id |
| 289 | + • input artifact hashes |
| 290 | + • output artifact hashes |
| 291 | + • prompt inputs used |
| 292 | + • source files loaded into prompt |
| 293 | + • model name |
| 294 | + • prompt template version |
| 295 | + • temperature |
| 296 | + |
| 297 | +This is likely more important than making the static DAG infinitely rich. |
| 298 | + |
| 299 | +5. Add claim-level citations in generated outputs |
| 300 | + |
| 301 | +The strongest future improvement would be to make generated markdown and report outputs carry explicit source references. |
| 302 | + |
| 303 | +For example, each section or bullet could include: |
| 304 | + • source artifact ids |
| 305 | + • source node ids |
| 306 | + • source spans or source field names |
| 307 | + |
| 308 | +That would make false-claim RCA much easier. |
| 309 | + |
| 310 | +Final assessment |
| 311 | + |
| 312 | +The JSON format has evolved into a strong artifact-level provenance graph. |
| 313 | + |
| 314 | +That is a major improvement over a plain DAG export. |
| 315 | + |
| 316 | +It is now good enough for practical root cause analysis in many cases, especially when the goal is to trace a false claim back to the earliest upstream artifact and likely responsible node. |
| 317 | + |
| 318 | +However, it is still not a full forensic provenance system. |
| 319 | + |
| 320 | +The current format can: |
| 321 | + • identify suspects |
| 322 | + • trace evidence flow |
| 323 | + • narrow the search space |
| 324 | + • connect artifacts to code |
| 325 | + |
| 326 | +But it still cannot fully: |
| 327 | + • prove which exact transformation introduced a false sentence |
| 328 | + • reconstruct the exact model context |
| 329 | + • show claim-level attribution end to end |
| 330 | + |
| 331 | +So the right conclusion is: |
| 332 | + • the format is already useful and worth keeping |
| 333 | + • artifact-level inputs was the right move |
| 334 | + • the next frontier is runtime provenance and claim-level traceability |
0 commit comments