Skip to content

Commit 15d206c

Browse files
authored
Merge pull request #540 from PlanExeOrg/docs/dag-and-rca
docs: add proposal 133 — DAG format insights and RCA strategy
2 parents e59ee49 + 0ce3b45 commit 15d206c

1 file changed

Lines changed: 334 additions & 0 deletions

File tree

docs/proposals/133-dag-and-rca.md

Lines changed: 334 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,334 @@
1+
Insights on the DAG JSON Format and How RCA Can Work
2+
3+
Executive summary
4+
5+
The current JSON format has improved from a simple DAG export into a usable artifact-level provenance map.
6+
7+
That is a meaningful shift.
8+
9+
Earlier, the graph mainly described node-to-node dependency structure. Now it describes which concrete artifacts flow into which nodes. That makes the format much more useful for investigation, debugging, and root cause analysis.
10+
11+
The current format is now strong enough for:
12+
• tracing likely upstream sources of a false claim in the final report
13+
• narrowing the search to a small set of relevant artifacts
14+
• identifying which node and source files are likely responsible
15+
• distinguishing between a claim that was inherited versus a claim that was introduced later
16+
17+
However, it is still not enough for perfect claim-level attribution. It can identify likely culprit artifacts and nodes, but it cannot yet prove exactly which transformation step introduced a specific false sentence.
18+
19+
What the JSON format currently does well
20+
21+
1. It models the pipeline as dataflow, not just control flow
22+
23+
The biggest improvement is replacing broad dependency links with explicit inputs.
24+
25+
Instead of saying:
26+
• node A depends on node B
27+
28+
it now says:
29+
• node A consumes artifact X from node B
30+
31+
That is much more useful for debugging.
32+
33+
For RCA, this matters because a false claim is usually carried by data, not just by execution order.
34+
35+
2. It makes artifact lineage inspectable
36+
37+
Each node now exposes:
38+
• its produced artifacts
39+
• its consumed artifacts
40+
• the source files associated with its implementation
41+
42+
This allows backward tracing from the final report to upstream intermediate artifacts.
43+
44+
3. It provides a practical bridge from output to code
45+
46+
The source_files section helps connect:
47+
• the artifact chain
48+
n- the node
49+
• the relevant Python files
50+
51+
That means the JSON is useful not only for graph inspection, but also for code-level investigation.
52+
53+
4. It is now usable for structured debugging
54+
55+
A false claim in the final report can now be investigated as a graph traversal problem:
56+
1. find the report node
57+
2. inspect its input artifacts
58+
3. search those artifacts for the false claim
59+
4. when the claim is found upstream, recurse into that node’s inputs
60+
5. continue until reaching the earliest artifact that contains the false claim
61+
6. inspect that node’s source files
62+
63+
That is already a workable investigation process.
64+
65+
What the format still lacks
66+
67+
1. Artifact semantics are too thin
68+
69+
Artifacts currently have only a path.
70+
71+
That is useful, but weak.
72+
73+
The format would be stronger if artifacts also had explicit metadata such as:
74+
• id
75+
• format
76+
• role
77+
• is_intermediate
78+
• audience
79+
80+
Example:
81+
82+
{
83+
"id": "executive_summary_markdown",
84+
"path": "025-2-executive_summary.md",
85+
"format": "md",
86+
"role": "summary_markdown"
87+
}
88+
89+
This would make it easier to reason about whether a false claim likely originated in:
90+
• raw generation
91+
• normalized machine-readable output
92+
• markdown rendering
93+
• final report assembly
94+
95+
2. Claim-level provenance is missing
96+
97+
The current graph can tell us:
98+
• which artifacts fed into a node
99+
• which nodes likely influenced the report
100+
101+
But it cannot tell us:
102+
• which sentence in the report came from which artifact section
103+
• whether a sentence was synthesized from several artifacts
104+
• whether the final renderer introduced a fresh falsehood
105+
106+
This is the main gap between good artifact-level RCA and true claim-level RCA.
107+
108+
3. Runtime behavior is not captured
109+
110+
The JSON describes intended structure, not actual execution.
111+
112+
It does not capture:
113+
• prompt inputs actually loaded at runtime
114+
• truncation behavior
115+
• model configuration
116+
• retries
117+
• prompt templates
118+
• hashes of input and output artifacts
119+
• whether source files were actually included in prompt context
120+
121+
For LLM-heavy systems, this missing runtime provenance is important.
122+
123+
4. The graph does not encode why an input was used
124+
125+
Right now, an input edge says:
126+
• this artifact was used
127+
128+
But not:
129+
• what it was used for
130+
131+
A stronger format could allow fields like:
132+
133+
{
134+
"from_node": "executive_summary",
135+
"artifact_path": "025-2-executive_summary.md",
136+
"used_for": "decision-maker summary section"
137+
}
138+
139+
That would improve interpretability during investigation.
140+
141+
How RCA can work with the current format
142+
143+
Goal
144+
145+
The goal of RCA is to answer questions like:
146+
• Why is a false claim shown in 030-report.html?
147+
• Which upstream artifact first contained it?
148+
• Which node likely introduced it?
149+
• Which source file should be inspected first?
150+
151+
Investigation strategy
152+
153+
Step 1: Start from the final artifact
154+
155+
Begin with the final output artifact, such as:
156+
• 030-report.html
157+
158+
Find the node that produces it.
159+
160+
Step 2: Inspect direct inputs to the final node
161+
162+
Look at the report node’s inputs.
163+
164+
These are the first suspects.
165+
166+
Check whether the false claim exists in any of those artifacts.
167+
168+
Typical possibilities include:
169+
• executive_summary.md
170+
• review_plan.md
171+
• questions_and_answers.md
172+
• premortem.md
173+
• project_plan.md
174+
• team.md
175+
176+
Step 3: Find the earliest artifact containing the claim
177+
178+
Once a matching upstream artifact is found, move to the node that produced it.
179+
180+
Then inspect that node’s own inputs.
181+
182+
Repeat the process until reaching the earliest artifact where the false claim appears.
183+
184+
That artifact is the best candidate for the first introduction point.
185+
186+
Step 4: Inspect the producing node’s source files
187+
188+
Once the likely introduction node has been found, inspect its source_files.
189+
190+
In practice, the first files to inspect are often:
191+
• the workflow_node file for orchestration and wiring
192+
• the business_logic file for actual transformation logic
193+
194+
Step 5: Classify the failure mode
195+
196+
Once the suspect node is identified, classify the false claim into one of these rough categories:
197+
• input falsehood: the claim was already present upstream
198+
• transformation error: the node misread or distorted upstream content
199+
• summarization drift: the claim changed during markdown or summary generation
200+
• aggregation error: several true inputs were combined into a false conclusion
201+
• renderer error: the final report step introduced or misformatted the claim
202+
• prompt-induced hallucination: the LLM invented unsupported content
203+
204+
This classification matters because the fix depends on the failure mode.
205+
206+
Example RCA flow
207+
208+
Suppose the final report contains the false claim:
209+
210+
The project requires 12 full-time engineers.
211+
212+
A practical investigation would look like this:
213+
1. search 030-report.html for the claim
214+
2. inspect the report node inputs
215+
3. search 025-2-executive_summary.md
216+
4. search 024-2-review_plan.md
217+
5. search 013-team.md
218+
6. if the claim appears in 013-team.md, inspect the team_markdown node
219+
7. inspect that node’s inputs:
220+
• 011-2-enrich_team_members_environment_info.json
221+
• 012-review_team_raw.json
222+
8. search those artifacts for the same claim or the numeric value
223+
9. continue upstream until the earliest occurrence is found
224+
10. inspect the producing node’s source_files
225+
226+
This gives a clear investigation trail from report output back to likely code.
227+
228+
What the current format is sufficient for
229+
230+
The current format is sufficient for:
231+
• artifact-chain investigation
232+
• identifying likely upstream culprit nodes
233+
• narrowing debugging scope
234+
• inspecting transformation paths
235+
• connecting output problems to relevant code files
236+
237+
That is already very useful.
238+
239+
What the current format is not sufficient for
240+
241+
The current format is not sufficient for:
242+
• proving which exact sentence transformation introduced a false claim
243+
• attributing a sentence to a specific prompt span
244+
• reconstructing exact runtime prompt context
245+
• distinguishing between listed inputs and actually attended inputs
246+
• auditing LLM behavior at a fine-grained level
247+
248+
So the format is good for investigation, but not perfect for forensic proof.
249+
250+
Recommended improvements
251+
252+
1. Give artifacts stable IDs and metadata
253+
254+
Example:
255+
256+
{
257+
"id": "review_plan_markdown",
258+
"path": "024-2-review_plan.md",
259+
"format": "md",
260+
"role": "review_output"
261+
}
262+
263+
2. Add optional purpose information to inputs
264+
265+
Example:
266+
267+
{
268+
"from_node": "review_plan",
269+
"artifact_path": "024-2-review_plan.md",
270+
"used_for": "quality review section"
271+
}
272+
273+
3. Add node kind metadata
274+
275+
Examples:
276+
• generator
277+
• validator
278+
• formatter
279+
• consolidator
280+
• report_assembler
281+
• diagnostic
282+
283+
This helps distinguish between nodes that are likely to introduce content versus those that mostly reformat it.
284+
285+
4. Add runtime provenance logs outside the DAG schema
286+
287+
For example:
288+
• run id
289+
• input artifact hashes
290+
• output artifact hashes
291+
• prompt inputs used
292+
• source files loaded into prompt
293+
• model name
294+
• prompt template version
295+
• temperature
296+
297+
This is likely more important than making the static DAG infinitely rich.
298+
299+
5. Add claim-level citations in generated outputs
300+
301+
The strongest future improvement would be to make generated markdown and report outputs carry explicit source references.
302+
303+
For example, each section or bullet could include:
304+
• source artifact ids
305+
• source node ids
306+
• source spans or source field names
307+
308+
That would make false-claim RCA much easier.
309+
310+
Final assessment
311+
312+
The JSON format has evolved into a strong artifact-level provenance graph.
313+
314+
That is a major improvement over a plain DAG export.
315+
316+
It is now good enough for practical root cause analysis in many cases, especially when the goal is to trace a false claim back to the earliest upstream artifact and likely responsible node.
317+
318+
However, it is still not a full forensic provenance system.
319+
320+
The current format can:
321+
• identify suspects
322+
• trace evidence flow
323+
• narrow the search space
324+
• connect artifacts to code
325+
326+
But it still cannot fully:
327+
• prove which exact transformation introduced a false sentence
328+
• reconstruct the exact model context
329+
• show claim-level attribution end to end
330+
331+
So the right conclusion is:
332+
• the format is already useful and worth keeping
333+
• artifact-level inputs was the right move
334+
• the next frontier is runtime provenance and claim-level traceability

0 commit comments

Comments
 (0)