Add log mechanism to proritise GC#26
Open
bharathappali wants to merge 3 commits into
Open
Conversation
Signed-off-by: bharathappali <abharath@redhat.com>
Reviewer's GuideAdds a GC-focused log summarization mechanism by parsing GC log lines into structured events, computing per-type statistics, and injecting high-weight GC summaries and representative lines into the existing LogOptimizer weighting pipeline. Sequence diagram for GC-aware log summarization in LogOptimizersequenceDiagram
participant LO as LogOptimizer
participant GP as GcParser
participant GE as GcEvent
participant GS as GcStats
LO->>GP: parseEvents(rawLines)
GP-->>LO: List<GcEvent> events
alt events not empty
LO->>GS: create young GcStats
LO->>GS: create full GcStats
loop for each GcEvent e in events
LO->>GS: young.add(e) when e.type == Young
LO->>GS: full.add(e) when e.type == Full
end
LO->>GP: buildStatsReport(YOUNG_GC, young, false)
GP-->>LO: String youngGcReport
LO->>GP: buildStatsReport(FULL_GC, full, false)
GP-->>LO: String fullGcReport
LO->>GP: selectDistributedLines(events, MAX_REPRESENTATIVE_LINES)
GP-->>LO: List<String> GC_REP_LINES
LO->>LO: lineWeights.merge(youngGcReport, 9)
LO->>LO: lineWeights.merge(fullGcReport, 9)
loop for each line in GC_REP_LINES
LO->>LO: lineWeights.merge(line, 1)
end
end
Class diagram for new GC log summarization componentsclassDiagram
class LogOptimizer {
- int MAX_REPRESENTATIVE_LINES
- LogSummary buildLogSummary(List~String~ rawLines, List~String~ deduped)
}
class GcParser {
- Pattern GC_PATTERN
+ List~GcEvent~ parseEvents(List~String~ lines)
- double percentile(List~Double~ sorted, int p)
+ String buildStatsReport(String label, GcStats gcStats, boolean printOutput)
+ List~String~ selectDistributedLines(List~GcEvent~ events, int maxLines)
}
class GcEvent {
+ double time
+ String type
+ String reason
+ double pause
+ double heapReduction
+ double reclaimRatio
+ String rawLine
+ GcEvent(double time, String type, String reason, double pause, double heapReduction, double reclaimRatio, String rawLine)
}
class GcStats {
+ List~Double~ pauses
+ List~Double~ heapReductions
+ List~Double~ reclaimRatios
+ List~GcEvent~ events
+ Map~String, Integer~ reasonCounts
+ void add(GcEvent e)
}
LogOptimizer ..> GcParser : uses
GcParser ..> GcEvent : creates
GcParser ..> GcStats : builds reports from
GcStats o--> GcEvent : aggregates
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- In
LogOptimizer.buildLogSummary, you’re adding the whole GC stats report strings as single map keys inlineWeights; consider splitting them into individual lines before weighting so they integrate consistently with the existing per-line weighting logic. - The
GcEventandGcStatsclasses expose most fields as package-private and mutable; consider making fields private (and preferably final where possible) with accessors to avoid accidental external mutation and to clarify the intended API surface. - The
buildStatsReportmethod mixes pure computation with optionalSystem.out.printlnside effects via theprintOutputflag; consider separating formatting from output (e.g., return-only here and let callers decide where to print) to keep the parser utility side-effect free.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `LogOptimizer.buildLogSummary`, you’re adding the whole GC stats report strings as single map keys in `lineWeights`; consider splitting them into individual lines before weighting so they integrate consistently with the existing per-line weighting logic.
- The `GcEvent` and `GcStats` classes expose most fields as package-private and mutable; consider making fields private (and preferably final where possible) with accessors to avoid accidental external mutation and to clarify the intended API surface.
- The `buildStatsReport` method mixes pure computation with optional `System.out.println` side effects via the `printOutput` flag; consider separating formatting from output (e.g., return-only here and let callers decide where to print) to keep the parser utility side-effect free.
## Individual Comments
### Comment 1
<location path="src/main/java/com/causa/rca/utils/GcParser.java" line_range="28-33" />
<code_context>
+ String type = m.group(2);
+ String reason = m.group(3);
+
+ double heapBefore = Double.parseDouble(m.group(4));
+ double heapAfter = Double.parseDouble(m.group(5));
+ double pause = Double.parseDouble(m.group(6));
+
+ double reduction = heapBefore - heapAfter;
+ double ratio = (reduction / heapBefore) * 100;
+
+ events.add(
</code_context>
<issue_to_address>
**issue (bug_risk):** Guard against division by zero when computing reclaim ratio if heapBefore is 0.
If `heapBefore` can be 0 in these logs, `(reduction / heapBefore)` will result in division-by-zero (Infinity/NaN or an exception). Please handle the `heapBefore == 0` case explicitly (e.g., skip the event, clamp the ratio, or set it to 0 with a brief comment).
</issue_to_address>
### Comment 2
<location path="src/main/java/com/causa/rca/utils/GcParser.java" line_range="175-181" />
<code_context>
+ return result;
+ }
+
+ public static List<String> selectDistributedLines(List<GcEvent> events,
+ int maxLines) {
+ if (events.isEmpty())
+ return List.of();
+
+ events.sort(Comparator.comparingDouble(e -> e.time));
+ int step = Math.max(1, events.size()/maxLines);
+ List<String> result = new ArrayList<>();
+ for (int i=0;i<events.size();i+=step) {
</code_context>
<issue_to_address>
**issue (bug_risk):** Handle maxLines == 0 to avoid division-by-zero in selectDistributedLines.
`events.size() / maxLines` will throw an `ArithmeticException` when `maxLines` is 0. Since this is a public utility, please guard against `maxLines <= 0` (e.g., return an empty list early) instead of assuming callers always pass a positive value.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Comment on lines
+175
to
+181
| public static List<String> selectDistributedLines(List<GcEvent> events, | ||
| int maxLines) { | ||
| if (events.isEmpty()) | ||
| return List.of(); | ||
|
|
||
| events.sort(Comparator.comparingDouble(e -> e.time)); | ||
| int step = Math.max(1, events.size()/maxLines); |
There was a problem hiding this comment.
issue (bug_risk): Handle maxLines == 0 to avoid division-by-zero in selectDistributedLines.
events.size() / maxLines will throw an ArithmeticException when maxLines is 0. Since this is a public utility, please guard against maxLines <= 0 (e.g., return an empty list early) instead of assuming callers always pass a positive value.
Signed-off-by: bharathappali <abharath@redhat.com>
Signed-off-by: bharathappali <abharath@redhat.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds the mechanism to build the LOG summary for GC related logs for LLM's better understanding
Summary by Sourcery
Prioritize GC-related information in log summaries by parsing JVM GC logs into structured events and statistics, then boosting their representation in the optimized log output.
New Features:
Enhancements: