Skip to content

Add log mechanism to proritise GC#26

Open
bharathappali wants to merge 3 commits into
causaai:pocfrom
bharathappali:add-gc-pause
Open

Add log mechanism to proritise GC#26
bharathappali wants to merge 3 commits into
causaai:pocfrom
bharathappali:add-gc-pause

Conversation

@bharathappali
Copy link
Copy Markdown
Contributor

@bharathappali bharathappali commented Mar 16, 2026

This PR adds the mechanism to build the LOG summary for GC related logs for LLM's better understanding

Summary by Sourcery

Prioritize GC-related information in log summaries by parsing JVM GC logs into structured events and statistics, then boosting their representation in the optimized log output.

New Features:

  • Introduce GC log parsing to extract structured Young and Full GC events from raw JVM logs.
  • Generate aggregated GC statistics reports, including counts, reasons, pause-time, and heap-reclamation metrics for different GC types.

Enhancements:

  • Adjust log weighting to give higher priority to GC summaries and representative GC log lines in the log optimizer.

Signed-off-by: bharathappali <abharath@redhat.com>
@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented Mar 16, 2026

Reviewer's Guide

Adds a GC-focused log summarization mechanism by parsing GC log lines into structured events, computing per-type statistics, and injecting high-weight GC summaries and representative lines into the existing LogOptimizer weighting pipeline.

Sequence diagram for GC-aware log summarization in LogOptimizer

sequenceDiagram
    participant LO as LogOptimizer
    participant GP as GcParser
    participant GE as GcEvent
    participant GS as GcStats

    LO->>GP: parseEvents(rawLines)
    GP-->>LO: List<GcEvent> events

    alt events not empty
        LO->>GS: create young GcStats
        LO->>GS: create full GcStats

        loop for each GcEvent e in events
            LO->>GS: young.add(e) when e.type == Young
            LO->>GS: full.add(e) when e.type == Full
        end

        LO->>GP: buildStatsReport(YOUNG_GC, young, false)
        GP-->>LO: String youngGcReport

        LO->>GP: buildStatsReport(FULL_GC, full, false)
        GP-->>LO: String fullGcReport

        LO->>GP: selectDistributedLines(events, MAX_REPRESENTATIVE_LINES)
        GP-->>LO: List<String> GC_REP_LINES

        LO->>LO: lineWeights.merge(youngGcReport, 9)
        LO->>LO: lineWeights.merge(fullGcReport, 9)
        loop for each line in GC_REP_LINES
            LO->>LO: lineWeights.merge(line, 1)
        end
    end
Loading

Class diagram for new GC log summarization components

classDiagram
    class LogOptimizer {
        - int MAX_REPRESENTATIVE_LINES
        - LogSummary buildLogSummary(List~String~ rawLines, List~String~ deduped)
    }

    class GcParser {
        - Pattern GC_PATTERN
        + List~GcEvent~ parseEvents(List~String~ lines)
        - double percentile(List~Double~ sorted, int p)
        + String buildStatsReport(String label, GcStats gcStats, boolean printOutput)
        + List~String~ selectDistributedLines(List~GcEvent~ events, int maxLines)
    }

    class GcEvent {
        + double time
        + String type
        + String reason
        + double pause
        + double heapReduction
        + double reclaimRatio
        + String rawLine
        + GcEvent(double time, String type, String reason, double pause, double heapReduction, double reclaimRatio, String rawLine)
    }

    class GcStats {
        + List~Double~ pauses
        + List~Double~ heapReductions
        + List~Double~ reclaimRatios
        + List~GcEvent~ events
        + Map~String, Integer~ reasonCounts
        + void add(GcEvent e)
    }

    LogOptimizer ..> GcParser : uses
    GcParser ..> GcEvent : creates
    GcParser ..> GcStats : builds reports from
    GcStats o--> GcEvent : aggregates
Loading

File-Level Changes

Change Details Files
Integrate GC parsing and prioritization into existing log summary weighting.
  • Update base weighting comment to indicate GC lines are treated with high priority alongside ERROR
  • Parse raw log lines into GC events and split them into Young and Full categories
  • Generate human-readable GC statistics summaries for Young and Full GC and assign them elevated weights in the summary
  • Select a time-distributed subset of raw GC log lines and add them as low-weight representative lines
src/main/java/com/causa/rca/utils/LogOptimizer.java
src/main/java/com/causa/rca/utils/GcParser.java
src/main/java/com/causa/rca/model/artifact/jvm/gc/GcEvent.java
src/main/java/com/causa/rca/model/artifact/jvm/gc/GcStats.java
Introduce a GC parsing utility and data model to extract and aggregate GC metrics from JVM logs.
  • Add GcParser with regex-based extraction of timestamp, type, reason, heap sizes, and pause time from GC log lines
  • Compute derived GC metrics such as heap reduction, reclaim ratio, and percentile statistics for pauses and reclaimed heap
  • Provide a textual GC statistics report builder with optional console printing and support for selecting evenly distributed GC raw lines
  • Define GcEvent to hold parsed GC data and GcStats to accumulate per-event metrics and trigger reason counts
src/main/java/com/causa/rca/utils/GcParser.java
src/main/java/com/causa/rca/model/artifact/jvm/gc/GcEvent.java
src/main/java/com/causa/rca/model/artifact/jvm/gc/GcStats.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • In LogOptimizer.buildLogSummary, you’re adding the whole GC stats report strings as single map keys in lineWeights; consider splitting them into individual lines before weighting so they integrate consistently with the existing per-line weighting logic.
  • The GcEvent and GcStats classes expose most fields as package-private and mutable; consider making fields private (and preferably final where possible) with accessors to avoid accidental external mutation and to clarify the intended API surface.
  • The buildStatsReport method mixes pure computation with optional System.out.println side effects via the printOutput flag; consider separating formatting from output (e.g., return-only here and let callers decide where to print) to keep the parser utility side-effect free.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `LogOptimizer.buildLogSummary`, you’re adding the whole GC stats report strings as single map keys in `lineWeights`; consider splitting them into individual lines before weighting so they integrate consistently with the existing per-line weighting logic.
- The `GcEvent` and `GcStats` classes expose most fields as package-private and mutable; consider making fields private (and preferably final where possible) with accessors to avoid accidental external mutation and to clarify the intended API surface.
- The `buildStatsReport` method mixes pure computation with optional `System.out.println` side effects via the `printOutput` flag; consider separating formatting from output (e.g., return-only here and let callers decide where to print) to keep the parser utility side-effect free.

## Individual Comments

### Comment 1
<location path="src/main/java/com/causa/rca/utils/GcParser.java" line_range="28-33" />
<code_context>
+            String type = m.group(2);
+            String reason = m.group(3);
+
+            double heapBefore = Double.parseDouble(m.group(4));
+            double heapAfter = Double.parseDouble(m.group(5));
+            double pause = Double.parseDouble(m.group(6));
+
+            double reduction = heapBefore - heapAfter;
+            double ratio = (reduction / heapBefore) * 100;
+
+            events.add(
</code_context>
<issue_to_address>
**issue (bug_risk):** Guard against division by zero when computing reclaim ratio if heapBefore is 0.

If `heapBefore` can be 0 in these logs, `(reduction / heapBefore)` will result in division-by-zero (Infinity/NaN or an exception). Please handle the `heapBefore == 0` case explicitly (e.g., skip the event, clamp the ratio, or set it to 0 with a brief comment).
</issue_to_address>

### Comment 2
<location path="src/main/java/com/causa/rca/utils/GcParser.java" line_range="175-181" />
<code_context>
+        return result;
+    }
+
+    public static List<String> selectDistributedLines(List<GcEvent> events,
+                                                       int maxLines) {
+        if (events.isEmpty())
+            return List.of();
+
+        events.sort(Comparator.comparingDouble(e -> e.time));
+        int step = Math.max(1, events.size()/maxLines);
+        List<String> result = new ArrayList<>();
+        for (int i=0;i<events.size();i+=step) {
</code_context>
<issue_to_address>
**issue (bug_risk):** Handle maxLines == 0 to avoid division-by-zero in selectDistributedLines.

`events.size() / maxLines` will throw an `ArithmeticException` when `maxLines` is 0. Since this is a public utility, please guard against `maxLines <= 0` (e.g., return an empty list early) instead of assuming callers always pass a positive value.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread src/main/java/com/causa/rca/utils/GcParser.java
Comment on lines +175 to +181
public static List<String> selectDistributedLines(List<GcEvent> events,
int maxLines) {
if (events.isEmpty())
return List.of();

events.sort(Comparator.comparingDouble(e -> e.time));
int step = Math.max(1, events.size()/maxLines);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Handle maxLines == 0 to avoid division-by-zero in selectDistributedLines.

events.size() / maxLines will throw an ArithmeticException when maxLines is 0. Since this is a public utility, please guard against maxLines <= 0 (e.g., return an empty list early) instead of assuming callers always pass a positive value.

Signed-off-by: bharathappali <abharath@redhat.com>
Signed-off-by: bharathappali <abharath@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant