Skip to content

🧹 Janitor: Validate output format before processing URL#140

Open
lucasew wants to merge 1 commit intomasterfrom
janitor/validate-format-early-15649003222395193027
Open

🧹 Janitor: Validate output format before processing URL#140
lucasew wants to merge 1 commit intomasterfrom
janitor/validate-format-early-15649003222395193027

Conversation

@lucasew
Copy link
Owner

@lucasew lucasew commented Feb 12, 2026

What Changed
Moved the output format validation check to the beginning of the handler function in api/index.go. Added a regression test TestHandler_InvalidFormat_NoFetch in api/index_test.go.

Why This Helps
Previously, the application would parse the URL, resolve it, and fetch the content (potentially downloading up to 2MB) even if the requested output format (e.g., ?format=invalid) was not supported. This wasted resources (bandwidth, CPU) and increased the attack surface for DoS. By validating the format early, we "fail fast" and reject invalid requests immediately.

Before

  1. Request ?url=...&format=bad
  2. Server fetches URL (slow, expensive)
  3. Server renders content
  4. Server checks format -> "invalid format" (400)

After

  1. Request ?url=...&format=bad
  2. Server checks format -> "invalid format" (400)
  3. (No fetch occurs)

Verification

  • Run ./mise run test
  • TestHandler_InvalidFormat_NoFetch passes (confirms no network request is made).
  • All existing tests pass.

PR created automatically by Jules for task 15649003222395193027 started by @lucasew

- Moves `formatters` lookup in `handler` to occur before URL normalization and fetching.
- Adds `TestHandler_InvalidFormat_NoFetch` to verify that invalid formats do not trigger network requests.
- Updates `.jules/janitor.md` with the new pattern.

Co-authored-by: lucasew <15693688+lucasew@users.noreply.github.com>
@google-labs-jules
Copy link
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@vercel
Copy link

vercel bot commented Feb 12, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
articleparser Ready Ready Preview, Comment Feb 12, 2026 8:44am

@gemini-code-assist
Copy link

Summary of Changes

Hello @lucasew, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the efficiency and resilience of the API by implementing an "early exit" strategy for requests with invalid output formats. By validating the requested format at the very beginning of the request handling process, the system avoids unnecessary and potentially costly operations such as fetching external content, thereby conserving resources and mitigating potential denial-of-service attack vectors.

Highlights

  • Early Format Validation: The output format validation check has been moved to the beginning of the handler function in api/index.go.
  • Resource Optimization: This change prevents the application from performing resource-intensive operations (like URL parsing, resolution, and content fetching) for requests with invalid output formats, reducing bandwidth and CPU usage.
  • Regression Test: A new regression test, TestHandler_InvalidFormat_NoFetch, was added to api/index_test.go to ensure that invalid format requests fail immediately without attempting to fetch the URL.
Changelog
  • .jules/janitor.md
    • Added a new janitor task entry for validating resource-intensive inputs before starting operations.
  • api/index.go
    • Moved the output format validation check from later in the handler function to an earlier position.
    • Removed the redundant format validation check from its original location.
  • api/index_test.go
    • Imported errors and net packages.
    • Added TestHandler_InvalidFormat_NoFetch to verify that an invalid format request does not trigger an external URL fetch.
Activity
  • The pull request was automatically created by Jules for a task started by @lucasew.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses a resource-wasting issue by moving the output format validation to the beginning of the request handler. The 'fail fast' approach is a great improvement. The implementation is clean, and the new regression test TestHandler_InvalidFormat_NoFetch correctly verifies that no network request is made for invalid formats. I have one suggestion regarding the new test to improve its robustness for parallel execution.

func TestHandler_InvalidFormat_NoFetch(t *testing.T) {
// This test ensures that an invalid format fails immediately without attempting to fetch the URL.
oldClient := httpClient
defer func() { httpClient = oldClient }()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For improved test robustness, consider using t.Cleanup instead of defer. t.Cleanup is the modern Go practice for test cleanup and behaves more predictably, especially in complex test scenarios with subtests or t.Fatal.

Additionally, be aware that modifying this global httpClient makes tests stateful and not safe to run in parallel with t.Parallel(). While this PR follows an existing pattern, for future work, protecting access to this global with a sync.Mutex would be a good improvement to allow for parallel test execution.

Suggested change
defer func() { httpClient = oldClient }()
t.Cleanup(func() { httpClient = oldClient })

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant