Skip to content

🛠️ Refactor: Split API handler into focused files#143

Open
lucasew wants to merge 1 commit intomasterfrom
refactor/split-api-handler-2307800879716948708
Open

🛠️ Refactor: Split API handler into focused files#143
lucasew wants to merge 1 commit intomasterfrom
refactor/split-api-handler-2307800879716948708

Conversation

@lucasew
Copy link
Owner

@lucasew lucasew commented Feb 13, 2026

Modularize api/index.go into separate files (client.go, fetch.go, render.go, request.go) to improve cohesion and maintainability.

  • Extract HTTP client and SSRF protection to api/client.go.
  • Extract article fetching logic to api/fetch.go.
  • Extract rendering and formatting logic to api/render.go.
  • Extract request validation and bot detection to api/request.go.
  • Keep api/index.go as the entry point and middleware orchestrator.
  • Update AGENTS.md to reflect the new location of userAgentPool.

This refactoring adheres to the Single Responsibility Principle and makes the codebase easier to navigate and test.


PR created automatically by Jules for task 2307800879716948708 started by @lucasew

Co-authored-by: lucasew <15693688+lucasew@users.noreply.github.com>
@google-labs-jules
Copy link
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@vercel
Copy link

vercel bot commented Feb 13, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
articleparser Error Error Feb 13, 2026 7:36am

@gemini-code-assist
Copy link

Summary of Changes

Hello @lucasew, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the api handler's architecture by modularizing its core functionalities. The monolithic api/index.go file has been broken down into several smaller, more focused files, each responsible for a specific aspect like HTTP client management, content fetching, rendering, or request validation. This refactoring enhances code organization, maintainability, and adherence to the Single Responsibility Principle, making the codebase easier to understand, test, and extend.

Highlights

  • Code Modularization: The monolithic api/index.go file has been refactored and split into four new, specialized files: client.go, fetch.go, render.go, and request.go.
  • HTTP Client and SSRF Protection: Dedicated api/client.go file introduced to manage the HTTP client, including a newSafeDialer for Server-Side Request Forgery (SSRF) protection.
  • Article Fetching Logic: All logic related to fetching remote articles, User-Agent rotation, and readability parsing has been moved to api/fetch.go.
  • Content Rendering and Formatting: A new api/render.go file now handles all aspects of rendering and formatting article content into HTML, Markdown, JSON, and plain text.
  • Request Validation and Bot Detection: Request-specific functionalities such as URL normalization, validation, LLM bot detection, and output format determination are now centralized in api/request.go.
  • Documentation Update: The AGENTS.md file has been updated to correctly reference the new location of the userAgentPool.
Changelog
  • AGENTS.md
    • Updated the documentation to correctly reference the new location of the userAgentPool in api/fetch.go.
  • api/client.go
    • Added a new file to centralize HTTP client configuration, including SSRF protection with newSafeDialer.
    • Defined constants for HTTP client timeouts and redirect limits.
  • api/fetch.go
    • Added a new file containing the ReadabilityParser instance and the userAgentPool.
    • Implemented getRandomUserAgent for User-Agent rotation.
    • Introduced fetchAndParse function for retrieving and parsing remote article content, including header spoofing and body size limiting.
  • api/index.go
    • Removed all HTTP client, fetching, rendering, request validation, and bot detection logic.
    • Reduced imports to only bytes, context, log, net/http, and time.
    • The file now primarily serves as the entry point and orchestrator, applying security headers and calling the newly separated functions.
  • api/render.go
    • Added a new file to manage all rendering aspects.
    • Contains the HTML Template string and DefaultTemplate instance.
    • Defined formatHandler type and implemented formatHTML, formatMarkdown, formatJSON, and formatText functions.
    • Includes formatters map and writeError function for consistent error responses.
  • api/request.go
    • Added a new file dedicated to request-related logic.
    • Contains llmUserAgents and isLLM for bot detection.
    • Implemented getFormat for determining output format based on query params, Accept header, or bot detection.
    • Provided reconstructTargetURL to handle URL parsing quirks.
    • Included normalizeAndValidateURL for cleaning and validating input URLs.
Activity
  • No specific activity (comments, reviews, progress updates) was provided in the context for this pull request.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request does a great job of refactoring the main API handler into smaller, more focused files, which significantly improves modularity and maintainability. The separation of concerns into client, fetch, render, and request files is logical and follows the Single Responsibility Principle.

I've added a couple of suggestions for improvement: one to make the SSRF protection logic in the HTTP client more direct and another to ensure the user-agent rotation works reliably across different Go versions. Overall, this is a solid improvement to the codebase.

Comment on lines +43 to +51
ips, err := net.LookupIP(host)
if err != nil {
return err
}
for _, ip := range ips {
if ip.IsPrivate() || ip.IsLoopback() || ip.IsLinkLocalUnicast() || ip.IsLinkLocalMulticast() || ip.IsUnspecified() {
return errors.New("refusing to connect to private network address")
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The address parameter in the Control function is already a resolved IP address string. Using net.ParseIP is more direct and efficient for this task than net.LookupIP, which is designed for DNS name resolution. This change avoids an unnecessary lookup and a loop.

            ip := net.ParseIP(host)
            if ip == nil {
                return fmt.Errorf("invalid IP address: %s", host)
            }
            if ip.IsPrivate() || ip.IsLoopback() || ip.IsLinkLocalUnicast() || ip.IsLinkLocalMulticast() || ip.IsUnspecified() {
                return errors.New("refusing to connect to private network address")
            }

Comment on lines +47 to +49
func getRandomUserAgent() string {
return userAgentPool[rand.Intn(len(userAgentPool))]
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

On Go versions prior to 1.20, the global math/rand source is deterministic unless seeded. This would cause rand.Intn to return the same value on every execution, meaning the same user agent would be used for every request, defeating the purpose of the user agent pool.

To ensure user agents are properly rotated across all Go versions and to follow best practices by avoiding stateful package-level functions, it's better to use a dedicated rand.Rand instance.

Here's how you can implement this:

// Add this at the package level in fetch.go.
// This will require importing the "time" package.
var userAgentRand = rand.New(rand.NewSource(time.Now().UnixNano()))

// Update getRandomUserAgent to use it.
func getRandomUserAgent() string {
	return userAgentPool[userAgentRand.Intn(len(userAgentPool))]
}

If you are certain the project is built with Go 1.20 or newer, the top-level rand.Intn is automatically seeded, and you can disregard this suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant