Pagination fix #19

colinmoynes · 2026-01-27T14:57:48Z

📄 Summary

Briefly describe the purpose of this pull request. What problem does it solve or feature does it add?

🔍 Related Issues

Link to any related GitHub issues (e.g., Fixes #12, Closes #34):

🧪 Type of Change

Please check the relevant type tag for this PR title:

[FIX] Bug fix
[NEW] New thing
[REFACTOR] Internal changes such as code restructuring or optimization that does not alter functionality
[DOC] Documentation-only changes
[CHORE] Maintenance, cleanup, or CI configuration

🧪 How Has This Been Tested?

Describe how you tested your changes. Include CI runs, local tests, manual verification, or screenshots if applicable.

📸 Screenshots (if applicable)

If UI or logs are affected, include before/after screenshots or output.

✅ Checklist

I’ve read and followed the CONTRIBUTING.md.
I’ve added or updated documentation as needed.
I’ve verified the change is tested and works as intended.
CI/CD checks pass and do not break existing functionality.
My code follows the style guidelines of this project.

Copilot

Pull request overview

This pull request adds pagination support to fix incomplete data retrieval when querying the Cloudsmith API. The changes enable fetching all pages of results instead of just the first page, and add support for client-side filtering when API-level filtering is insufficient.

Changes:

Added pagination logic to handle Link headers and X-Pagination headers from Cloudsmith API
Implemented client-side filtering for download criteria when API doesn't support aggregated stats
Increased page size from default to 100 for better efficiency

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-27T15:02:23Z

Docker/Sonar/sonar.py

+                            scheme, netloc, path, params, query, fragment = urlparse(start_url)
+                            query_params = parse_qs(query)
+                            query_params['page'] = [str(current_page + 1)]
+
+                            new_query = urlencode(query_params, doseq=True)
+                            next_url = urlunparse((scheme, netloc, path, params, new_query, fragment))


Using parse_qs followed by urlencode with doseq=True may not preserve the original query parameter order or format. If parse_qs returns list values and some parameters appeared multiple times in the original URL, this could alter the query string structure. Consider using urllib.parse.parse_qsl and urlencode without doseq if you want to preserve the original structure more faithfully, or verify that the API accepts query parameters in any order with list-style encoding.

Copilot · 2026-01-27T15:02:23Z

Docker/Sonar/sonar.py

+            if isinstance(data, list):
+                results.extend(data)


When the response is not a list (line 610) or not a dict with 'repositories' (line 1096), the function silently skips adding results but continues pagination. This could mask API response format changes or errors. Consider adding a warning log when the response format is unexpected to aid debugging.

Copilot · 2026-01-27T15:02:24Z

Docker/Sonar/sonar.py

+                    except ValueError:
+                        pass


When parse_qs or int() conversion fails in the X-Pagination header parsing (lines 1118-1131), the exception is silently caught with a bare except ValueError: pass. This could hide legitimate parsing errors. Consider logging the error at debug level to help diagnose pagination issues in production.

Copilot · 2026-01-27T15:02:24Z

Docker/Sonar/sonar.py

+    def fetch_paginated_results(start_url, is_catalog=False):
+        results = []
+        next_url = start_url
+        while next_url:


The pagination loop lacks a safety limit on the number of pages fetched. If the API returns a circular Link header or if there's a bug in the next URL construction, this could result in an infinite loop. Consider adding a maximum page limit (e.g., 1000 pages) or a safety counter to prevent runaway pagination.

Copilot · 2026-01-27T15:02:24Z

Docker/Sonar/sonar.py

+    def fetch_all_pages(start_url):
+        results = []
+        next_url = start_url
+        while next_url:


The pagination loop lacks a safety limit on the number of pages fetched. If the API returns a circular Link header, this could result in an infinite loop. Consider adding a maximum page limit or detecting if the same URL is visited twice.

Copilot · 2026-01-27T15:02:25Z

Docker/Sonar/sonar.py

+    # Helper to handle pagination
+    def fetch_all_pages(start_url):
+        results = []
+        next_url = start_url
+        while next_url:
+            resp = make_request(next_url, {"Cache-Control": "no-cache"}, return_headers=True)
+            if not resp:
+                break
+
+            data, headers = resp
+            if isinstance(data, list):
+                results.extend(data)
+
+            # Check Link header for next page
+            # Format: <url>; rel="next", <url>; rel="last"
+            next_url = None
+            link_header = headers.get('Link')
+            if link_header:
+                for link in link_header.split(','):
+                    parts = link.split(';')
+                    if len(parts) >= 2 and 'rel="next"' in parts[1]:
+                        # Extract URL from <url>
+                        next_url = parts[0].strip().strip('<>')
+                        break
+        return results
+


This is a duplicate of the fetch_all_pages function defined at lines 601-624. The pagination logic should be extracted to a single module-level helper function to avoid code duplication and ensure consistent behavior across all API calls.

Suggested change

# Helper to handle pagination

def fetch_all_pages(start_url):

results = []

next_url = start_url

while next_url:

resp = make_request(next_url, {"Cache-Control": "no-cache"}, return_headers=True)

if not resp:

break

data, headers = resp

if isinstance(data, list):

results.extend(data)

# Check Link header for next page

# Format: <url>; rel="next", <url>; rel="last"

next_url = None

link_header = headers.get('Link')

if link_header:

for link in link_header.split(','):

parts = link.split(';')

if len(parts) >= 2 and 'rel="next"' in parts[1]:

# Extract URL from <url>

next_url = parts[0].strip().strip('<>')

break

return results

Copilot · 2026-01-27T15:02:25Z

Docker/Sonar/sonar.py

+             stripped, criteria = parse_filter_criteria(args.filter)
+             if criteria:
+                 if args.output == 'table':
+                    console.print("[yellow]No direct matches. Expanding search to check aggregated download stats...[/yellow]")
+                 sq_stripped = f"{stripped} AND format:docker" if stripped else "format:docker"
+                 fq = urlencode({'query': sq_stripped, 'page_size': 100})
+                 su = f"https://api.cloudsmith.io/v1/packages/{args.org}/{args.repo}/?{fq}"
+                 found_packages = fetch_paginated_results(su)


Inconsistent indentation detected. This line uses an extra space at the beginning. All other lines in this block use consistent indentation, but this line appears to have an additional space character.

Suggested change

stripped, criteria = parse_filter_criteria(args.filter)

if criteria:

if args.output == 'table':

console.print("[yellow]No direct matches. Expanding search to check aggregated download stats...[/yellow]")

sq_stripped = f"{stripped} AND format:docker" if stripped else "format:docker"

fq = urlencode({'query': sq_stripped, 'page_size': 100})

su = f"https://api.cloudsmith.io/v1/packages/{args.org}/{args.repo}/?{fq}"

found_packages = fetch_paginated_results(su)

stripped, criteria = parse_filter_criteria(args.filter)

if criteria:

if args.output == 'table':

console.print("[yellow]No direct matches. Expanding search to check aggregated download stats...[/yellow]")

sq_stripped = f"{stripped} AND format:docker" if stripped else "format:docker"

fq = urlencode({'query': sq_stripped, 'page_size': 100})

su = f"https://api.cloudsmith.io/v1/packages/{args.org}/{args.repo}/?{fq}"

found_packages = fetch_paginated_results(su)

Copilot · 2026-01-27T15:02:26Z

Docker/Sonar/sonar.py

+                for link in link_header.split(','):
+                    parts = link.split(';')
+                    if len(parts) >= 2 and 'rel="next"' in parts[1]:
+                        # Extract URL from <url>
+                        next_url = parts[0].strip().strip('<>')


The Link header parsing is fragile and may fail if the header format varies slightly. For example, if there are spaces around the angle brackets or the semicolon, the parsing could break. Consider using a more robust parsing approach, such as splitting on commas first, then using regex to extract the URL and rel value, or using a library designed for parsing Link headers.

Suggested change

for link in link_header.split(','):

parts = link.split(';')

if len(parts) >= 2 and 'rel="next"' in parts[1]:

# Extract URL from <url>

next_url = parts[0].strip().strip('<>')

# Split on commas to get individual link-values

for link in link_header.split(','):

# Robustly extract URL and rel using regex to tolerate whitespace

match = re.search(r'\s*<\s*([^>]+?)\s*>\s*;\s*rel\s*=\s*"(.*?)"', link)

if not match:

continue

url_part, rel_value = match.groups()

if rel_value == "next":

next_url = url_part

Copilot · 2026-01-27T15:02:26Z

Docker/Sonar/sonar.py

+        while next_url:
+            resp = make_request(next_url, {"Cache-Control": "no-cache"}, return_headers=True)
+            if not resp:
+                break
+
+            data, headers = resp
+            if isinstance(data, list):
+                results.extend(data)
+


The pagination loop lacks a safety limit on the number of pages fetched. If the API returns a circular Link header, this could result in an infinite loop. Consider adding a maximum page limit or detecting if the same URL is visited twice.

Suggested change

while next_url:

resp = make_request(next_url, {"Cache-Control": "no-cache"}, return_headers=True)

if not resp:

break

data, headers = resp

if isinstance(data, list):

results.extend(data)

visited_urls = set()

max_pages = 1000 # safety limit to avoid infinite pagination loops

page_count = 0

while next_url:

# break if we detect a pagination cycle

if next_url in visited_urls:

break

visited_urls.add(next_url)

# break if we exceed a reasonable maximum number of pages

if page_count >= max_pages:

break

page_count += 1

resp = make_request(next_url, {"Cache-Control": "no-cache"}, return_headers=True)

if not resp:

break

data, headers = resp

if isinstance(data, list):

results.extend(data)

Copilot · 2026-01-27T15:02:26Z

Docker/Sonar/sonar.py

+    if not filter_str: return "", []
+    criteria = []
+    # Matches: downloads:5, downloads>5, downloads<=10
+    pattern = re.compile(r'\bdownloads\s*([:><]=?)\s*(\d+)', re.IGNORECASE)


The regex pattern does not handle the '!=' operator which might be expected for 'not equals' comparisons. If this is intentional, consider documenting the supported operators. If '!=' should be supported, add it to the pattern and handle it in the matches_criteria function.

colinmoynes added 2 commits January 27, 2026 14:40

proper pagination. Can now return over 30 results

ad066ea

inital

c34f437

colinmoynes self-assigned this Jan 27, 2026

Copilot AI review requested due to automatic review settings January 27, 2026 14:57

Copilot started reviewing on behalf of colinmoynes January 27, 2026 14:58 View session

colinmoynes merged commit dfb8324 into main Jan 27, 2026
5 checks passed

Copilot AI reviewed Jan 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pagination fix #19

Pagination fix #19

Uh oh!

colinmoynes commented Jan 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 27, 2026

Uh oh!

Copilot AI Jan 27, 2026

Uh oh!

Copilot AI Jan 27, 2026

Uh oh!

Copilot AI Jan 27, 2026

Uh oh!

Copilot AI Jan 27, 2026

Uh oh!

Copilot AI Jan 27, 2026

Uh oh!

Copilot AI Jan 27, 2026

Uh oh!

Copilot AI Jan 27, 2026

Uh oh!

Copilot AI Jan 27, 2026

Uh oh!

Copilot AI Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

-                for link in link_header.split(','):
-                    parts = link.split(';')
-                    if len(parts) >= 2 and 'rel="next"' in parts[1]:
-                        # Extract URL from <url>
-                        next_url = parts[0].strip().strip('<>')
+                # Split on commas to get individual link-values
+                for link in link_header.split(','):
+                    # Robustly extract URL and rel using regex to tolerate whitespace
+                    match = re.search(r'\s*<\s*([^>]+?)\s*>\s*;\s*rel\s*=\s*"(.*?)"', link)
+                    if not match:
+                        continue
+                    url_part, rel_value = match.groups()
+                    if rel_value == "next":
+                        next_url = url_part

Pagination fix #19

Pagination fix #19

Uh oh!

Conversation

colinmoynes commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📄 Summary

🔍 Related Issues

🧪 Type of Change

🧪 How Has This Been Tested?

📸 Screenshots (if applicable)

✅ Checklist

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

colinmoynes commented Jan 27, 2026 •

edited

Loading