Skip to content

Conversation

@colinmoynes
Copy link
Collaborator

@colinmoynes colinmoynes commented Jan 27, 2026

📄 Summary

Briefly describe the purpose of this pull request. What problem does it solve or feature does it add?

🔍 Related Issues

Link to any related GitHub issues (e.g., Fixes #12, Closes #34):

🧪 Type of Change

Please check the relevant type tag for this PR title:

  • [FIX] Bug fix
  • [NEW] New thing
  • [REFACTOR] Internal changes such as code restructuring or optimization that does not alter functionality
  • [DOC] Documentation-only changes
  • [CHORE] Maintenance, cleanup, or CI configuration

🧪 How Has This Been Tested?

Describe how you tested your changes. Include CI runs, local tests, manual verification, or screenshots if applicable.

📸 Screenshots (if applicable)

If UI or logs are affected, include before/after screenshots or output.

✅ Checklist

  • I’ve read and followed the CONTRIBUTING.md.
  • I’ve added or updated documentation as needed.
  • I’ve verified the change is tested and works as intended.
  • CI/CD checks pass and do not break existing functionality.
  • My code follows the style guidelines of this project.

@colinmoynes colinmoynes self-assigned this Jan 27, 2026
Copilot AI review requested due to automatic review settings January 27, 2026 14:57
@colinmoynes colinmoynes merged commit dfb8324 into main Jan 27, 2026
5 checks passed
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds pagination support to fix incomplete data retrieval when querying the Cloudsmith API. The changes enable fetching all pages of results instead of just the first page, and add support for client-side filtering when API-level filtering is insufficient.

Changes:

  • Added pagination logic to handle Link headers and X-Pagination headers from Cloudsmith API
  • Implemented client-side filtering for download criteria when API doesn't support aggregated stats
  • Increased page size from default to 100 for better efficiency

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1124 to +1129
scheme, netloc, path, params, query, fragment = urlparse(start_url)
query_params = parse_qs(query)
query_params['page'] = [str(current_page + 1)]

new_query = urlencode(query_params, doseq=True)
next_url = urlunparse((scheme, netloc, path, params, new_query, fragment))
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using parse_qs followed by urlencode with doseq=True may not preserve the original query parameter order or format. If parse_qs returns list values and some parameters appeared multiple times in the original URL, this could alter the query string structure. Consider using urllib.parse.parse_qsl and urlencode without doseq if you want to preserve the original structure more faithfully, or verify that the API accepts query parameters in any order with list-style encoding.

Copilot uses AI. Check for mistakes.
Comment on lines +610 to +611
if isinstance(data, list):
results.extend(data)
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the response is not a list (line 610) or not a dict with 'repositories' (line 1096), the function silently skips adding results but continues pagination. This could mask API response format changes or errors. Consider adding a warning log when the response format is unexpected to aid debugging.

Copilot uses AI. Check for mistakes.
Comment on lines +1130 to +1131
except ValueError:
pass
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When parse_qs or int() conversion fails in the X-Pagination header parsing (lines 1118-1131), the exception is silently caught with a bare except ValueError: pass. This could hide legitimate parsing errors. Consider logging the error at debug level to help diagnose pagination issues in production.

Copilot uses AI. Check for mistakes.
def fetch_paginated_results(start_url, is_catalog=False):
results = []
next_url = start_url
while next_url:
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pagination loop lacks a safety limit on the number of pages fetched. If the API returns a circular Link header or if there's a bug in the next URL construction, this could result in an infinite loop. Consider adding a maximum page limit (e.g., 1000 pages) or a safety counter to prevent runaway pagination.

Copilot uses AI. Check for mistakes.
def fetch_all_pages(start_url):
results = []
next_url = start_url
while next_url:
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pagination loop lacks a safety limit on the number of pages fetched. If the API returns a circular Link header, this could result in an infinite loop. Consider adding a maximum page limit or detecting if the same URL is visited twice.

Copilot uses AI. Check for mistakes.
Comment on lines +753 to +778
# Helper to handle pagination
def fetch_all_pages(start_url):
results = []
next_url = start_url
while next_url:
resp = make_request(next_url, {"Cache-Control": "no-cache"}, return_headers=True)
if not resp:
break

data, headers = resp
if isinstance(data, list):
results.extend(data)

# Check Link header for next page
# Format: <url>; rel="next", <url>; rel="last"
next_url = None
link_header = headers.get('Link')
if link_header:
for link in link_header.split(','):
parts = link.split(';')
if len(parts) >= 2 and 'rel="next"' in parts[1]:
# Extract URL from <url>
next_url = parts[0].strip().strip('<>')
break
return results

Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a duplicate of the fetch_all_pages function defined at lines 601-624. The pagination logic should be extracted to a single module-level helper function to avoid code duplication and ensure consistent behavior across all API calls.

Suggested change
# Helper to handle pagination
def fetch_all_pages(start_url):
results = []
next_url = start_url
while next_url:
resp = make_request(next_url, {"Cache-Control": "no-cache"}, return_headers=True)
if not resp:
break
data, headers = resp
if isinstance(data, list):
results.extend(data)
# Check Link header for next page
# Format: <url>; rel="next", <url>; rel="last"
next_url = None
link_header = headers.get('Link')
if link_header:
for link in link_header.split(','):
parts = link.split(';')
if len(parts) >= 2 and 'rel="next"' in parts[1]:
# Extract URL from <url>
next_url = parts[0].strip().strip('<>')
break
return results

Copilot uses AI. Check for mistakes.
Comment on lines +1160 to +1167
stripped, criteria = parse_filter_criteria(args.filter)
if criteria:
if args.output == 'table':
console.print("[yellow]No direct matches. Expanding search to check aggregated download stats...[/yellow]")
sq_stripped = f"{stripped} AND format:docker" if stripped else "format:docker"
fq = urlencode({'query': sq_stripped, 'page_size': 100})
su = f"https://api.cloudsmith.io/v1/packages/{args.org}/{args.repo}/?{fq}"
found_packages = fetch_paginated_results(su)
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent indentation detected. This line uses an extra space at the beginning. All other lines in this block use consistent indentation, but this line appears to have an additional space character.

Suggested change
stripped, criteria = parse_filter_criteria(args.filter)
if criteria:
if args.output == 'table':
console.print("[yellow]No direct matches. Expanding search to check aggregated download stats...[/yellow]")
sq_stripped = f"{stripped} AND format:docker" if stripped else "format:docker"
fq = urlencode({'query': sq_stripped, 'page_size': 100})
su = f"https://api.cloudsmith.io/v1/packages/{args.org}/{args.repo}/?{fq}"
found_packages = fetch_paginated_results(su)
stripped, criteria = parse_filter_criteria(args.filter)
if criteria:
if args.output == 'table':
console.print("[yellow]No direct matches. Expanding search to check aggregated download stats...[/yellow]")
sq_stripped = f"{stripped} AND format:docker" if stripped else "format:docker"
fq = urlencode({'query': sq_stripped, 'page_size': 100})
su = f"https://api.cloudsmith.io/v1/packages/{args.org}/{args.repo}/?{fq}"
found_packages = fetch_paginated_results(su)

Copilot uses AI. Check for mistakes.
Comment on lines +618 to +622
for link in link_header.split(','):
parts = link.split(';')
if len(parts) >= 2 and 'rel="next"' in parts[1]:
# Extract URL from <url>
next_url = parts[0].strip().strip('<>')
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Link header parsing is fragile and may fail if the header format varies slightly. For example, if there are spaces around the angle brackets or the semicolon, the parsing could break. Consider using a more robust parsing approach, such as splitting on commas first, then using regex to extract the URL and rel value, or using a library designed for parsing Link headers.

Suggested change
for link in link_header.split(','):
parts = link.split(';')
if len(parts) >= 2 and 'rel="next"' in parts[1]:
# Extract URL from <url>
next_url = parts[0].strip().strip('<>')
# Split on commas to get individual link-values
for link in link_header.split(','):
# Robustly extract URL and rel using regex to tolerate whitespace
match = re.search(r'\s*<\s*([^>]+?)\s*>\s*;\s*rel\s*=\s*"(.*?)"', link)
if not match:
continue
url_part, rel_value = match.groups()
if rel_value == "next":
next_url = url_part

Copilot uses AI. Check for mistakes.
Comment on lines +757 to +765
while next_url:
resp = make_request(next_url, {"Cache-Control": "no-cache"}, return_headers=True)
if not resp:
break

data, headers = resp
if isinstance(data, list):
results.extend(data)

Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pagination loop lacks a safety limit on the number of pages fetched. If the API returns a circular Link header, this could result in an infinite loop. Consider adding a maximum page limit or detecting if the same URL is visited twice.

Suggested change
while next_url:
resp = make_request(next_url, {"Cache-Control": "no-cache"}, return_headers=True)
if not resp:
break
data, headers = resp
if isinstance(data, list):
results.extend(data)
visited_urls = set()
max_pages = 1000 # safety limit to avoid infinite pagination loops
page_count = 0
while next_url:
# break if we detect a pagination cycle
if next_url in visited_urls:
break
visited_urls.add(next_url)
# break if we exceed a reasonable maximum number of pages
if page_count >= max_pages:
break
page_count += 1
resp = make_request(next_url, {"Cache-Control": "no-cache"}, return_headers=True)
if not resp:
break
data, headers = resp
if isinstance(data, list):
results.extend(data)

Copilot uses AI. Check for mistakes.
if not filter_str: return "", []
criteria = []
# Matches: downloads:5, downloads>5, downloads<=10
pattern = re.compile(r'\bdownloads\s*([:><]=?)\s*(\d+)', re.IGNORECASE)
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex pattern does not handle the '!=' operator which might be expected for 'not equals' comparisons. If this is intentional, consider documenting the supported operators. If '!=' should be supported, add it to the pattern and handle it in the matches_criteria function.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants