Skip to content

feat(utils): add format_file_size() and refactor print_memory_usage()#230

Open
cz-03 wants to merge 1 commit intop-e-w:masterfrom
cz-03:patch-1
Open

feat(utils): add format_file_size() and refactor print_memory_usage()#230
cz-03 wants to merge 1 commit intop-e-w:masterfrom
cz-03:patch-1

Conversation

@cz-03
Copy link
Copy Markdown

@cz-03 cz-03 commented Mar 15, 2026

Adds format_file_size(size_in_bytes) to utils.py to convert a byte count into a human-readable string across B, KB, MB, GB, and TB. Refactors print_memory_usage() to call format_file_size() instead of the hardcoded / (1024**3):.2f} GB inline expression, removing duplicated formatting logic and making memory reporting consistent with any future callers of the new helper. No new dependencies pure Python arithmetic, matching the style of the existing format_duration() utility.

Adds format_file_size(size_in_bytes) to utils.py to convert a byte count into a human-readable string across B, KB, MB, GB, and TB. Refactors print_memory_usage() to call format_file_size() instead of the hardcoded / (1024**3):.2f} GB inline expression, removing duplicated formatting logic and making memory reporting consistent with any future callers of the new helper. No new dependencies pure Python arithmetic, matching the style of the existing format_duration() utility.
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the utils module by adding a versatile function for formatting file sizes into human-readable units. This new utility is then integrated into the existing memory usage reporting, streamlining the code and promoting consistency across the codebase for displaying size-related information. Additionally, several outdated or redundant comments were removed to improve code clarity.

Highlights

  • New Utility Function: Introduced format_file_size(size_in_bytes) in utils.py to convert byte counts into human-readable strings (e.g., "512 B", "4.00 KB", "7.00 GB").
  • Code Refactoring: Refactored print_memory_usage() to utilize the new format_file_size() function, eliminating duplicated formatting logic and ensuring consistent memory reporting.
  • Code Clarity: Removed several outdated or redundant explanatory comments from various functions to improve code readability.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • src/heretic/utils.py
    • Added format_file_size function for human-readable byte conversion.
    • Updated print_memory_usage to use format_file_size.
    • Removed several explanatory comments from is_notebook, load_prompts, empty_cache, and get_readme_intro functions.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a useful format_file_size utility and refactors print_memory_usage to use it, which is a good improvement for code clarity and reuse. My review includes a suggestion to make the new function more robust and maintainable. However, the PR also contains several unrelated changes that remove comments from existing functions. This violates the repository's style guide (rule #11), which requires PRs to be focused on a single change. Please revert the comment removals to adhere to the contribution guidelines.

Comment thread src/heretic/utils.py
Comment on lines +54 to +63
if size_in_bytes < 1024:
return f"{size_in_bytes} B"

for unit in ("KB", "MB", "GB"):
size_in_bytes /= 1024
if size_in_bytes < 1024:
return f"{size_in_bytes:.2f} {unit}"

size_in_bytes /= 1024
return f"{size_in_bytes:.2f} TB"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This implementation is correct, but it could be made more robust and maintainable. The docstring mentions non-negative input, but there's no validation for it. Also, the logic can be simplified into a single loop that is more easily extensible to larger units (like Petabytes) in the future, avoiding the special handling for the TB unit.

Suggested change
if size_in_bytes < 1024:
return f"{size_in_bytes} B"
for unit in ("KB", "MB", "GB"):
size_in_bytes /= 1024
if size_in_bytes < 1024:
return f"{size_in_bytes:.2f} {unit}"
size_in_bytes /= 1024
return f"{size_in_bytes:.2f} TB"
assert size_in_bytes >= 0, "size_in_bytes cannot be negative"
if size_in_bytes < 1024:
return f"{size_in_bytes} B"
size = float(size_in_bytes)
unit = "B"
for next_unit in ("KB", "MB", "GB", "TB", "PB"):
size /= 1024
unit = next_unit
if size < 1024:
break
return f"{size:.2f} {unit}"

Comment thread src/heretic/utils.py
Comment on lines 58 to -60
@@ -55,13 +87,9 @@ def p(label: str, size_in_bytes: int):


def is_notebook() -> bool:
# Check for specific environment variables (Colab, Kaggle).
# This is necessary because when running as a subprocess (e.g. !heretic),
# get_ipython() might not be available or might not reflect the notebook environment.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This comment removal, along with others in this file, is unrelated to the main purpose of this pull request (adding format_file_size). Per the repository style guide (rule #11), unrelated changes to existing code, including comments, should be avoided in a PR. Please revert these comment removals to keep the PR focused on a single, semantic change.

References
  1. PRs must not change existing code unless the changes are directly related to the PR. This includes changes to formatting and comments. (link)

@p-e-w
Copy link
Copy Markdown
Owner

p-e-w commented Mar 16, 2026

Thanks for the PR. Please revert the unrelated changes so I can do a proper review. Comments should not be deleted, they are there for a reason.

Copy link
Copy Markdown

@JiwaniZakir JiwaniZakir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The format_file_size function reuses the size_in_bytes parameter as a float via /= inside the loop, despite the type annotation declaring it as int. This is a minor but real type annotation drift that static analysis tools will flag. The docstring states "Non-negative byte count" but there's no assertion or guard — passing a negative value silently produces output like "-0.50 KB", which could happen if a caller passes an unvalidated value from a system API.

The bulk comment removal in empty_cache() is the more significant concern: the deleted block explicitly referenced PR #17 and explained why gc.collect() must be called both before and after emptying the backend cache to avoid OOM errors. That's non-obvious behavior with a documented rationale, and stripping it leaves future contributors no way to know why the call ordering matters without digging through git history. Similarly, the comment in get_readme_intro explaining that the path is hidden to protect private information was a useful security note, not just noise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants