Skip to content

Fix imports, remove dead code, and make token handling model-aware in chat_paper and chat_arxiv#302

Draft
Copilot wants to merge 2 commits intomainfrom
copilot/optimize-project-performance
Draft

Fix imports, remove dead code, and make token handling model-aware in chat_paper and chat_arxiv#302
Copilot wants to merge 2 commits intomainfrom
copilot/optimize-project-performance

Conversation

Copy link

Copilot AI commented Mar 2, 2026

Several correctness and robustness issues in chat_paper.py and chat_arxiv.py caused inaccurate token counting, silent text truncation bugs, and hardcoded context limits that broke for GPT-4 and 16k models.

Changes

  • Fix duplicate import os / multi-module import (chat_paper.py): import fitz, io, os shadowed the earlier import os; split into clean, separate stdlib and third-party import blocks.

  • Remove unused images variable (both files): images = page.get_images() was assigned and immediately shadowed by the for … in page.get_images() loop — dead assignment removed.

  • Model-aware max_token_num (both files): Replaced the hardcoded 4096 with a lookup dict so GPT-3.5-turbo-16k, GPT-4, GPT-4-32k, and GPT-4-turbo users get the correct context window. Unknown models fall back to 4096.

model_max_tokens = {
    'gpt-3.5-turbo': 4096,
    'gpt-3.5-turbo-16k': 16384,
    'gpt-4': 8192,
    'gpt-4-32k': 32768,
    'gpt-4-turbo': 128000,
}
self.max_token_num = model_max_tokens.get(self.chatgpt_model, 4096)
  • Model-aware tiktoken encoding (both files): tiktoken.get_encoding("gpt2") gives wrong token counts for GPT-3.5/4 (which use cl100k_base). Replaced with tiktoken.encoding_for_model(self.chatgpt_model) with a cl100k_base fallback for unknown models.

  • Bounds-check clip_text_index (both files, all three chat methods): The index calculation could yield 0 or negative when prompt token budget ≥ text length, producing an empty or inverted slice. Wrapped with max(1, …).

  • Read model from config in chat_arxiv.py: The model was hardcoded to "gpt-3.5-turbo" in all three openai.ChatCompletion.create calls. Now reads CHATGPT_MODEL from apikey.ini via self.chatgpt_model, consistent with chat_paper.py.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

…oding

Co-authored-by: kaixindelele <28528386+kaixindelele@users.noreply.github.com>
Copilot AI changed the title [WIP] Optimize project for better performance Fix imports, remove dead code, and make token handling model-aware in chat_paper and chat_arxiv Mar 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants