Fix imports, remove dead code, and make token handling model-aware in chat_paper and chat_arxiv by Copilot · Pull Request #302 · kaixindelele/ChatPaper

Copilot · 2026-03-02T15:40:30Z

Several correctness and robustness issues in chat_paper.py and chat_arxiv.py caused inaccurate token counting, silent text truncation bugs, and hardcoded context limits that broke for GPT-4 and 16k models.

Changes

Fix duplicate import os / multi-module import (chat_paper.py): import fitz, io, os shadowed the earlier import os; split into clean, separate stdlib and third-party import blocks.
Remove unused images variable (both files): images = page.get_images() was assigned and immediately shadowed by the for … in page.get_images() loop — dead assignment removed.
Model-aware max_token_num (both files): Replaced the hardcoded 4096 with a lookup dict so GPT-3.5-turbo-16k, GPT-4, GPT-4-32k, and GPT-4-turbo users get the correct context window. Unknown models fall back to 4096.

model_max_tokens = {
    'gpt-3.5-turbo': 4096,
    'gpt-3.5-turbo-16k': 16384,
    'gpt-4': 8192,
    'gpt-4-32k': 32768,
    'gpt-4-turbo': 128000,
}
self.max_token_num = model_max_tokens.get(self.chatgpt_model, 4096)

Model-aware tiktoken encoding (both files): tiktoken.get_encoding("gpt2") gives wrong token counts for GPT-3.5/4 (which use cl100k_base). Replaced with tiktoken.encoding_for_model(self.chatgpt_model) with a cl100k_base fallback for unknown models.
Bounds-check clip_text_index (both files, all three chat methods): The index calculation could yield 0 or negative when prompt token budget ≥ text length, producing an empty or inverted slice. Wrapped with max(1, …).
Read model from config in chat_arxiv.py: The model was hardcoded to "gpt-3.5-turbo" in all three openai.ChatCompletion.create calls. Now reads CHATGPT_MODEL from apikey.ini via self.chatgpt_model, consistent with chat_paper.py.

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

…oding Co-authored-by: kaixindelele <28528386+kaixindelele@users.noreply.github.com>

Initial plan

76b9be9

Copilot AI assigned Copilot and kaixindelele Mar 2, 2026

Copilot started work on behalf of kaixindelele March 2, 2026 15:40 View session

Optimize: fix imports, remove unused vars, model-aware tokens and enc…

b98ce9d

…oding Co-authored-by: kaixindelele <28528386+kaixindelele@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Optimize project for better performance~~ Fix imports, remove dead code, and make token handling model-aware in chat_paper and chat_arxiv Mar 2, 2026

Copilot finished work on behalf of kaixindelele March 2, 2026 15:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix imports, remove dead code, and make token handling model-aware in chat_paper and chat_arxiv#302

Fix imports, remove dead code, and make token handling model-aware in chat_paper and chat_arxiv#302
Copilot wants to merge 2 commits intomainfrom
copilot/optimize-project-performance

Copilot AI commented Mar 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Mar 2, 2026 •

edited

Loading