Skip to content

refactor(client): migrate HTTP backend from scrapling/curl_cffi to wreq#4

Merged
lifefloating merged 1 commit into
mainfrom
feat-wrep
Apr 24, 2026
Merged

refactor(client): migrate HTTP backend from scrapling/curl_cffi to wreq#4
lifefloating merged 1 commit into
mainfrom
feat-wrep

Conversation

@lifefloating
Copy link
Copy Markdown
Owner

@lifefloating lifefloating commented Apr 24, 2026

由 Sourcery 提供的摘要

将 TCaptcha HTTP 客户端从 scrapling/curl_cffi 技术栈迁移到带有 Chrome TLS/HTTP2 仿真的有状态 wreq 阻塞客户端,并相应地更新配置、文档和测试。

新特性:

  • 通过 TCAPTCHA_EMULATION 设置添加可配置的 wreq Chrome 仿真配置文件,用于控制 HTTP 客户端所使用的 TLS/HTTP2 指纹。

缺陷修复:

  • 确保来自 TCaptcha 端点的 HTTP 错误和空图片响应会抛出清晰的 NetworkError 异常。
  • 修复 Node.js TDC 执行器的超时处理逻辑,在子进程超时时抛出标准的 TimeoutError

增强:

  • 重构 TCaptchaClient,使其维护一个可复用的 wreq 客户端,在验证码流程中共享连接池和 Cookie 存储。
  • 改进预处理、图片下载和验证请求的头部处理,使 User-AgentRefererOriginContent-Type 等值保持一致,并与真实的 Chrome 行为对齐。
  • 为未知仿真配置文件添加健壮的解析和日志记录逻辑,并安全地回退到默认的 Chrome 版本。
  • 加强类型和日志相关选择(例如 Enum 使用上的注解),以获得更好的兼容性和诊断能力。

构建:

  • 将最低支持的 Python 版本提升至 3.11,并使 Ruff 配置与新的目标版本保持一致。
  • 在核心项目依赖中用 wreq 替换 scrapling 依赖。

文档:

  • 更新 README、架构说明、逆向工程笔记以及代理相关文档,以说明新的基于 wreq 的 HTTP 层及其配置选项。

测试:

  • 在客户端测试中,用轻量级的假 wreq 客户端替换基于 respx/httpx 的 HTTP mock,实现对 HTTP 错误处理、空响应体以及头部/查询构造行为的更广泛覆盖。

杂务:

  • 重新生成或更新锁文件和辅助项目元数据,以反映新的 HTTP 技术栈和依赖集合。
Original summary in English

Summary by Sourcery

Migrate the TCaptcha HTTP client from the scrapling/curl_cffi stack to a stateful wreq blocking client with Chrome TLS/HTTP2 emulation, updating configuration, docs, and tests accordingly.

New Features:

  • Add configurable wreq Chrome emulation profile via the TCAPTCHA_EMULATION setting to control the TLS/HTTP2 fingerprint used by the HTTP client.

Bug Fixes:

  • Ensure HTTP errors and empty image responses from the TCaptcha endpoints raise clear NetworkError exceptions.
  • Fix Node.js TDC executor timeout handling by raising a standard TimeoutError on subprocess timeouts.

Enhancements:

  • Refactor TCaptchaClient to maintain a reusable wreq client with shared connection pool and cookie store across captcha flows.
  • Improve header handling for prehandle, image download, and verify requests, including consistent User-Agent, Referer, Origin, and Content-Type values aligned with real Chrome behavior.
  • Add robust resolution and logging for unknown emulation profiles, falling back safely to a default Chrome version.
  • Strengthen type and logging choices (e.g., Enum usage annotations) for better compatibility and diagnostics.

Build:

  • Bump the minimum supported Python version to 3.11 and align Ruff configuration with the new target.
  • Replace the scrapling dependency with wreq in the core project dependencies.

Documentation:

  • Update README, architecture, reverse engineering notes, and agent documentation to describe the new wreq-based HTTP layer and configuration options.

Tests:

  • Replace respx/httpx-based HTTP mocking in client tests with a lightweight fake wreq client, expanding coverage for HTTP error handling, empty bodies, and header/query construction behavior.

Chores:

  • Regenerate or update lockfiles and auxiliary project metadata to reflect the new HTTP stack and dependency set.

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented Apr 24, 2026

审阅者指南

将 TCaptcha HTTP 客户端重构为使用持久化的 wreq blocking.Client(带 Chrome 模拟),替代 scrapling/curl_cffi;同步更新配置和文档;并用本地的 wreq 假客户端 stub 替换基于 respx 的测试,用于断言请求头、查询参数以及错误处理。

使用 wreq blocking.Client 的 TCaptchaClient verify 流程时序图

sequenceDiagram
    actor Caller
    participant TCaptchaClient
    participant WreqClient
    participant TCaptchaServer

    Caller->>TCaptchaClient: verify(sess, ans, pow_answer, pow_calc_time, collect, eks)
    TCaptchaClient->>TCaptchaClient: build body dict
    TCaptchaClient->>TCaptchaClient: origin = _origin_of(_entry_url)
    TCaptchaClient->>TCaptchaClient: verify_headers = _common_headers copy
    alt entry_url is set
        TCaptchaClient->>TCaptchaClient: set Referer and Origin in verify_headers
    end
    TCaptchaClient->>TCaptchaClient: set Accept header if missing
    TCaptchaClient->>TCaptchaClient: set Content_Type to application_x_www_form_urlencoded
    TCaptchaClient->>TCaptchaClient: urlencode body to bytes
    TCaptchaClient->>WreqClient: post(url, body, headers)
    WreqClient->>TCaptchaServer: HTTPS POST /verify with Chrome emulation
    TCaptchaServer-->>WreqClient: HTTP response (status, body)
    WreqClient-->>TCaptchaClient: Response
    TCaptchaClient->>TCaptchaClient: status = resp.status.as_int()
    alt status != 200
        TCaptchaClient->>Caller: raise NetworkError("verify failed: HTTP {status}")
    else status == 200
        TCaptchaClient->>TCaptchaClient: parse_jsonp(resp.text())
        TCaptchaClient->>Caller: VerifyResponse
    end
Loading

使用持久化 wreq 客户端的 TCaptchaClient prehandle 和图片下载时序图

sequenceDiagram
    actor Caller
    participant TCaptchaClient
    participant WreqClient
    participant TCaptchaServer

    Caller->>TCaptchaClient: __init__(settings, ua, timeout, proxy)
    TCaptchaClient->>TCaptchaClient: _common_headers = {User_Agent: ua}
    TCaptchaClient->>TCaptchaClient: emu = _resolve_emulation(settings.emulation)
    TCaptchaClient->>TCaptchaClient: build client_kw (emulation, user_agent, timeout, cookie_store, proxies)
    TCaptchaClient->>WreqClient: create blocking Client with client_kw
    TCaptchaClient-->>Caller: client instance

    Caller->>TCaptchaClient: prehandle(aid, subsid, entry_url)
    TCaptchaClient->>TCaptchaClient: compute params and referer
    TCaptchaClient->>TCaptchaClient: headers = _common_headers + Referer
    TCaptchaClient->>WreqClient: get(url, query=params, headers=headers)
    WreqClient->>TCaptchaServer: HTTPS GET /prehandle with Chrome emulation
    TCaptchaServer-->>WreqClient: HTTP 200 JSONP
    WreqClient-->>TCaptchaClient: Response
    TCaptchaClient->>TCaptchaClient: status = resp.status.as_int()
    alt status != 200
        TCaptchaClient->>Caller: raise NetworkError
    else status == 200
        TCaptchaClient->>TCaptchaClient: raw_text = resp.text()
        TCaptchaClient->>TCaptchaClient: data = parse_jsonp(raw_text)
        TCaptchaClient-->>Caller: PrehandleResponse
    end

    Caller->>TCaptchaClient: get_image(img_url)
    TCaptchaClient->>TCaptchaClient: full = absolute url
    TCaptchaClient->>TCaptchaClient: headers = _common_headers + Referer
    TCaptchaClient->>WreqClient: get(full, headers=headers)
    WreqClient->>TCaptchaServer: HTTPS GET image with Chrome emulation
    TCaptchaServer-->>WreqClient: HTTP response (status, bytes)
    WreqClient-->>TCaptchaClient: Response
    TCaptchaClient->>TCaptchaClient: status = resp.status.as_int()
    TCaptchaClient->>TCaptchaClient: body = resp.bytes()
    alt status != 200 or len(body) == 0
        TCaptchaClient->>Caller: raise NetworkError
    else
        TCaptchaClient-->>Caller: image bytes
    end

    Caller->>TCaptchaClient: close()
    TCaptchaClient->>TCaptchaClient: closer = getattr(_http, close)
    alt closer is callable
        TCaptchaClient->>TCaptchaClient: call closer() with suppressed exceptions
    end
Loading

使用 wreq 的 TCaptchaClient 及相关设置的更新类图

classDiagram
    class TCaptchaSettings {
        +str base_url
        +str tdc_node_path
        +str or None proxy
        +str emulation
        +str llm_api_key
        +str llm_base_url
        +str llm_model
        +float llm_timeout
    }

    class TCaptchaClient {
        -str _base_url
        -str _entry_url
        -float _timeout
        -str or None _proxy
        -dict~str, str~ _common_headers
        -WreqClient _http
        +__init__(settings, ua, timeout, proxy)
        +close() None
        +prehandle(aid, subsid, entry_url) PrehandleResponse
        +get_image(img_url) bytes
        +get_fg_image_url(bg_img_url) str
        +verify(sess, ans, pow_answer, pow_calc_time, collect, eks) VerifyResponse
        +__enter__() TCaptchaClient
        +__exit__(exc_type, exc_value, traceback) None
    }

    class WreqClient {
        +WreqClient(emulation, user_agent, timeout, cookie_store, proxies)
        +get(url, query, headers) WreqResponse
        +post(url, body, headers) WreqResponse
        +close() None
    }

    class WreqResponse {
        +status Status
        +status.as_int() int
        +text() str
        +bytes() bytes
    }

    class Emulation {
        <<enumeration>>
        Chrome137
        Chrome134
        Chrome131
    }

    class Logger {
        +warning(msg, name)
        +info(msg, args)
    }

    class HelperFunctions {
        +_resolve_emulation(name) Emulation
        +_origin_of(url) str
        +parse_jsonp(raw_text) dict
    }

    TCaptchaSettings "1" --> "1" TCaptchaClient : provides_config
    TCaptchaClient ..> WreqClient : creates_and_uses
    TCaptchaClient ..> Emulation : uses_for_chrome_profile
    TCaptchaClient ..> Logger : logs_events
    TCaptchaClient ..> HelperFunctions : calls_helpers
    WreqClient --> WreqResponse : returns
    Emulation <.. HelperFunctions : returned_by
Loading

文件级变更

Change Details Files
Replace scrapling Fetcher + curl_cffi HTTP layer with a stateful wreq blocking.Client using configurable Chrome emulation, adjusting client behavior and error handling.
  • 引入模块级 logger 和辅助函数 _resolve_emulation,用于将 settings.emulation 字符串映射到 wreq.Emulation 值,并在失败时安全回退并记录 warning。
  • TCaptchaClient 从无状态门面改为有状态的 wreq.blocking.Client 包装器,在构造时保存通用请求头,并配置超时、代理、cookie_store 和 emulation。
  • self._http.get / self._http.post 替换 Fetcher.get/Fetcher.post 调用,对应地调整参数命名(query vs paramsbody vs data)、响应处理(status.as_inttext()bytes())以及错误信息。
  • 确保 verify() 显式对 POST body 做 URL 编码,并设置 Content-TypeAccept 请求头,同时基于 entry_url 及其 origin 构建 Referer/Origin 请求头。
  • 更新 get_image(),使用 wreq 响应记录状态码/响应大小,并在非 200 或响应体为空时抛出 NetworkError,正常时从 resp.bytes() 返回原始字节。
  • 实现 close() 方法,在底层客户端存在 close() 时尽力调用,并吞掉关闭过程中的异常。
src/crack_tcaptcha/client.py
Extend configuration and documentation to describe wreq-based Chrome TLS/HTTP2 emulation and the new emulation setting, while bumping the required Python version and base dependencies.
  • TCaptchaSettings 中新增 emulation 字段,默认 "Chrome137",并通过注释说明它与 wreq.Emulation 的映射及支持的值。
  • AGENTS.mdREADME.mddocs/architecture.mddocs/reverse-notes.md 中,将 scrapling/curl_cffi 的描述替换为 wreq 的用法细节,包括 Emulation.Chrome137、HTTP/2 指纹以及 TCAPTCHA_EMULATION 环境变量。
  • pyproject.toml 中使用 wreq>=0.11 替代 scrapling[fetchers] 依赖,将 requires-python 提升到 >=3.11,并将 ruff 的 target-version 设置为 py311
  • 明确说明 wreq 是 Chrome TLS/HTTP2 指纹模拟所必需的依赖,不应被普通 httpx 替代。
  • TCaptchaType 上添加 noqa: UP042 注释,说明保留 str + Enum 是为了兼容 pydantic/JSON。
src/crack_tcaptcha/settings.py
AGENTS.md
README.md
docs/architecture.md
docs/reverse-notes.md
pyproject.toml
src/crack_tcaptcha/models.py
Rework HTTP-layer tests to stub the wreq client directly, removing respx/httpx usage and adding more precise assertions on headers, URL construction, and error paths.
  • 在测试中引入 _FakeStatus_FakeResponse 帮助类,模拟 wreq 响应 API(status.as_inttextbytesjson)。
  • 添加 _patch_http 辅助函数,将带有 get/post/close 方法的 SimpleNamespace 注入到 TCaptchaClient_http 上,一方面捕获调用参数用于断言,一方面返回预设的假响应。
  • 重写 TestPrehandle 测试以使用假客户端,对 prehandle 正常路径进行断言,并验证 URL、Referer 请求头的回退逻辑以及查询参数;新增 test_prehandle_http_error 来验证非 200 响应时的错误处理。
  • 重写 TestGetImage 测试以使用假客户端,断言 Referer 请求头和 HTTPS URL 的构造,并新增 test_empty_body_raises 以验证响应体为空时抛出 NetworkError 的行为。
  • 重写 TestVerify 测试以使用假客户端,断言成功路径下 Referer/Origin/Content-Type 请求头以及 URL 编码的请求体是否正确,同时保持在 errorCode 15 情况下的失败路径行为;移除 respx 和 httpx 及所有与 scrapling 相关的 skip 标记。
  • 保持 TestFgImageUrl 处于激活状态(不再跳过),因为不再需要 HTTP mocking。
tests/test_client.py
Miscellaneous robustness and compatibility tweaks unrelated to the main HTTP backend swap.
  • 将 tdc Node.js jsdom 收集器中对子进程超时的处理,从抛出 asyncio.TimeoutError 改为抛出 TimeoutError,在杀死子进程后用 TDCError 包装。
  • 在仓库中新建占位文件 llms.txt(可能用于文档或工具集成)。
src/crack_tcaptcha/tdc/nodejs_jsdom.py
llms.txt

提示与命令

与 Sourcery 交互

  • 触发新一轮审查: 在 pull request 评论中输入 @sourcery-ai review
  • 继续讨论: 直接回复 Sourcery 的审查评论。
  • 从审查评论生成 GitHub issue: 回复 Sourcery 的审查评论请求它创建 issue。你也可以直接回复审查评论 @sourcery-ai issue 来从该评论创建 issue。
  • 生成 pull request 标题: 在 pull request 标题任意位置写上 @sourcery-ai,即可随时生成标题。你也可以在 pull request 中评论 @sourcery-ai title 以(重新)生成标题。
  • 生成 pull request 摘要: 在 pull request 正文中任意位置写上 @sourcery-ai summary,即可在对应位置生成 PR 摘要。你也可以评论 @sourcery-ai summary 来在任意时间(重新)生成摘要。
  • 生成审阅者指南: 在 pull request 中评论 @sourcery-ai guide,可随时(重新)生成审阅者指南。
  • 一次性解决所有 Sourcery 评论: 在 pull request 中评论 @sourcery-ai resolve,即可将所有 Sourcery 评论标记为已解决。如果你已经手动处理完所有评论且不再希望看到它们,这会很有用。
  • 忽略所有 Sourcery 审查: 在 pull request 中评论 @sourcery-ai dismiss,即可忽略所有已存在的 Sourcery 审查。若你想在一个干净状态下重新开始审查,特别有用——别忘了再评论 @sourcery-ai review 来触发新的审查!

自定义你的体验

访问你的 控制台 来:

  • 启用或禁用审查功能,比如 Sourcery 自动生成的 pull request 摘要、审阅者指南等。
  • 更改审查语言。
  • 添加、删除或编辑自定义审查说明。
  • 调整其他审查设置。

获取帮助

Original review guide in English

Reviewer's Guide

Refactors the TCaptcha HTTP client to use a persistent wreq blocking.Client with Chrome emulation instead of scrapling/curl_cffi, updates configuration and docs accordingly, and replaces respx-based tests with local fake wreq stubs that assert headers, query params, and error handling.

Sequence diagram for TCaptchaClient verify flow using wreq blocking.Client

sequenceDiagram
    actor Caller
    participant TCaptchaClient
    participant WreqClient
    participant TCaptchaServer

    Caller->>TCaptchaClient: verify(sess, ans, pow_answer, pow_calc_time, collect, eks)
    TCaptchaClient->>TCaptchaClient: build body dict
    TCaptchaClient->>TCaptchaClient: origin = _origin_of(_entry_url)
    TCaptchaClient->>TCaptchaClient: verify_headers = _common_headers copy
    alt entry_url is set
        TCaptchaClient->>TCaptchaClient: set Referer and Origin in verify_headers
    end
    TCaptchaClient->>TCaptchaClient: set Accept header if missing
    TCaptchaClient->>TCaptchaClient: set Content_Type to application_x_www_form_urlencoded
    TCaptchaClient->>TCaptchaClient: urlencode body to bytes
    TCaptchaClient->>WreqClient: post(url, body, headers)
    WreqClient->>TCaptchaServer: HTTPS POST /verify with Chrome emulation
    TCaptchaServer-->>WreqClient: HTTP response (status, body)
    WreqClient-->>TCaptchaClient: Response
    TCaptchaClient->>TCaptchaClient: status = resp.status.as_int()
    alt status != 200
        TCaptchaClient->>Caller: raise NetworkError("verify failed: HTTP {status}")
    else status == 200
        TCaptchaClient->>TCaptchaClient: parse_jsonp(resp.text())
        TCaptchaClient->>Caller: VerifyResponse
    end
Loading

Sequence diagram for TCaptchaClient prehandle and image download with persistent wreq client

sequenceDiagram
    actor Caller
    participant TCaptchaClient
    participant WreqClient
    participant TCaptchaServer

    Caller->>TCaptchaClient: __init__(settings, ua, timeout, proxy)
    TCaptchaClient->>TCaptchaClient: _common_headers = {User_Agent: ua}
    TCaptchaClient->>TCaptchaClient: emu = _resolve_emulation(settings.emulation)
    TCaptchaClient->>TCaptchaClient: build client_kw (emulation, user_agent, timeout, cookie_store, proxies)
    TCaptchaClient->>WreqClient: create blocking Client with client_kw
    TCaptchaClient-->>Caller: client instance

    Caller->>TCaptchaClient: prehandle(aid, subsid, entry_url)
    TCaptchaClient->>TCaptchaClient: compute params and referer
    TCaptchaClient->>TCaptchaClient: headers = _common_headers + Referer
    TCaptchaClient->>WreqClient: get(url, query=params, headers=headers)
    WreqClient->>TCaptchaServer: HTTPS GET /prehandle with Chrome emulation
    TCaptchaServer-->>WreqClient: HTTP 200 JSONP
    WreqClient-->>TCaptchaClient: Response
    TCaptchaClient->>TCaptchaClient: status = resp.status.as_int()
    alt status != 200
        TCaptchaClient->>Caller: raise NetworkError
    else status == 200
        TCaptchaClient->>TCaptchaClient: raw_text = resp.text()
        TCaptchaClient->>TCaptchaClient: data = parse_jsonp(raw_text)
        TCaptchaClient-->>Caller: PrehandleResponse
    end

    Caller->>TCaptchaClient: get_image(img_url)
    TCaptchaClient->>TCaptchaClient: full = absolute url
    TCaptchaClient->>TCaptchaClient: headers = _common_headers + Referer
    TCaptchaClient->>WreqClient: get(full, headers=headers)
    WreqClient->>TCaptchaServer: HTTPS GET image with Chrome emulation
    TCaptchaServer-->>WreqClient: HTTP response (status, bytes)
    WreqClient-->>TCaptchaClient: Response
    TCaptchaClient->>TCaptchaClient: status = resp.status.as_int()
    TCaptchaClient->>TCaptchaClient: body = resp.bytes()
    alt status != 200 or len(body) == 0
        TCaptchaClient->>Caller: raise NetworkError
    else
        TCaptchaClient-->>Caller: image bytes
    end

    Caller->>TCaptchaClient: close()
    TCaptchaClient->>TCaptchaClient: closer = getattr(_http, close)
    alt closer is callable
        TCaptchaClient->>TCaptchaClient: call closer() with suppressed exceptions
    end
Loading

Updated class diagram for TCaptchaClient and related settings using wreq

classDiagram
    class TCaptchaSettings {
        +str base_url
        +str tdc_node_path
        +str or None proxy
        +str emulation
        +str llm_api_key
        +str llm_base_url
        +str llm_model
        +float llm_timeout
    }

    class TCaptchaClient {
        -str _base_url
        -str _entry_url
        -float _timeout
        -str or None _proxy
        -dict~str, str~ _common_headers
        -WreqClient _http
        +__init__(settings, ua, timeout, proxy)
        +close() None
        +prehandle(aid, subsid, entry_url) PrehandleResponse
        +get_image(img_url) bytes
        +get_fg_image_url(bg_img_url) str
        +verify(sess, ans, pow_answer, pow_calc_time, collect, eks) VerifyResponse
        +__enter__() TCaptchaClient
        +__exit__(exc_type, exc_value, traceback) None
    }

    class WreqClient {
        +WreqClient(emulation, user_agent, timeout, cookie_store, proxies)
        +get(url, query, headers) WreqResponse
        +post(url, body, headers) WreqResponse
        +close() None
    }

    class WreqResponse {
        +status Status
        +status.as_int() int
        +text() str
        +bytes() bytes
    }

    class Emulation {
        <<enumeration>>
        Chrome137
        Chrome134
        Chrome131
    }

    class Logger {
        +warning(msg, name)
        +info(msg, args)
    }

    class HelperFunctions {
        +_resolve_emulation(name) Emulation
        +_origin_of(url) str
        +parse_jsonp(raw_text) dict
    }

    TCaptchaSettings "1" --> "1" TCaptchaClient : provides_config
    TCaptchaClient ..> WreqClient : creates_and_uses
    TCaptchaClient ..> Emulation : uses_for_chrome_profile
    TCaptchaClient ..> Logger : logs_events
    TCaptchaClient ..> HelperFunctions : calls_helpers
    WreqClient --> WreqResponse : returns
    Emulation <.. HelperFunctions : returned_by
Loading

File-Level Changes

Change Details Files
Replace scrapling Fetcher + curl_cffi HTTP layer with a stateful wreq blocking.Client using configurable Chrome emulation, adjusting client behavior and error handling.
  • Introduce a module-level logger and a helper _resolve_emulation to map settings.emulation strings to wreq.Emulation values with a safe fallback and warning.
  • Change TCaptchaClient from a stateless facade to a stateful wrapper around wreq.blocking.Client, storing common headers, configuring timeout, proxy, cookie_store, and emulation on construction.
  • Replace Fetcher.get/Fetcher.post calls with self._http.get/self._http.post, adapting parameter names (query vs params, body vs data), response handling (status.as_int, text(), bytes()), and error messages.
  • Ensure verify() explicitly URL-encodes the POST body and sets Content-Type and Accept headers, while building Referer/Origin headers from entry_url and its origin.
  • Update get_image() to log status/size using wreq responses, and to raise NetworkError on non-200 or empty bodies, returning raw bytes from resp.bytes().
  • Implement a close() method that best-effort calls underlying client.close() if present, suppressing exceptions.
src/crack_tcaptcha/client.py
Extend configuration and documentation to describe wreq-based Chrome TLS/HTTP2 emulation and the new emulation setting, while bumping the required Python version and base dependencies.
  • Add an emulation field to TCaptchaSettings with default "Chrome137" and comments explaining mapping to wreq.Emulation and supported values.
  • Replace scrapling/curl_cffi references in AGENTS.md, README.md, docs/architecture.md, and docs/reverse-notes.md with wreq usage details, including Emulation.Chrome137, HTTP/2 fingerprinting, and TCAPTCHA_EMULATION environment variable.
  • Update pyproject.toml to depend on wreq>=0.11 instead of scrapling[fetchers], bump requires-python to >=3.11, and set ruff target-version to py311.
  • Clarify that wreq is a required dependency for Chrome TLS/HTTP2 fingerprint emulation and should not be replaced with plain httpx.
  • Add a noqa: UP042 comment on TCaptchaType to justify keeping str+Enum for pydantic/JSON compatibility.
src/crack_tcaptcha/settings.py
AGENTS.md
README.md
docs/architecture.md
docs/reverse-notes.md
pyproject.toml
src/crack_tcaptcha/models.py
Rework HTTP-layer tests to stub the wreq client directly, removing respx/httpx usage and adding more precise assertions on headers, URL construction, and error paths.
  • Introduce _FakeStatus and _FakeResponse helpers in tests to mimic wreq response API (status.as_int, text, bytes, json).
  • Add a _patch_http helper that injects a SimpleNamespace with get/post/close methods into TCaptchaClient, capturing call arguments for assertions while returning preconfigured fake responses.
  • Rewrite TestPrehandle tests to use the fake client, assert the prehandle happy path, and verify URL, Referer header fallback, and query params; add a new test_prehandle_http_error to assert error handling on non-200 responses.
  • Rewrite TestGetImage tests to use the fake client, asserting Referer header and HTTPS URL construction, and add test_empty_body_raises to validate empty-body NetworkError behavior.
  • Rewrite TestVerify tests to use the fake client, asserting correct Referer/Origin/Content-Type headers and urlencoded body for success, and maintaining failure-path behavior with errorCode 15; remove respx and httpx usage and all skip markers tied to scrapling.
  • Keep TestFgImageUrl active (no longer skipped) since HTTP mocking is no longer required there.
tests/test_client.py
Miscellaneous robustness and compatibility tweaks unrelated to the main HTTP backend swap.
  • Change the tdc Node.js jsdom collector to treat timeouts as TimeoutError instead of asyncio.TimeoutError when killing the subprocess and raising TDCError.
  • Create a placeholder llms.txt file in the repo (likely for documentation or tooling integration).
src/crack_tcaptcha/tdc/nodejs_jsdom.py
llms.txt

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - 我发现了 5 个问题,并留下了一些整体反馈:

  • tdc/nodejs_jsdom.py 中,将 except asyncio.TimeoutError 改成 except TimeoutError 会导致 asyncio.wait_for 的超时不再被捕获,因此子进程在超时时可能不会被杀掉;你可能需要继续捕获 asyncio.TimeoutError(或者两者都捕获)。
  • TCaptchaClient.verify 中,请求体是用 urllib.parse.urlencode(body).encode() 手动编码的,这跟之前由 HTTP 客户端对 dict 自动编码的行为不同;如果任何字段可能是序列类型(例如多值参数),请考虑使用 doseq=True,或者确认这种编码方式确实符合上游 API 的预期。
给 AI Agent 的提示词
Please address the comments from this code review:

## Overall Comments
- In `tdc/nodejs_jsdom.py`, changing `except asyncio.TimeoutError` to `except TimeoutError` means `asyncio.wait_for` timeouts will no longer be caught, so the subprocess may not be killed on timeout; you likely want to keep catching `asyncio.TimeoutError` (or both).
- In `TCaptchaClient.verify`, the body is manually encoded with `urllib.parse.urlencode(body).encode()`, which differs from the previous behavior where the HTTP client encoded a dict; if any field can be a sequence (e.g., multi-valued params), consider using `doseq=True` or otherwise confirming this encoding matches the upstream API expectations.

## Individual Comments

### Comment 1
<location path="src/crack_tcaptcha/client.py" line_range="354" />
<code_context>
-            resp = Fetcher.post(url, data=body, **fetch_kw)
-            if resp.status != 200:
-                raise NetworkError(f"verify failed: HTTP {resp.status}")
+            resp = self._http.post(url, body=urllib.parse.urlencode(body).encode(), headers=verify_headers)
+            status = resp.status.as_int()
+            if status != 200:
</code_context>
<issue_to_address>
**issue:** Using `urllib.parse.urlencode` without `doseq=True` may subtly change how list-valued fields are encoded compared to the previous client.

The previous `Fetcher.post(..., data=body)` likely encoded sequences as repeated keys, while `urllib.parse.urlencode` will stringify list/tuple values unless `doseq=True` is used. If any `verify` payload fields are or become multi-valued, this changes the wire format and can break backend compatibility. Please use `urllib.parse.urlencode(body, doseq=True)` to preserve the prior semantics for sequence values.
</issue_to_address>

### Comment 2
<location path="tests/test_client.py" line_range="136-141" />
<code_context>
+        assert "Referer" in kw["headers"]
+        assert kw["query"]["aid"] == "12345"
+
+    def test_prehandle_http_error(self):
+        with TCaptchaClient() as c:
+            _patch_http(c, get=_FakeResponse(500, body=b""))
+            with pytest.raises(Exception) as exc_info:
+                c.prehandle("12345")
+            assert "prehandle failed" in str(exc_info.value)


</code_context>
<issue_to_address>
**suggestion (testing):** Assert the specific NetworkError type instead of a generic Exception in prehandle error tests

Since failures are wrapped in `NetworkError`, this test can be stricter by asserting that type (and optionally its attributes), e.g.:

```python
with pytest.raises(NetworkError) as exc_info:
    c.prehandle("12345")
assert "prehandle failed" in str(exc_info.value)
```
This avoids passing when an unrelated exception is raised and better captures the public error contract.
</issue_to_address>

### Comment 3
<location path="tests/test_client.py" line_range="159-164" />
<code_context>
+        assert url.startswith("https://")
+        assert kw["headers"]["Referer"] == "https://turing.captcha.gtimg.com/"
+
+    def test_empty_body_raises(self):
+        with TCaptchaClient() as c:
+            _patch_http(c, get=_FakeResponse(200, body=b""))
+            with pytest.raises(Exception) as exc_info:
+                c.get_image("/img?x=1")
+            assert "empty body" in str(exc_info.value)


</code_context>
<issue_to_address>
**suggestion (testing):** Tighten the get_image error-path assertion to the expected NetworkError type

Here it would be better to assert that `get_image` raises `NetworkError` (as in `test_prehandle_http_error`) instead of a bare `Exception`, e.g.:

```python
with pytest.raises(NetworkError) as exc_info:
    c.get_image("/img?x=1")
assert "empty body" in str(exc_info.value)
```

This keeps the test focused on the public exception type rather than any arbitrary error.

Suggested implementation:

```python
            _patch_http(c, get=_FakeResponse(200, body=b""))
            with pytest.raises(NetworkError) as exc_info:
                c.get_image("/img?x=1")
            assert "empty body" in str(exc_info.value)

```

To compile and run correctly, `NetworkError` needs to be imported in this test module. In `tests/test_client.py`, update the existing import from your client module (likely where `TCaptchaClient` is imported) to also import `NetworkError`, e.g. change:

```python
from tcaptcha.client import TCaptchaClient
```

to:

```python
from tcaptcha.client import TCaptchaClient, NetworkError
```

or otherwise import `NetworkError` from the appropriate module where it is defined.
</issue_to_address>

### Comment 4
<location path="tests/test_client.py" line_range="149" />
<code_context>


 class TestGetImage:
-    @pytest.mark.skip(reason=_SKIP_REASON)
-    @respx.mock
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test for non-200 HTTP status in get_image

Please also cover the `HTTP != 200` branch now that `get_image` raises `NetworkError(f"image download failed: HTTP {status}")`. For instance, patch `_http.get` to return `_FakeResponse(500, body=b"error")` and assert that a `NetworkError` is raised with the expected message fragment, so the new wreq-based error handling is verified.
</issue_to_address>

### Comment 5
<location path="docs/reverse-notes.md" line_range="71" />
<code_context>
 - Canvas 指纹 —— jsdom 返回空白,目前仍能通过(腾讯侧未强校验)
 - 请求频率 —— 大规模场景用代理池,通过 `TCAPTCHA_PROXY``solve()` 的 proxy 参数配置
-- TLS 指纹 —— 普通 Python HTTP 库会被 403;`client.py` 使用 `scrapling.Fetcher`(基于 `curl_cffi`)做 Chrome TLS 模拟
+- TLS 指纹 —— 普通 Python HTTP 库会被 403;`client.py` 使用 `wreq.blocking.Client``Emulation.Chrome137`,可通过 `TCAPTCHA_EMULATION` 切换)做 Chrome TLS / HTTP2 指纹模拟
</code_context>
<issue_to_address>
**nitpick (typo):** Align `HTTP2` with `HTTP/2` spelling used elsewhere.

这一行也用了 `HTTP2`,建议改为 `HTTP/2`,与其他文档中的写法保持一致。
</issue_to_address>

Sourcery 对开源项目免费 —— 如果你觉得这次 Review 有帮助,欢迎分享 ✨
帮我变得更有用!请在每条评论上点 👍 或 👎,我会根据你的反馈改进后续 Review。
Original comment in English

Hey - I've found 5 issues, and left some high level feedback:

  • In tdc/nodejs_jsdom.py, changing except asyncio.TimeoutError to except TimeoutError means asyncio.wait_for timeouts will no longer be caught, so the subprocess may not be killed on timeout; you likely want to keep catching asyncio.TimeoutError (or both).
  • In TCaptchaClient.verify, the body is manually encoded with urllib.parse.urlencode(body).encode(), which differs from the previous behavior where the HTTP client encoded a dict; if any field can be a sequence (e.g., multi-valued params), consider using doseq=True or otherwise confirming this encoding matches the upstream API expectations.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `tdc/nodejs_jsdom.py`, changing `except asyncio.TimeoutError` to `except TimeoutError` means `asyncio.wait_for` timeouts will no longer be caught, so the subprocess may not be killed on timeout; you likely want to keep catching `asyncio.TimeoutError` (or both).
- In `TCaptchaClient.verify`, the body is manually encoded with `urllib.parse.urlencode(body).encode()`, which differs from the previous behavior where the HTTP client encoded a dict; if any field can be a sequence (e.g., multi-valued params), consider using `doseq=True` or otherwise confirming this encoding matches the upstream API expectations.

## Individual Comments

### Comment 1
<location path="src/crack_tcaptcha/client.py" line_range="354" />
<code_context>
-            resp = Fetcher.post(url, data=body, **fetch_kw)
-            if resp.status != 200:
-                raise NetworkError(f"verify failed: HTTP {resp.status}")
+            resp = self._http.post(url, body=urllib.parse.urlencode(body).encode(), headers=verify_headers)
+            status = resp.status.as_int()
+            if status != 200:
</code_context>
<issue_to_address>
**issue:** Using `urllib.parse.urlencode` without `doseq=True` may subtly change how list-valued fields are encoded compared to the previous client.

The previous `Fetcher.post(..., data=body)` likely encoded sequences as repeated keys, while `urllib.parse.urlencode` will stringify list/tuple values unless `doseq=True` is used. If any `verify` payload fields are or become multi-valued, this changes the wire format and can break backend compatibility. Please use `urllib.parse.urlencode(body, doseq=True)` to preserve the prior semantics for sequence values.
</issue_to_address>

### Comment 2
<location path="tests/test_client.py" line_range="136-141" />
<code_context>
+        assert "Referer" in kw["headers"]
+        assert kw["query"]["aid"] == "12345"
+
+    def test_prehandle_http_error(self):
+        with TCaptchaClient() as c:
+            _patch_http(c, get=_FakeResponse(500, body=b""))
+            with pytest.raises(Exception) as exc_info:
+                c.prehandle("12345")
+            assert "prehandle failed" in str(exc_info.value)


</code_context>
<issue_to_address>
**suggestion (testing):** Assert the specific NetworkError type instead of a generic Exception in prehandle error tests

Since failures are wrapped in `NetworkError`, this test can be stricter by asserting that type (and optionally its attributes), e.g.:

```python
with pytest.raises(NetworkError) as exc_info:
    c.prehandle("12345")
assert "prehandle failed" in str(exc_info.value)
```
This avoids passing when an unrelated exception is raised and better captures the public error contract.
</issue_to_address>

### Comment 3
<location path="tests/test_client.py" line_range="159-164" />
<code_context>
+        assert url.startswith("https://")
+        assert kw["headers"]["Referer"] == "https://turing.captcha.gtimg.com/"
+
+    def test_empty_body_raises(self):
+        with TCaptchaClient() as c:
+            _patch_http(c, get=_FakeResponse(200, body=b""))
+            with pytest.raises(Exception) as exc_info:
+                c.get_image("/img?x=1")
+            assert "empty body" in str(exc_info.value)


</code_context>
<issue_to_address>
**suggestion (testing):** Tighten the get_image error-path assertion to the expected NetworkError type

Here it would be better to assert that `get_image` raises `NetworkError` (as in `test_prehandle_http_error`) instead of a bare `Exception`, e.g.:

```python
with pytest.raises(NetworkError) as exc_info:
    c.get_image("/img?x=1")
assert "empty body" in str(exc_info.value)
```

This keeps the test focused on the public exception type rather than any arbitrary error.

Suggested implementation:

```python
            _patch_http(c, get=_FakeResponse(200, body=b""))
            with pytest.raises(NetworkError) as exc_info:
                c.get_image("/img?x=1")
            assert "empty body" in str(exc_info.value)

```

To compile and run correctly, `NetworkError` needs to be imported in this test module. In `tests/test_client.py`, update the existing import from your client module (likely where `TCaptchaClient` is imported) to also import `NetworkError`, e.g. change:

```python
from tcaptcha.client import TCaptchaClient
```

to:

```python
from tcaptcha.client import TCaptchaClient, NetworkError
```

or otherwise import `NetworkError` from the appropriate module where it is defined.
</issue_to_address>

### Comment 4
<location path="tests/test_client.py" line_range="149" />
<code_context>


 class TestGetImage:
-    @pytest.mark.skip(reason=_SKIP_REASON)
-    @respx.mock
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test for non-200 HTTP status in get_image

Please also cover the `HTTP != 200` branch now that `get_image` raises `NetworkError(f"image download failed: HTTP {status}")`. For instance, patch `_http.get` to return `_FakeResponse(500, body=b"error")` and assert that a `NetworkError` is raised with the expected message fragment, so the new wreq-based error handling is verified.
</issue_to_address>

### Comment 5
<location path="docs/reverse-notes.md" line_range="71" />
<code_context>
 - Canvas 指纹 —— jsdom 返回空白,目前仍能通过(腾讯侧未强校验)
 - 请求频率 —— 大规模场景用代理池,通过 `TCAPTCHA_PROXY``solve()` 的 proxy 参数配置
-- TLS 指纹 —— 普通 Python HTTP 库会被 403;`client.py` 使用 `scrapling.Fetcher`(基于 `curl_cffi`)做 Chrome TLS 模拟
+- TLS 指纹 —— 普通 Python HTTP 库会被 403;`client.py` 使用 `wreq.blocking.Client``Emulation.Chrome137`,可通过 `TCAPTCHA_EMULATION` 切换)做 Chrome TLS / HTTP2 指纹模拟
</code_context>
<issue_to_address>
**nitpick (typo):** Align `HTTP2` with `HTTP/2` spelling used elsewhere.

这一行也用了 `HTTP2`,建议改为 `HTTP/2`,与其他文档中的写法保持一致。
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread src/crack_tcaptcha/client.py
Comment thread tests/test_client.py
Comment thread tests/test_client.py
Comment thread tests/test_client.py
Comment thread docs/reverse-notes.md
@lifefloating lifefloating merged commit a67b6e9 into main Apr 24, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant