refactor(client): migrate HTTP backend from scrapling/curl_cffi to wreq#4
Merged
Conversation
审阅者指南将 TCaptcha HTTP 客户端重构为使用持久化的 wreq 使用 wreq blocking.Client 的 TCaptchaClient verify 流程时序图sequenceDiagram
actor Caller
participant TCaptchaClient
participant WreqClient
participant TCaptchaServer
Caller->>TCaptchaClient: verify(sess, ans, pow_answer, pow_calc_time, collect, eks)
TCaptchaClient->>TCaptchaClient: build body dict
TCaptchaClient->>TCaptchaClient: origin = _origin_of(_entry_url)
TCaptchaClient->>TCaptchaClient: verify_headers = _common_headers copy
alt entry_url is set
TCaptchaClient->>TCaptchaClient: set Referer and Origin in verify_headers
end
TCaptchaClient->>TCaptchaClient: set Accept header if missing
TCaptchaClient->>TCaptchaClient: set Content_Type to application_x_www_form_urlencoded
TCaptchaClient->>TCaptchaClient: urlencode body to bytes
TCaptchaClient->>WreqClient: post(url, body, headers)
WreqClient->>TCaptchaServer: HTTPS POST /verify with Chrome emulation
TCaptchaServer-->>WreqClient: HTTP response (status, body)
WreqClient-->>TCaptchaClient: Response
TCaptchaClient->>TCaptchaClient: status = resp.status.as_int()
alt status != 200
TCaptchaClient->>Caller: raise NetworkError("verify failed: HTTP {status}")
else status == 200
TCaptchaClient->>TCaptchaClient: parse_jsonp(resp.text())
TCaptchaClient->>Caller: VerifyResponse
end
使用持久化 wreq 客户端的 TCaptchaClient prehandle 和图片下载时序图sequenceDiagram
actor Caller
participant TCaptchaClient
participant WreqClient
participant TCaptchaServer
Caller->>TCaptchaClient: __init__(settings, ua, timeout, proxy)
TCaptchaClient->>TCaptchaClient: _common_headers = {User_Agent: ua}
TCaptchaClient->>TCaptchaClient: emu = _resolve_emulation(settings.emulation)
TCaptchaClient->>TCaptchaClient: build client_kw (emulation, user_agent, timeout, cookie_store, proxies)
TCaptchaClient->>WreqClient: create blocking Client with client_kw
TCaptchaClient-->>Caller: client instance
Caller->>TCaptchaClient: prehandle(aid, subsid, entry_url)
TCaptchaClient->>TCaptchaClient: compute params and referer
TCaptchaClient->>TCaptchaClient: headers = _common_headers + Referer
TCaptchaClient->>WreqClient: get(url, query=params, headers=headers)
WreqClient->>TCaptchaServer: HTTPS GET /prehandle with Chrome emulation
TCaptchaServer-->>WreqClient: HTTP 200 JSONP
WreqClient-->>TCaptchaClient: Response
TCaptchaClient->>TCaptchaClient: status = resp.status.as_int()
alt status != 200
TCaptchaClient->>Caller: raise NetworkError
else status == 200
TCaptchaClient->>TCaptchaClient: raw_text = resp.text()
TCaptchaClient->>TCaptchaClient: data = parse_jsonp(raw_text)
TCaptchaClient-->>Caller: PrehandleResponse
end
Caller->>TCaptchaClient: get_image(img_url)
TCaptchaClient->>TCaptchaClient: full = absolute url
TCaptchaClient->>TCaptchaClient: headers = _common_headers + Referer
TCaptchaClient->>WreqClient: get(full, headers=headers)
WreqClient->>TCaptchaServer: HTTPS GET image with Chrome emulation
TCaptchaServer-->>WreqClient: HTTP response (status, bytes)
WreqClient-->>TCaptchaClient: Response
TCaptchaClient->>TCaptchaClient: status = resp.status.as_int()
TCaptchaClient->>TCaptchaClient: body = resp.bytes()
alt status != 200 or len(body) == 0
TCaptchaClient->>Caller: raise NetworkError
else
TCaptchaClient-->>Caller: image bytes
end
Caller->>TCaptchaClient: close()
TCaptchaClient->>TCaptchaClient: closer = getattr(_http, close)
alt closer is callable
TCaptchaClient->>TCaptchaClient: call closer() with suppressed exceptions
end
使用 wreq 的 TCaptchaClient 及相关设置的更新类图classDiagram
class TCaptchaSettings {
+str base_url
+str tdc_node_path
+str or None proxy
+str emulation
+str llm_api_key
+str llm_base_url
+str llm_model
+float llm_timeout
}
class TCaptchaClient {
-str _base_url
-str _entry_url
-float _timeout
-str or None _proxy
-dict~str, str~ _common_headers
-WreqClient _http
+__init__(settings, ua, timeout, proxy)
+close() None
+prehandle(aid, subsid, entry_url) PrehandleResponse
+get_image(img_url) bytes
+get_fg_image_url(bg_img_url) str
+verify(sess, ans, pow_answer, pow_calc_time, collect, eks) VerifyResponse
+__enter__() TCaptchaClient
+__exit__(exc_type, exc_value, traceback) None
}
class WreqClient {
+WreqClient(emulation, user_agent, timeout, cookie_store, proxies)
+get(url, query, headers) WreqResponse
+post(url, body, headers) WreqResponse
+close() None
}
class WreqResponse {
+status Status
+status.as_int() int
+text() str
+bytes() bytes
}
class Emulation {
<<enumeration>>
Chrome137
Chrome134
Chrome131
}
class Logger {
+warning(msg, name)
+info(msg, args)
}
class HelperFunctions {
+_resolve_emulation(name) Emulation
+_origin_of(url) str
+parse_jsonp(raw_text) dict
}
TCaptchaSettings "1" --> "1" TCaptchaClient : provides_config
TCaptchaClient ..> WreqClient : creates_and_uses
TCaptchaClient ..> Emulation : uses_for_chrome_profile
TCaptchaClient ..> Logger : logs_events
TCaptchaClient ..> HelperFunctions : calls_helpers
WreqClient --> WreqResponse : returns
Emulation <.. HelperFunctions : returned_by
文件级变更
提示与命令与 Sourcery 交互
自定义你的体验访问你的 控制台 来:
获取帮助Original review guide in EnglishReviewer's GuideRefactors the TCaptcha HTTP client to use a persistent wreq blocking.Client with Chrome emulation instead of scrapling/curl_cffi, updates configuration and docs accordingly, and replaces respx-based tests with local fake wreq stubs that assert headers, query params, and error handling. Sequence diagram for TCaptchaClient verify flow using wreq blocking.ClientsequenceDiagram
actor Caller
participant TCaptchaClient
participant WreqClient
participant TCaptchaServer
Caller->>TCaptchaClient: verify(sess, ans, pow_answer, pow_calc_time, collect, eks)
TCaptchaClient->>TCaptchaClient: build body dict
TCaptchaClient->>TCaptchaClient: origin = _origin_of(_entry_url)
TCaptchaClient->>TCaptchaClient: verify_headers = _common_headers copy
alt entry_url is set
TCaptchaClient->>TCaptchaClient: set Referer and Origin in verify_headers
end
TCaptchaClient->>TCaptchaClient: set Accept header if missing
TCaptchaClient->>TCaptchaClient: set Content_Type to application_x_www_form_urlencoded
TCaptchaClient->>TCaptchaClient: urlencode body to bytes
TCaptchaClient->>WreqClient: post(url, body, headers)
WreqClient->>TCaptchaServer: HTTPS POST /verify with Chrome emulation
TCaptchaServer-->>WreqClient: HTTP response (status, body)
WreqClient-->>TCaptchaClient: Response
TCaptchaClient->>TCaptchaClient: status = resp.status.as_int()
alt status != 200
TCaptchaClient->>Caller: raise NetworkError("verify failed: HTTP {status}")
else status == 200
TCaptchaClient->>TCaptchaClient: parse_jsonp(resp.text())
TCaptchaClient->>Caller: VerifyResponse
end
Sequence diagram for TCaptchaClient prehandle and image download with persistent wreq clientsequenceDiagram
actor Caller
participant TCaptchaClient
participant WreqClient
participant TCaptchaServer
Caller->>TCaptchaClient: __init__(settings, ua, timeout, proxy)
TCaptchaClient->>TCaptchaClient: _common_headers = {User_Agent: ua}
TCaptchaClient->>TCaptchaClient: emu = _resolve_emulation(settings.emulation)
TCaptchaClient->>TCaptchaClient: build client_kw (emulation, user_agent, timeout, cookie_store, proxies)
TCaptchaClient->>WreqClient: create blocking Client with client_kw
TCaptchaClient-->>Caller: client instance
Caller->>TCaptchaClient: prehandle(aid, subsid, entry_url)
TCaptchaClient->>TCaptchaClient: compute params and referer
TCaptchaClient->>TCaptchaClient: headers = _common_headers + Referer
TCaptchaClient->>WreqClient: get(url, query=params, headers=headers)
WreqClient->>TCaptchaServer: HTTPS GET /prehandle with Chrome emulation
TCaptchaServer-->>WreqClient: HTTP 200 JSONP
WreqClient-->>TCaptchaClient: Response
TCaptchaClient->>TCaptchaClient: status = resp.status.as_int()
alt status != 200
TCaptchaClient->>Caller: raise NetworkError
else status == 200
TCaptchaClient->>TCaptchaClient: raw_text = resp.text()
TCaptchaClient->>TCaptchaClient: data = parse_jsonp(raw_text)
TCaptchaClient-->>Caller: PrehandleResponse
end
Caller->>TCaptchaClient: get_image(img_url)
TCaptchaClient->>TCaptchaClient: full = absolute url
TCaptchaClient->>TCaptchaClient: headers = _common_headers + Referer
TCaptchaClient->>WreqClient: get(full, headers=headers)
WreqClient->>TCaptchaServer: HTTPS GET image with Chrome emulation
TCaptchaServer-->>WreqClient: HTTP response (status, bytes)
WreqClient-->>TCaptchaClient: Response
TCaptchaClient->>TCaptchaClient: status = resp.status.as_int()
TCaptchaClient->>TCaptchaClient: body = resp.bytes()
alt status != 200 or len(body) == 0
TCaptchaClient->>Caller: raise NetworkError
else
TCaptchaClient-->>Caller: image bytes
end
Caller->>TCaptchaClient: close()
TCaptchaClient->>TCaptchaClient: closer = getattr(_http, close)
alt closer is callable
TCaptchaClient->>TCaptchaClient: call closer() with suppressed exceptions
end
Updated class diagram for TCaptchaClient and related settings using wreqclassDiagram
class TCaptchaSettings {
+str base_url
+str tdc_node_path
+str or None proxy
+str emulation
+str llm_api_key
+str llm_base_url
+str llm_model
+float llm_timeout
}
class TCaptchaClient {
-str _base_url
-str _entry_url
-float _timeout
-str or None _proxy
-dict~str, str~ _common_headers
-WreqClient _http
+__init__(settings, ua, timeout, proxy)
+close() None
+prehandle(aid, subsid, entry_url) PrehandleResponse
+get_image(img_url) bytes
+get_fg_image_url(bg_img_url) str
+verify(sess, ans, pow_answer, pow_calc_time, collect, eks) VerifyResponse
+__enter__() TCaptchaClient
+__exit__(exc_type, exc_value, traceback) None
}
class WreqClient {
+WreqClient(emulation, user_agent, timeout, cookie_store, proxies)
+get(url, query, headers) WreqResponse
+post(url, body, headers) WreqResponse
+close() None
}
class WreqResponse {
+status Status
+status.as_int() int
+text() str
+bytes() bytes
}
class Emulation {
<<enumeration>>
Chrome137
Chrome134
Chrome131
}
class Logger {
+warning(msg, name)
+info(msg, args)
}
class HelperFunctions {
+_resolve_emulation(name) Emulation
+_origin_of(url) str
+parse_jsonp(raw_text) dict
}
TCaptchaSettings "1" --> "1" TCaptchaClient : provides_config
TCaptchaClient ..> WreqClient : creates_and_uses
TCaptchaClient ..> Emulation : uses_for_chrome_profile
TCaptchaClient ..> Logger : logs_events
TCaptchaClient ..> HelperFunctions : calls_helpers
WreqClient --> WreqResponse : returns
Emulation <.. HelperFunctions : returned_by
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - 我发现了 5 个问题,并留下了一些整体反馈:
- 在
tdc/nodejs_jsdom.py中,将except asyncio.TimeoutError改成except TimeoutError会导致asyncio.wait_for的超时不再被捕获,因此子进程在超时时可能不会被杀掉;你可能需要继续捕获asyncio.TimeoutError(或者两者都捕获)。 - 在
TCaptchaClient.verify中,请求体是用urllib.parse.urlencode(body).encode()手动编码的,这跟之前由 HTTP 客户端对 dict 自动编码的行为不同;如果任何字段可能是序列类型(例如多值参数),请考虑使用doseq=True,或者确认这种编码方式确实符合上游 API 的预期。
给 AI Agent 的提示词
Please address the comments from this code review:
## Overall Comments
- In `tdc/nodejs_jsdom.py`, changing `except asyncio.TimeoutError` to `except TimeoutError` means `asyncio.wait_for` timeouts will no longer be caught, so the subprocess may not be killed on timeout; you likely want to keep catching `asyncio.TimeoutError` (or both).
- In `TCaptchaClient.verify`, the body is manually encoded with `urllib.parse.urlencode(body).encode()`, which differs from the previous behavior where the HTTP client encoded a dict; if any field can be a sequence (e.g., multi-valued params), consider using `doseq=True` or otherwise confirming this encoding matches the upstream API expectations.
## Individual Comments
### Comment 1
<location path="src/crack_tcaptcha/client.py" line_range="354" />
<code_context>
- resp = Fetcher.post(url, data=body, **fetch_kw)
- if resp.status != 200:
- raise NetworkError(f"verify failed: HTTP {resp.status}")
+ resp = self._http.post(url, body=urllib.parse.urlencode(body).encode(), headers=verify_headers)
+ status = resp.status.as_int()
+ if status != 200:
</code_context>
<issue_to_address>
**issue:** Using `urllib.parse.urlencode` without `doseq=True` may subtly change how list-valued fields are encoded compared to the previous client.
The previous `Fetcher.post(..., data=body)` likely encoded sequences as repeated keys, while `urllib.parse.urlencode` will stringify list/tuple values unless `doseq=True` is used. If any `verify` payload fields are or become multi-valued, this changes the wire format and can break backend compatibility. Please use `urllib.parse.urlencode(body, doseq=True)` to preserve the prior semantics for sequence values.
</issue_to_address>
### Comment 2
<location path="tests/test_client.py" line_range="136-141" />
<code_context>
+ assert "Referer" in kw["headers"]
+ assert kw["query"]["aid"] == "12345"
+
+ def test_prehandle_http_error(self):
+ with TCaptchaClient() as c:
+ _patch_http(c, get=_FakeResponse(500, body=b""))
+ with pytest.raises(Exception) as exc_info:
+ c.prehandle("12345")
+ assert "prehandle failed" in str(exc_info.value)
</code_context>
<issue_to_address>
**suggestion (testing):** Assert the specific NetworkError type instead of a generic Exception in prehandle error tests
Since failures are wrapped in `NetworkError`, this test can be stricter by asserting that type (and optionally its attributes), e.g.:
```python
with pytest.raises(NetworkError) as exc_info:
c.prehandle("12345")
assert "prehandle failed" in str(exc_info.value)
```
This avoids passing when an unrelated exception is raised and better captures the public error contract.
</issue_to_address>
### Comment 3
<location path="tests/test_client.py" line_range="159-164" />
<code_context>
+ assert url.startswith("https://")
+ assert kw["headers"]["Referer"] == "https://turing.captcha.gtimg.com/"
+
+ def test_empty_body_raises(self):
+ with TCaptchaClient() as c:
+ _patch_http(c, get=_FakeResponse(200, body=b""))
+ with pytest.raises(Exception) as exc_info:
+ c.get_image("/img?x=1")
+ assert "empty body" in str(exc_info.value)
</code_context>
<issue_to_address>
**suggestion (testing):** Tighten the get_image error-path assertion to the expected NetworkError type
Here it would be better to assert that `get_image` raises `NetworkError` (as in `test_prehandle_http_error`) instead of a bare `Exception`, e.g.:
```python
with pytest.raises(NetworkError) as exc_info:
c.get_image("/img?x=1")
assert "empty body" in str(exc_info.value)
```
This keeps the test focused on the public exception type rather than any arbitrary error.
Suggested implementation:
```python
_patch_http(c, get=_FakeResponse(200, body=b""))
with pytest.raises(NetworkError) as exc_info:
c.get_image("/img?x=1")
assert "empty body" in str(exc_info.value)
```
To compile and run correctly, `NetworkError` needs to be imported in this test module. In `tests/test_client.py`, update the existing import from your client module (likely where `TCaptchaClient` is imported) to also import `NetworkError`, e.g. change:
```python
from tcaptcha.client import TCaptchaClient
```
to:
```python
from tcaptcha.client import TCaptchaClient, NetworkError
```
or otherwise import `NetworkError` from the appropriate module where it is defined.
</issue_to_address>
### Comment 4
<location path="tests/test_client.py" line_range="149" />
<code_context>
class TestGetImage:
- @pytest.mark.skip(reason=_SKIP_REASON)
- @respx.mock
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test for non-200 HTTP status in get_image
Please also cover the `HTTP != 200` branch now that `get_image` raises `NetworkError(f"image download failed: HTTP {status}")`. For instance, patch `_http.get` to return `_FakeResponse(500, body=b"error")` and assert that a `NetworkError` is raised with the expected message fragment, so the new wreq-based error handling is verified.
</issue_to_address>
### Comment 5
<location path="docs/reverse-notes.md" line_range="71" />
<code_context>
- Canvas 指纹 —— jsdom 返回空白,目前仍能通过(腾讯侧未强校验)
- 请求频率 —— 大规模场景用代理池,通过 `TCAPTCHA_PROXY` 或 `solve()` 的 proxy 参数配置
-- TLS 指纹 —— 普通 Python HTTP 库会被 403;`client.py` 使用 `scrapling.Fetcher`(基于 `curl_cffi`)做 Chrome TLS 模拟
+- TLS 指纹 —— 普通 Python HTTP 库会被 403;`client.py` 使用 `wreq.blocking.Client`(`Emulation.Chrome137`,可通过 `TCAPTCHA_EMULATION` 切换)做 Chrome TLS / HTTP2 指纹模拟
</code_context>
<issue_to_address>
**nitpick (typo):** Align `HTTP2` with `HTTP/2` spelling used elsewhere.
这一行也用了 `HTTP2`,建议改为 `HTTP/2`,与其他文档中的写法保持一致。
</issue_to_address>帮我变得更有用!请在每条评论上点 👍 或 👎,我会根据你的反馈改进后续 Review。
Original comment in English
Hey - I've found 5 issues, and left some high level feedback:
- In
tdc/nodejs_jsdom.py, changingexcept asyncio.TimeoutErrortoexcept TimeoutErrormeansasyncio.wait_fortimeouts will no longer be caught, so the subprocess may not be killed on timeout; you likely want to keep catchingasyncio.TimeoutError(or both). - In
TCaptchaClient.verify, the body is manually encoded withurllib.parse.urlencode(body).encode(), which differs from the previous behavior where the HTTP client encoded a dict; if any field can be a sequence (e.g., multi-valued params), consider usingdoseq=Trueor otherwise confirming this encoding matches the upstream API expectations.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `tdc/nodejs_jsdom.py`, changing `except asyncio.TimeoutError` to `except TimeoutError` means `asyncio.wait_for` timeouts will no longer be caught, so the subprocess may not be killed on timeout; you likely want to keep catching `asyncio.TimeoutError` (or both).
- In `TCaptchaClient.verify`, the body is manually encoded with `urllib.parse.urlencode(body).encode()`, which differs from the previous behavior where the HTTP client encoded a dict; if any field can be a sequence (e.g., multi-valued params), consider using `doseq=True` or otherwise confirming this encoding matches the upstream API expectations.
## Individual Comments
### Comment 1
<location path="src/crack_tcaptcha/client.py" line_range="354" />
<code_context>
- resp = Fetcher.post(url, data=body, **fetch_kw)
- if resp.status != 200:
- raise NetworkError(f"verify failed: HTTP {resp.status}")
+ resp = self._http.post(url, body=urllib.parse.urlencode(body).encode(), headers=verify_headers)
+ status = resp.status.as_int()
+ if status != 200:
</code_context>
<issue_to_address>
**issue:** Using `urllib.parse.urlencode` without `doseq=True` may subtly change how list-valued fields are encoded compared to the previous client.
The previous `Fetcher.post(..., data=body)` likely encoded sequences as repeated keys, while `urllib.parse.urlencode` will stringify list/tuple values unless `doseq=True` is used. If any `verify` payload fields are or become multi-valued, this changes the wire format and can break backend compatibility. Please use `urllib.parse.urlencode(body, doseq=True)` to preserve the prior semantics for sequence values.
</issue_to_address>
### Comment 2
<location path="tests/test_client.py" line_range="136-141" />
<code_context>
+ assert "Referer" in kw["headers"]
+ assert kw["query"]["aid"] == "12345"
+
+ def test_prehandle_http_error(self):
+ with TCaptchaClient() as c:
+ _patch_http(c, get=_FakeResponse(500, body=b""))
+ with pytest.raises(Exception) as exc_info:
+ c.prehandle("12345")
+ assert "prehandle failed" in str(exc_info.value)
</code_context>
<issue_to_address>
**suggestion (testing):** Assert the specific NetworkError type instead of a generic Exception in prehandle error tests
Since failures are wrapped in `NetworkError`, this test can be stricter by asserting that type (and optionally its attributes), e.g.:
```python
with pytest.raises(NetworkError) as exc_info:
c.prehandle("12345")
assert "prehandle failed" in str(exc_info.value)
```
This avoids passing when an unrelated exception is raised and better captures the public error contract.
</issue_to_address>
### Comment 3
<location path="tests/test_client.py" line_range="159-164" />
<code_context>
+ assert url.startswith("https://")
+ assert kw["headers"]["Referer"] == "https://turing.captcha.gtimg.com/"
+
+ def test_empty_body_raises(self):
+ with TCaptchaClient() as c:
+ _patch_http(c, get=_FakeResponse(200, body=b""))
+ with pytest.raises(Exception) as exc_info:
+ c.get_image("/img?x=1")
+ assert "empty body" in str(exc_info.value)
</code_context>
<issue_to_address>
**suggestion (testing):** Tighten the get_image error-path assertion to the expected NetworkError type
Here it would be better to assert that `get_image` raises `NetworkError` (as in `test_prehandle_http_error`) instead of a bare `Exception`, e.g.:
```python
with pytest.raises(NetworkError) as exc_info:
c.get_image("/img?x=1")
assert "empty body" in str(exc_info.value)
```
This keeps the test focused on the public exception type rather than any arbitrary error.
Suggested implementation:
```python
_patch_http(c, get=_FakeResponse(200, body=b""))
with pytest.raises(NetworkError) as exc_info:
c.get_image("/img?x=1")
assert "empty body" in str(exc_info.value)
```
To compile and run correctly, `NetworkError` needs to be imported in this test module. In `tests/test_client.py`, update the existing import from your client module (likely where `TCaptchaClient` is imported) to also import `NetworkError`, e.g. change:
```python
from tcaptcha.client import TCaptchaClient
```
to:
```python
from tcaptcha.client import TCaptchaClient, NetworkError
```
or otherwise import `NetworkError` from the appropriate module where it is defined.
</issue_to_address>
### Comment 4
<location path="tests/test_client.py" line_range="149" />
<code_context>
class TestGetImage:
- @pytest.mark.skip(reason=_SKIP_REASON)
- @respx.mock
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test for non-200 HTTP status in get_image
Please also cover the `HTTP != 200` branch now that `get_image` raises `NetworkError(f"image download failed: HTTP {status}")`. For instance, patch `_http.get` to return `_FakeResponse(500, body=b"error")` and assert that a `NetworkError` is raised with the expected message fragment, so the new wreq-based error handling is verified.
</issue_to_address>
### Comment 5
<location path="docs/reverse-notes.md" line_range="71" />
<code_context>
- Canvas 指纹 —— jsdom 返回空白,目前仍能通过(腾讯侧未强校验)
- 请求频率 —— 大规模场景用代理池,通过 `TCAPTCHA_PROXY` 或 `solve()` 的 proxy 参数配置
-- TLS 指纹 —— 普通 Python HTTP 库会被 403;`client.py` 使用 `scrapling.Fetcher`(基于 `curl_cffi`)做 Chrome TLS 模拟
+- TLS 指纹 —— 普通 Python HTTP 库会被 403;`client.py` 使用 `wreq.blocking.Client`(`Emulation.Chrome137`,可通过 `TCAPTCHA_EMULATION` 切换)做 Chrome TLS / HTTP2 指纹模拟
</code_context>
<issue_to_address>
**nitpick (typo):** Align `HTTP2` with `HTTP/2` spelling used elsewhere.
这一行也用了 `HTTP2`,建议改为 `HTTP/2`,与其他文档中的写法保持一致。
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
由 Sourcery 提供的摘要
将 TCaptcha HTTP 客户端从 scrapling/curl_cffi 技术栈迁移到带有 Chrome TLS/HTTP2 仿真的有状态 wreq 阻塞客户端,并相应地更新配置、文档和测试。
新特性:
TCAPTCHA_EMULATION设置添加可配置的 wreq Chrome 仿真配置文件,用于控制 HTTP 客户端所使用的 TLS/HTTP2 指纹。缺陷修复:
NetworkError异常。TimeoutError。增强:
TCaptchaClient,使其维护一个可复用的 wreq 客户端,在验证码流程中共享连接池和 Cookie 存储。User-Agent、Referer、Origin和Content-Type等值保持一致,并与真实的 Chrome 行为对齐。构建:
文档:
测试:
杂务:
Original summary in English
Summary by Sourcery
Migrate the TCaptcha HTTP client from the scrapling/curl_cffi stack to a stateful wreq blocking client with Chrome TLS/HTTP2 emulation, updating configuration, docs, and tests accordingly.
New Features:
Bug Fixes:
Enhancements:
Build:
Documentation:
Tests:
Chores: