Skip to content

Bump chardet from 5.2.0 to 7.4.1 in /docker/google-vision-api#43734

Open
dependabot[bot] wants to merge 1 commit intomasterfrom
dependabot/pip/docker/google-vision-api/chardet-7.4.1
Open

Bump chardet from 5.2.0 to 7.4.1 in /docker/google-vision-api#43734
dependabot[bot] wants to merge 1 commit intomasterfrom
dependabot/pip/docker/google-vision-api/chardet-7.4.1

Conversation

@dependabot
Copy link
Copy Markdown
Contributor

@dependabot dependabot bot commented on behalf of github Apr 8, 2026

Bumps chardet from 5.2.0 to 7.4.1.

Release notes

Sourced from chardet's releases.

7.4.1

Bug Fixes

  • BOM-prefixed UTF-16/32 input now returns utf-16/utf-32 instead of utf-16-le/utf-16-be/utf-32-le/utf-32-be. The endian-specific codecs don't strip the BOM on decode, so callers were getting a stray U+FEFF at the start of their text. BOM-less detection is unchanged. (#364, #365)

Full Changelog: chardet/chardet@7.4.0...7.4.1

chardet 7.4.0 brings accuracy up to 99.3% (from 98.6% in 7.3.0) and significantly faster cold start thanks to a new dense model format.

What's New

Performance:

  • New dense zlib-compressed model format (v2) drops cold start (import + first detect) from ~75ms to ~13ms with mypyc

Accuracy (98.6% → 99.3%):

  • Eliminated train/test data overlap via content fingerprinting
  • Added MADLAD-400 and Wikipedia as supplemental training sources
  • Improved non-ASCII bigram scoring: high-byte bigrams are now preserved during training and weighted by per-bigram IDF
  • Encoding-aware substitution filtering (substitutions only apply for characters the target encoding can't represent)
  • Increased training samples from 15K to 25K per language/encoding pair

Bug fixes:

  • Added dedicated structural analyzers for CP932, CP949, and Big5-HKSCS (these were previously sharing their base encoding's byte-range analyzer, missing extended ranges)

Metrics

chardet 7.4.0 (mypyc) chardet 6.0.0 charset-normalizer 3.4.6
Accuracy (2,517 files) 99.3% 88.2% 85.4%
Speed 551 files/s 12 files/s 376 files/s
Language detection 95.7% 40.0% 59.2%

Full changelog: https://chardet.readthedocs.io/en/latest/changelog.html

7.3.0

License

  • 0BSD license — the project license has been changed from MIT to 0BSD, a maximally permissive license with no attribution requirement. All prior 7.x releases should also be considered 0BSD licensed as of this release.

Features

  • Added mime_type field to detection results — identifies file types for both binary (via magic number matching) and text content. Returned in all detect(), detect_all(), and UniversalDetector results. (#350)
  • New pipeline/magic.py module detects 40+ binary file formats including images, audio/video, archives, documents, executables, and fonts. ZIP-based formats (XLSX, DOCX, JAR, APK, EPUB, wheel, OpenDocument) are distinguished by entry filenames. (#350)

Bug Fixes

  • Fixed incorrect equivalence between UTF-16-LE and UTF-16-BE in accuracy testing — these are distinct encodings with different byte order, not interchangeable

Performance

... (truncated)

Changelog

Sourced from chardet's changelog.

7.4.1 (2026-04-07)

Bug Fixes:

  • BOM-prefixed UTF-16 and UTF-32 input now reports utf-16 and utf-32 instead of the endian-specific variants. Python's utf-16-le/utf-16-be/utf-32-le/utf-32-be codecs keep the BOM as a U+FEFF in the decoded string, while utf-16/utf-32 strip it, so callers passing the detection result directly to .decode() were getting a stray BOM at the start of their text. BOM-less UTF-16/32 detection (via null-byte patterns) is unchanged and still returns the endian-specific name. (Dan Blanchard <https://github.com/dan-blanchard>_ via Claude, [#364](https://github.com/chardet/chardet/issues/364) <https://github.com/chardet/chardet/issues/364>, [#365](https://github.com/chardet/chardet/issues/365) <https://github.com/chardet/chardet/pull/365>)

7.4.0 (2026-03-26)

Performance:

  • Switched to dense zlib-compressed model format (v2): models are now stored as contiguous memoryview slices of a single decompressed blob, eliminating per-model struct.unpack overhead. Cold start (import + first detect) dropped from ~75ms to ~13ms with mypyc. (Dan Blanchard <https://github.com/dan-blanchard>_ via Claude, [#354](https://github.com/chardet/chardet/issues/354) <https://github.com/chardet/chardet/pull/354>_)

Accuracy:

  • Accuracy improved from 98.6% to 99.3% (2499/2517 files) through a combination of training and scoring improvements:

    • Eliminated train/test data overlap by content-fingerprinting test suite articles and excluding them from training data ([#351](https://github.com/chardet/chardet/issues/351) <https://github.com/chardet/chardet/pull/351>_)
    • Added MADLAD-400 and Wikipedia as supplemental training sources to fill gaps left by exclusion filtering ([#351](https://github.com/chardet/chardet/issues/351) <https://github.com/chardet/chardet/pull/351>_)
    • Improved non-ASCII bigram scoring: high-byte bigrams are now preserved during training (instead of being crushed by global normalization), and weighted by per-bigram IDF so encoding-specific byte patterns contribute proportionally to how discriminative they are ([#352](https://github.com/chardet/chardet/issues/352) <https://github.com/chardet/chardet/pull/352>_)
    • Added encoding-aware substitution filtering: character substitutions during training now only apply for characters the target encoding cannot represent
    • Increased training samples from 15K to 25K per language/encoding pair (Dan Blanchard <https://github.com/dan-blanchard>_ via Claude)

... (truncated)

Commits
  • d9ae78d docs: changelog for 7.4.1
  • 2a54c68 Return utf-16/utf-32 (not -le/-be) when a BOM is present (#365)
  • c63c632 Address GitHub code quality findings and add missing test coverage
  • 1ad8e6a Revert "Add PyInstaller hook to collect mypyc shared runtime library (#359)" ...
  • 7fb0563 Add PyInstaller hook to collect mypyc shared runtime library (#359)
  • 2d75e6d Link to blogpost in README
  • e37cf3c fix: prevent dirty-tree version in Windows mypyc wheel builds
  • f9f5af2 Fix a couple errors in the changelog
  • 53755de chore: add .superpowers/ to .gitignore
  • 3a20df6 docs: update README examples with correct outputs
  • Additional commits viewable in compare view

@dependabot dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update Python code labels Apr 8, 2026
@xsoar-bot
Copy link
Copy Markdown

Docker Image Ready - Dev

Docker automatic build has deployed your docker image: devdemisto/google-vision-api:1.0.0.8138304
It is available now on docker hub at: https://hub.docker.com/r/devdemisto/google-vision-api/tags
Get started by pulling the image:

docker pull devdemisto/google-vision-api:1.0.0.8138304

Docker Metadata

  • Image Size: 115.89 MB
  • Image ID: sha256:af5ee3fb9ef0ddd56ac381a3f98d69460a32b691e9470aca0ea964435d655698
  • Created: 2026-04-08T14:24:27.302050041Z
  • Arch: linux/amd64
  • Command: ["python3"]
  • Environment:
    • PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
    • LANG=C.UTF-8
    • GPG_KEY=7169605F62C751356D054A26A821E680E5FA6305
    • PYTHON_VERSION=3.12.12
    • PYTHON_SHA256=fb85a13414b028c49ba18bbd523c2d055a30b56b18b92ce454ea2c51edc656c4
    • DOCKER_IMAGE=devdemisto/google-vision-api:1.0.0.8138304
  • Labels:
    • org.opencontainers.image.authors:Demisto <containers@demisto.com>
    • org.opencontainers.image.revision:8eedd0cfc1c723119806f7f74708b6ba1a4b1854
    • org.opencontainers.image.version:1.0.0.8138304

Bumps [chardet](https://github.com/chardet/chardet) from 5.2.0 to 7.4.1.
- [Release notes](https://github.com/chardet/chardet/releases)
- [Changelog](https://github.com/chardet/chardet/blob/main/docs/changelog.rst)
- [Commits](chardet/chardet@5.2.0...7.4.1)

---
updated-dependencies:
- dependency-name: chardet
  dependency-version: 7.4.1
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot bot force-pushed the dependabot/pip/docker/google-vision-api/chardet-7.4.1 branch from 8eedd0c to 493a3c0 Compare April 8, 2026 23:12
@xsoar-bot
Copy link
Copy Markdown

Docker Image Ready - Dev

Docker automatic build has deployed your docker image: devdemisto/google-vision-api:1.0.0.8142274
It is available now on docker hub at: https://hub.docker.com/r/devdemisto/google-vision-api/tags
Get started by pulling the image:

docker pull devdemisto/google-vision-api:1.0.0.8142274

Docker Metadata

  • Image Size: 115.90 MB
  • Image ID: sha256:5f0b0a6e259bf169f9201dab69c99e4172b3fcadf76e0a3b6aa93735fc3f6058
  • Created: 2026-04-08T23:17:57.493749985Z
  • Arch: linux/amd64
  • Command: ["python3"]
  • Environment:
    • PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
    • LANG=C.UTF-8
    • GPG_KEY=7169605F62C751356D054A26A821E680E5FA6305
    • PYTHON_VERSION=3.12.12
    • PYTHON_SHA256=fb85a13414b028c49ba18bbd523c2d055a30b56b18b92ce454ea2c51edc656c4
    • DOCKER_IMAGE=devdemisto/google-vision-api:1.0.0.8142274
  • Labels:
    • org.opencontainers.image.authors:Demisto <containers@demisto.com>
    • org.opencontainers.image.revision:493a3c09a4cdfdd18706d732b722847d954f8292
    • org.opencontainers.image.version:1.0.0.8142274

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python Pull requests that update Python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant