Skip to content

Unify ZAC_CHAR handling for cross-platform signedness differences#7

Closed
WJiangzhi wants to merge 1 commit into
masterfrom
codex/check-signed-vs-unsigned-char-compatibility
Closed

Unify ZAC_CHAR handling for cross-platform signedness differences#7
WJiangzhi wants to merge 1 commit into
masterfrom
codex/check-signed-vs-unsigned-char-compatibility

Conversation

@WJiangzhi
Copy link
Copy Markdown
Member

Motivation

  • Different platforms treat char as signed or unsigned which can cause mismatches or incorrect indexing when treating bytes as array indices, leading to crashes or incorrect behavior in trie/state-machine code.
  • The library heavily indexes trie tables using byte values (0..255), so a consistent unsigned-byte view is required to avoid signedness bugs across PC and mobile.
  • The single-header distribution must remain consistent with the main headers/sources to avoid divergence when consumers include the header-only variant.

Description

  • Introduced a platform-aware alias ZAC_CHAR (unsigned char on Android/ARM/AArch64, char elsewhere) and made ZAC_SV a std::basic_string_view<ZAC_CHAR> for C++17 or a compatible fallback class using ZAC_CHAR storage.
  • Updated internal loops to iterate over ZAC_CHAR (e.g. for (const ZAC_CHAR i : from)) and consistently cast characters to unsigned char for trie indexing (static_cast<unsigned char>(...)).
  • Added explicit reinterpret_cast<const char*> conversions when interacting with std::string APIs (e.g. std::string::append, outputs.emplace_back) to bridge the ZAC_CHAR* representation to char*-based APIs safely.
  • Applied the same changes to the single-header file ZACLib_single.hpp to keep the header-only distribution in sync with the library implementation.

Testing

  • Ran the build: cmake -S /workspace/ZACLib -B /workspace/ZACLib/build && cmake --build /workspace/ZACLib/build, which completed successfully (library and example built).

Codex Task

@WJiangzhi WJiangzhi added the codex Created by Codex AI label Mar 3, 2026 — with ChatGPT Codex Connector
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical cross-platform compatibility issue where the signedness of the char type can vary, leading to incorrect behavior, particularly in trie-based indexing. By introducing a platform-aware ZAC_CHAR type and adapting string views and internal logic to consistently use unsigned character values for indexing, the changes ensure robust and predictable behavior across diverse environments like PC and mobile. The modifications also include necessary explicit casts to safely integrate with standard library string functions, thereby enhancing the library's portability and reliability.

Highlights

  • Platform-aware character type: Introduced a new type alias, ZAC_CHAR, which resolves to unsigned char on Android/ARM/AArch64 platforms and char elsewhere, to standardize character handling across different environments.
  • String view adaptation: Modified ZAC_SV (string_view equivalent) to use ZAC_CHAR as its underlying character type, ensuring consistency with the new platform-aware character definition.
  • Consistent trie indexing: Updated internal loops to iterate using ZAC_CHAR and explicitly cast characters to unsigned char when used for trie indexing, preventing signedness-related bugs.
  • Safe API bridging: Added explicit reinterpret_cast<const char*> conversions when interacting with std::string APIs to safely bridge between ZAC_CHAR* and char* representations.
  • Single-header synchronization: Applied all the aforementioned changes to ZACLib_single.hpp to maintain consistency between the main library and its header-only distribution.
Changelog
  • ZACLib/ZACLib.cpp
    • Updated character iteration loops to use ZAC_CHAR.
    • Applied reinterpret_cast<const char*> for outputs.emplace_back calls.
    • Applied reinterpret_cast<const char*> for result.append calls.
    • Ensured explicit static_cast<unsigned char> for trie indexing.
  • ZACLib/ZACLib_Types.hpp
    • Defined ZAC_CHAR as a platform-dependent character type.
    • Modified ZAC_SV to use ZAC_CHAR for its internal data pointer and methods.
    • Added reinterpret_cast for std::string to ZAC_SV conversions and std::strlen usage within ZAC_SV.
  • ZACLib/ZACLib_single.hpp
    • Defined ZAC_CHAR as a platform-dependent character type.
    • Modified ZAC_SV to use ZAC_CHAR for its internal data pointer and methods.
    • Added reinterpret_cast for std::string to ZAC_SV conversions and std::strlen usage within ZAC_SV.
    • Updated character iteration loops to use ZAC_CHAR.
    • Applied reinterpret_cast<const char*> for outputs.emplace_back calls.
    • Applied reinterpret_cast<const char*> for result.append calls.
    • Ensured explicit static_cast<unsigned char> for trie indexing.
Activity
  • The build process was successfully run using cmake -S /workspace/ZACLib -B /workspace/ZACLib/build && cmake --build /workspace/ZACLib/build, confirming that the library and example built without issues.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses cross-platform character signedness issues by introducing a ZAC_CHAR type and updating the library to use it, ensuring correct behavior on platforms where char is unsigned. The use of static_cast for trie indexing and reinterpret_cast for API compatibility is appropriate. However, I've identified a critical issue in the C++17 implementation of ZAC_SV. Aliasing it directly to std::basic_string_view<ZAC_CHAR> breaks compilation on platforms like ARM where ZAC_CHAR is unsigned char, as it prevents construction from std::string or string literals. I have provided suggestions to resolve this by using a wrapper class, which will restore API compatibility.

Comment thread ZACLib/ZACLib_Types.hpp
namespace ZACLib {
#if __cplusplus >= 201703L
using ZAC_SV = std::string_view;
using ZAC_SV = std::basic_string_view<ZAC_CHAR>;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

On platforms where ZAC_CHAR is unsigned char (e.g., ARM/Android), aliasing ZAC_SV directly to std::basic_string_view<ZAC_CHAR> breaks API compatibility for C++17 and later. Code that passes string literals or std::string objects to functions expecting a ZAC_SV will fail to compile because there's no implicit conversion from std::string (which uses char) to std::basic_string_view<unsigned char>.

To fix this while retaining the benefits of std::string_view, ZAC_SV can be a wrapper class that inherits from std::basic_string_view<ZAC_CHAR> and provides the necessary constructor from std::string.

class ZAC_SV : public std::basic_string_view<ZAC_CHAR> {
    public:
        using std::basic_string_view<ZAC_CHAR>::basic_string_view;
        ZAC_SV() = default;
        ZAC_SV(const std::string& s) noexcept
            : std::basic_string_view<ZAC_CHAR>(reinterpret_cast<const ZAC_CHAR*>(s.data()), s.size())
        {}
    };

Comment thread ZACLib/ZACLib_single.hpp
namespace ZACLib {
#if __cplusplus >= 201703L
using ZAC_SV = std::string_view;
using ZAC_SV = std::basic_string_view<ZAC_CHAR>;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

On platforms where ZAC_CHAR is unsigned char (e.g., ARM/Android), aliasing ZAC_SV directly to std::basic_string_view<ZAC_CHAR> breaks API compatibility for C++17 and later. Code that passes string literals or std::string objects to functions expecting a ZAC_SV will fail to compile because there's no implicit conversion from std::string (which uses char) to std::basic_string_view<unsigned char>.

To fix this while retaining the benefits of std::string_view, ZAC_SV can be a wrapper class that inherits from std::basic_string_view<ZAC_CHAR> and provides the necessary constructor from std::string.

class ZAC_SV : public std::basic_string_view<ZAC_CHAR> {
    public:
        using std::basic_string_view<ZAC_CHAR>::basic_string_view;
        ZAC_SV() = default;
        ZAC_SV(const std::string& s) noexcept
            : std::basic_string_view<ZAC_CHAR>(reinterpret_cast<const ZAC_CHAR*>(s.data()), s.size())
        {}
    };

@WJiangzhi WJiangzhi closed this Mar 6, 2026
@WJiangzhi WJiangzhi deleted the codex/check-signed-vs-unsigned-char-compatibility branch March 6, 2026 04:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

codex Created by Codex AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant