Skip to content

Make string view and char handling platform-aware (introduce ZAC_CHAR)#8

Closed
WJiangzhi wants to merge 1 commit into
masterfrom
codex/check-signed-vs-unsigned-char-compatibility-rty0rw
Closed

Make string view and char handling platform-aware (introduce ZAC_CHAR)#8
WJiangzhi wants to merge 1 commit into
masterfrom
codex/check-signed-vs-unsigned-char-compatibility-rty0rw

Conversation

@WJiangzhi
Copy link
Copy Markdown
Member

Motivation

  • Fix incorrect assumptions about char signedness across platforms (Android/ARM) and eliminate UB from mixing char and unsigned char when indexing trie and appending slices to std::string.
  • Provide a portable ZAC_SV abstraction so library APIs work correctly with both char and unsigned char storage across toolchains.

Description

  • Add a platform-aware alias ZAC_CHAR that is unsigned char on __ANDROID__/__arm__/__aarch64__ and char otherwise.
  • Replace usages of raw char in the string-view wrapper with ZAC_SV based on std::basic_string_view<ZAC_CHAR> for C++17 and a compatible custom ZAC_SV for pre-C++17, adding constructors that accept std::string and const char* with appropriate reinterpret_casts.
  • Update iteration and indexing to iterate for (const ZAC_CHAR i : from) and cast to unsigned char when computing trie indices, ensuring correct byte values are used for 0..255 indexing.
  • Use reinterpret_cast<const char*> when constructing std::string outputs and when calling result.append(...) from ZAC_SV::data() to safely convert the library's internal view bytes into std::string bytes.
  • Apply the above changes consistently in both the multi-file implementation (ZACLib.cpp/ZACLib_Types.hpp) and the single-header variant (ZACLib_single.hpp).

Testing

  • Performed a local build of the project using cmake --build . which completed successfully on the tested x86_64 toolchain.
  • No automated unit tests were added or modified in this change. Existing test suites were not altered by this PR.

Codex Task

@WJiangzhi WJiangzhi added the codex Created by Codex AI label Mar 3, 2026 — with ChatGPT Codex Connector
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the library's character and string view handling by making it platform-aware. It addresses potential issues with char signedness, particularly on Android/ARM, by introducing a new ZAC_CHAR type and adapting the ZAC_SV string view wrapper. These changes eliminate undefined behavior and ensure consistent, portable string processing across various toolchains, enhancing the overall robustness of the library.

Highlights

  • Platform-Aware Character Type: Introduced a new type alias, ZAC_CHAR, which is defined as unsigned char on Android/ARM architectures and char otherwise, to correctly handle character signedness across different platforms.
  • Refactored String View (ZAC_SV): Modified the ZAC_SV string view wrapper to use ZAC_CHAR as its underlying character type, ensuring consistent and portable string view behavior.
  • Safe Type Conversions: Implemented explicit reinterpret_cast operations when converting between std::string and ZAC_SV, and when appending ZAC_SV data to std::string, to safely bridge ZAC_CHAR based internal views with char based standard library functions.
  • Consistent Character Iteration and Indexing: Updated character iteration loops and trie indexing logic to consistently use ZAC_CHAR and static_cast<unsigned char> for correct byte value interpretation, preventing undefined behavior related to char signedness.
Changelog
  • ZACLib/ZACLib.cpp
    • Updated character iteration loops to use ZAC_CHAR instead of char.
    • Applied reinterpret_cast<const char*> when calling outputs.emplace_back and result.append with ZAC_SV data.
    • Modified character indexing to explicitly cast to unsigned char for trie lookups.
  • ZACLib/ZACLib_Types.hpp
    • Defined ZAC_CHAR as a platform-dependent character type (unsigned char for Android/ARM, char otherwise).
    • Refactored ZAC_SV to be a std::basic_string_view<ZAC_CHAR> for C++17 or a custom class using ZAC_CHAR for older standards.
    • Added platform-specific constructors to ZAC_SV to handle std::string and const char* with appropriate reinterpret_cast for compatibility.
    • Adjusted ZAC_SV's internal data pointer, iterators, and operator[] to consistently use ZAC_CHAR.
  • ZACLib/ZACLib_single.hpp
    • Incorporated the ZAC_CHAR definition and ZAC_SV refactoring, mirroring changes in ZACLib_Types.hpp.
    • Applied updates to character iteration, reinterpret_cast for appends, and explicit unsigned char casting for indexing, consistent with changes in ZACLib.cpp.
Activity
  • No human activity (comments, reviews, or progress updates) has been recorded for this pull request since its creation.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces platform-aware character handling by defining ZAC_CHAR and updating the ZAC_SV string view wrapper. The changes correctly address potential issues with char signedness across different platforms, which is a solid improvement. My review focuses on improving code consistency and maintainability. I've suggested adding noexcept to constructors where appropriate, using s.data() consistently, and simplifying preprocessor logic to reduce repetition. These changes are recommended for both ZACLib_Types.hpp and the single-header variant ZACLib_single.hpp.

Comment thread ZACLib/ZACLib_Types.hpp
Comment on lines +37 to +45
#if defined(__ANDROID__) || defined(__arm__) || defined(__aarch64__)
ZAC_SV(const char* s, const size_t n)
: std::basic_string_view<ZAC_CHAR>(reinterpret_cast<const ZAC_CHAR*>(s), n) {}

ZAC_SV(const char* s)
: std::basic_string_view<ZAC_CHAR>(
reinterpret_cast<const ZAC_CHAR*>(s),
s ? std::char_traits<char>::length(s) : 0) {}
#endif
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The preprocessor condition defined(__ANDROID__) || defined(__arm__) || defined(__aarch64__) is repeated multiple times in this file. To improve maintainability and avoid potential inconsistencies, consider defining a single macro for this check where ZAC_CHAR is defined, and reusing it.

For example:

// At file scope
#if defined(__ANDROID__) || defined(__arm__) || defined(__aarch64__)
#define ZAC_CHAR_IS_UNSIGNED 1
#else
#define ZAC_CHAR_IS_UNSIGNED 0
#endif

// ... then later
#if ZAC_CHAR_IS_UNSIGNED
// ...
#endif

Comment thread ZACLib/ZACLib_Types.hpp
Comment on lines +64 to 65
ZAC_SV(const std::string& s) : m_data(reinterpret_cast<const ZAC_CHAR*>(s.c_str())),
m_size(s.size()) {} // 模仿std::string_view,不禁止隐式构造
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency with std::string_view and the C++17 version of ZAC_SV, several constructors in the pre-C++17 implementation should be marked noexcept as they don't perform any operations that can throw. This includes this constructor, and the ones at lines 55 and 59.

Additionally, for consistency with the C++17 version and modern C++ practice, consider using s.data() instead of s.c_str() in this constructor.

        ZAC_SV(const std::string& s) noexcept : m_data(reinterpret_cast<const ZAC_CHAR*>(s.data())),
                                       m_size(s.size()) {} // 模仿std::string_view,不禁止隐式构造

Comment thread ZACLib/ZACLib_single.hpp
Comment on lines +52 to +60
#if defined(__ANDROID__) || defined(__arm__) || defined(__aarch64__)
ZAC_SV(const char* s, const size_t n)
: std::basic_string_view<ZAC_CHAR>(reinterpret_cast<const ZAC_CHAR*>(s), n) {}

ZAC_SV(const char* s)
: std::basic_string_view<ZAC_CHAR>(
reinterpret_cast<const ZAC_CHAR*>(s),
s ? std::char_traits<char>::length(s) : 0) {}
#endif
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The preprocessor condition defined(__ANDROID__) || defined(__arm__) || defined(__aarch64__) is repeated multiple times in this file. To improve maintainability and avoid potential inconsistencies, consider defining a single macro for this check where ZAC_CHAR is defined, and reusing it.

For example:

// At file scope
#if defined(__ANDROID__) || defined(__arm__) || defined(__aarch64__)
#define ZAC_CHAR_IS_UNSIGNED 1
#else
#define ZAC_CHAR_IS_UNSIGNED 0
#endif

// ... then later
#if ZAC_CHAR_IS_UNSIGNED
// ...
#endif

Comment thread ZACLib/ZACLib_single.hpp
Comment on lines +80 to 81
ZAC_SV(const std::string& s) : m_data(reinterpret_cast<const ZAC_CHAR*>(s.c_str())),
m_size(s.size()) {} // 模仿std::string_view,不禁止隐式构造
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency with std::string_view and the C++17 version of ZAC_SV, several constructors in the pre-C++17 implementation should be marked noexcept as they don't perform any operations that can throw. This includes this constructor, and the ones at lines 71 and 75.

Additionally, for consistency with the C++17 version and modern C++ practice, consider using s.data() instead of s.c_str() in this constructor.

        ZAC_SV(const std::string& s) noexcept : m_data(reinterpret_cast<const ZAC_CHAR*>(s.data())),
                                       m_size(s.size()) {} // 模仿std::string_view,不禁止隐式构造

@WJiangzhi WJiangzhi closed this Mar 6, 2026
@WJiangzhi WJiangzhi deleted the codex/check-signed-vs-unsigned-char-compatibility-rty0rw branch March 6, 2026 04:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

codex Created by Codex AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant