Update and Clean Up Project Files#11
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly improves the project's build system flexibility and cross-platform compatibility. It introduces dedicated build configurations for Android NDK, Make, and Visual Studio, while simultaneously removing platform-specific examples that hindered portability. A major refactoring of the Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a number of valuable updates, including improved build system support for NDK, Make, and Visual Studio, and refactoring to handle character types more robustly. However, a security vulnerability was identified in the Replace::AddRule function due to an explicit narrowing cast from size_t to int. This cast can lead to an integer overflow if a large number of rules are added, potentially resulting in out-of-bounds memory access. Additionally, other critical and high-severity issues include an incorrect Makefile configuration attempting to build an executable instead of a static library, and further instances of incorrect casting between char* and unsigned char* which affects portability. Detailed comments and suggestions have been provided for each issue.
| $(TARGET): $(OBJ) | ||
| $(CXX) $(OBJ) -o $(TARGET) |
There was a problem hiding this comment.
| const auto c = static_cast<unsigned char>(i); | ||
| if (trie[node].next[c] == -1) { | ||
| trie[node].next[c] = trie.size(); // NOLINT(*-narrowing-conversions) | ||
| trie[node].next[c] = static_cast<Node::value_type>(trie.size()); |
There was a problem hiding this comment.
| trie[node].output_id = static_cast<Node::value_type>(outputs.size()); | ||
| trie[node].match_len = from.size(); | ||
| outputs.emplace_back(to.data(), to.size()); | ||
| outputs.emplace_back(ArmCastChar(to.data()), to.size()); |
There was a problem hiding this comment.
The ArmCastChar macro is being used incorrectly here. outputs is a std::vector<std::string>, and std::string's constructor expects a const char*, not a const ZAC_CHAR* (which can be const unsigned char*). This can lead to non-portable behavior. You should reinterpret_cast to const char* when constructing the std::string.
| outputs.emplace_back(ArmCastChar(to.data()), to.size()); | |
| outputs.emplace_back(reinterpret_cast<const char*>(to.data()), to.size()); |
| if (input.empty()) return result; | ||
| if (max_rule_len == 0) { | ||
| result.append(input.data(), input.size()); | ||
| result.append(ArmCastChar(input.data()), input.size()); |
There was a problem hiding this comment.
The ArmCastChar macro is used incorrectly here. std::string::append expects a const char*, but ArmCastChar(input.data()) may resolve to const unsigned char*, which is not portable. You should reinterpret_cast to const char*.
| result.append(ArmCastChar(input.data()), input.size()); | |
| result.append(reinterpret_cast<const char*>(input.data()), input.size()); |
| } | ||
|
|
||
| result.append(input.data() + last_pos, cursor - last_pos); | ||
| result.append(ArmCastChar(input.data() + last_pos), cursor - last_pos); |
There was a problem hiding this comment.
The ArmCastChar macro is used incorrectly here. std::string::append expects a const char*, but ArmCastChar(input.data() + last_pos) may resolve to const unsigned char*, which is not portable. You should reinterpret_cast to const char*.
| result.append(ArmCastChar(input.data() + last_pos), cursor - last_pos); | |
| result.append(reinterpret_cast<const char*>(input.data() + last_pos), cursor - last_pos); |
| trie[node].output_id = outputs.size(); | ||
| trie[node].match_len = from.size(); | ||
| outputs.emplace_back(from.data(), from.size()); | ||
| outputs.emplace_back(ArmCastChar(from.data()), from.size()); |
There was a problem hiding this comment.
The ArmCastChar macro is being used incorrectly here. outputs is a std::vector<std::string>, and std::string's constructor expects a const char*, not a const ZAC_CHAR* (which can be const unsigned char*). This can lead to non-portable behavior. You should reinterpret_cast to const char* when constructing the std::string.
| outputs.emplace_back(ArmCastChar(from.data()), from.size()); | |
| outputs.emplace_back(reinterpret_cast<const char*>(from.data()), from.size()); |
| for (const unsigned char c : from) { | ||
| if (trie[node].next[c] == -1) { | ||
| trie[node].next[c] = trie.size(); // NOLINT(*-narrowing-conversions) | ||
| trie[node].next[c] = static_cast<Node::value_type>(trie.size()); |
There was a problem hiding this comment.
| ZAC_SV(const std::string& s) : m_data(s.c_str()), | ||
| m_size(s.size()) {} // 模仿std::string_view,不禁止隐式构造 | ||
| ZAC_SV(const ZAC_CHAR* d) : m_data(d), | ||
| m_size(d ? std::strlen(ArmCastChar(d)) : 0) {} |
There was a problem hiding this comment.
The strlen function expects a const char* argument, but you are passing it ArmCastChar(d). When CHAR_UNSIGNED is defined, d is a const unsigned char* and ArmCastChar is a no-op, resulting in non-portable code. You should cast d to const char* before passing it to strlen.
m_size(d ? std::strlen(reinterpret_cast<const char*>(d)) : 0) {}|
|
||
| if (from.size() > trie[node].match_len) { | ||
| trie[node].output_id = outputs.size(); // NOLINT(*-narrowing-conversions) | ||
| trie[node].output_id = static_cast<Node::value_type>(outputs.size()); |
There was a problem hiding this comment.
In Replace::AddRule (ZACLib/ZACLib.cpp), a critical security vulnerability exists due to an explicit narrowing cast from outputs.size() (type std::size_t) to Node::value_type (which is int) when assigning to trie[node].output_id. This cast can lead to an integer overflow if the number of rules exceeds INT_MAX, potentially causing out-of-bounds memory access when indexing the outputs vector later in Replace::Do. This is inconsistent with other parts of the code like Search::AddRule where no such cast is performed. Removing this unnecessary cast will ensure correctness and safety.
| trie[node].output_id = static_cast<Node::value_type>(outputs.size()); | |
| trie[node].output_id = outputs.size(); |
|
|
||
| SRC := ZACLib.cpp | ||
| OBJ := $(SRC:.cpp=.o) | ||
| TARGET := ZACLib |
…in-casting Fix Makefile static library target and harden trie/index & char conversions
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a number of valuable updates and cleanups. It enhances build system support for NDK, Make, and Visual Studio, removes non-cross-platform examples, and deprecates the single-header version. Key improvements include fixing incorrect unsigned char handling for better cross-platform compatibility and restoring C++17 std::string compatibility for ZAC_SV. I have one suggestion regarding the Visual Studio project file for improved robustness.
| <ItemGroup> | ||
| <ClInclude Include=".\**\*.h*" /> | ||
| </ItemGroup> |
There was a problem hiding this comment.
unsigned charjudgmentstd::stringcompatibilityFixes #9