Skip to content

Add built flag and fallback path for unbuilt tries in Search/Has#6

Closed
WJiangzhi wants to merge 2 commits into
masterfrom
codex/ensure-search-and-has-operate-like-replace
Closed

Add built flag and fallback path for unbuilt tries in Search/Has#6
WJiangzhi wants to merge 2 commits into
masterfrom
codex/ensure-search-and-has-operate-like-replace

Conversation

@WJiangzhi
Copy link
Copy Markdown
Member

Motivation

  • Ensure Search and Has can be used safely without calling Build() by providing a fallback runtime path when the trie hasn't been compacted (i.e. next entries may be -1).

Description

  • Add a bool built member to Search and Has and mark it false on AddRule() and true in Build() to track whether the trie was built.
  • Change Do() implementations to branch on built so that when not built they follow fail links and handle -1 next entries, and when built they use the fast direct next lookup.
  • Modify trie initialization in the single-header variant to construct the root via the constructor (emplace_back) instead of an inline initializer to keep initialization consistent with the new built handling.

Testing

  • No automated tests were executed for this change.

Codex Task

@WJiangzhi WJiangzhi added the codex Created by Codex AI label Mar 3, 2026 — with ChatGPT Codex Connector
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request improves the robustness of the Search and Has trie implementations by introducing a state management mechanism. It ensures that these data structures can operate correctly and safely, even if the Build() method has not been explicitly invoked, by providing an appropriate runtime path for uncompacted tries. This change prevents potential issues with -1 next entries and enhances the overall reliability of the trie operations.

Highlights

  • Introduction of built flag: A new boolean member built has been added to both Search and Has classes to track whether the trie has been compacted by calling Build().
  • Conditional logic in AddRule() and Build(): The AddRule() method now sets built to false, indicating that the trie structure has changed and is no longer in a 'built' state. Conversely, the Build() method sets built to true upon completion.
  • Runtime fallback for unbuilt tries: The Do() implementations in both Search and Has now include conditional logic. If the trie is not built, they use a fallback path that correctly handles -1 next entries and follows fail links. If built is true, they use the optimized direct next lookup.
  • Consistent trie initialization: In the single-header variant, the trie root is now constructed via emplace_back instead of an inline initializer, ensuring consistency with the new built flag handling.
Changelog
  • ZACLib/ZACLib.cpp
    • Added built = false to Search::AddRule and Has::AddRule methods.
    • Added built = true to Search::Build and Has::Build methods.
    • Implemented conditional logic in Search::Do and Has::Do to use a fallback path when the trie is not built.
  • ZACLib/ZACLib.hpp
    • Introduced a bool built = false; member to the Search and Has class definitions.
  • ZACLib/ZACLib_single.hpp
    • Modified the initialization of the trie vector to remove direct Node{} initialization.
    • Added a bool built = false; member to the Search and Has class definitions.
    • Inserted built = false in Search::AddRule and Has::AddRule methods.
    • Inserted built = true in Search::Build and Has::Build methods.
    • Updated Search::Do and Has::Do methods to include conditional logic based on the built flag.
Activity
  • No human activity has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a built flag to the Search and Has classes to allow them to be used without an explicit call to Build(). However, a critical security vulnerability exists where the built flag is set at the start of the Build() method; if an exception occurs during the build, the trie can be left in an incomplete state while the flag is true, potentially leading to out-of-bounds memory access in Do() methods. Additionally, the fallback logic for un-built tries in Search::Do and Has::Do is flawed, as it incorrectly uses uninitialized fail links, resulting in an incorrect Aho-Corasick search. The absence of automated tests for this functionality is also a concern.

Comment thread ZACLib/ZACLib.cpp Outdated
Comment on lines 228 to 239
if (!built) {
while (state != 0 && trie[state].next[c] == -1) {
state = trie[state].fail;
}
if (trie[state].next[c] != -1) {
state = trie[state].next[c];
} else {
state = 0;
}
} else {
state = 0;
state = trie[state].next[c];
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The fallback logic for when the trie is not built (!built) is incorrect. It relies on fail links (trie[state].fail) to traverse the automaton, but these links are only computed within the Build() method. Before Build() is called, all fail links are 0 (as initialized by the Node constructor).

This means the !built path behaves as follows:

while (state != 0 && trie[state].next[c] == -1) {
    state = 0; // Since trie[state].fail is 0
}
if (trie[state].next[c] != -1) {
    state = trie[state].next[c];
} else {
    state = 0;
}

This degenerates the Aho-Corasick search into a simple trie search that resets to the root on a mismatch. This will fail to find many patterns that an Aho-Corasick automaton is designed to find (e.g., overlapping patterns).

For example, with patterns "ab" and "bc", and input "abc", it will find "ab" but miss "bc".

This is a critical correctness issue. The fallback path does not function as a proper Aho-Corasick search. The same issue exists in Has::Do.

Comment thread ZACLib/ZACLib.cpp Outdated
Comment on lines +301 to +312
if (!built) {
while (state != 0 && trie[state].next[c] == -1) {
state = trie[state].fail;
}
if (trie[state].next[c] != -1) {
state = trie[state].next[c];
} else {
state = 0;
}
} else {
state = trie[state].next[c];
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Similar to the issue in Search::Do, the fallback logic for !built is incorrect because it relies on fail links that have not been computed. This leads to incorrect search behavior and missed matches. Please see the comment on Search::Do for a detailed explanation.

Comment thread ZACLib/ZACLib_single.hpp Outdated
Comment on lines 311 to 322
if (!built) {
while (state != 0 && trie[state].next[c] == -1) {
state = trie[state].fail;
}
if (trie[state].next[c] != -1) {
state = trie[state].next[c];
} else {
state = 0;
}
} else {
state = 0;
state = trie[state].next[c];
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The fallback logic for when the trie is not built (!built) is incorrect. It relies on fail links (trie[state].fail) which are not computed until Build() is called. Before Build(), fail links are 0, causing the search to behave like a simple trie search and miss matches. This is a critical correctness issue. For a more detailed explanation, please refer to the comment on the same issue in ZACLib/ZACLib.cpp.

Comment thread ZACLib/ZACLib_single.hpp Outdated
Comment on lines +392 to +403
if (!built) {
while (state != 0 && trie[state].next[c] == -1) {
state = trie[state].fail;
}
if (trie[state].next[c] != -1) {
state = trie[state].next[c];
} else {
state = 0;
}
} else {
state = trie[state].next[c];
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The fallback logic for !built is incorrect for the same reasons as in Search::Do. It uses uninitialized fail links, leading to incorrect search results.

Comment thread ZACLib/ZACLib.cpp Outdated
}

void Search::Build() {
built = true;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The built flag is set to true at the beginning of the Build() method. If an exception occurs during the execution of Build() (e.g., std::bad_alloc when allocating the queue or during trie traversal), the built flag will remain true even though the trie has not been fully constructed. Subsequent calls to Do() will then use the direct next lookup path, which assumes all next entries are valid indices. Since the trie is incomplete, some next entries may still be -1, leading to an out-of-bounds access when trie[state] is called with state = -1.

    void Search::Build() {
        std::queue<int> q;
        for (int c = 0; c < 256; ++c) {
            int nxt = trie[0].next[c];
            if (nxt != -1) {
                trie[nxt].fail = 0;
                q.push(nxt);
            } else trie[0].next[c] = 0;
        }

        while (!q.empty()) {
            const int u = q.front();
            q.pop();
            for (int c = 0; c < 256; ++c) {
                int v = trie[u].next[c];
                if (v != -1) {
                    trie[v].fail = trie[trie[u].fail].next[c];

                    int f = trie[v].fail;
                    if (trie[f].match_len > trie[v].match_len) {
                        trie[v].match_len = trie[f].match_len;
                        trie[v].output_id = trie[f].output_id;
                    }

                    q.push(v);
                } else {
                    trie[u].next[c] = trie[trie[u].fail].next[c];
                }
            }
        }
        built = true;
    }

Comment thread ZACLib/ZACLib.cpp Outdated
}

void Has::Build() {
built = true;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

Similar to the Search::Build implementation, the built flag is set to true before the trie is actually built. If the build process fails, the object is left in an inconsistent state where built is true but the trie contains -1 entries, leading to out-of-bounds access in Do().

    void Has::Build() {
        std::queue<int> q;
        for (int c = 0; c < 256; ++c) {
            int nxt = trie[0].next[c];
            if (nxt != -1) {
                trie[nxt].fail = 0;
                q.push(nxt);
            } else {
                trie[0].next[c] = 0;
            }
        }

        while (!q.empty()) {
            const int u = q.front(); q.pop();
            for (int c = 0; c < 256; ++c) {
                int v = trie[u].next[c];
                if (v != -1) {
                    trie[v].fail = trie[trie[u].fail].next[c];
                    q.push(v);
                } else {
                    trie[u].next[c] = trie[trie[u].fail].next[c];
                }
            }
        }
        built = true;
    }

Comment thread ZACLib/ZACLib_single.hpp Outdated
}

void Build() {
built = true;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The built flag is set to true at the beginning of the Build() method in the single-header variant as well. This should be moved to the end of the method to ensure the trie is fully built before the flag is set.

        void Build() {
            std::queue<int> q;
            for (int c = 0; c < 256; ++c) {
                int nxt = trie[0].next[c];
                if (nxt != -1) {
                    trie[nxt].fail = 0;
                    q.push(nxt);
                } else trie[0].next[c] = 0;
            }
            while (!q.empty()) {
                const int u = q.front();
                q.pop();
                for (int c = 0; c < 256; ++c) {
                    int v = trie[u].next[c];
                    if (v != -1) {
                        trie[v].fail = trie[trie[u].fail].next[c];
                        const int f = trie[v].fail;
                        if (trie[f].match_len > trie[v].match_len) {
                            trie[v].match_len = trie[f].match_len;
                            trie[v].output_id = trie[f].output_id;
                        }
                        q.push(v);
                    } else {
                        trie[u].next[c] = trie[trie[u].fail].next[c];
                    }
                }
            }
            built = true;
        }

Comment thread ZACLib/ZACLib_single.hpp Outdated
}

void Build() {
built = true;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The built flag is set to true at the beginning of the Build() method for the Has class in the single-header variant. This should be moved to the end of the method.

        void Build() {
            std::queue<int> q;
            for (int c = 0; c < 256; ++c) {
                int nxt = trie[0].next[c];
                if (nxt != -1) {
                    trie[nxt].fail = 0;
                    q.push(nxt);
                } else {
                    trie[0].next[c] = 0;
                }
            }

            while (!q.empty()) {
                const int u = q.front();
                q.pop();
                for (int c = 0; c < 256; ++c) {
                    int v = trie[u].next[c];
                    if (v != -1) {
                        trie[v].fail = trie[trie[u].fail].next[c];
                        q.push(v);
                    } else {
                        trie[u].next[c] = trie[trie[u].fail].next[c];
                    }
                }
            }
            built = true;
        }

@WJiangzhi
Copy link
Copy Markdown
Member Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a built flag to the Search and Has classes, enabling a fallback mechanism for Do() operations when the Aho-Corasick trie has not been explicitly built via Build(). The built flag is correctly managed, being set to false on AddRule() and true on Build(). The optimized Do() path for built tries is also correctly implemented. However, the fallback paths for unbuilt tries in both Search::Do and Has::Do implement a brute-force string matching algorithm. This approach is highly inefficient and deviates from the stated motivation of handling uncompacted trie states by leveraging the trie structure and fail links, instead bypassing the trie entirely.

Comment thread ZACLib/ZACLib.cpp
Comment on lines +227 to +245
if (!built) {
for (size_t i = 0; i < input.size(); ++i) {
size_t best_len = 0;
size_t best_rule = Node::kInvalidOutput;
for (size_t rule_id = 0; rule_id < outputs.size(); ++rule_id) {
const auto& rule = outputs[rule_id];
const size_t len = rule.size();
if (len == 0 || len > i + 1) continue;
if (std::memcmp(input.data() + i + 1 - len, rule.data(), len) == 0 && len > best_len) {
best_len = len;
best_rule = rule_id;
}
}

if (best_rule != Node::kInvalidOutput) {
result.push_back(Match{i + 1 - best_len, best_len, best_rule});
}
}
return result;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The fallback path for !built in Search::Do performs a brute-force string comparison for each rule against the input. This approach is highly inefficient, with a time complexity of O(N * M * L) (where N is input length, M is number of rules, L is max rule length). The original Aho-Corasick algorithm, even in an uncompacted state, would typically offer better performance by leveraging the trie structure and fail links. The current implementation completely bypasses the trie for unbuilt states, which contradicts the motivation of handling -1 next entries and following fail links in an uncompacted trie. Consider implementing a less optimized, but still trie-based, search for unbuilt tries, or explicitly documenting this performance characteristic.

Comment thread ZACLib/ZACLib.cpp
Comment on lines +312 to +320
if (!built) {
for (size_t i = 0; i < input.size(); ++i) {
for (const auto& rule : outputs) {
const size_t len = rule.size();
if (len == 0 || len > i + 1) continue;
if (std::memcmp(input.data() + i + 1 - len, rule.data(), len) == 0) return true;
}
}
return false;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The fallback path for !built in Has::Do also performs a brute-force string comparison for each rule. This is highly inefficient, with a time complexity of O(N * M * L) (where N is input length, M is number of rules, L is max rule length). This approach completely bypasses the trie structure and its potential, even in an unbuilt state, for more efficient searching. Consider implementing a trie-based search for unbuilt tries, or explicitly documenting this performance characteristic.

Comment thread ZACLib/ZACLib_single.hpp
Comment on lines +309 to +327
if (!built) {
for (size_t i = 0; i < input.size(); ++i) {
size_t best_len = 0;
size_t best_rule = Node::kInvalidOutput;
for (size_t rule_id = 0; rule_id < outputs.size(); ++rule_id) {
const auto& rule = outputs[rule_id];
const size_t len = rule.size();
if (len == 0 || len > i + 1) continue;
if (std::memcmp(input.data() + i + 1 - len, rule.data(), len) == 0 && len > best_len) {
best_len = len;
best_rule = rule_id;
}
}

if (best_rule != Node::kInvalidOutput) {
result.push_back(Match{i + 1 - best_len, best_len, best_rule});
}
}
return result;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The fallback path for !built in Search::Do (single-header version) performs a brute-force string comparison, which is highly inefficient (O(N * M * L)). This bypasses the trie structure entirely for unbuilt states, which contradicts the stated motivation of handling -1 next entries and following fail links in an uncompacted trie. Consider a trie-based search for unbuilt tries, or clearly document this performance characteristic.

Comment thread ZACLib/ZACLib_single.hpp
Comment on lines +402 to +410
if (!built) {
for (size_t i = 0; i < input.size(); ++i) {
for (const auto& rule : outputs) {
const size_t len = rule.size();
if (len == 0 || len > i + 1) continue;
if (std::memcmp(input.data() + i + 1 - len, rule.data(), len) == 0) return true;
}
}
return false;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The fallback path for !built in Has::Do (single-header version) performs a brute-force string comparison, which is highly inefficient (O(N * M * L)). This bypasses the trie structure entirely for unbuilt states, which contradicts the stated motivation of handling -1 next entries and following fail links in an uncompacted trie. Consider a trie-based search for unbuilt tries, or clearly document this performance characteristic.

@WJiangzhi WJiangzhi closed this Mar 6, 2026
@WJiangzhi WJiangzhi deleted the codex/ensure-search-and-has-operate-like-replace branch March 6, 2026 04:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

codex Created by Codex AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant