feat(factory): put few known_issues cards into factory and add log#21
Open
Dong1017 wants to merge 1 commit intovigo999:refactor-arch-4from
Open
feat(factory): put few known_issues cards into factory and add log#21Dong1017 wants to merge 1 commit intovigo999:refactor-arch-4from
Dong1017 wants to merge 1 commit intovigo999:refactor-arch-4from
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Expand
incubating/factory/cards/known_issueswith a first batch of high-frequency runtime failure cards so Factory can route common Ascend/GPU failures before deeper reasoning.What Changed
Added these
known_issuecards:missing-cann-environmentdevice-out-of-memorydistributed-communication-timeoutms-context-emptyms-tbe-operator-compilation-errorstack-version-mismatchAlso updated the Factory manifest inventory in
incubating/factory/manifests/pack.yamlso the new cards are included in the stable pack.Why
The existing
known_issuesset covered only a small slice of failure signatures. In practice, many first-line diagnosis requests cluster around:These are strong candidates for Factory because they are:
Card Design Notes
Each card is kept in the current
known_issueschema shape and includes:idsymptom: failureThe intent is to improve triage and reuse, not to encode full repair playbooks.
Impact
This increases the
known_issuesinventory in the pack and gives Factory better coverage for bootstrap/runtime failures that appear before model- or operator-specific analysis.Expected user-facing effect:
Validation
known_issuecard shape.incubating/factory/manifests/pack.yamlincludes the new card entries and updatedknown_issuescount.