facebook · meta-codesync · May 23, 2026
@@ -0,0 +1,81 @@
+---
+name: fboss-code-standards
+description: FBOSS coding standards and patterns. Auto-loaded when writing code in fboss/ to catch architecture violations, SAI/SDK misuse, thrift_cow pitfalls, platform config errors, and testing gaps. For explicit multi-reviewer review, use /fboss-review instead.
+---
+
+# FBOSS Code Standards
+
+## Overview
+
+Passive coding guidance for FBOSS. Applied automatically while writing or modifying code under `fboss/`.
+
+## Scope
+
+Currently `fboss/` only. TODO: extend to `configerator/source/neteng/fboss`, `neteng/netcastle`, `neteng/fboss`.
+
+## Quick Checklist
+
+| Area | Pattern | Check |
+|------|---------|-------|
+| Agent | Mono/multi-switch duality | State changes must work in both modes |
+| Agent | Warmboot serialization | New SwitchState fields must serialize/deserialize |
+| SAI | SaiApiTable registration | New SAI attributes must be registered |
+| SAI | SaiStore consistency | SAI objects must be tracked, no orphans |
+| FSDB | State/Stats duality | Always handle both trees |
+| FSDB | extern template | New ThriftStructNode instantiations need extern template |
+| thrift_cow | COW modification | Use `modify()`, never mutate shared nodes |
+| Platform | JSON + Thrift sync | Config changes update both schemas |
+| Testing | Naming convention | Follow `AgentHw<Feature>Test` pattern |
+| Testing | NSDB impact | Core FSDB changes must run NSDB tests |
+| General | Follow local patterns | Be consistent with existing code in the directory |
+| Testing | No GTEST_SKIP | Use ProductionFeatures filtering, never `GTEST_SKIP()` |
+| Testing | DSF counters | Check reassembly errors on fabric ports, not discards |
+| Agent | Non-coalescing behavioral deltas | Mark delta-significant updates non-coalescing |
+| Agent | Rolled-out flag cleanup | Remove feature flags that are fully rolled out |
+| Agent | Early return on empty deltas | Check for empty deltas and return early |
+| Agent | No hardcoded ASIC types | Use feature/property lookups, not ASIC name checks |
+| SAI | Explicit cancellation | Distinguish reconnectable errors from shutdown signals |
+| General | No private fn defaults | Don't default parameters all callers override |
+| General | Explicit state flags | Pass all enable/disable flags, no direction assumptions |
+| General | Verify before deleting | Provide evidence files are unused before deleting |
+| General | CHECK over assert | Use CHECK for production invariants, not assert() |
+| General | emplace over operator[] | Use emplace/insert for map insertions |
+| General | ASIC feature gating | Gate HW-specific constraints with ASIC features |
+| General | Concise method naming | Don't include class name in method name |
+| General | Explicit operator precedence | Parenthesize compound boolean/comparison expressions |
+| General | Inline trivial comparisons | Don't wrap one-line checks in helper methods |
+| General | Fix lint before landing | Run `arc f` and `arc lint` before submitting |
+| Testing | Targeted unit tests | New modules need dedicated unit tests, not just HW tests |
+| Agent | Documentation-value methods | Keep per-ASIC limit methods as architecture docs |
+| General | Rate-limit verbose logging | Use XLOG_EVERY_MS, never log at line rate |
+| General | Perspective-aware naming | Use fromX/toX instead of ambiguous Rx/Tx |
+| General | Update OSS build files | Update CMake alongside BUCK when adding/moving files |
+| Testing | Deep comparison over size checks | Compare content, not just collection size |
+| Testing | Verify negative cases | Explicitly check absence, not just no-throw |
+| Testing | Test overflow and error paths | Cover resource exhaustion, not just happy path |
+| Testing | Optimize test execution time | Use smallest packet sizes, avoid unnecessary waits |
+| Agent | ODS counters for debugging | Add ODS counters for drop/error events, not just logs |
+| Agent | Validate external config limits | Reject unreasonable externally configurable values |
+| Agent | Populate warmboot cache first | Fill all caches from warmboot state before processing |
+| General | Empty vector over optional\<vector\> | Use empty vector, not optional\<vector\> |
+| General | Map-based config | Use map lookups, not if-else chains for platform values |
+| General | Log IDs + names | Include both numeric ID and name in log messages |
+| Agent | Delta processing order | Remove → Change → Add; document in comments |
+| Agent | Validator extraction | Extract complex validation into dedicated classes |
+| Agent | Gate validations with flags | New enforcement behind FLAGS for gradual rollout |
+| Testing | Test transient states | Test intermediate invalid states, not just final |
+| Testing | Consolidate test helpers | Move shared helpers to base classes |
+| Testing | Validation UTs | Pure validation → unit tests, not just HW tests |
+| Testing | No setup changes to existing tests | Create new tests instead; setup changes break warmboot roundtrip CI |
+
+## When to Load References
+
+| Topic | Action |
+|-------|--------|
+| Team coding conventions (C++ style, control flow, naming) | Read `references/general-patterns.md` |
+| Agent code (SwSwitch, HwSwitch, mono/multi-switch, warmboot) | Read `references/agent-patterns.md` |
+| SAI/SDK layer (SaiApi, SaiStore, SaiManager, vendor SDK) | Read `references/sai-sdk-patterns.md` |
+| FSDB or thrift_cow (subscriptions, COW nodes, PatchBuilder) | Read `references/thrift-cow-fsdb-patterns.md` |
+| Platform services or config (sensor_service, fan_service, JSON configs) | Read `references/platform-config-patterns.md` |
+| Tests (agent HW tests, multinode tests, naming, fixtures) | Read `references/testing-patterns.md` |
+| Adding new patterns | Read `references/contributing.md` |
@@ -0,0 +1,92 @@
+# Agent Architecture Patterns
+
+Patterns for code under `fboss/agent/`, including SwSwitch, HwSwitch, and state management.
+
+### 1. Mono/Multi-Switch Duality
+**Check**: Any state change or control path must work in both mono mode (`FLAGS_multi_switch=false`, `agent/single/`) and multi-switch mode (`FLAGS_multi_switch=true`, `agent/mnpu/`)
+**Why**: FBOSS supports both single-process and multi-process architectures. Code that only works in one mode causes production failures on the other deployment type.
+**Confidence**: HIGH (always a bug if only one mode is handled)
+
+### 2. Warmboot State Preservation
+**Check**: New fields added to SwitchState or its child nodes must be serialized and deserialized during warmboot. Verify the field appears in the warmboot state Thrift structure and is handled in `fromThrift()`/`toThrift()`.
+**Why**: Missing serialization causes state loss during warmboot, leading to traffic disruption or config drift after agent restart.
+**Confidence**: HIGH (always a bug if new state fields are not serialized)
+
+### 3. OperDelta Sync in Multi-Switch
+**Check**: In multi-switch mode, HwSwitch state updates must go through OperDelta sync (`OperDeltaSyncer`), never by direct HwSwitch mutation from SwSwitch.
+**Why**: Direct mutation bypasses the Thrift IPC boundary, causing state divergence between SwSwitch and HwSwitch processes.
+**Confidence**: HIGH (always a bug in multi-switch mode)
+
+### 4. State Delta Ordering
+**Check**: When applying state deltas, respect dependency order: VLANs before interfaces, interfaces before routes, ACLs before applying to interfaces.
+**Why**: Out-of-order delta application causes transient failures or crashes when referenced objects don't exist yet.
+**Confidence**: HIGH (ordering violations cause crashes)
+
+### 5. Lock Ordering in SwSwitch
+**Check**: When acquiring multiple locks, follow the established lock hierarchy. Never hold `updateLock_` while acquiring `stateLock_`; `stateLock_` must be acquired first if both are needed.
+**Why**: Lock ordering violations cause deadlocks that hang the agent.
+**Confidence**: HIGH (always a bug)
+
+### 6. Non-Coalescing Updates for Behavioral Deltas
+**Check**: State updates where the delta carries behavioral significance (e.g., deltaApplicationBehavior is set) must use non-coalescing update paths. Do not allow the state update framework to coalesce such deltas.
+**Why**: Coalesced updates collapse intermediate states. A sequence of enable→disable→enable is different from just enable, but coalescing reduces it to the final state, losing behavioral intent.
+**Confidence**: HIGH (always a bug if coalesced when delta behavior matters)
+
+### 7. Clean Up Fully Rolled-Out Feature Flags
+**Check**: When a feature is fully rolled out across all platforms, remove the feature flag and the dead code path. Do not leave conditional checks for flags that always evaluate to true.
+**Why**: Dead flag branches add complexity, confuse readers, and accumulate debt. If every platform has the feature, the flag serves no purpose.
+**Confidence**: MEDIUM (requires confirming rollout is complete on all platforms)
+
+### 8. Early Return on Empty Deltas
+**Check**: Delta processing functions must check for empty deltas at the start and return early. Do not process all state updates when only a specific delta changed.
+**Why**: Processing empty deltas wastes CPU and can trigger unnecessary side effects (e.g., recomputing local switch IDs on every state update instead of only when DSF node map changes).
+**Confidence**: HIGH (always add an early return for empty deltas)
+
+### 9. Don't Hardcode ASIC Types
+**Check**: Do not check for specific ASIC types (e.g., `if asicType == J3`) when the information can be derived from config, DSF node map, or ASIC feature flags. Look up ASIC properties dynamically.
+**Why**: Hardcoded ASIC checks break when new ASICs are added and create maintenance burden. Use feature-based gating instead.
+**Confidence**: HIGH (always use feature/property lookups over hardcoded ASIC names)
+
+### 10. Retain Documentation-Value Per-ASIC Methods
+**Check**: Keep per-ASIC limit methods or constants that document platform-specific capabilities (e.g., max neighbor entries per ASIC), even when a unified gflag-based limit is used at runtime. Do not delete them during cleanup unless the platform is fully deprecated.
+**Why**: These methods serve as architecture documentation. Without them, future engineers must reverse-engineer per-platform limits from scattered config files or vendor docs.
+**Confidence**: MEDIUM (requires judgment on whether the platform is still relevant)
+
+### 11. Use ODS Counters for Production Debugging
+**Check**: Add ODS counters (not just logs) for events that need production debugging — packet drops, errors, resource exhaustion. Add follow-up configerator diffs to permit export of new counters to ODS.
+**Why**: Logs are ephemeral and hard to aggregate. ODS counters provide exact counts, time-series trending, and alerting. Without counters, production packet drops require log scraping to debug.
+**Confidence**: HIGH (always add ODS counters for drop/error events)
+
+### 12. Validate External Configuration Limits
+**Check**: Validate externally configurable limits (thrift config, gflags, configerator) to prevent values that could destabilize the system. Reject unreasonable values (e.g., 1M pkts/ms rate limit) that would undo built-in protections like thrift overload guards.
+**Why**: External users can configure arbitrary values. Without bounds checking, a misconfiguration can cause CPU exhaustion, memory exhaustion, or bypass overload protection.
+**Confidence**: HIGH (always validate external config bounds)
+
+### 13. Populate Warmboot Cache Before Processing
+**Check**: During warmboot, populate all caches and ID maps from the warmboot state before processing any new state changes. Do not lazily discover IDs as routes or entries are added.
+**Why**: Lazy discovery causes ID collisions — new allocations may reuse IDs that exist in the warmboot cache but haven't been discovered yet, corrupting forwarding state.
+**Confidence**: HIGH (always populate warmboot caches before processing new state)
+
+### 14. Delta Processing Order
+**Check**: Process deltas in consistent order: removals → changes → additions (per interface, per object type). Document the ordering in comments.
+**Why**: Wrong order creates transient invalid states (e.g., two neighbors with same MAC during update). Documented order prevents future refactors from breaking it.
+**Source**: D96253394 reviewer feedback
+**Confidence**: HIGH
+
+### 15. Extract Validation into Dedicated Classes
+**Check**: Complex validation logic should be extracted into dedicated validator classes with clear `isValidDelta()` and `stateChanged()` APIs, not embedded in state update handlers.
+**Why**: Single responsibility, easier to unit test in isolation, easier to maintain ordering semantics.
+**Source**: D96253394 reviewer feedback
+**Confidence**: MEDIUM
+
+### 16. Gate New Validations Behind Feature Flags
+**Check**: New validation/enforcement logic must be gated behind a FLAGS (e.g., `FLAGS_enforce_single_nbr_mac_per_intf`) for gradual rollout.
+**Why**: Allows disabling if issues arise in production, enables A/B comparison, prevents big-bang rollout risk.
+**Source**: D96253394 reviewer feedback
+**Confidence**: HIGH
+
+### 17. Rebuild Validator State on Rejection
+**Check**: When `updateRejected()` is called after partial validation, rebuild validator state from scratch via `stateChanged(empty → oldState)` rather than attempting incremental undo.
+**Why**: Incremental undo is error-prone. Clean rebuild ensures validator state stays in sync even when validation fails partway through.
+**Source**: D96253394 reviewer feedback
+**Confidence**: MEDIUM
@@ -0,0 +1,38 @@
+# Contributing Patterns
+
+## Where to Add Your Pattern
+
+```
+Is your pattern FBOSS-specific?
++-- YES -> Which area?
+|   +-- Agent/SwSwitch/HwSwitch    -> agent-patterns.md
+|   +-- SAI/SDK                     -> sai-sdk-patterns.md
+|   +-- FSDB/thrift_cow            -> thrift-cow-fsdb-patterns.md
+|   +-- Platform/Config            -> platform-config-patterns.md
+|   +-- Testing                    -> testing-patterns.md
+|   +-- Cross-cutting design judgment (not a single Check/Why rule)
+|       -> fboss-review/references/fboss-specific-reviewers.md (Architect reviewer)
++-- NO -> Is it a coding convention the team wants enforced?
+    +-- YES -> general-patterns.md
+    +-- NO  -> Probably doesn't belong here
+```
+
+## Pattern Template
+
+Every pattern must follow this format:
+
+```markdown
+### N. Pattern Name
+**Check**: What specifically to look for
+**Why**: Consequence of violation
+**Confidence**: HIGH (always a bug) or MEDIUM (context-dependent)
+```
+
+## Guidelines
+
+- **Evidence-based only**: Patterns must come from real bugs, SEVs, or repeated review feedback. No hypothetical issues.
+- **False positive test**: Before adding, check 3-5 recent diffs to confirm the pattern would not produce false positives.
+- **HIGH confidence**: Use only when violation is always a bug, regardless of context.
+- **MEDIUM confidence**: Use when violation is usually a problem but some exceptions exist.
+- **No style preferences**: Style-only preferences (brace placement, blank lines) do not belong here.
+- **Diff prefix**: When submitting pattern changes, use `[fboss][claude]` prefix in the diff title.