e2e: increase waitForRoutesTimeout from 90s to 120s#2799
Open
martinsander00 wants to merge 1 commit intomainfrom
Open
e2e: increase waitForRoutesTimeout from 90s to 120s#2799martinsander00 wants to merge 1 commit intomainfrom
martinsander00 wants to merge 1 commit intomainfrom
Conversation
c9d0d0d to
99d53f3
Compare
Add diagnostic test for BGP propagation timing investigation.
99d53f3 to
7ab6230
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Important
The primary goal of this PR is to increase
waitForRoutesTimeoutfrom 90s to 120s. The diagnostic test file (qa_bgp_propagation_test.go) is included for reference but will be removed before merging.Summary
waitForRoutesTimeoutfrom 90s to 120s to reduce flaky QA test failuresTestQA_BGPPropagationVariance) used to investigate the root causeInvestigation
QA tests (
TestQA_UnicastConnectivity, multicast tests) were intermittently timing out at the "waiting for routes to be installed" step, particularly for the fra↔sgp (Frankfurt↔Singapore) route.What we tested
Findings
Total = Sgp Connect + Route Propagation (timing starts when sgp begins connecting)
Additional context - Fra initial routes at start of each iteration:
Key observations:
Root cause
There's a delay between "BGP Session Up" status and actual route exchange completing. In cold-start scenarios (first test of the day, or after BGP state reset), route propagation between distant exchanges (xfra↔xsin) can take 65-80s total, which approaches or exceeds the 90s timeout.
Solution
Increase
waitForRoutesTimeoutfrom 90s to 120s to provide sufficient headroom for worst-case BGP propagation times.Testing Verification
TestQA_BGPPropagationVariance5 iterations - all passed with new 120s timeoutTestQA_UnicastConnectivitywith all 4 hosts - passed in 113sgo build -tags=qa ./e2e/...passes