fix: reduce tcp_retries2 to 3#2912
Open
Conversation
Contributor
|
@merlin-northern, start a full integration test pipeline with:
my commands and optionsYou can prevent me from automatically starting CI pipelines:
You can trigger a client pipeline on multiple prs with:
You can trigger a client pipeline for a specific Mender Client release with:
You can trigger GitHub->GitLab branch sync with:
You can print PR statistics for a repository with:
You can cherry pick to a given branch or branches with:
|
ef6e6e2 to
666d827
Compare
some of the tests in the mender_connect and among them test_in_poor* expect the TCP/IP stack to react in a given time. This is not the case as things are implemented in the linux kernel at the moment. in order to make the test pass every time we have to influence the kernel to calculate lower retransmission timeouts (RTO), so the TCP stack notices that the connectivity has been broken and assumes the mender-connect link to the deviceconnect is broken in the times hardcoded in the test. one way of doing that is to lover the tcp_retries2 from default 15, which may (and does as the failures of this test prove) reach even more than 10 mintues, to new turbo sensitive 3. Refs: https://docs.kernel.org/networking/ip-sysctl.html Changelog: Title Ticket: QA-1625 Signed-off-by: Peter Grzybowski <peter@northern.tech>
Ticket: QA-1625 Signed-off-by: Peter Grzybowski <peter@northern.tech>
35af818 to
e7a8b48
Compare
Contributor
|
Merging these commits will result in the following changelog entries: Changelogsintegration (qa_1625__mender_connect_test_in_poor_network_conditions_fail_with_404_from_time_to_time_tcp-retries2)New changes in integration since master: Bug Fixes
|
Contributor
Author
|
@mender-test-bot start integration pipeline please |
Contributor
|
Hello 😺 I created a pipeline for you here: Pipeline-2564031608 Build Configuration Matrix
|
Contributor
Author
|
4.1.x: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fix: reduce tcp_retries2 to 3
some of the tests in the mender_connect and among them test_in_poor*
expect the TCP/IP stack to react in a given time. This is not the case
as things are implemented in the linux kernel at the moment.
in order to make the test pass every time we have to influence the kernel
to calculate lower retransmission timeouts (RTO), so the TCP stack
notices that the connectivity has been broken and assumes the mender-connect
link to the deviceconnect is broken in the times hardcoded in the test.
one way of doing that is to lover the tcp_retries2 from default 15, which
may (and does as the failures of this test prove) reach even more
than 10 mintues, to new turbo sensitive 3.
Refs:
https://docs.kernel.org/networking/ip-sysctl.html
A note to reviewers @danielskinstad @kjaskiewiczz: I started 4 independent pipelines in 4.1.x listed below; I restarted integration:3 many times, it never failed. You can see all the logs from all the jobs and how other failures were distributed (it does not bring an additional instability).
Changelog: Title
Ticket: QA-1625
Signed-off-by: Peter Grzybowski peter@northern.tech