Skip to content

Prevent overflow on StatsCollector.java#3932

Merged
andrijapanicsb merged 1 commit into
apache:4.13from
CLDIN:statscollector-potential-issues
Mar 13, 2020
Merged

Prevent overflow on StatsCollector.java#3932
andrijapanicsb merged 1 commit into
apache:4.13from
CLDIN:statscollector-potential-issues

Conversation

@GabrielBrascher
Copy link
Copy Markdown
Member

@GabrielBrascher GabrielBrascher commented Mar 4, 2020

Description

Did a quick check and found an operation that can result in an overflow. Additionally, I took the liberty to do a couple of code enhancements/clean ups.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

s_logger.warn("Usage stats job aggregation range is to small, using the minimum value of " + UsageUtils.USAGE_AGGREGATION_RANGE_MIN);
_usageAggregationRange = UsageUtils.USAGE_AGGREGATION_RANGE_MIN;
}
_diskStatsUpdateExecutor.scheduleAtFixedRate(new VmDiskStatsUpdaterTask(), (endDate - System.currentTimeMillis()), (_usageAggregationRange * 60 * 1000),
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential overflow in 2 Values before it is converted to long by use in an invocation context.

for (AutoScaleVmGroupPolicyMapVO asVmgPmap : listMap) {
AutoScalePolicyVO policyVO = _asPolicyDao.findById(asVmgPmap.getPolicyId());
if (policyVO != null) {
Integer quitetime = policyVO.getQuietTime();
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable 'quitetime' is only assigned values of primitive type and is never 'null', but it is declared with the boxed type 'Integer'.

Comment thread server/src/main/java/com/cloud/server/StatsCollector.java
@GabrielBrascher
Copy link
Copy Markdown
Member Author

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@GabrielBrascher a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result: ✖centos6 ✔centos7 ✔debian. JID-1000

@yadvr yadvr added this to the 4.14.0.0 milestone Mar 5, 2020
@yadvr
Copy link
Copy Markdown
Member

yadvr commented Mar 5, 2020

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link
Copy Markdown

Trillian test result (tid-1191)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 28797 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr3932-t1191-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_ssvm.py
Smoke tests completed. 81 look OK, 0 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File

@DaanHoogland DaanHoogland reopened this Mar 6, 2020
@GabrielBrascher GabrielBrascher changed the title Prevent overflow and null pointer on StatsCollector.java Prevent overflow on StatsCollector.java Mar 6, 2020
Copy link
Copy Markdown
Member

@weizhouapache weizhouapache left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code LGTM

@GabrielBrascher
Copy link
Copy Markdown
Member Author

Thanks for all the reviews!

Just to mention. I did a few manual tests to ensure that no overflow will happen. Setting ONE_MINUTE_IN_MILLISCONDS to Long (or long) will indeed solve overflow issues.

@DaanHoogland
Copy link
Copy Markdown
Contributor

@GabrielBrascher as this is a pure fix, will it apply to 4.13?

Copy link
Copy Markdown
Contributor

@borisstoyanov borisstoyanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GabrielBrascher can you share some steps to reproduce this? I'm trying to find out what exactly we're trying to fix here.

@GabrielBrascher GabrielBrascher changed the base branch from master to 4.13 March 9, 2020 11:47
@GabrielBrascher GabrielBrascher changed the base branch from 4.13 to master March 9, 2020 11:48
@GabrielBrascher
Copy link
Copy Markdown
Member Author

@borisstoyanov the overflow is not something that will happen easily; however, the code is there and it still something that might happen one day depending on the value stored at _usageAggregationRange.

The operation that I extracted can result in an overflow as it is an integer operation stored in an integer memory (and then after storing it as an integer it is cast into a long). I extracted it into a long = int * Long, which prevents a overflow as it is not an integer operation anymore but converted into a long operation.

@GabrielBrascher
Copy link
Copy Markdown
Member Author

@DaanHoogland I see no problem in applying this at 4.13.1.

@DaanHoogland
Copy link
Copy Markdown
Contributor

ok, can you rebase @GabrielBrascher ?

@GabrielBrascher GabrielBrascher changed the base branch from master to 4.13 March 9, 2020 12:32
@GabrielBrascher GabrielBrascher force-pushed the statscollector-potential-issues branch from ff95beb to 38755ed Compare March 9, 2020 12:44
@GabrielBrascher
Copy link
Copy Markdown
Member Author

@DaanHoogland rebased it to 4.13 🙂

@GabrielBrascher GabrielBrascher force-pushed the statscollector-potential-issues branch from 38755ed to c4ea2ad Compare March 9, 2020 12:52
@GabrielBrascher GabrielBrascher force-pushed the statscollector-potential-issues branch from c4ea2ad to b9aad0f Compare March 9, 2020 13:03
@borisstoyanov
Copy link
Copy Markdown
Contributor

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@borisstoyanov a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result: ✖centos6 ✔centos7 ✔debian. JID-1037

@borisstoyanov
Copy link
Copy Markdown
Contributor

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@borisstoyanov a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link
Copy Markdown

Trillian test result (tid-1225)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 33837 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr3932-t1225-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_internal_lb.py
Intermittent failure detected: /marvin/tests/smoke/test_privategw_acl.py
Smoke tests completed. 76 look OK, 1 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_02_vpc_privategw_static_routes Failure 219.74 test_privategw_acl.py
test_03_vpc_privategw_restart_vpc_cleanup Failure 210.20 test_privategw_acl.py
test_04_rvpc_privategw_static_routes Failure 942.54 test_privategw_acl.py

@DaanHoogland DaanHoogland modified the milestones: 4.14.0.0, 4.13.1.0 Mar 10, 2020
@andrijapanicsb
Copy link
Copy Markdown
Contributor

2 x LGTMs, all tests passing (those privategtw tests are "OK", fixed in newer ACS code - this PR is 150+ commits behind master)

Merging.

@andrijapanicsb andrijapanicsb merged commit cd6f0cb into apache:4.13 Mar 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants