Skip to content

Fix #3203 usage server broken in 4.11/4.12#3207

Closed
GabrielBrascher wants to merge 5 commits into
apache:masterfrom
CLDIN:usage-aggregation-issue
Closed

Fix #3203 usage server broken in 4.11/4.12#3207
GabrielBrascher wants to merge 5 commits into
apache:masterfrom
CLDIN:usage-aggregation-issue

Conversation

@GabrielBrascher
Copy link
Copy Markdown
Member

@GabrielBrascher GabrielBrascher commented Mar 11, 2019

Fixes issue #3203.

Due to the migration to systemd, the configuration -Dpid=$$ does not return the pid value but the string '$$'; thus, the _pid variable received '$$', causing a silenced exception.

This PR adds the file /etc/cloudstack/usage/cloudstack-usage.service.pid that has the process pid. The file is configured after the ExecStart, by running the systemctl show command configured as:

ExecStartPost=/bin/sh -c "systemctl show -p MainPID cloudstack-usage.service 2>/dev/null | cut -d= -f2 > /etc/cloudstack/usage/cloudstack-usage.service.pid"

Description

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Screenshots (if appropriate):

How Has This Been Tested?

CloudStack usage service run as expected after updating with the changes from this PR.

'$$'; thus, the _pid variable received '$$', causing an exception.
@GabrielBrascher
Copy link
Copy Markdown
Member Author

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@GabrielBrascher a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@GabrielBrascher GabrielBrascher changed the title Fix #3203 usage server broken in 4.12.0.0 Fix #3203 usage server broken in 4.11 / 4.12 Mar 11, 2019
@GabrielBrascher GabrielBrascher changed the title Fix #3203 usage server broken in 4.11 / 4.12 Fix #3203 usage server broken in 4.11/4.12 Mar 11, 2019
@GabrielBrascher
Copy link
Copy Markdown
Member Author

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@GabrielBrascher a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@konstantinjfk
Copy link
Copy Markdown

Would you think to put pid file to the /run folder? like /run/cloudstack-usage.service.pid

@wido
Copy link
Copy Markdown
Contributor

wido commented Mar 11, 2019

I would say that /run or /var/run would be better as well.

But why does the usage server actually needs it's PID?

@andrijapanicsb
Copy link
Copy Markdown
Contributor

I would also ask (sorry if silly question) - why any difference to 4.11.2 at all (did the code around systemd changed) ? - i.e. I observe no pid with 4.11.2, as far as I can see.

But definitively (if PID required) would put it to default location as /var/run/*

Happy to test, just want to avoid doing it 2 times... ?

@ustcweizhou
Copy link
Copy Markdown
Contributor

I agree with @wido @konstantinjfk that it would be better to save pid file as "/var/run/cloudstack-usage.service.pid"

I have applied @GabrielBrascher 's patch in cloudstack 4.11.2, changed pid file to "/var/run/cloudstack-usage.service.pid" and tested it. all seems fine.

so this pull request is good for me, except the pid file location.

thanks @GabrielBrascher for your work.

exception in case of null or blank pid (NumberUtils.toInt returns zero
if the conversion fails).
@ustcweizhou
Copy link
Copy Markdown
Contributor

code looks good to me.

@GabrielBrascher
Copy link
Copy Markdown
Member Author

Thanks for the feedback @wido @andrijapanic @konstantinjfk @ustcweizhou!
I have changed the pid file to /var/run/cloudstack-usage.service.pid.

@GabrielBrascher
Copy link
Copy Markdown
Member Author

@wido, as far as I know, the implementation used the pid to have a unique id regarding the cloudstack-usage.service (in case of multiple cloudstack-usage nodes running). To change this approach some refactoring is needed.

Regarding the codebase (prior to this PR), the exception was happening at line 274 from UsageManagerImpl.java.
_pid = Integer.parseInt(System.getProperty("pid")); and silenced at the UsageServer as it would print the stack only in case of running the service from the command line and not a systemd service (which was not the case for any of us). Explaining why the usage.log did not print details and/or an exception.

try {
    ComponentContext.initComponentsLifeCycle();
} catch (Exception e) {
    e.printStackTrace();
}

If the _pid is not a valid pid for the service (for instance zero) the runInContextInternal execution flow is ignored.

UsageJobVO job = _usageJobDao.isOwner(_hostname, _pid);

@GabrielBrascher
Copy link
Copy Markdown
Member Author

@andrijapanic the _pid is there at 4.11.2. However, I am not sure if the service was alrady migrate to systemd at 4.11.2 or it was migrated in the 4.11 branch after 4.11.2 was released.
https://github.com/apache/cloudstack/blob/4.11.2.0/usage/src/com/cloud/usage/UsageManagerImpl.java#L353

Comment thread usage/src/main/java/com/cloud/usage/UsageManagerImpl.java Outdated
@andrijapanicsb
Copy link
Copy Markdown
Contributor

@GabrielBrascher thx a lot for the fixes ! I'll wait for commit and kick packaging once more.

@GabrielBrascher
Copy link
Copy Markdown
Member Author

Fixed! Thanks for the help @andrijapanicsb!

@GabrielBrascher GabrielBrascher force-pushed the usage-aggregation-issue branch from 3510470 to 4c948e0 Compare March 11, 2019 14:32
Copy link
Copy Markdown
Contributor

@wido wido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@GabrielBrascher GabrielBrascher force-pushed the usage-aggregation-issue branch from 4d7ad0f to d9d3da9 Compare March 11, 2019 19:03
@ustcweizhou
Copy link
Copy Markdown
Contributor

@GabrielBrascher could you squash the commits to one commit ?

@GabrielBrascher GabrielBrascher force-pushed the usage-aggregation-issue branch from d9d3da9 to a42bc9f Compare March 11, 2019 22:01
@GabrielBrascher
Copy link
Copy Markdown
Member Author

@ustcweizhou sure, I will Squash when merging (with the Squash and merge).

@GabrielBrascher
Copy link
Copy Markdown
Member Author

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@GabrielBrascher a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result: ✔centos6 ✔centos7 ✔debian. JID-2624

@GabrielBrascher
Copy link
Copy Markdown
Member Author

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@GabrielBrascher a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

# specific language governing permissions and limitations
# under the License.

JAVA_OPTS="-Dpid=$$ -Xms256m -Xmx2048m"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GabrielBrascher What's the issue, this should work?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, re-read the description. Alright keep this change, I'll advise you a simpler fix.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, this without quotes would work

[Service]
Type=simple
EnvironmentFile=/etc/default/cloudstack-usage
ExecStart=/usr/bin/java $JAVA_OPTS -cp $CLASSPATH $JAVA_CLASS
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put this and this would fix the pid issue:

Environment=JAVA_PID=$$
ExecStart=/usr/bin/java $JAVA_DEBUG -Dpid=$JAVA_PID $JAVA_OPTS -cp $CLASSPATH $JAVA_CLASS

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment: this would need to be fixed for 4.11 as well, let me send a quick fix instead.

Copy link
Copy Markdown
Member

@yadvr yadvr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too complicated, I'll submit a simpler change.

@yadvr
Copy link
Copy Markdown
Member

yadvr commented Mar 12, 2019

Please review/test - #3210

@blueorangutan
Copy link
Copy Markdown

Trillian test result (tid-3416)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 28344 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr3207-t3416-kvm-centos7.zip
Smoke tests completed. 70 look OK, 0 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File

@andrijapanicsb
Copy link
Copy Markdown
Contributor

I did not review this one (test it), but already had env for #3210 and confirm it works fine - usage jobs does kick in as it should.

@GabrielBrascher
Copy link
Copy Markdown
Member Author

Closing this in favor of #3210

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants