Skip to content

Cache libvirt domain state#819

Draft
ben-grande wants to merge 17 commits into
QubesOS:mainfrom
ben-grande:cache-running
Draft

Cache libvirt domain state#819
ben-grande wants to merge 17 commits into
QubesOS:mainfrom
ben-grande:cache-running

Conversation

@ben-grande

@ben-grande ben-grande commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

The state changes rarely, but querying it can take a considerable time, that is blocking, when looping through the state of multiple domains.

For: QubesOS/qubes-issues#10569
For: QubesOS/qubes-issues#9902


Didn't run openqa locally, only experiment some things:

  • start
  • shutdown
  • kill
  • pause
  • unpause
  • checking Domains Widget
  • restart qubesd
  • qubed-query -e dom0 admin.vm.CurrentState QUBE

TODO:

@ben-grande

Copy link
Copy Markdown
Contributor Author

PipelineRetryFailed

@ben-grande

Copy link
Copy Markdown
Contributor Author

Didn't run openqa locally, only experiment some things, start, shutdown, kill, checking Domains Widget.

@ben-grande

Copy link
Copy Markdown
Contributor Author

PipelineRetryFailed

@codecov

codecov Bot commented Jun 1, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 43.11927% with 62 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.12%. Comparing base (9196c9a) to head (367ae38).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
qubes/vm/qubesvm.py 31.42% 48 Missing ⚠️
qubes/device_protocol.py 0.00% 5 Missing ⚠️
qubes/vm/adminvm.py 20.00% 4 Missing ⚠️
qubes/ext/pci.py 87.50% 3 Missing ⚠️
qubes/app.py 33.33% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #819      +/-   ##
==========================================
- Coverage   70.42%   70.12%   -0.30%     
==========================================
  Files          61       61              
  Lines       14143    14174      +31     
==========================================
- Hits         9960     9940      -20     
- Misses       4183     4234      +51     
Flag Coverage Δ
unittests 70.12% <43.11%> (-0.30%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ben-grande ben-grande force-pushed the cache-running branch 2 times, most recently from 5eff83f to aa2e823 Compare June 1, 2026 21:26
@ben-grande

Copy link
Copy Markdown
Contributor Author

PipelineRetryFailed

1 similar comment
@ben-grande

Copy link
Copy Markdown
Contributor Author

PipelineRetryFailed

@ben-grande ben-grande marked this pull request as draft June 2, 2026 07:55
@ben-grande

Copy link
Copy Markdown
Contributor Author

Marked as draft as there still is some caching issues for the _power_state.

@marmarek

marmarek commented Jun 2, 2026

Copy link
Copy Markdown
Member

If necessary (or simpler), it's IMO okay to use cache only for is_running() (that is used frequently), but still do active check for get_power_state(). It could be also a safety valve for cases when cache are not reliable (like during startup or shutdown).

@ben-grande

Copy link
Copy Markdown
Contributor Author

If necessary (or simpler), it's IMO okay to use cache only for is_running() (that is used frequently), but still do active check for get_power_state(). It could be also a safety valve for cases when cache are not reliable (like during startup or shutdown).

Experimenting with cache a bit more and it's getting better, but if I don't get it to a hundred percent, will not cache the power state.

@ben-grande

ben-grande commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

From my tests, it seems to be working. My nemesis, OpenQA, should try me. Do you know a subset of tests that are interesting before joining this PR in the full run?

Looking at this recent test:

  • system_tests_suspend (soft-failed)
  • system_tests_usbproxy
  • system_tests_devices
  • system_tests_gui_tools
  • system_tests_guivm_gui_interactive

Line to use after this PR is not draft anymore:

openQArun TEST=system_tests_suspend,system_tests_usbproxy,system_tests_devices,system_tests_gui_tools,system_tests_guivm_gui_interactive

@marmarek

marmarek commented Jun 2, 2026

Copy link
Copy Markdown
Member

I'd include also at least one of the guivm tests

@ben-grande ben-grande force-pushed the cache-running branch 4 times, most recently from 9538d21 to 6011ca3 Compare June 2, 2026 21:30
@ben-grande

Copy link
Copy Markdown
Contributor Author

PipelineRetryFailed

@ben-grande

Copy link
Copy Markdown
Contributor Author

Last CI failed on Fedora 43 to due to:

======================================================================
ERROR: qubes.tests.api_admin/TC_00_VMs/test_150_pool_info
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib64/python3.14/contextlib.py", line 85, in inner
    return func(*args, **kwds)
  File "/home/gitlab-runner/builds/QubesOS/qubes-core-admin/qubes/tests/api_admin.py", line 76, in setUp
    super().setUp()
    ~~~~~~~~~~~~~^^
  File "/home/gitlab-runner/builds/QubesOS/qubes-core-admin/qubes/tests/__init__.py", line 516, in setUp
    self.loop = asyncio.get_event_loop()
                ~~~~~~~~~~~~~~~~~~~~~~^^
  File "/usr/lib64/python3.14/asyncio/events.py", line 715, in get_event_loop
    raise RuntimeError('There is no current event loop in thread %r.'
                       % threading.current_thread().name)
RuntimeError: There is no current event loop in thread 'MainThread'.

So it finally reached this repo.

@ben-grande

Copy link
Copy Markdown
Contributor Author

PipelineRetryFailed

@ben-grande

Copy link
Copy Markdown
Contributor Author

openQArun TEST=system_tests_suspend,system_tests_usbproxy,system_tests_devices,system_tests_gui_tools,system_tests_guivm_gui_interactive

@ben-grande ben-grande force-pushed the cache-running branch 4 times, most recently from 7bda37a to a0a7e60 Compare June 12, 2026 16:12
Since there is "resume", and there is a moment when resuming that
domain might still be suspended, or event suspending a domain on
purposed, allow clients to know the domain state.
Useful when debugging to know what's happening. I thought of logging the
pretty name of the detail, but that got really big and I think it's out
of scope from the Qubes OS project and in scope of the python-libvirt
package.
- Keep events related to domain state in the same place
- Shorter exception messages, as it's already prefixed with qube name
- Much easier to read with less body on the if statements
- Will help on a future commit to access private attributes without
  having to disable pylint checks
- The events are related to a domain, not to the larger app instance
@ben-grande ben-grande force-pushed the cache-running branch 2 times, most recently from f140ed3 to 29678d5 Compare June 13, 2026 00:14
The state changes rarely, but querying it can take a considerable
blocking time, worse when looping through the state of multiple domains.

For: QubesOS/qubes-issues#10569
For: QubesOS/qubes-issues#9902
@ben-grande ben-grande force-pushed the cache-running branch 2 times, most recently from 2d4a274 to fd6ef64 Compare June 17, 2026 12:03
These data only changes once when domain is running.
This function reimports the same information over and over and is time
consuming when looping through multiple domains. Reduce listing USB
devices while there is no cache from 2.8s to 0.5s.
Method 'startwith' adds considerable overhead.
Libvirt API can be quite slow on the methods for "listAllDevices" and
"XMLDesc", aggravated in loops.
Assuming that hotplug is unsupported.
Each call was recording 1ms, now the calls are below that.
Although it doesn't help reduce the time to get the XML, as XMLDesc() is
super slow, this helps cleanup the code a bit.
Skip PVH when listing attached PCI devices, as they can't possibly be
there.

@DemiMarie DemiMarie left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XPath variables are super nice. They are the XPath equivalent of SQL parameterized queries.

Comment thread qubes/storage/__init__.py
if source.get("dev") == "/dev/%s" % volume.vid:
return True
disks = parsed_xml.xpath(
"//domain/devices/disk/source[@dev='/dev/{}']".format(volume.vid)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of injecting the variable with .format(), use XPath variables. Something like this (not tested):

        disks = parsed_xml.xpath(
            "//domain/devices/disk/source[@dev = $dev]",
            dev = "/dev/" + volume.vid)

This avoids an XPath injection vulnerability if volume.vid winds up containing an apostraphe. You can use a regular expression to validate volume.vid, but even then it is much better to use XPath variables.

.find() is even faster if it’s an option.

Comment thread qubes/storage/__init__.py
Comment on lines +673 to +675
disks = parsed_xml.xpath(
"//domain/devices/disk/source[@dev='/dev/{}']".format(volume.vid)
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use an XPath variable.

@ben-grande ben-grande marked this pull request as draft June 24, 2026 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants