Skip to content

New test runner#213

Merged
brenns10 merged 21 commits into
mainfrom
newtests
Jun 3, 2026
Merged

New test runner#213
brenns10 merged 21 commits into
mainfrom
newtests

Conversation

@brenns10
Copy link
Copy Markdown
Member

@brenns10 brenns10 commented Jun 1, 2026

Introduce a new test runner, testing.vm, which will replace both the testing.litevm and testing.heavyvm runners, while adding more features and being overall... better.

The new test runner has the following benefits:

  • It runs tests in an Oracle Linux rootfs. This means the tests run in a real OL userspace and can use the official drgn RPMs in our tests. (Compared to the old "litevm" runner, which ran in the host's rootfs.)
  • The rootfs is built with dnf --installroot in a container, which is much faster than scripting the ISO installer, and easier to update than the prebuilt cloud VM images.
  • Since the rootfs is manually built and part of the host filesystem, we can install just what we need, keeping the size small. And since we can share the rootfs for each test associated test kernel, we have less duplication. My rootfs directory is less than 2 GiB for all three combined.
  • Since the rootfs is not contained in an image, it's easy to run maintainence commands (e.g. installing new packages, compiling kernel modules) in a chroot rather than spinning up a full VM.
  • As a direct consequence, we can now build and insert a kernel module for our VM tests.
  • The runner now supports testing against Red Hat kernels (RHCK) -- including with the test kernel module.
  • Tests run on Oracle Linux host with only standard RPMs - just be sure to use the latest KVM utils stream.
  • Tests boot and run much faster! My laptop can run both DWARF + CTF test for OL8/UEK6 in around 15 seconds -- and this requires booting twice.

Some other notes:

  • This is a large amount of addition without many changes or deletions. The idea is to add this new framework in parallel to the others. I'll remove the others (testing.litevm, testing.heavyvm) once this is stable on Github CI.
  • I have some follow-ups that I will tackle later based on priority:
    • Make it possible to run tests against specific Python versions in each rootfs. (OL8: python 3.6, 3.12, OL9: 3.9,
      3.12, 3.14 soon, OL10: 3.12, 3.14 soon).
    • Enable running vmcore tests in the chroot for each OL version. This ensures that vmcore tests also use the real bits from our RPMs.
    • Make it easy to drop-in new drgn/drgn-tools RPMs to test them (replacing my ad-hoc testing workflow for testing new RPM releases).
    • Expand support to aarch64 and add this to Gitlab.

There is a known failure on UEK-NEXT due to changes in the slab allocator. Imran is currently taking a look at it, but I don't think this should block merging this.

brenns10 added 12 commits April 21, 2026 16:22
Oracle Linux uses /usr/libexec/qemu-kvm, test that first and fallback to
the standard name of qemu.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
The "litevm" runner runs an Oracle Linux kernel, but it does so in
whatever host rootfs is available. This means that we are not running
tests of the actual drgn RPM which Oracle Linux provides. It also means
that if we want to create a test kmod (similar to drgn), it is very
difficult to build it.

The "heavyvm" runner runs an Oracle Linux kernel within a full Oracle
Linux userspace disk image. This solves both problems, but since the
disk image is very large and slow to interact with, it cannot really be
used in Github Actions and it's cumbersome to use locally. What's more,
sharing the code for testing & module builds is difficult, so in
practice we don't actually get much benefit. Plus, rebuilding the image
is a maintenance chore.

To simplify and sidestep these issues, a new "vm" runner will replace
both. Tests will run within an Oracle Linux userspace chroot which we
create by using the official podman images to run "dnf --installroot".
The chroot contains necessary build dependencies for kernel modules as
well as drgn and anything else the tests need. With this, the chroots
can be created quickly enough to be useful in CI (say, a couple
minutes). They are much smaller than the VM runners of heavyvm, with the
flexibility of allowing bind mounts and kernel module builds.

We also add a "hello world" kernel module which will become the test
kmod over time. The test runner itself is stubbed out, we will add it
next.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
This allows networking to function well enough inside the chroot, at
least if you're running a local resolver.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
With the new vm test runner, we can actually have our own kernel module.
So try that module name first. Export a symbol so that it will show up
in the smoke test.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
The previous behavior loaded DWARF debuginfo for the kmod during CTF
testing, which resulted in some collisions between the two type systems.
In particular:

  AttributeError: 'struct address_space' has no member 'host'
  drgn_tools/kernfs_memcg.py: 236: AttributeError

Thankfully, our drgn CTF support has (somewhat hidden) support for
loading kernel module CTF, if we have the ELF file. Let's use that so
we're completely CTF in CTF-mode, and completely DWARF in DWARF-mode.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
* Update the logger to print out what's happening in non-verbose mode,
for things like building kmod or rootfs. We don't want *no* output, we
just don't want to see all the output.

* Use "echo b>/proc/sysrq-trigger" to shut down, avoiding the "poweroff"
command which brings in a systemd dependency. This is directly lifted
from drgn's approach.

* Use "exec switch_root" rather than "chroot" to in the init. This is
important because "exec switch_root" leaves behind the old mounts, which
means that drgn will interpret paths in the old mount namespace. This
causes tests to fail (e.g. pstack, mounts) because they expect the mount
namespace to match their view.

* Automatically detect non-pytest commands and set "interactive", and
don't run them twice (DWARF/CTF).

* Use "stty" to set the terminal size, similar to how drgn does it.
* Add "hostname" to the rootfs so bash shows a relevant prompt.
* When a DWARF test fails, don't skip the CTF test for that target!

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
This is embarassing. I never committed this one. Oh well.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Historically we have not maintained RHCK testing infrastructure, but
with the new rootfs based testing, it's honestly not too hard to do
this. Add the necessary configuration plumbing. There's one failure on
OL8 RHCK that I'll need to fix.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
OL8 RHCK does not set CONFIG_CMA. Even with the config disabled, the
variable totalcma_pages still exists, but of course it's zero and
nothing appears in /proc/meminfo. This causes a test failure, because we
expect meminfo entries which are present in procfs should be present in
corelens. Resolve the test failure by properly detecting CONFIG_CMA as
unset. In that case, CMA stats are not returned.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
@oracle-contributor-agreement oracle-contributor-agreement Bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Jun 1, 2026
@brenns10 brenns10 force-pushed the newtests branch 5 times, most recently from a4e3056 to 7b640f2 Compare June 1, 2026 17:02
Now that all tests except UEK-NEXT are passing, switch the Github CI
runner to use the new test framework. There are definitely still missing
features here, primarily that we do not delete the extracted RPMs after
the test. We'll see whether that becomes an issue.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
brenns10 added 5 commits June 1, 2026 10:14
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
The virtiofsd on the latest Ubuntu image on Github Actions does not seem
to have the --readonly option. This enforces read-only behavior from the
host side, which would be better for security, but we trust our guest.
We are really just making the FS read-only as an assertion that we do
not expect the tests to modify anything. So let's drop --readonly from
the virtiofsd command line, and instead use a mount option in the guest
to make it read-only. This is good enough for the intended purpose.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
This feature from litevm is definitely still necessary for running in
the Github CI.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Interactive mode looks nicer because output is directly attached to the
console, but the trade-off is that we cannot detect the pass/fail status
of the test, which kind of defeats the purpose of CI.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
@brenns10 brenns10 changed the title [draft] New test runner New test runner Jun 1, 2026
@brenns10 brenns10 requested a review from biger410 June 1, 2026 21:09
brenns10 added 2 commits June 1, 2026 17:22
To run the test framework under SELinux we can add relabel=private, this
is just like the "Z" flag in the "-v" command.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
@biger410
Copy link
Copy Markdown
Member

biger410 commented Jun 2, 2026

Was the new runner running inside other OS, or qemu with oracle linux?

@brenns10
Copy link
Copy Markdown
Member Author

brenns10 commented Jun 3, 2026

The new runner is QEMU with Oracle Linux user space.
Previously it was:

  • QEMU with whatever user space the host provides (Ubuntu for GitHub CI). This was the litevm tests.
  • QEMU with a Oracle Linux user space (as a qcow2 image that needed rebuilding every so often). This was the heavyvm tests on Gitlab.

We should be able to replace both with the new runner.

@brenns10 brenns10 merged commit 7dd105f into main Jun 3, 2026
3 of 5 checks passed
@brenns10 brenns10 deleted the newtests branch June 3, 2026 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

allow-missing-latest OCA Verified All contributors have signed the Oracle Contributor Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants