Skip to content

Kernel Oops report #13

@YWHyuk

Description

@YWHyuk

When starting GiantVM, It failed to connect each other and I got Oops.

dell-3:~/Tutorial/guest_image$ ./run_gvm.sh -c 8 -m 8192 -s 4 -l 4 -i "10.10.20.53 10.10.20.54"
CPU Info
Total: 8
Local: 4 [4-7]
Remote: 4[ 0 1 2 3 ]
KVM API version[12], QEMU version[12]
WARNING: Image format was not specified for 'user-data.img' and probing guessed raw.
         Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted.
         Specify the 'raw' format explicitly to remove the restrictions.
CPU 0 is remote CPU, pause
CPU 1 is remote CPU, pause
CPU 2 is remote CPU, pause
CPU 3 is remote CPU, pause
start kvm dsm server, total memory size: 8589934592
run_on_cpu: not local CPU, ignore here. This could be a latent bug.
run_on_cpu: not local CPU, ignore here. This could be a latent bug.
run_on_cpu: not local CPU, ignore here. This could be a latent bug.
run_on_cpu: not local CPU, ignore here. This could be a latent bug.
error: msi send vector in range 0-15
error: cannot find current apic
run_on_cpu: not local CPU, ignore here. This could be a latent bug.
run_on_cpu: not local CPU, ignore here. This could be a latent bug.
run_on_cpu: not local CPU, ignore here. This could be a latent bug.
run_on_cpu: not local CPU, ignore here. This could be a latent bug.
run_on_cpu: not local CPU, ignore here. This could be a latent bug.
run_on_cpu: not local CPU, ignore here. This could be a latent bug.
run_on_cpu: not local CPU, ignore here. This could be a latent bug.
run_on_cpu: not local CPU, ignore here. This could be a latent bug.
QEMU nums: 2, Total CPU nums: 8, CPU per QEMU: 4
connect_io_router...
QEMU 0 wait for RDMA connection on 10.10.20.54:40004
RDMA ERROR: Error: could not rdma_bind_addr!
RDMA init fail on 10.10.20.54:40004
RDMA ERROR: Error: could not rdma_bind_addr!
RDMA init fail on 10.10.20.54:40005
QEMU 0 wait for RDMA connection on 10.10.20.54:40005
source_resolve_host RDMA Device opened: kernel name mlx5_0 uverbs device name uverbs0, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs0, infiniband class device path /sys/class/infiniband/mlx5_0, transport: (1) Infiniband
rdma_get_cm_event != EVENT_ESTABLISHED after rdma_connect: No such file or directory
RDMA ERROR: connecting to destination!
RDMA connect to 10.10.20.53:40002 fail, retrying...
source_resolve_host RDMA Device opened: kernel name mlx5_0 uverbs device name uverbs0, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs0, infiniband class device path /sys/class/infiniband/mlx5_0, transport: (1) Infiniband
rdma_get_cm_event != EVENT_ESTABLISHED after rdma_connect: No such file or directory
RDMA ERROR: connecting to destination!
RDMA connect to 10.10.20.53:40002 fail, retrying...
source_resolve_host RDMA Device opened: kernel name mlx5_0 uverbs device name uverbs0, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs0, infiniband class device path /sys/class/infiniband/mlx5_0, transport: (1) Infiniband
rdma_get_cm_event != EVENT_ESTABLISHED after rdma_connect: No such file or directory
RDMA ERROR: connecting to destination!
RDMA connect to 10.10.20.53:40002 fail, retrying...
source_resolve_host RDMA Device opened: kernel name mlx5_0 uverbs device name uverbs0, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs0, infiniband class device path /sys/class/infiniband/mlx5_0, transport: (1) Infiniband
dell-4:~/Tutorial/guest_image$ ./run_gvm.sh -c 8 -m 8192 -s 0 -l 4 -i "10.10.20.53 10.10.20.54"
qemu-system-x86_64: -redir tcp:5556::22: The -redir option is deprecated. Please use '-netdev user,hostfwd=...' instead.
CPU Info
Total: 8
Local: 4 [0-3]
Remote: 4[ 4 5 6 7 ]
KVM API version[12], QEMU version[12]
WARNING: Image format was not specified for 'user-data.img' and probing guessed raw.
         Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted.
         Specify the 'raw' format explicitly to remove the restrictions.
CPU 4 is remote CPU, pause
CPU 5 is remote CPU, pause
CPU 6 is remote CPU, pause
CPU 7 is remote CPU, pause
start kvm dsm server, total memory size: 8589934592
run_on_cpu: not local CPU, ignore here. This could be a latent bug.
run_on_cpu: not local CPU, ignore here. This could be a latent bug.
run_on_cpu: not local CPU, ignore here. This could be a latent bug.
run_on_cpu: not local CPU, ignore here. This could be a latent bug.
error: msi send vector in range 0-15
error: cannot find current apic
run_on_cpu: not local CPU, ignore here. This could be a latent bug.
run_on_cpu: not local CPU, ignore here. This could be a latent bug.
run_on_cpu: not local CPU, ignore here. This could be a latent bug.
run_on_cpu: not local CPU, ignore here. This could be a latent bug.
run_on_cpu: not local CPU, ignore here. This could be a latent bug.
run_on_cpu: not local CPU, ignore here. This could be a latent bug.
run_on_cpu: not local CPU, ignore here. This could be a latent bug.
run_on_cpu: not local CPU, ignore here. This could be a latent bug.
QEMU nums: 2, Total CPU nums: 8, CPU per QEMU: 4
connect_io_router...
QEMU 1 wait for RDMA connection on 10.10.20.53:40002
RDMA ERROR: Error: could not rdma_bind_addr!
RDMA init fail on 10.10.20.53:40002
RDMA ERROR: Error: could not rdma_bind_addr!
RDMA init fail on 10.10.20.53:40003
QEMU 1 wait for RDMA connection on 10.10.20.53:40003
source_resolve_host RDMA Device opened: kernel name mlx5_0 uverbs device name uverbs0, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs0, infiniband class device path /sys/class/infiniband/mlx5_0, transport: (1) Infiniband
rdma_get_cm_event != EVENT_ESTABLISHED after rdma_connect: No such file or directory
RDMA ERROR: connecting to destination!
RDMA connect to 10.10.20.54:40004 fail, retrying...
source_resolve_host RDMA Device opened: kernel name mlx5_0 uverbs device name uverbs0, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs0, infiniband class device path /sys/class/infiniband/mlx5_0, transport: (1) Infiniband
rdma_get_cm_event != EVENT_ESTABLISHED after rdma_connect: No such file or directory
RDMA ERROR: connecting to destination!
RDMA connect to 10.10.20.54:40004 fail, retrying...
source_resolve_host RDMA Device opened: kernel name mlx5_0 uverbs device name uverbs0, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs0, infiniband class device path /sys/class/infiniband/mlx5_0, transport: (1) Infiniband
dell-3 $ dmesg
[13710.960715] kvm_dsm_init: Enable kvm dsm mode, this kvm instance will be node-1
[13710.960717] kvm_dsm_init: kvm_dsm_init: kvm 1 use RDMA connection
[13710.960834] rdma_bind_addr failed, ret -99
[13803.984311] kvm-dsm: node-1 stopping dsm server
[13803.984319] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[13804.033143] PGD 0 P4D 0
[13804.081451] Oops: 0002 [#1] SMP PTI
[13804.129920] CPU: 6 PID: 2229 Comm: qemu-vcpu/4 Not tainted 4.18.20-gvm1 #1
[13804.179828] Hardware name: Dell Inc. PowerEdge R530/0CN7X8, BIOS 2.7.1 01/26/2018
[13804.231376] RIP: 0010:wait_for_completion+0x8c/0x150
[13804.283866] Code: 00 48 c7 44 24 10 b0 45 eb 86 c7 04 24 01 00 00 00 49 89 54 24 18 48 bd ff ff ff ff ff ff ff 7f 48 89 4c 24 18 48 89 44 24 20 <48> 89 10 eb 05 48 85 ed 74 3d 65 48 8b 04 25 00 5c 01 00 48 89 df
[13804.400531] RSP: 0018:ffffa8164950bbc0 EFLAGS: 00010046
[13804.460394] RAX: 0000000000000000 RBX: ffff96a955673d00 RCX: ffff96a955673d08
[13804.522268] RDX: ffffa8164950bbd8 RSI: 0000000000000246 RDI: ffff96a955673d00
[13804.585257] RBP: 7fffffffffffffff R08: ffff969ec0a9cb00 R09: 0000000000000004
[13804.649471] R10: 0000000000000000 R11: 0000000000000001 R12: ffff96a955673cf8
[13804.714336] R13: ffff96a108db1540 R14: ffffa816494d19a0 R15: ffffa816494d19a0
[13804.779665] FS:  00007fbb8ddf9700(0000) GS:ffff96a15fcc0000(0000) knlGS:0000000000000000
[13804.846461] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[13804.913393] CR2: 0000000000000000 CR3: 000000004780a002 CR4: 00000000001626e0
[13804.981141] Call Trace:
[13805.047998]  ? wake_up_q+0x70/0x70
[13805.114769]  kthread_stop+0x42/0xf0
[13805.180976]  kvm_dsm_free+0xf7/0x150 [kvm]
[13805.247716]  kvm_arch_destroy_vm+0x148/0x1a0 [kvm]
[13805.315126]  kvm_put_kvm+0x146/0x250 [kvm]
[13805.382700]  kvm_vm_release+0x1d/0x30 [kvm]
[13805.450205]  __fput+0xd8/0x210
[13805.518145]  task_work_run+0x8a/0xb0
[13805.586954]  do_exit+0x2e0/0xb30
[13805.656252]  ? get_futex_key+0x2ed/0x3d0
[13805.726392]  do_group_exit+0x3a/0xa0
[13805.796996]  get_signal+0x27a/0x5b0
[13805.867358]  do_signal+0x36/0x6d0
[13805.937525]  ? do_sigtimedwait+0xc6/0x230
[13806.008676]  exit_to_usermode_loop+0x89/0xf0
[13806.080775]  do_syscall_64+0xf3/0x110
[13806.153527]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[13806.226495] RIP: 0033:0x7fbb96324ad3
[13806.298945] Code: Bad RIP value.
[13806.371863] RSP: 002b:00007fbb8ddf8ac0 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
[13806.447905] RAX: fffffffffffffe00 RBX: 0000557f16ffebf0 RCX: 00007fbb96324ad3
[13806.525252] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 0000557f16ffec18
[13806.603414] RBP: 0000557f16ffec14 R08: 0000000000000000 R09: 0000000000000000
[13806.682121] R10: 0000000000000000 R11: 0000000000000246 R12: 0000557f16ffec18
[13806.760447] R13: 0000000000000000 R14: 0000557f15016240 R15: 0000000000000008
[13806.837714] Modules linked in: ib_umad ib_ipoib rpcrdma sunrpc rdma_ucm intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ipmi_ssif irqbypass intel_cstate ipmi_si mei_me ipmi_devintf dcdbas intel_rapl_perf input_leds lpc_ich mei ipmi_msghandler acpi_power_meter mac_hid sch_fq_codel ib_iser rdma_cm configfs iw_cm ib_cm iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ib_uverbs ib_core crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd hid_generic cryptd mlx5_core usbhid mxm_wmi glue_helper ahci mlxfw hid tg3 megaraid_sas devlink
[13807.362582]  libahci wmi
[13807.454423] CR2: 0000000000000000
[13807.546754] ---[ end trace 2029a70eb51272bd ]---
[13807.685127] RIP: 0010:wait_for_completion+0x8c/0x150
[13807.778873] Code: 00 48 c7 44 24 10 b0 45 eb 86 c7 04 24 01 00 00 00 49 89 54 24 18 48 bd ff ff ff ff ff ff ff 7f 48 89 4c 24 18 48 89 44 24 20 <48> 89 10 eb 05 48 85 ed 74 3d 65 48 8b 04 25 00 5c 01 00 48 89 df
[13807.977468] RSP: 0018:ffffa8164950bbc0 EFLAGS: 00010046
[13808.076352] RAX: 0000000000000000 RBX: ffff96a955673d00 RCX: ffff96a955673d08
[13808.177519] RDX: ffffa8164950bbd8 RSI: 0000000000000246 RDI: ffff96a955673d00
[13808.279722] RBP: 7fffffffffffffff R08: ffff969ec0a9cb00 R09: 0000000000000004
[13808.383363] R10: 0000000000000000 R11: 0000000000000001 R12: ffff96a955673cf8
[13808.488500] R13: ffff96a108db1540 R14: ffffa816494d19a0 R15: ffffa816494d19a0
[13808.594944] FS:  00007fbb8ddf9700(0000) GS:ffff96a15fcc0000(0000) knlGS:0000000000000000
[13808.703752] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[13808.812343] CR2: 00007fbb96324aa9 CR3: 000000004780a002 CR4: 00000000001626e0
[13808.920989] Fixing recursive fault but reboot is needed!
dell-4 $ dmesg
[13139.622137] kvm_dsm_init: Enable kvm dsm mode, this kvm instance will be node-0
[13139.622142] kvm_dsm_init: kvm_dsm_init: kvm 0 use RDMA connection
[13139.622249] rdma_bind_addr failed, ret -99
[13485.424318] kvm-dsm: node-0 stopping dsm server
[13485.424328] general protection fault: 0000 [#1] SMP PTI
[13485.490123] CPU: 9 PID: 15591 Comm: qemu-vcpu/1 Not tainted 4.18.20-gvm1 #1
[13485.558142] Hardware name: Dell Inc. PowerEdge R530/0CN7X8, BIOS 2.7.1 01/26/2018
[13485.628262] RIP: 0010:native_queued_spin_lock_slowpath+0x174/0x1c0
[13485.699911] Code: ff 0f 84 e6 fe ff ff e9 1c ff ff ff c1 ee 12 83 e0 03 83 ee 01 48 c1 e0 04 48 63 f6 48 05 00 39 02 00 48 03 04 f5 00 57 96 9d <48> 89 10 8b 42 08 85 c0 75 09 f3 90 8b 42 08 85 c0 74 f7 48 8b 32
[13485.855008] RSP: 0018:ffffac1ee8f3bbb0 EFLAGS: 00010002
[13485.934340] RAX: 655f636970639864 RBX: ffff8f2134f5a5e0 RCX: 0000000000280000
[13486.014489] RDX: ffff8f217fd23900 RSI: 0000000000000d3b RDI: ffff8f2134f5a5e0
[13486.094801] RBP: ffff8f2134f5a5a0 R08: ffff8f19493d4b00 R09: 0000000000000004
[13486.175357] R10: ffff8f18ffb58000 R11: 0000000000000001 R12: ffff8f2134f5a5d8
[13486.257301] R13: ffff8f1956703300 R14: ffffac1ee8f219a0 R15: ffffac1ee8f219a0
[13486.339979] FS:  00007fefe3fff700(0000) GS:ffff8f217fd00000(0000) knlGS:0000000000000000
[13486.423255] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[13486.505578] CR2: 00007fefe8b139e0 CR3: 0000000f19e0a005 CR4: 00000000001626e0
[13486.588065] Call Trace:
[13486.668682]  _raw_spin_lock_irq+0x24/0x27
[13486.748686]  wait_for_completion+0x32/0x150
[13486.828076]  kthread_stop+0x42/0xf0
[13486.906720]  kvm_dsm_free+0xf7/0x150 [kvm]
[13486.985207]  kvm_arch_destroy_vm+0x148/0x1a0 [kvm]
[13487.063958]  kvm_put_kvm+0x146/0x250 [kvm]
[13487.142681]  kvm_vm_release+0x1d/0x30 [kvm]
[13487.221281]  __fput+0xd8/0x210
[13487.299228]  task_work_run+0x8a/0xb0
[13487.376875]  do_exit+0x2e0/0xb30
[13487.454414]  ? get_futex_key+0x2ed/0x3d0
[13487.531897]  do_group_exit+0x3a/0xa0
[13487.608484]  get_signal+0x27a/0x5b0
[13487.685019]  do_signal+0x36/0x6d0
[13487.761178]  ? do_sigtimedwait+0xc6/0x230
[13487.836909]  exit_to_usermode_loop+0x89/0xf0
[13487.912082]  do_syscall_64+0xf3/0x110
[13487.986656]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[13488.061447] RIP: 0033:0x7fefeb028ad3
[13488.135665] Code: Bad RIP value.
[13488.209347] RSP: 002b:00007fefe3ffeac0 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
[13488.284627] RAX: fffffffffffffe00 RBX: 000056402a720310 RCX: 00007fefeb028ad3
[13488.359882] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 000056402a720338
[13488.435598] RBP: 000056402a720334 R08: 0000000000000000 R09: 0000000000000000
[13488.511586] R10: 0000000000000000 R11: 0000000000000246 R12: 000056402a720338
[13488.587710] R13: 0000000000000000 R14: 0000564029e34240 R15: 0000000000000008
[13488.663808] Modules linked in: cpuid ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs rpcrdma sunrpc rdma_ucm ib_umad ib_ipoib intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ipmi_ssif irqbypass intel_cstate mei_me dcdbas intel_rapl_perf input_leds lpc_ich mei ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter mac_hid sch_fq_codel ib_iser rdma_cm configfs iw_cm ib_cm iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ib_uverbs ib_core crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc hid_generic aesni_intel usbhid aes_x86_64 crypto_simd cryptd mlx5_core mxm_wmi
[13489.195165]  glue_helper ahci tg3 hid mlxfw megaraid_sas devlink libahci wmi
[13489.289787] ---[ end trace ecf1afaf252a6817 ]---
[13489.430667] RIP: 0010:native_queued_spin_lock_slowpath+0x174/0x1c0
[13489.527375] Code: ff 0f 84 e6 fe ff ff e9 1c ff ff ff c1 ee 12 83 e0 03 83 ee 01 48 c1 e0 04 48 63 f6 48 05 00 39 02 00 48 03 04 f5 00 57 96 9d <48> 89 10 8b 42 08 85 c0 75 09 f3 90 8b 42 08 85 c0 74 f7 48 8b 32
[13489.731112] RSP: 0018:ffffac1ee8f3bbb0 EFLAGS: 00010002
[13489.832850] RAX: 655f636970639864 RBX: ffff8f2134f5a5e0 RCX: 0000000000280000
[13489.935296] RDX: ffff8f217fd23900 RSI: 0000000000000d3b RDI: ffff8f2134f5a5e0
[13490.039156] RBP: ffff8f2134f5a5a0 R08: ffff8f19493d4b00 R09: 0000000000000004
[13490.144322] R10: ffff8f18ffb58000 R11: 0000000000000001 R12: ffff8f2134f5a5d8
[13490.250821] R13: ffff8f1956703300 R14: ffffac1ee8f219a0 R15: ffffac1ee8f219a0
[13490.358434] FS:  00007fefe3fff700(0000) GS:ffff8f217fd00000(0000) knlGS:0000000000000000
[13490.468331] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[13490.579143] CR2: 00007fefeb028aa9 CR3: 0000000f19e0a005 CR4: 00000000001626e0
[13490.690580] Fixing recursive fault but reboot is needed!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions