Skip to content

Only the first device result aligns with the host’s computation #4

@mengllm

Description

@mengllm

@and-ivanov @benrothen

Hi,

Regardless of whether I use generated.bin or extracted.bin, and whether I use checksum_kernel or checksum_kernel_from_data, the device’s checksum result changes with each execution of cuLaunchKernel. Only the first device result aligns with the host’s computation. Is this phenomenon reasonable? If so, how can verification between the device and host be achieved?

add print msg in code:
int warmup_repeats = warmup ? 10 : 0;
State device_result_temp;
for (int iters = 0; iters < warmup_repeats + repeats; iters++){
Time t1 = timer::now();
CUDA_DRV_CHECK(cuLaunchKernel(checksum_kernel,
/* grid size */ gridSize, 1, 1, /* block size */ blockSize, 1, 1,
/* shared mem */ 0, /* stream */ nullptr, args, 0));
CUDA_DRV_CHECK(cuCtxSynchronize()); // wait kernel to stop
Time t2 = timer::now();
if (iters >= warmup_repeats) {
runtime += seconds(t2 - t1);
}
CUDA_DRV_CHECK(cuMemcpyDtoH(&device_result_temp, device_state, sizeof(State)));
printf("execution %d: device_result_temp.c: %" PRIx32 " \n", iters, device_result_temp.c);

logs:
execution 0: device_result_temp.c: 8442a62d
base address 0x7f0d9c000000
Data pointer alignment is good!
execution 1: device_result_temp.c: 8387062d
base address 0x7f0d9c000000
Data pointer alignment is good!
execution 2: device_result_temp.c: 203062d
base address 0x7f0d9c000000
Data pointer alignment is good!
execution 3: device_result_temp.c: 26ab062d
base address 0x7f0d9c000000
Data pointer alignment is good!
execution 4: device_result_temp.c: f224462d
base address 0x7f0d9c000000
Data pointer alignment is good!
execution 5: device_result_temp.c: 6a21462d
base address 0x7f0d9c000000
Data pointer alignment is good!
execution 6: device_result_temp.c: edec462d
base address 0x7f0d9c000000
Data pointer alignment is good!
execution 7: device_result_temp.c: 22efe62d
base address 0x7f0d9c000000
Data pointer alignment is good!
execution 8: device_result_temp.c: c0d4062d
base address 0x7f0d9c000000
Data pointer alignment is good!
execution 9: device_result_temp.c: 81fa862d
base address 0x7f0d9c000000
Data pointer alignment is good!
execution 10: device_result_temp.c: 7129e62d
checksum Runtime: 1.1418 s
result DtoH took 2.1416e-05 s size:104
GPU clocks: 1609585358
Optimal clocks 684800000
Observed 43 % of peak performance
Computing checksum on host... 100%
Verification on host took 376.599 s
verification FAILED! dev: 7129e62d host: 8442a62d

Meanwhile,regardless of how many times the host computes the checksum,the result remains unchanged:
Computing checksum on host... 100%
host verification result: 8442a62d
Computing checksum on host... 100%
host verification result: 8442a62d
Computing checksum on host... 100%
host verification result: 8442a62d
Computing checksum on host... 100%
host verification result: 8442a62d
Computing checksum on host... 100%
host verification result: 8442a62d
Computing checksum on host... 100%
host verification result: 8442a62d
Computing checksum on host... 100%
host verification result: 8442a62d
Computing checksum on host... 100%
host verification result: 8442a62d
Computing checksum on host... 100%
host verification result: 8442a62d
Computing checksum on host... 100%
host verification result: 8442a62d
Computing checksum on host... 100%
host verification result: 8442a62d

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions