@and-ivanov @benrothen
Hi,
Regardless of whether I use generated.bin or extracted.bin, and whether I use checksum_kernel or checksum_kernel_from_data, the device’s checksum result changes with each execution of cuLaunchKernel. Only the first device result aligns with the host’s computation. Is this phenomenon reasonable? If so, how can verification between the device and host be achieved?
add print msg in code:
int warmup_repeats = warmup ? 10 : 0;
State device_result_temp;
for (int iters = 0; iters < warmup_repeats + repeats; iters++){
Time t1 = timer::now();
CUDA_DRV_CHECK(cuLaunchKernel(checksum_kernel,
/* grid size */ gridSize, 1, 1, /* block size */ blockSize, 1, 1,
/* shared mem */ 0, /* stream */ nullptr, args, 0));
CUDA_DRV_CHECK(cuCtxSynchronize()); // wait kernel to stop
Time t2 = timer::now();
if (iters >= warmup_repeats) {
runtime += seconds(t2 - t1);
}
CUDA_DRV_CHECK(cuMemcpyDtoH(&device_result_temp, device_state, sizeof(State)));
printf("execution %d: device_result_temp.c: %" PRIx32 " \n", iters, device_result_temp.c);
logs:
execution 0: device_result_temp.c: 8442a62d
base address 0x7f0d9c000000
Data pointer alignment is good!
execution 1: device_result_temp.c: 8387062d
base address 0x7f0d9c000000
Data pointer alignment is good!
execution 2: device_result_temp.c: 203062d
base address 0x7f0d9c000000
Data pointer alignment is good!
execution 3: device_result_temp.c: 26ab062d
base address 0x7f0d9c000000
Data pointer alignment is good!
execution 4: device_result_temp.c: f224462d
base address 0x7f0d9c000000
Data pointer alignment is good!
execution 5: device_result_temp.c: 6a21462d
base address 0x7f0d9c000000
Data pointer alignment is good!
execution 6: device_result_temp.c: edec462d
base address 0x7f0d9c000000
Data pointer alignment is good!
execution 7: device_result_temp.c: 22efe62d
base address 0x7f0d9c000000
Data pointer alignment is good!
execution 8: device_result_temp.c: c0d4062d
base address 0x7f0d9c000000
Data pointer alignment is good!
execution 9: device_result_temp.c: 81fa862d
base address 0x7f0d9c000000
Data pointer alignment is good!
execution 10: device_result_temp.c: 7129e62d
checksum Runtime: 1.1418 s
result DtoH took 2.1416e-05 s size:104
GPU clocks: 1609585358
Optimal clocks 684800000
Observed 43 % of peak performance
Computing checksum on host... 100%
Verification on host took 376.599 s
verification FAILED! dev: 7129e62d host: 8442a62d
Meanwhile,regardless of how many times the host computes the checksum,the result remains unchanged:
Computing checksum on host... 100%
host verification result: 8442a62d
Computing checksum on host... 100%
host verification result: 8442a62d
Computing checksum on host... 100%
host verification result: 8442a62d
Computing checksum on host... 100%
host verification result: 8442a62d
Computing checksum on host... 100%
host verification result: 8442a62d
Computing checksum on host... 100%
host verification result: 8442a62d
Computing checksum on host... 100%
host verification result: 8442a62d
Computing checksum on host... 100%
host verification result: 8442a62d
Computing checksum on host... 100%
host verification result: 8442a62d
Computing checksum on host... 100%
host verification result: 8442a62d
Computing checksum on host... 100%
host verification result: 8442a62d
@and-ivanov @benrothen
Hi,
Regardless of whether I use
generated.binorextracted.bin,and whether I usechecksum_kernelorchecksum_kernel_from_data,the device’s checksum result changes with each execution ofcuLaunchKernel.Only the first device result aligns with the host’s computation. Is this phenomenon reasonable? If so, how can verification between the device and host be achieved?add print msg in code:
int warmup_repeats = warmup ? 10 : 0;State device_result_temp;for (int iters = 0; iters < warmup_repeats + repeats; iters++){Time t1 = timer::now();CUDA_DRV_CHECK(cuLaunchKernel(checksum_kernel,/* grid size */ gridSize, 1, 1, /* block size */ blockSize, 1, 1,/* shared mem */ 0, /* stream */ nullptr, args, 0));CUDA_DRV_CHECK(cuCtxSynchronize()); // wait kernel to stopTime t2 = timer::now();if (iters >= warmup_repeats) {runtime += seconds(t2 - t1);}CUDA_DRV_CHECK(cuMemcpyDtoH(&device_result_temp, device_state, sizeof(State)));printf("execution %d: device_result_temp.c: %" PRIx32 " \n", iters, device_result_temp.c);logs:
execution 0: device_result_temp.c: 8442a62d
base address 0x7f0d9c000000
Data pointer alignment is good!
execution 1: device_result_temp.c: 8387062d
base address 0x7f0d9c000000
Data pointer alignment is good!
execution 2: device_result_temp.c: 203062d
base address 0x7f0d9c000000
Data pointer alignment is good!
execution 3: device_result_temp.c: 26ab062d
base address 0x7f0d9c000000
Data pointer alignment is good!
execution 4: device_result_temp.c: f224462d
base address 0x7f0d9c000000
Data pointer alignment is good!
execution 5: device_result_temp.c: 6a21462d
base address 0x7f0d9c000000
Data pointer alignment is good!
execution 6: device_result_temp.c: edec462d
base address 0x7f0d9c000000
Data pointer alignment is good!
execution 7: device_result_temp.c: 22efe62d
base address 0x7f0d9c000000
Data pointer alignment is good!
execution 8: device_result_temp.c: c0d4062d
base address 0x7f0d9c000000
Data pointer alignment is good!
execution 9: device_result_temp.c: 81fa862d
base address 0x7f0d9c000000
Data pointer alignment is good!
execution 10: device_result_temp.c: 7129e62d
checksum Runtime: 1.1418 s
result DtoH took 2.1416e-05 s size:104
GPU clocks: 1609585358
Optimal clocks 684800000
Observed 43 % of peak performance
Computing checksum on host... 100%
Verification on host took 376.599 s
verification FAILED! dev: 7129e62d host: 8442a62d
Meanwhile,regardless of how many times the host computes the checksum,the result remains unchanged:
Computing checksum on host... 100%
host verification result: 8442a62d
Computing checksum on host... 100%
host verification result: 8442a62d
Computing checksum on host... 100%
host verification result: 8442a62d
Computing checksum on host... 100%
host verification result: 8442a62d
Computing checksum on host... 100%
host verification result: 8442a62d
Computing checksum on host... 100%
host verification result: 8442a62d
Computing checksum on host... 100%
host verification result: 8442a62d
Computing checksum on host... 100%
host verification result: 8442a62d
Computing checksum on host... 100%
host verification result: 8442a62d
Computing checksum on host... 100%
host verification result: 8442a62d
Computing checksum on host... 100%
host verification result: 8442a62d