Gpu-enabled cpu runs are slow

If I build `mnist.cu.cpp` with `-DGINN_ENABLE_GPU=0` and run, single epoch takes ~3.5s. If I build it with `-DGINN_ENABLE_GPU=1` and run using the same docker instance (no gpu, falling back to cpu), single epoch takes ~30s.

Could be:
 - Optimizer flags are not properly set / sufficient
   - Although, I verified with verbose build that pxtas and compiler options are set to at least O3, is there anything else missing?
 - nvcc is doing a poor job somehow
   - Maybe test using cuda > 11.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gpu-enabled cpu runs are slow #26

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Gpu-enabled cpu runs are slow #26

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions