On my environments, Intel Core i7-5930K + GeForce 980 Ti + GeForce 980 on Windows 8.1 is 4 times slower than Intel Core i7-5820K + GeForce Titan X x2 on ubuntu Linux 16.04 LTS.
According to the following link, it caused by CUDA kernel dispatching latency on WDDM.
http://stackoverflow.com/questions/19944429/cuda-performance-penalty-when-running-in-windows
Add codes for explicit flushing of CUDA kernels.