## To-Do's: * [ ] Create a `GpuArray` class mirroring `Array` * [ ] Integrate with GPU backends: * [ ] CUDA (for NVIDIA cards) * [ ] OpenCL / ROCm (for AMD) * [ ] (Optional later) Metal for Apple M-series * [ ] GPU memory management abstraction * [x] Port CPU ops to GPU kernels: * [x] Elementwise ops * [x] Reductions (sum, mean, etc.) * [x] Matrix multiplication & dot products * [ ] Auto-select backend (CPU vs GPU) or allow manual selection * [ ] Async GPU execution (streams, queues) * [ ] GPU-CUDA kernel loader system * [ ] Performance benchmarking against CuPy / PyTorch / NumPy * [ ] GPU unit test framework * [ ] GPU error handling and safe fallbacks * [ ] Support for hybrid ops (GPU-to-CPU and vice versa)
To-Do's:
Create a
GpuArrayclass mirroringArrayIntegrate with GPU backends:
GPU memory management abstraction
Port CPU ops to GPU kernels:
Auto-select backend (CPU vs GPU) or allow manual selection
Async GPU execution (streams, queues)
GPU-CUDA kernel loader system
Performance benchmarking against CuPy / PyTorch / NumPy
GPU unit test framework
GPU error handling and safe fallbacks
Support for hybrid ops (GPU-to-CPU and vice versa)