From 159d84788d9e2106271d69760bcdb0880c5aa0a7 Mon Sep 17 00:00:00 2001
From: Shahar Sandhaus <wiishahar@gmail.com>
Date: Mon, 8 Sep 2025 23:03:41 -0700
Subject: [PATCH 1/8] More docs updates

---
 TODOs.txt                                     |  5 +-
 docs/getting_started.rst                      | 86 +++++++++----------
 docs/index.rst                                | 21 +++--
 docs/public_api/buffer_tutorial.rst           | 35 --------
 docs/public_api/index.rst                     | 13 ---
 docs/tutorials/buffer_tutorial.rst            | 70 +++++++++++++++
 docs/tutorials/building_from_source.rst       | 83 ++++++++++++++++++
 .../context_system.rst                        | 20 ++---
 docs/tutorials/index.rst                      | 12 +++
 examples/first_buffer.py                      | 14 ---
 test.py                                       |  4 +-
 11 files changed, 238 insertions(+), 125 deletions(-)
 delete mode 100644 docs/public_api/buffer_tutorial.rst
 delete mode 100644 docs/public_api/index.rst
 create mode 100644 docs/tutorials/buffer_tutorial.rst
 create mode 100644 docs/tutorials/building_from_source.rst
 rename docs/{public_api => tutorials}/context_system.rst (79%)
 create mode 100644 docs/tutorials/index.rst
 delete mode 100644 examples/first_buffer.py

diff --git a/TODOs.txt b/TODOs.txt
index ea8d0fd1..18486845 100644
--- a/TODOs.txt
+++ b/TODOs.txt
@@ -2,4 +2,7 @@ This file is to keep track of all the features that I hope to implement eventual
 
  - Implement dynamic local size through specialization constants
  - proper device ordering to match CUDA
- - CUDA/Vulkan memory bridging
\ No newline at end of file
+ - CUDA/Vulkan memory bridging
+
+# offline webUI
+HF_HUB_OFFLINE=1 open-webui serve
\ No newline at end of file
diff --git a/docs/getting_started.rst b/docs/getting_started.rst
index 56b59ca8..011a4a99 100644
--- a/docs/getting_started.rst
+++ b/docs/getting_started.rst
@@ -1,4 +1,4 @@
-Getting Started with vkdispatch
+Getting Started/Installation
 ===============================
 
 Welcome to vkdispatch! This guide will help you install the library and run your first GPU-accelerated code.
@@ -7,22 +7,27 @@ Welcome to vkdispatch! This guide will help you install the library and run your
    vkdispatch requires a Vulkan-compatible GPU and drivers installed on your system.
    Please ensure your system meets these requirements before proceeding.
 
-Installation
-------------
+PyPI
+----
 
-You can install `vkdispatch` directly from PyPI using `pip`. We recommend
-using a `virtual environment`_ for your projects.
+The default installation method for `vkdispatch` is through PyPI (pip):
 
 .. code-block:: bash
 
-   # Create a virtual environment (optional, but recommended)
-   python -m venv venv
-   source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
-
-   # Install vkdispatch
+   # Install the package
    pip install vkdispatch
 
-.. _virtual environment: https://docs.python.org/3/library/venv.html
+On mainstream platforms — Windows (x86_64), macOS (x86_64 and Apple Silicon/arm64),
+and Linux (x86_64) — pip will download a **prebuilt wheel** (built with `cibuildwheel`
+on GitHub Actions and tagged as *manylinux* where applicable), so no compiler is needed.
+
+On less common platforms (e.g., non-Apple ARM or other niche architectures), pip may
+fall back to a **source build**, which takes a few minutes. See :doc:`Building From Source<tutorials/building_from_source>`
+for toolchain requirements and developer-oriented instructions.
+
+.. note::
+   If you see output like ``Building wheel for vkdispatch (pyproject.toml)``,
+   you’re compiling from source.
 
 Verifying Your Installation
 ---------------------------
@@ -31,56 +36,47 @@ To ensure `vkdispatch` is installed correctly and can detect your GPU,
 run this simple Python script:
 
 .. code-block:: bash
+   
    # Run the example script to verify installation
    vdlist
 
    # If the above command fails, you can try this alternative
    python3 -m vkdispatch
 
-If the installation was successful, you should see output listing your GPU(s).
-
-Your First GPU Buffer
----------------------
+If the installation was successful, you should see output listing your GPU(s) which may look something like this:
 
-Let's create a simple GPU buffer and fill it with data.
+.. code-block:: text
 
-.. literalinclude:: ../examples/first_buffer.py
-   :language: python
-   :linenos:
-   :caption: examples/first_buffer.py
+   Device 0: Apple M2 Pro
+        Vulkan Version: 1.2.283
+        Device Type: Integrated GPU
 
-.. raw:: html
+        Features:
+                Float32 Atomic Add: True
 
-.. code-block:: text
+        Properties:
+                64-bit Float Support: False
+                16-bit Float Support: True
+                64-bit Int Support: True
+                16-bit Int Support: True
+                Max Push Constant Size: 4096 bytes
+                Subgroup Size: 32
+                Max Compute Shared Memory Size: 32768
 
-      # Expected Output:
-      # Original CPU data: [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15]
-      # Data downloaded from GPU: [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15]
-   
+        Queues:
+                0 (count=1, flags=0x7): Graphics | Compute
+                1 (count=1, flags=0x7): Graphics | Compute
+                2 (count=1, flags=0x7): Graphics | Compute
+                3 (count=1, flags=0x7): Graphics | Compute
 
-.. admonition:: What's happening here?
-   :class: tip
 
-   1.  We import `vkdispatch` and `numpy` (a common dependency for numerical data).
-   2.  A `BufferBuilder` is used to define the characteristics of our GPU buffer (size, usage).
-   3.  `buffer.upload()` transfers data from your CPU's memory to the GPU.
-   4.  `buffer.download()` retrieves data back from the GPU to the CPU.
-   5.  Error checking is crucial in GPU programming, so `check_for_errors()` ensures operations completed successfully.
 
 Next Steps
 ----------
 
-Now that you've got `vkdispatch` up and running, consider exploring:
-
-*   **Public API Reference:** Our curated guide to the most commonly used classes and functions.
-*   **Full Python API Reference:** A comprehensive list of all Python-facing components.
-*   **C++/Cython API Reference:** Dive deep into the backend details.
-
-Happy GPU programming!
+Now that you've got `vkdispatch` up and running, consider exploring the following:
 
-.. seealso::
+*   :doc:`Tutorials<tutorials/index>`: Our curated guide to the most commonly used classes and functions.
+*   :doc:`Full Python API Reference<python_api>`: A comprehensive list of all Python-facing components.
 
-   :doc:`public_api/index`
-      Start here for a guided tour of core features.
-   :doc:`buffer_management`
-      Detailed information on working with GPU buffers.
\ No newline at end of file
+Happy GPU programming!
\ No newline at end of file
diff --git a/docs/index.rst b/docs/index.rst
index 7b17085a..13302d57 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -6,20 +6,29 @@
 Welcome to vkdispatch's documentation!
 ======================================
 
+**[WARNING: This site is still under heavy development, and has many missing sections]**
+
+Welcome to the vkdispatch documentation website!
+
+To learn how to install vkdispatch, go to the :doc:`Getting Started<getting_started>` Section.
+
+Additionally, below are a set of tutorials on vkdispatch usage and a full API reference.
+
 .. toctree::
-   :maxdepth: 2
-   :caption: User Guide:
+   :hidden:
 
    getting_started
-   public_api/index
 
 .. toctree::
    :maxdepth: 2
-   :caption: Full Internal Documentation:
 
-   internal_api
+   Tutorials <tutorials/index>
+
+.. toctree::
+   :maxdepth: 2
+
+   Full Internal Documentation <internal_api>
 
-.. include:: ../README.md
 
 
 Indices and tables
diff --git a/docs/public_api/buffer_tutorial.rst b/docs/public_api/buffer_tutorial.rst
deleted file mode 100644
index 2ec10716..00000000
--- a/docs/public_api/buffer_tutorial.rst
+++ /dev/null
@@ -1,35 +0,0 @@
-Buffer Tutorial
-=================
-
-The Buffer system is the heart of vkdispatch. All GPU memory operations
-go through Buffer objects.
-
-.. note::
-   Always use BufferBuilder to create buffers - direct Buffer construction
-   is not supported.
-
-Buffer Class
-------------
-
-.. autoclass:: vkdispatch.Buffer
-   :members: __init__, _destroy, write, read
-   :show-inheritance:
-
-   **Location:** vkdispatch.base.Buffer
-
-   **Example Usage:**
-   
-   .. code-block:: python
-   
-      buffer = vd.Buffer((1000000,), vd.float32)
-      buffer.write(my_data)
-      result = buffer.read()
-
-
-
-Buffer Builder
---------------
-
-.. autoclass:: vkdispatch.BufferBuilder
-   :members:
-   :show-inheritance:
\ No newline at end of file
diff --git a/docs/public_api/index.rst b/docs/public_api/index.rst
deleted file mode 100644
index 96392574..00000000
--- a/docs/public_api/index.rst
+++ /dev/null
@@ -1,13 +0,0 @@
-Public API Reference
-====================
-
-The essential API for most users.
-
-.. toctree::
-   :maxdepth: 2
-
-   context_system
-   buffer_tutorial
-..   compute_operations
-   shader_system
-   utilities
\ No newline at end of file
diff --git a/docs/tutorials/buffer_tutorial.rst b/docs/tutorials/buffer_tutorial.rst
new file mode 100644
index 00000000..6dec2a24
--- /dev/null
+++ b/docs/tutorials/buffer_tutorial.rst
@@ -0,0 +1,70 @@
+Buffer Creation and Usage
+=================
+
+The Buffer system is the heart of vkdispatch. All GPU memory operations
+go through Buffer objects.
+
+.. note::
+   Always use BufferBuilder to create buffers - direct Buffer construction
+   is not supported.
+
+Buffer Class
+------------
+
+.. autoclass:: vkdispatch.Buffer
+   :members: __init__, _destroy, write, read
+   :show-inheritance:
+
+   **Location:** vkdispatch.base.Buffer
+
+   **Example Usage:**
+   
+   .. code-block:: python
+   
+      buffer = vd.Buffer((1000000,), vd.float32)
+      buffer.write(my_data)
+      result = buffer.read()
+
+
+Your First GPU Buffer
+---------------------
+
+
+
+.. code-block:: python
+   
+   import vkdispatch as vd
+   import numpy as np
+
+   # Create a simple numpy array
+   cpu_data = np.arange(16, dtype=np.int32)
+   print(f"Original CPU data: {cpu_data}")
+
+   # Create a GPU buffer
+   gpu_buffer = vd.asbuffer(cpu_data)
+
+   # Read data back from GPU to CPU to verify
+   downloaded_data = gpu_buffer.read(0)
+   print(f"Data downloaded from GPU: {downloaded_data.flatten()}")
+
+   # Expected Output:
+   # Original CPU data: [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15]
+   # Data downloaded from GPU: [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15]
+
+
+.. admonition:: What's happening here?
+   :class: tip
+
+   1.  We import `vkdispatch` and `numpy` (a common dependency for numerical data).
+   2.  A `BufferBuilder` is used to define the characteristics of our GPU buffer (size, usage).
+   3.  `buffer.upload()` transfers data from your CPU's memory to the GPU.
+   4.  `buffer.download()` retrieves data back from the GPU to the CPU.
+   5.  Error checking is crucial in GPU programming, so `check_for_errors()` ensures operations completed successfully.
+
+
+Buffer Builder
+--------------
+
+.. autoclass:: vkdispatch.BufferBuilder
+   :members:
+   :show-inheritance:
\ No newline at end of file
diff --git a/docs/tutorials/building_from_source.rst b/docs/tutorials/building_from_source.rst
new file mode 100644
index 00000000..b1b561e7
--- /dev/null
+++ b/docs/tutorials/building_from_source.rst
@@ -0,0 +1,83 @@
+Building from Source
+====================
+
+This page is for contributors and power users who want to **clone the repository and
+modify vkdispatch**, or for platforms where a prebuilt wheel is not available and pip falls
+back to a source build.
+
+Who should use this?
+--------------------
+- You plan to edit vkdispatch and need an **editable/development install**.
+- You’re on a **non-standard architecture** (e.g., non-Apple ARM/aarch64 or niche OS),
+  where pip cannot find a prebuilt wheel.
+- You want to **rebuild a wheel locally** for testing or distribution.
+
+Prerequisites
+-------------
+Most builds succeed with just a modern compiler and Python. For clarity:
+
+- **Compilation Requirements**:
+  - A **C++17-capable compiler**
+    - Linux: GCC ≥ 9 or Clang ≥ 10
+    - macOS: Xcode Command Line Tools (``xcode-select --install``)
+    - Windows: Microsoft C++ Build Tools or Visual Studio 2019+ (x64)
+  - **Python development headers** 
+
+    .. code-block:: bash
+
+        # Debian/Ubuntu
+        sudo apt-get update
+        sudo apt-get install -y build-essential python3-dev
+
+        # Fedora/RHEL
+        sudo dnf groupinstall -y "Development Tools"
+        sudo dnf install -y python3-devel
+
+Quick start (clone → editable install)
+--------------------------------------
+Use an editable install to iterate on the code without reinstalling each change.
+
+.. code-block:: bash
+
+   # 1) Clone your fork or the upstream repo (replace with your URL)
+   git clone https://github.com/sharhar/vkdispatch.git
+   cd vkdispatch
+
+   # Download source code of dependencies
+   python fetch_dependencies.py
+
+   # 3) Create/activate a clean environment (recommended)
+   python -m venv .venv && . .venv/bin/activate   # on macOS/Linux
+   # .venv\Scripts\activate                       # on Windows (PowerShell/CMD)
+
+   # 4) Install in editable mode
+   pip install -e .
+
+Build a wheel locally (optional)
+--------------------------------
+If you prefer a built artifact (e.g., CI, packaging, testing import behavior):
+
+.. code-block:: bash
+
+   # Build a wheel into ./dist
+   pip wheel . -w dist
+
+   # Or using the 'build' frontend (creates sdist + wheel under ./dist)
+   python -m build
+
+Troubleshooting
+---------------
+- **error: Python.h: No such file or directory**  
+  Install your distro’s Python headers (``python3-dev`` / ``python3-devel``).
+
+- **error: Missing header**
+  Fetch the source dependencies by calling ``python3 fetch_dependencies.py``.
+
+Clean rebuild tips
+------------------
+.. code-block:: bash
+
+   # Remove previous builds/artifacts and reinstall verbosely
+   pip uninstall -y vkdispatch
+   rm -rf build/ dist/ *.egg-info
+   pip install -e . -v
diff --git a/docs/public_api/context_system.rst b/docs/tutorials/context_system.rst
similarity index 79%
rename from docs/public_api/context_system.rst
rename to docs/tutorials/context_system.rst
index 248c3416..59f57be3 100644
--- a/docs/public_api/context_system.rst
+++ b/docs/tutorials/context_system.rst
@@ -23,19 +23,19 @@ to a vkdispatch API, a few examples are provided below:
 
 .. code-block:: python
 
-    import vkdispatch as vd
+   import vkdispatch as vd
 
-    # Enables debug mode, which allows for printing from shaders
-    vd.initialize(debug_mode=True)
+   # Enables debug mode, which allows for printing from shaders
+   vd.initialize(debug_mode=True)
 
-    # Sets the environment variable `VK_LOADER_DEBUG` to 'all'. 
-    # This enables debug log outputs for the vulkan loader, which
-    # can be useful for debugging driver loading issues 
-    vd.initialize(loader_debug_logs=True)
+   # Sets the environment variable `VK_LOADER_DEBUG` to 'all'. 
+   # This enables debug log outputs for the vulkan loader, which
+   # can be useful for debugging driver loading issues 
+   vd.initialize(loader_debug_logs=True)
 
-    # Sets the default logging level to INFO, which enables detailed printouts
-    # of internal vkdispatch operations, useful for debugging internal issues.
-    vd.initialize(log_level=vd.LogLevel.INFO)
+   # Sets the default logging level to INFO, which enables detailed printouts
+   # of internal vkdispatch operations, useful for debugging internal issues.
+   vd.initialize(log_level=vd.LogLevel.INFO)
 
 .. note::
    The debug_mode flag enables the `VK_EXT_debug_utils <https://registry.khronos.org/vulkan/specs/latest/man/html/VK_EXT_debug_utils.html>`_ vulkan extension and singals the creation of a `VkDebugUtilsMessengerEXT <https://registry.khronos.org/vulkan/specs/latest/man/html/VkDebugUtilsMessengerEXT.html>`_ object. This allows for printing from shaders, but also significantly reduces performance by introducing runtime debugging tools. Therefore, it is recommended this option remain off unless needed for in shader debugging.
diff --git a/docs/tutorials/index.rst b/docs/tutorials/index.rst
new file mode 100644
index 00000000..393b7523
--- /dev/null
+++ b/docs/tutorials/index.rst
@@ -0,0 +1,12 @@
+Vkdispatch tutorials
+====================
+
+A collection of tutorials covering how to use and modify the vkdispatch library.
+
+.. toctree::
+   :maxdepth: 2
+
+   context_system
+   buffer_tutorial
+
+   building_from_source
\ No newline at end of file
diff --git a/examples/first_buffer.py b/examples/first_buffer.py
deleted file mode 100644
index f19d8919..00000000
--- a/examples/first_buffer.py
+++ /dev/null
@@ -1,14 +0,0 @@
-import vkdispatch as vd
-import numpy as np
-
-# Create a simple numpy array
-cpu_data = np.arange(16, dtype=np.int32)
-print(f"Original CPU data: {cpu_data}")
-
-# Create a GPU buffer
-gpu_buffer = vd.asbuffer(cpu_data)
-
-# Read data back from GPU to CPU to verify
-downloaded_data = gpu_buffer.read(0)
-print(f"Data downloaded from GPU: {downloaded_data.flatten()}")
-
diff --git a/test.py b/test.py
index 8a7997d5..1ceada87 100644
--- a/test.py
+++ b/test.py
@@ -187,4 +187,6 @@ def do_numpy_convolution(buffer: np.ndarray, kernel: np.ndarray, signal_shape) -
 assert np.allclose(buff.read(0), numpy_result, atol=1e-5)
 
 #vd.fft.fft(buff, axis=0, print_shader=True)
-#vd.vkfft.fft(buff, axis=0, print_shader=True)
\ No newline at end of file
+#vd.vkfft.fft(buff, axis=0, print_shader=True)
+
+

From c83cf284923ea916e8b6dc692d8d0066841a63f2 Mon Sep 17 00:00:00 2001
From: Shahar Sandhaus <wiishahar@gmail.com>
Date: Mon, 8 Sep 2025 23:07:04 -0700
Subject: [PATCH 2/8] Adding workflow

---
 .github/workflows/deploy_docs.yml | 65 +++++++++++++++++++++++++++++++
 1 file changed, 65 insertions(+)
 create mode 100644 .github/workflows/deploy_docs.yml

diff --git a/.github/workflows/deploy_docs.yml b/.github/workflows/deploy_docs.yml
new file mode 100644
index 00000000..3b62cb20
--- /dev/null
+++ b/.github/workflows/deploy_docs.yml
@@ -0,0 +1,65 @@
+name: Deploy Sphinx docs to GitHub Pages
+
+on:
+  push:
+    branches: [ docs ]
+    paths:
+      - 'docs/**'
+      - '**.py'
+      - 'pyproject.toml'
+      - 'setup.py'
+      - 'README*'
+  workflow_dispatch: {}   # Manual “Run workflow” button
+
+permissions:
+  contents: read
+  pages: write
+  id-token: write
+
+concurrency:
+  group: 'pages'
+  cancel-in-progress: true
+
+jobs:
+  build:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+
+      - uses: actions/setup-python@v5
+        with:
+          python-version: '3.x'
+          cache: 'pip'
+          cache-dependency-path: 'docs/requirements.txt'
+
+      - name: Install build dependencies
+        run: |
+          python -m pip install --upgrade pip
+          if [ -f docs/requirements.txt ]; then pip install -r docs/requirements.txt; fi
+          # If your docs import your package (autodoc), uncomment:
+          # pip install -e .
+
+      - name: Build Sphinx HTML
+        working-directory: docs
+        run: |
+          make clean html
+
+      - name: Add .nojekyll (avoid Jekyll processing on GitHub Pages)
+        run: |
+          touch docs/_build/html/.nojekyll
+
+      - name: Upload Pages artifact
+        uses: actions/upload-pages-artifact@v3
+        with:
+          path: docs/_build/html
+
+  deploy:
+    needs: build
+    runs-on: ubuntu-latest
+    environment:
+      name: github-pages
+      url: ${{ steps.deployment.outputs.page_url }}
+    steps:
+      - name: Deploy to GitHub Pages
+        id: deployment
+        uses: actions/deploy-pages@v4

From 5293cf1ed0de87b97a704cbda28829cd26663cd3 Mon Sep 17 00:00:00 2001
From: Shahar Sandhaus <wiishahar@gmail.com>
Date: Mon, 8 Sep 2025 23:09:22 -0700
Subject: [PATCH 3/8] Adding docs reqs

---
 docs/requirements.txt | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 docs/requirements.txt

diff --git a/docs/requirements.txt b/docs/requirements.txt
new file mode 100644
index 00000000..c3f492b7
--- /dev/null
+++ b/docs/requirements.txt
@@ -0,0 +1 @@
+sphinx>=7
\ No newline at end of file

From c89e4871b3c2b0cbc34e5efe91bdc0ff7e4c5181 Mon Sep 17 00:00:00 2001
From: Shahar Sandhaus <wiishahar@gmail.com>
Date: Mon, 8 Sep 2025 23:13:20 -0700
Subject: [PATCH 4/8] Installing docs reqs

---
 .github/workflows/deploy_docs.yml | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/.github/workflows/deploy_docs.yml b/.github/workflows/deploy_docs.yml
index 3b62cb20..07692c0f 100644
--- a/.github/workflows/deploy_docs.yml
+++ b/.github/workflows/deploy_docs.yml
@@ -34,10 +34,21 @@ jobs:
 
       - name: Install build dependencies
         run: |
+          sudo apt-get update
+          # Install system packages required for exhale/breathe
+          sudo apt-get install -y doxygen graphviz
+
           python -m pip install --upgrade pip
-          if [ -f docs/requirements.txt ]; then pip install -r docs/requirements.txt; fi
-          # If your docs import your package (autodoc), uncomment:
-          # pip install -e .
+          # Always install sphinx and required extensions
+          python -m pip install \
+            "sphinx>=7,<9" \
+            sphinx-rtd-theme \
+            myst-parser \
+            breathe \
+            exhale
+
+          # If your docs import your package (autodoc), also install it:
+          # python -m pip install -e .
 
       - name: Build Sphinx HTML
         working-directory: docs

From 3528fe08e07ce317c6b6e773bf6edd96d7cdf8b4 Mon Sep 17 00:00:00 2001
From: Shahar Sandhaus <wiishahar@gmail.com>
Date: Mon, 8 Sep 2025 23:14:52 -0700
Subject: [PATCH 5/8] removed file

---
 docs/requirements.txt | 1 -
 1 file changed, 1 deletion(-)
 delete mode 100644 docs/requirements.txt

diff --git a/docs/requirements.txt b/docs/requirements.txt
deleted file mode 100644
index c3f492b7..00000000
--- a/docs/requirements.txt
+++ /dev/null
@@ -1 +0,0 @@
-sphinx>=7
\ No newline at end of file

From e285c03186778d939a059c3985d8ddd296820095 Mon Sep 17 00:00:00 2001
From: Shahar Sandhaus <wiishahar@gmail.com>
Date: Mon, 8 Sep 2025 23:20:23 -0700
Subject: [PATCH 6/8] Removing doxygen for now

---
 .github/workflows/deploy_docs.yml |   5 +-
 TODOs.txt                         |   8 --
 docs/conf.py                      |  40 +++----
 docs/cpp_api.rst                  |   6 +-
 test.py                           | 192 ------------------------------
 5 files changed, 24 insertions(+), 227 deletions(-)
 delete mode 100644 TODOs.txt
 delete mode 100644 test.py

diff --git a/.github/workflows/deploy_docs.yml b/.github/workflows/deploy_docs.yml
index 07692c0f..dccde4a0 100644
--- a/.github/workflows/deploy_docs.yml
+++ b/.github/workflows/deploy_docs.yml
@@ -42,10 +42,7 @@ jobs:
           # Always install sphinx and required extensions
           python -m pip install \
             "sphinx>=7,<9" \
-            sphinx-rtd-theme \
-            myst-parser \
-            breathe \
-            exhale
+            sphinx-rtd-theme
 
           # If your docs import your package (autodoc), also install it:
           # python -m pip install -e .
diff --git a/TODOs.txt b/TODOs.txt
deleted file mode 100644
index 18486845..00000000
--- a/TODOs.txt
+++ /dev/null
@@ -1,8 +0,0 @@
-This file is to keep track of all the features that I hope to implement eventually:
-
- - Implement dynamic local size through specialization constants
- - proper device ordering to match CUDA
- - CUDA/Vulkan memory bridging
-
-# offline webUI
-HF_HUB_OFFLINE=1 open-webui serve
\ No newline at end of file
diff --git a/docs/conf.py b/docs/conf.py
index c80ad607..0bff39f5 100644
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -21,31 +21,31 @@
 extensions = [
     'sphinx.ext.autodoc',  # For Python docstrings
     'sphinx.ext.napoleon', # To support Google/Numpy style docstrings
-    'breathe',             # The Doxygen-Sphinx bridge
-    'exhale',              # The automation layer for Breathe
-    'myst_parser',         # For Markdown support
+#    'breathe',             # The Doxygen-Sphinx bridge
+#    'exhale',              # The automation layer for Breathe
+#    'myst_parser',         # For Markdown support
 ]
-breathe_projects = {
-    "vkdispatch": "../build/doxyoutput/xml" # Path to Doxygen's XML output
-}
-breathe_default_project = "vkdispatch"
-
-exhale_args = {
-    # These arguments are required
-    "containmentFolder":     "./api",
-    "rootFileName":          "library_root.rst",
-    "doxygenStripFromPath":  "..",
-    # Suggested optional arguments
-    "createTreeView":        True,
-    "exhaleExecutesDoxygen": True,
-    "exhaleDoxygenStdin":    "INPUT = ../vkdispatch_native2", # Path to your C++ source
-}
+# breathe_projects = {
+#     "vkdispatch": "../build/doxyoutput/xml" # Path to Doxygen's XML output
+# }
+#breathe_default_project = "vkdispatch"
+
+# exhale_args = {
+#     # These arguments are required
+#     "containmentFolder":     "./api",
+#     "rootFileName":          "library_root.rst",
+#     "doxygenStripFromPath":  "..",
+#     # Suggested optional arguments
+#     "createTreeView":        True,
+#     "exhaleExecutesDoxygen": True,
+#     "exhaleDoxygenStdin":    "INPUT = ../vkdispatch_native2", # Path to your C++ source
+# }
 
 # Tell sphinx what the primary language being documented is.
-primary_domain = 'cpp'
+# primary_domain = 'cpp'
 
 # Tell sphinx what the pygments highlight language should be.
-highlight_language = 'cpp'
+# highlight_language = 'cpp'
 
 templates_path = ['_templates']
 exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
diff --git a/docs/cpp_api.rst b/docs/cpp_api.rst
index 3721290d..eceb40c5 100644
--- a/docs/cpp_api.rst
+++ b/docs/cpp_api.rst
@@ -1,7 +1,7 @@
 C++/Cython API Reference
 ========================
 
-.. toctree::
-   :maxdepth: 3
+.. .. toctree
+..    :maxdepth: 3
 
-   api/library_root
\ No newline at end of file
+..    api/library_root
\ No newline at end of file
diff --git a/test.py b/test.py
deleted file mode 100644
index 1ceada87..00000000
--- a/test.py
+++ /dev/null
@@ -1,192 +0,0 @@
-import vkdispatch as vd
-import vkdispatch.codegen as vc
-import tqdm
-import numpy as np
-from matplotlib import pyplot as plt
-
-
-def transpose_kernel(
-        correlation_buffer: vd.Buffer,
-        image_dft_buffer: vd.Buffer,
-        image_dft_buffer_transposed: vd.Buffer):
-    
-    @vd.map_registers([vc.c64])
-    def kernel_mapping(
-        kernel_buffer: vc.Buffer[vc.c64],
-        kernel_transposed_buffer: vc.Buffer[vc.c64]):
-
-        read_register = vc.mapping_registers()[1]
-
-        # We skip batches other than the first one, since we only have one kernel
-        vc.if_statement(
-            vc.mapping_index() >= correlation_buffer.shape[1] * correlation_buffer.shape[2]
-        )
-        vc.return_statement()
-        vc.end()
-
-        # Calculate the invocation within this FFT batch
-        in_group_index = vc.local_invocation().y * vc.workgroup_size().x + vc.local_invocation().x
-        out_group_index = vc.workgroup().y * vc.num_workgroups().x + vc.workgroup().x
-        workgroup_index = in_group_index + out_group_index * (
-            vc.workgroup_size().x * vc.workgroup_size().y
-        )
-
-        # Calculate the batch index of the FFT
-        batch_index = vc.mapping_index() / (
-            vc.workgroup_size().x * vc.workgroup_size().y *
-            vc.num_workgroups().x * vc.num_workgroups().y
-        )
-
-        # Calculate the transposed index
-        transposed_index = workgroup_index + batch_index * (
-            vc.workgroup_size().x * vc.workgroup_size().y *
-            vc.num_workgroups().x * vc.num_workgroups().y
-        )
-
-        read_register[:] = kernel_buffer[vc.mapping_index()]
-        kernel_transposed_buffer[transposed_index] = read_register
-    
-    vd.fft.convolve(
-        correlation_buffer,
-        image_dft_buffer,
-        image_dft_buffer_transposed,
-        kernel_map=kernel_mapping,
-        axis=1
-    )
-
-def padded_cross_correlation(
-        buffer: vd.Buffer,
-        kernel: vd.Buffer,
-        signal_shape: tuple,
-        graph: vd.CommandGraph):
-
-
-    # Fill input buffer with zeros where needed
-    @vd.map_registers([vc.c64])
-    def initial_input_mapping(input_buffer: vc.Buffer[vc.c64]):
-        vc.if_statement(vc.mapping_index() % buffer.shape[2] < signal_shape[1])
-
-        in_layer_index = vc.mapping_index() % (signal_shape[1] * buffer.shape[2])
-        out_layer_index = vc.mapping_index() / (signal_shape[1] * buffer.shape[2])
-        actual_index = in_layer_index + out_layer_index * (buffer.shape[1] * buffer.shape[2])
-
-        vc.mapping_registers()[0][:] = input_buffer[actual_index]
-        vc.else_statement()
-        vc.mapping_registers()[0][:] = "vec2(0)"
-        vc.end()
-
-    # Remap output indicies to match the actual buffer shape
-    @vd.map_registers([vc.c64])
-    def initial_output_mapping(output_buffer: vc.Buffer[vc.c64]):
-        in_layer_index = vc.mapping_index() % (signal_shape[1] * buffer.shape[2])
-        out_layer_index = vc.mapping_index() / (signal_shape[1] * buffer.shape[2])
-        actual_index = in_layer_index + out_layer_index * (buffer.shape[1] * buffer.shape[2])
-        output_buffer[actual_index] = vc.mapping_registers()[0]
-
-    # Do the first FFT on the correlation buffer accross the first axis
-    vd.fft.fft(
-        buffer,
-        buffer,
-        buffer_shape=(
-            buffer.shape[0],
-            signal_shape[1],
-            buffer.shape[2]
-        ),
-        input_map=initial_input_mapping,
-        output_map=initial_output_mapping,
-        graph=graph
-    )
-
-    @vd.map_registers([vc.c64])
-    def kernel_mapping(kernel_buffer: vc.Buffer[vc.c64]):
-        img_val = vc.mapping_registers()[0]
-        read_register = vc.mapping_registers()[1]
-
-        in_group_index = vc.local_invocation().y * vc.workgroup_size().x + vc.local_invocation().x
-        out_group_index = vc.workgroup().y * vc.num_workgroups().x + vc.workgroup().x
-        workgroup_index = in_group_index + out_group_index * (
-            vc.workgroup_size().x * vc.workgroup_size().y
-        )
-
-        batch_index = (
-            vc.mapping_index() % (kernel.shape[0] * kernel.shape[1])
-        ) / (
-            vc.workgroup_size().x * vc.workgroup_size().y *
-            vc.num_workgroups().x * vc.num_workgroups().y
-        )
-
-        transposed_index = workgroup_index + batch_index * (
-            vc.workgroup_size().x * vc.workgroup_size().y *
-            vc.num_workgroups().x * vc.num_workgroups().y
-        )
-
-        read_register[:] = kernel_buffer[transposed_index]
-        img_val[:] = vc.mult_conj_c64(read_register, img_val)
-
-    @vd.map_registers([vc.c64])
-    def input_mapping(input_buffer: vc.Buffer[vc.c64]):
-        in_layer_index = vc.mapping_index() % (
-            buffer.shape[1] * buffer.shape[2]
-        )
-
-        vc.if_statement(in_layer_index / buffer.shape[2] < signal_shape[1])
-        vc.mapping_registers()[0][:] = input_buffer[vc.mapping_index()]
-        vc.else_statement()
-        vc.mapping_registers()[0][:] = "vec2(0)"
-        vc.end()
-
-    vd.fft.convolve(
-        buffer,
-        buffer,
-        kernel,
-        #kernel_map=kernel_mapping,
-        input_map=input_mapping,
-        axis=1,
-        graph=graph
-    )
-
-    vd.fft.ifft(buffer, graph=graph)
-
-def do_numpy_convolution(buffer: np.ndarray, kernel: np.ndarray, signal_shape) -> np.ndarray:
-    print(buffer.shape, kernel.shape, signal_shape)
-
-    padded_buffer = np.zeros((buffer.shape[0], buffer.shape[1], buffer.shape[2]), dtype=np.complex64)
-    padded_buffer[:, :signal_shape[1], :signal_shape[2]] = buffer[:, :signal_shape[1], :signal_shape[2]]
-
-    f_buffer = np.fft.fft2(padded_buffer, axes=(-2, -1))
-    convolved = np.fft.ifft2(f_buffer * np.conj(kernel), axes=(-2, -1))
-
-    return convolved
-
-data = np.random.rand(1, 64, 64).astype(np.complex64)
-kernel_data = np.random.rand(1, 64, 64).astype(np.complex64)
-
-buff = vd.asbuffer(data)
-kernel = vd.asbuffer(kernel_data)
-
-kernel_transposed = vd.asbuffer(np.zeros_like(kernel_data))
-
-transpose_kernel(buff, kernel, kernel_transposed)
-
-graph = vd.CommandGraph()
-padded_cross_correlation(buff, kernel_transposed, (1, 16, 16), graph)
-graph.submit()
-
-numpy_result = do_numpy_convolution(data, kernel_data, (1, 16, 16))
-
-plt.imshow(np.abs(buff.read(0)[0]), cmap='gray')
-plt.title('Vkdispatch Result')
-plt.colorbar()
-plt.savefig('vkdispatch_result.png')
-
-plt.imshow(np.abs(numpy_result[0]), cmap='gray')
-plt.title('Numpy Result')
-plt.colorbar()
-plt.savefig('numpy_result.png')
-
-assert np.allclose(buff.read(0), numpy_result, atol=1e-5)
-
-#vd.fft.fft(buff, axis=0, print_shader=True)
-#vd.vkfft.fft(buff, axis=0, print_shader=True)
-
-

From 0931bfa53ff51252ffb951638749c7e431366bef Mon Sep 17 00:00:00 2001
From: Shahar Sandhaus <wiishahar@gmail.com>
Date: Mon, 8 Sep 2025 23:31:11 -0700
Subject: [PATCH 7/8] readme update

---
 README.md | 68 ++++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 55 insertions(+), 13 deletions(-)

diff --git a/README.md b/README.md
index 89b4ac45..d8f77cdf 100644
--- a/README.md
+++ b/README.md
@@ -1,24 +1,66 @@
-# vkdispatch
-A Python module for orchestrating and dispatching large computations across multi-GPU systems using Vulkan.
+# Getting Started / Installation
 
+Welcome to **vkdispatch**! This guide will help you install the library and run your first GPU-accelerated code.
 
-## Instillation
+> **Note:** vkdispatch requires a Vulkan-compatible GPU and drivers installed on your system. Please ensure your system meets these requirements before proceeding.
 
-The vkdispatch package can be installed via Pypi using
+## PyPI
 
-```
+The default installation method for `vkdispatch` is through PyPI (pip):
+
+```bash
+# Install the package
 pip install vkdispatch
 ```
 
-### Local instillation
+On mainstream platforms — Windows (x86_64), macOS (x86_64 and Apple Silicon/arm64), and Linux (x86_64) — pip will download a **prebuilt wheel** (built with `cibuildwheel` on GitHub Actions and tagged as *manylinux* where applicable), so no compiler is needed.
+
+On less common platforms (e.g., non-Apple ARM or other niche architectures), pip may fall back to a **source build**, which takes a few minutes. See **[Building From Source](https://sharhar.github.io/vkdispatch/tutorials/building_from_source.html)** for toolchain requirements and developer-oriented instructions.  
+*(Replace the link above with your actual GitHub Pages URL once deployed.)*
+
+> **Tip:** If you see output like `Building wheel for vkdispatch (pyproject.toml)`, you’re compiling from source.
 
-If you want a local install of vkdispatch (e.g. for development purposes), then use the following steps to build from source.
-Note that its recommended to use a Python environment manager for development
+## Verifying Your Installation
 
+To ensure `vkdispatch` is installed correctly and can detect your GPU, run:
+
+```bash
+# Quick device listing
+vdlist
+
+# If the above command is unavailable, try:
+python3 -m vkdispatch
 ```
-git clone https://github.com/sharhar/vkdispatch.git
-cd vkdispatch
-python fetch_dependencies.py
-pip install -r requirements.txt
-pip install -e .
+
+If the installation was successful, you should see output listing your GPU(s), for example:
+
+```text
+Device 0: Apple M2 Pro
+    Vulkan Version: 1.2.283
+    Device Type: Integrated GPU
+
+    Features:
+        Float32 Atomic Add: True
+
+    Properties:
+        64-bit Float Support: False
+        16-bit Float Support: True
+        64-bit Int Support: True
+        16-bit Int Support: True
+        Max Push Constant Size: 4096 bytes
+        Subgroup Size: 32
+        Max Compute Shared Memory Size: 32768
+
+    Queues:
+        0 (count=1, flags=0x7): Graphics | Compute
+        1 (count=1, flags=0x7): Graphics | Compute
+        2 (count=1, flags=0x7): Graphics | Compute
+        3 (count=1, flags=0x7): Graphics | Compute
 ```
+
+## Next Steps
+
+- **[Tutorials](https://sharhar.github.io/vkdispatch/tutorials/index.html)** — our curated guide to common workflows and examples
+- **[Full Python API Reference](https://sharhar.github.io/vkdispatch/python_api.html)** — comprehensive reference for Python-facing components
+
+Happy GPU programming!

From b3b46c28d1fbe15fb6dfc4795afdaa4e519893d9 Mon Sep 17 00:00:00 2001
From: Shahar Sandhaus <wiishahar@gmail.com>
Date: Mon, 8 Sep 2025 23:33:18 -0700
Subject: [PATCH 8/8] Removed uneeded sentence from readme

---
 README.md | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/README.md b/README.md
index d8f77cdf..495348e5 100644
--- a/README.md
+++ b/README.md
@@ -15,8 +15,7 @@ pip install vkdispatch
 
 On mainstream platforms — Windows (x86_64), macOS (x86_64 and Apple Silicon/arm64), and Linux (x86_64) — pip will download a **prebuilt wheel** (built with `cibuildwheel` on GitHub Actions and tagged as *manylinux* where applicable), so no compiler is needed.
 
-On less common platforms (e.g., non-Apple ARM or other niche architectures), pip may fall back to a **source build**, which takes a few minutes. See **[Building From Source](https://sharhar.github.io/vkdispatch/tutorials/building_from_source.html)** for toolchain requirements and developer-oriented instructions.  
-*(Replace the link above with your actual GitHub Pages URL once deployed.)*
+On less common platforms (e.g., non-Apple ARM or other niche architectures), pip may fall back to a **source build**, which takes a few minutes. See **[Building From Source](https://sharhar.github.io/vkdispatch/tutorials/building_from_source.html)** for toolchain requirements and developer-oriented instructions.
 
 > **Tip:** If you see output like `Building wheel for vkdispatch (pyproject.toml)`, you’re compiling from source.