Skip to content

Commit 9bb75b6

Browse files
committed
Updated webpage with more details
1 parent d900d64 commit 9bb75b6

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

research/research_home.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,7 @@ <h3> Thematic Questions </h3>
8989
</ul>
9090
</li>
9191

92-
<li> <b> Are the existing resource managers for clusters (such as SLURM, Kubernetes, or general cloud-infra) efficient, portable, and friendly enough? </b>
92+
<li> <b> Are the existing resource managers for clusters (such as SLURM, Kubernetes, or general cloud-infra) efficient, portable, and friendly enough to <em> nicely </em> support AI workloads? </b>
9393
<ul>
9494
<!-- <li> How should this higher-lever resource manager interact with collective programming frameworks, such as Nvidia's NCCL, AMD's RCCL, or Intel's oncCCL? Is this as efficient and scalable as it could be? <em> What about building a system which supports vendor-agnostic collective programming? </em> </li> -->
9595
<!-- <li> <em> Given the explosion in architectures and accelerators, we would ideally like a system that is compatible with hardware from various vendors. </em> There is current support for CUDA devices, but this support is a second-class priority and the configuration does not appear to be user-friendly or scalable. SLURM interfaces with Nvidia's Multi-Process Service (MPS) and Multi-Instance (MIG) so multiple jobs can share an individual device's resources; however, there are limitations and this current structure will not be compatible with the advanced GPUs being developed by other vendors. I believe there is room for improved system design. </li>

0 commit comments

Comments
 (0)