Skip to content

updating dev-docker README 20250214#426

Draft
arakowsk-amd wants to merge 645 commits intoROCm:mainfrom
arakowsk-amd:20250214_mainfest
Draft

updating dev-docker README 20250214#426
arakowsk-amd wants to merge 645 commits intoROCm:mainfrom
arakowsk-amd:20250214_mainfest

Conversation

@arakowsk-amd
Copy link
Copy Markdown

@arakowsk-amd arakowsk-amd commented Feb 14, 2025

  • Updated build instructions
  • Updated commit

dllehr-amd and others added 30 commits September 27, 2024 16:44
* Enable RPD for single/multi gpu

Co-authored-by: AdrianAbeyta <adrian.abeyta@amd.com>

* Add rpd build instructions to Dockerfile.rocm

* Handle env path

* Fix code errors

* Move RPD based profiling over to profiling folder

* use envs vs os.getenv

---------

Co-authored-by: AdrianAbeyta <adrian.abeyta@amd.com>
* adding cython into docker file with flag

* correcting if
Co-authored-by: Charlie Fu <Charlie.Fu@amd.com>
* make rpdtracer import optional

* fix rpd_mark

* convert rpd_mark to try/except

* move rpd_trace import down

* move import
…OCm#218)

* Automatically set rpd env var with profile flag

* Add readme

* Fix lint errors

---------

Co-authored-by: AdrianAbeyta <Adrian.Abeyta@amd.com>
* llama3.2 + cross attn test

* lint issues fix

* mypy errors

* making yapf happy

* cut off WA for tunned gemms

* try and catch for non continuous tensor

---------

Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
* Optimize CAR for ROCm

* tune block numbers
* inrease cutoff to RCCL fallback to 16 MB
* scope atomics
* remove volatiles

* Pacify linters.
* enable custom PA with max seqlen 128k

* custom PA support to write out scaled fp8 value

* use regular divide for scaling

* enable custom PA to write out fp8 with scaling factor in llama

* linter fixes

* clang-format  fixes

* update abstract attn impl with fp8_out_scale

* add optional fp8_out_scale arg to all attn backend classes

* clang format fix

* add env var to enable cpa fp8 write out

* isort fix
gshtras and others added 22 commits February 12, 2025 17:49
* Using upstream FA repo. Building aiter in the base docker image

* Renaming the file to match upstream naming
* fused_moe config for DSv3 on MI300X updated

* Add tuning script and post processing script

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

* Add modification to fp8_utils for tuning

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

* update tuning script and add the configs

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

* slightly better tunings

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

* benchmark_moe.py is updated to generate more accurate MoE configs and a specific MoE config for DSv3 is added

* Bug in sgl_moe_align_block_size() is fixed by Greg

* Generate fp8_w8a8 config for MI300XHF

* tunings that don't give garbage output

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

* More accurate tunings

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

* More accurate tunings and reject inaccurate configs

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

* add new tunings

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

* rename tuning script and add benchmark script to use for optimizing blockwise quant

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

* remove white space from file names

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

* remove white space from file names

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

* Remove some unnecessary changes

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

* don't use space in file names

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

* remove XHF tunings

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

* remove OAM from file name

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

* rmeove OAM from file names

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

* yapf

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

* update config name

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

* remove benchmark_moe.py changes

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

* remove is_contiguous

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

* use more recent fp8_utils.py

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

* remove is_contiguous

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

---------

Signed-off-by: Randall Smith <Randall.Smith@amd.com>
Co-authored-by: qli88 <qiang.li2@amd.com>
…ed to each following path for their ownership to apply (ROCm#427)
Signed-off-by: isotr0py <2037008807@qq.com>
Signed-off-by: isotr0py <2037008807@qq.com>
@arakowsk-amd arakowsk-amd marked this pull request as ready for review February 18, 2025 02:20
@arakowsk-amd arakowsk-amd marked this pull request as draft February 18, 2025 02:22
@gshtras gshtras force-pushed the main branch 2 times, most recently from 1d2c43d to eb9d4de Compare September 9, 2025 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.