updating dev-docker README 20250214#426
Draft
arakowsk-amd wants to merge 645 commits intoROCm:mainfrom
Draft
Conversation
* Enable RPD for single/multi gpu Co-authored-by: AdrianAbeyta <adrian.abeyta@amd.com> * Add rpd build instructions to Dockerfile.rocm * Handle env path * Fix code errors * Move RPD based profiling over to profiling folder * use envs vs os.getenv --------- Co-authored-by: AdrianAbeyta <adrian.abeyta@amd.com>
* adding cython into docker file with flag * correcting if
Co-authored-by: Charlie Fu <Charlie.Fu@amd.com>
Upstream merge 24 09 27 0.6.2
* make rpdtracer import optional * fix rpd_mark * convert rpd_mark to try/except * move rpd_trace import down * move import
…OCm#218) * Automatically set rpd env var with profile flag * Add readme * Fix lint errors --------- Co-authored-by: AdrianAbeyta <Adrian.Abeyta@amd.com>
…orts setuptools_scm (ROCm#221)
* llama3.2 + cross attn test * lint issues fix * mypy errors * making yapf happy * cut off WA for tunned gemms * try and catch for non continuous tensor --------- Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
* Optimize CAR for ROCm * tune block numbers * inrease cutoff to RCCL fallback to 16 MB * scope atomics * remove volatiles * Pacify linters.
* enable custom PA with max seqlen 128k * custom PA support to write out scaled fp8 value * use regular divide for scaling * enable custom PA to write out fp8 with scaling factor in llama * linter fixes * clang-format fixes * update abstract attn impl with fp8_out_scale * add optional fp8_out_scale arg to all attn backend classes * clang format fix * add env var to enable cpa fp8 write out * isort fix
Upstream merge 25 02 10
* Using upstream FA repo. Building aiter in the base docker image * Renaming the file to match upstream naming
* fused_moe config for DSv3 on MI300X updated * Add tuning script and post processing script Signed-off-by: Randall Smith <Randall.Smith@amd.com> * Add modification to fp8_utils for tuning Signed-off-by: Randall Smith <Randall.Smith@amd.com> * update tuning script and add the configs Signed-off-by: Randall Smith <Randall.Smith@amd.com> * slightly better tunings Signed-off-by: Randall Smith <Randall.Smith@amd.com> * benchmark_moe.py is updated to generate more accurate MoE configs and a specific MoE config for DSv3 is added * Bug in sgl_moe_align_block_size() is fixed by Greg * Generate fp8_w8a8 config for MI300XHF * tunings that don't give garbage output Signed-off-by: Randall Smith <Randall.Smith@amd.com> * More accurate tunings Signed-off-by: Randall Smith <Randall.Smith@amd.com> * More accurate tunings and reject inaccurate configs Signed-off-by: Randall Smith <Randall.Smith@amd.com> * add new tunings Signed-off-by: Randall Smith <Randall.Smith@amd.com> * rename tuning script and add benchmark script to use for optimizing blockwise quant Signed-off-by: Randall Smith <Randall.Smith@amd.com> * remove white space from file names Signed-off-by: Randall Smith <Randall.Smith@amd.com> * remove white space from file names Signed-off-by: Randall Smith <Randall.Smith@amd.com> * Remove some unnecessary changes Signed-off-by: Randall Smith <Randall.Smith@amd.com> * don't use space in file names Signed-off-by: Randall Smith <Randall.Smith@amd.com> * remove XHF tunings Signed-off-by: Randall Smith <Randall.Smith@amd.com> * remove OAM from file name Signed-off-by: Randall Smith <Randall.Smith@amd.com> * rmeove OAM from file names Signed-off-by: Randall Smith <Randall.Smith@amd.com> * yapf Signed-off-by: Randall Smith <Randall.Smith@amd.com> * update config name Signed-off-by: Randall Smith <Randall.Smith@amd.com> * remove benchmark_moe.py changes Signed-off-by: Randall Smith <Randall.Smith@amd.com> * remove is_contiguous Signed-off-by: Randall Smith <Randall.Smith@amd.com> * use more recent fp8_utils.py Signed-off-by: Randall Smith <Randall.Smith@amd.com> * remove is_contiguous Signed-off-by: Randall Smith <Randall.Smith@amd.com> --------- Signed-off-by: Randall Smith <Randall.Smith@amd.com> Co-authored-by: qli88 <qiang.li2@amd.com>
…ed to each following path for their ownership to apply (ROCm#427)
Signed-off-by: isotr0py <2037008807@qq.com>
Signed-off-by: isotr0py <2037008807@qq.com>
Upstream merge 25 02 17
1d2c43d to
eb9d4de
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.