Open
Conversation
Co-authored-by: suxiangM <maxiang992@128.com>
Co-authored-by: jinchengxiong <jinchengxiong@baidu.com>
* add [angle, dot, index_put, nan_to_num, polar] supported * fix angle
* fix-speedup:rand * fix-randn: change unroll to 8 * change blocksize 512 to 1024 * seed+1 * fix-speedup:all, normal
* fix codestyle * update * [MutiBackend] update MutiBackend Framework * update * update multibackend README
* [kunlunxin] fix any buffer_size_limit param * fix all
* [METAX] modify metax backend debug message * [METAX] improve index_select and repeat_interleave performance * [METAX] add max_int accuracy test for metax --------- Co-authored-by: mx-flaggems-user <m01080@metax-tech.com>
…rm_interface and upsample_bicubic2d_aa. MTHREADS: Fix op vdot and fill_. MTHREADS: Fix some ops. MTHREADS: Fixed the bug that the op under backend _mthreads cannot be recognized. Mthreads: Skip two ops in the benchmark that are not supported, enable op all.
MTHREADS: Add addmm kernel for _mthreads backend, and fix a bug of mm kernel. MTHREADS: Add bmm kernel for _mthreads backend.
* [hygon] fix accuracy error for trunc div * [hygon] fix isclose accurary error --------- Co-authored-by: suxiangM <maxiang992@128.com>
* [Huawei] Ascend code for FlagGems (flagos-ai#608) * Add files via upload * Update __init__.py * Update device.py * Update commom_utils.py * Update __init__.py * Update gelu_and_mul.py * Update angle.py * Update div.py * Update gelu.py * Update isinf.py * Update isnan.py * Update nan_to_num.py * Update pow.py * Update tanh.py * Update vector_norm.py * Update performance_utils.py * Update test_binary_pointwise_perf.py * Update test_reduction_perf.py * Update test_unary_pointwise_perf.py * Update test_binary_pointwise_ops.py * Update test_reduction_ops.py * Update test_unary_pointwise_ops.py * Create __init__.py * Update pointwise_dynamic.py * Update test_blas_ops.py * Update test_blas_ops.py * Update test_general_reduction_ops.py * Update test_reduction_ops.py * Update test_binary_pointwise_ops.py * Update test_unary_pointwise_ops.py * Update test_binary_pointwise_ops.py * Update test_blas_ops.py * Update test_special_ops.py * Update test_binary_pointwise_ops.py * Update test_unary_pointwise_ops.py * Update test_binary_pointwise_ops.py * Update test_unary_pointwise_ops.py * Update test_norm_ops.py * Update test_binary_pointwise_ops.py * Update test_binary_pointwise_ops.py * Update test_binary_pointwise_ops.py * Update test_reduction_ops.py * Update test_binary_pointwise_ops.py * Update test_general_reduction_ops.py * Update test_general_reduction_ops.py * Update test_general_reduction_ops.py * Update test_blas_ops.py * Update test_blas_ops.py * Update test_special_ops.py * Update test_binary_pointwise_ops.py * Update test_binary_pointwise_ops.py * Update test_general_reduction_ops.py * Update test_reduction_ops.py * Update test_general_reduction_ops.py * Update test_binary_pointwise_ops.py * Update test_reduction_ops.py * Update test_special_ops.py * Update test_unary_pointwise_ops.py * Update pointwise_dynamic.py * Update __init__.py * Update test_binary_pointwise_perf.py * Update test_reduction_perf.py * Update test_unary_pointwise_perf.py * Update test_blas_perf.py * Update test_binary_pointwise_perf.py * Update test_reduction_perf.py * Update test_binary_pointwise_ops.py * Update test_blas_ops.py * Update test_general_reduction_ops.py * Update test_norm_ops.py * Update test_reduction_ops.py * Update test_unary_pointwise_ops.py * Update test_binary_pointwise_ops.py * Update test_binary_pointwise_ops.py * Update test_general_reduction_ops.py * Update test_general_reduction_ops.py * Update test_unary_pointwise_ops.py * Update test_binary_pointwise_ops.py * Update __init__.py * Update test_binary_pointwise_ops.py * Update test_unary_pointwise_ops.py * Update test_blas_perf.py * Delete src/flag_gems/runtime/backend/_ascend/ops directory * Update test_binary_pointwise_ops.py * [BACKEND] Init ascend backend --------- Co-authored-by: Jiang_wj <62932620+Sans1J@users.noreply.github.com>
* [Doc] update README with citation * [no ci]update * fix cpp doc
* [KUNLUN] Speed Up Full/Ones/Zeros * [KUNLUN] Fix Ones/Zeros --------- Co-authored-by: root <root@zzjg-isa-ai-p800-klxnode04.zzjg.baidu.com>
flagos-ai#532) * write to tmp file & os.replace, so as to avoid writing a module per process in PointwiseDynamicFunction, add test for multiprocessing & multithreading * update for other operators * Update tests/test_pointwise_dynamic.py * use os.replace to write the same contents to the same path concurrently --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* [Operator] register backward independently for tanh * [Operator] register backward independently for gelu * [Operator] implement threshold fwd and bwd, as bwd of relu at the same time * [Operator] register sigmoid independently * [Operator] register silu backward independently * [Operator] register dropout backward independently * [Operator] register embedding backward * [Operator] register group_norm backward * [Operator] register layer_norm backward * [Test] test backward with torch.ops.aten functions * [Operator] optimize group_norm_backward to allow larger input * [Bugfix] wrong call of threshold_backward * [Operator] register backward of softmax * [Operator] register log_softmax backward * [Operator] register batch_norm backward * [Operator] register weightnorm_interface_backward * [Operator] modify weight_norm * [Bugfix] weight_norm test error * [Bugfix] diagonal_backward * [Bugfix] initialize cuda context properly and reduce test cases * remove backward for inplace ops * impl dropout on train=False and fix error in groupnorm * [Operator] move ops weight_norm/instance_norm/outer/celoss into fused directory, which are registered as AutogradCUDA before * reformat * rename some variables for better understanding; use torch.nn's get_enum to convert reduction string to integer * delete useless definition of REDUCTION * misspell fix * Update weight_norm.py for ci * Update weight_norm.py * fix redefination of test_accuracy_polar --------- Co-authored-by: Clement Chan <iclementine@outlook.com> Co-authored-by: Bowen <81504862+Bowen12992@users.noreply.github.com>
flagos-ai#631) * [bugfix] reorder the computation of weight_norm_backward to pass unit test * [bugfix] allow grad of weight_norm to be nan --------- Co-authored-by: i3wanna2 <2535184404@qq.com>
* [LIBENTRY] fix triton 3.3.x support * [LIBENTRY] Fix tune and heur config when using Triton 3.3
* set environment variable for liboperators.so to find source of triton kernel code * clean cmake files * update doc and workflow for building c extensions
…gos-ai#802) * tmp disable * format change * tmp disable * tmp disable * tmp disable * tmp skip
flagos-ai#804) Adaptation for MThreads backend: - update heuristics_config - enable scatter op - enable scatter_ op - enable layernorm op
* tmp disable and open operator * format
* add environment variable for libtuner cache * rename env flag * rename env flag * rename . * rename .
Define `get_torch_device_ctx` in runtime, replacing torch_device_fn.device(device) with it. In utils/pointwise_dynamic, add some compatible codes around `_DeviceGuard`. .
Currently the implementation of `where` is unable to run on CPU
… `run_all_perf_tests.sh`
[cpu] modify
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Here's a summary of tests:
For a full CPU porting document, please refer to CPUPorting.md in the project
附上最后一个测试完成的截图:
