Abstract Syntax Tree for Programming Language Understanding and Representation: How Far Are We?
@article{2023-AST4PLU,
title={Abstract Syntax Tree for Programming Language Understanding and Representation: How Far Are We?},
author={Weisong Sun, Chunrong Fang, Yun Miao, Yudu You, Mengzhe Yuan, Yuchen Chen, Quanjun Zhang, An Guo, Xiang Chen, Yang Liu, Zhenyu Chen},
journal={arXiv preprint arXiv:2312.00413},
year={2023}
}
This repository contains the implementation code for the experiments in the paper "Abstract Syntax Tree for Programming Language Understanding and Representation: How Far Are We?".
data-process - Contains code for generating AST using JDT, srcML, Antlr and Tree-sitter, and code for preprocessing AST into BFS sequence, SBT, AST Path, Binary Tree and Split AST.
code-clone-detection - Contains code for the code clone detection task using BiLSTM, Transformer, Child-Sum TreeLSTM, N-ary TreeLSTM and AST-Trans.
code-summarization - Contains code for the code summarization task using BiLSTM, Transformer, Child-Sum TreeLSTM, N-ary TreeLSTM and AST-Trans.
code-search - Contains code for the code search task using BiLSTM, Transformer, Child-Sum TreeLSTM, N-ary TreeLSTM and AST-Trans.
The supplemental material of our paper is ./TOSEM 2024_Supplemental Material.pdf.
pip install -r requirements.txt
pytorch-geometric (used for AST-Trans) can be installed by pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://pytorch-geometric.com/whl/torch-1.10.0%2Bcu102.html
Please refer to the README in data-process directory.
Dataset preprocessed by us can be downloaded here.
Run run.py in the code directory.
For specific parameter settings, please refer to the README under each subfolder.
For code clone detection, run evaluator.py in the evaluator directory.
For other tasks, please refer to the README under each subfolder.
Results of our experiments can be found here.