Code intelligence is an emerging domain in software engineering, aiming to improve the effectiveness and efficiency of various code-related tasks. Recent research suggests that incorporating contextual information beyond the basic original task inputs (i.e., source code) can substantially enhance model performance. Such contextual signals may be obtained directly or indirectly from sources such as API documentation or intermediate representations like abstract syntax trees can significantly improve the effectiveness of code intelligence. Despite growing academic interest, there is a lack of systematic analysis of \textbf{context} in code intelligence. To address this gap, we conduct an extensive literature review of 146 relevant studies published between September 2007 and August 2024. Our investigation yields four main contributions. (1) A quantitative analysis of the research landscape, including publication trends, venues, and the explored domains; (2) A novel taxonomy of context types used in code intelligence; (3) A task-oriented analysis investigating context integration strategies across diverse code intelligence tasks; (4) A critical evaluation of evaluation methodologies for context-aware methods. Based on these findings, we identify fundamental challenges in context utilization in current code intelligence systems and propose a research roadmap that outlines key opportunities for future research.
Note: This repository contains the paper collection from our systematic survey. The repository will be continuously updated as new relevant studies are published.
- Systematic Review: This is the first systematic review synthesizing context utilization in code intelligence tasks.
- Context Taxonomy: We build a taxonomy of context types and task-specific integration patterns.
- Quantitative Analysis: We conduct quantitative analysis of methodological trends and evaluation practices.
- Research Roadmap: We provide a research roadmap addressing scalability, generalizability, and assessment gaps, identifying key challenges and emerging opportunities for enhancing code intelligence through effective context utilization.
- 2025 - (New*) Hybrid Intelligent Architecture for Context-Driven Code Clone Detection
- 2024 - Prism: Decomposing Program Semantics for Code Clone Detection through Compilation
- 2023 - Graph-based code semantics learning for efficient semantic code clone detection
- 2020 - CCGraph: a PDG-based code clone detector with approximate graph matching
- 2020 - VGRAPH: A Robust Vulnerable Code Clone Detection System Using Code Property Triplets
- 2019 - From Local to Global Semantic Clone Detection
- 2019 - CLCDSA: Cross Language Code Clone Detection using Syntactical Features and API Documentation
- 2018 - Semantic Clone Detection: Can Source Code Comments Help?
- 2018 - CroLSim: Cross Language Software Similarity Detector Using API Documentation
- 2015 - An Execution-Semantic and Content-and-Context-Based Code-Clone Detection and Analysis
- 2013 - Towards Contextual and On-Demand Code Clone Management by Continuous Monitoring
- 2025 - (New*) Challenge on Optimization of Context Collection for Code Completion
- 2025 - (New*) API-Repo: API-centric Repository-level Code Completion
- 2025 - RLCoder: Reinforcement Learning for Repository-Level Code Completion
- 2024 - RepoHyper: Better Context Retrieval Is All You Need for Repository-Level Code Completion
- 2024 - Repoformer: Selective Retrieval for Repository-Level Code Completion
- 2024 - RepoMinCoder: Improving Repository-Level Code Generation Based on Information Loss Screening
- 2024 - Contextual API Completion for Unseen Repositories Using LLMs
- 2024 - REPOFUSE: Repository-Level Code Completion with Fused Dual Context
- 2024 - Dataflow-Guided Retrieval Augmentation for Repository-Level Code Completion
- 2024 - Prompt-based Code Completion via Multi-Retrieval Augmented Generation
- 2024 - Enhancing LLM-Based Coding Tools through Native Integration of IDE-Derived Static Context
- 2024 - GraphCoder: Enhancing Repository-Level Code Completion via Code Context Graph-based Retrieval and Language Model
- 2024 - R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models
- 2024 - RepoMasterEval: Evaluating Code Completion via Real-World Repositories
- 2024 - STALL+: Boosting LLM-based Repository-level Code Completion with Static Analysis
- 2023 - Guiding Language Models of Code with Global Context using Monitors
- 2023 - Better Context Makes Better Code Language Models: A Case Study on Function Call Argument Completion
- 2023 - CoCoMIC: Code Completion By Jointly Modeling In-file and Cross-file Context
- 2023 - Repository-Level Prompt Generation for Large Language Models of Code
- 2023 - RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation
- 2023 - RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems
- 2023 - RepoFusion: Training Code Models to Understand Your Repository
- 2023 - LongCoder: A Long-Range Pre-trained Language Model for Code Completion
- 2023 - CROSSCODEEVAL: A Diverse and Multilingual Benchmark for Cross-File Code Completion
- 2022 - Reacc: A retrieval-augmented code completion framework
- 2022 - All you need is logs: Improving code completion by learning from anonymous ide usage logs
- 2016 - A Simple, Efficient, Context-sensitive Approach for Code Completion
- 2016 - Probabilistic Model for Code with Decision Trees
- 2014 - CSCC: Simple, Efficient, Context Sensitive Code Completion
- 2012 - Graph-Based Pattern-Oriented, Context-Sensitive Source Code Completion
- 2025 - (New*) SolEval: Benchmarking Large Language Models for Repository-level Solidity Code Generation
- 2025 - (New*) Fea-bench: A benchmark for evaluating repository-level code generation for feature implementation
- 2025 - (New*) GraphCodeAgent: Dual Graph-Guided LLM Agent for Retrieval-Augmented Repo-Level Code Generation
- 2025 - (New*) Improving Project-Level Code Generation Using Combined Relevant Context
- 2024 - Iterative Refinement of Project-Level Code Context for Precise Code Generation with Compiler Feedback
- 2024 - CoderEval: A Benchmark of Pragmatic Code Generation with Generative Pre-trained Models
- 2024 - A^3-CodGen: A Repository-Level Code Generation Framework for Code Reuse with Local-Aware, Global-Aware, and Third-Party-Library-Aware
- 2024 - CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks
- 2024 - EvoCodeBench: An Evolving Code Generation Benchmark Aligned with Real-World Code Repositories
- 2024 - Teaching Code LLMs to Use Autocompletion Tools in Repository-Level Code Generation
- 2024 - REPOEXEC: Evaluate Code Generation with a Repository-Level Executable Benchmark
- 2024 - Codeagent: Enhancing code generation with tool-integrated agent systems for real-world repo-level coding challenges
- 2024 - Towards more realistic evaluation of LLM-based code generation: an experimental study and beyond
- 2024 - Enhancing Repository-Level Code Generation with Integrated Contextual Information
- 2024 - Compositional API Recommendation for Library-Oriented Code Generation
- 2024 - Bigcodebench: Benchmarking code generation with diverse function calls and complex instructions
- 2023 - Private-Library-Oriented Code Generation with Large Language Models
- 2023 - BIOCODER: A Benchmark for Bioinformatics Code Generation with Contextual Pragmatic Knowledge
- 2023 - DocPrompting: Generating Code by Retrieving the Docs
- 2023 - ToolCoder: Teach Code Generation Models to use API search tools
- 2023 - CodeGen4Libs: A Two-Stage Approach for Library-Oriented Code Generation
- 2022 - When Language Model Meets Private Library
- 2022 - Code Generation for Unknown Libraries via Reading API Documentations
- 2022 - Compilable neural code generation with compiler feedback
- 2021 - Embedding api depen dency graph for neural code generation
- 2019 - Automatic Code Generation From UML State Chart Diagrams
- 2019 - JuICe: A Large Scale Distantly Supervised Dataset for Open Domain Context-based Code Generation
- 2025 - (New*) Hierarchical Repository-Level Code Summarization for Business Applications Using Local LLMs
- 2025 - (New*) Code Summarization Beyond Function Level
- 2024 - Revisiting File Context for Source Code Summarization
- 2024 - Context‐based transfer learning for low resource code summarization
- 2024 - Context-aware Code Summary Generation
- 2024 - Learning to Generate Structured Code Summaries from Hybrid Code Context
- 2024 - Programmer Visual Attention During Context-Aware Code Summarization
- 2023 - Statement-based Memory for Neural Source Code Summarization
- 2023 - APIContext2Com: Code Comment Generation by Incorporating Pre-Defined API Documentation
- 2023 - Snippet Comment Generation Based on Code Context Expansion
- 2023 - Function Call Graph Context Encoding for Neural Source Code Summarization
- 2023 - Evaluating Code Comment Generation With Summarized API Docs
- 2023 - Code structure guided transformer for source code summarization
- 2023 - An Intra-Class Relation Guided Approach for Code Comment Generation
- 2022 - Improved Automatic Summarization of Subroutines via Attention to File Context
- 2021 - API2Com: On the Improvement of Automatically Generated Code Comments Using API Documentations
- 2021 - Project-Level Encoding for Neural Source Code Summarization of Subroutines
- 2021 - LANGUAGE-AGNOSTIC REPRESENTATION LEARNING OF SOURCE CODE FROM STRUCTURE AND CONTEXT
- 2021 - Improving Code Summarization with Block-wise Abstract Syntax Tree Splitting
- 2021 - RETRIEVAL-AUGMENTED GENERATION FOR CODE SUMMARIZATION VIA HYBRID GNN
- 2021 - Learning to Generate Code Comments from Class Hierarchies
- 2020 - CoCoSum: Contextual Code Summarization with Multi-Relational Graph Neural Network
- 2020 - Towards Context-Aware Code Comment Generation
- 2019 - Augmenting Java method comments generation with context information based on neural networks
- 2016 - Automatic source code summarization of context for Java methods
- 2015 - CloCom: Mining Existing Source Code for Automatic Comment Generation
- 2025 - (New*) Consider What Humans Consider: Optimizing Commit Message Leveraging Contexts Considered By Human
- 2025 - (New*) Capturing the context-aware code change via dynamic control flow graph for commit message generation
- 2024 - Multi-grained contextual code representation learning for commit message generation
- 2024 - COMET: Generating Commit Messages using Delta Graph Context Representation
- 2024 - Only diff Is Not Enough: Generating Commit Messages Leveraging Reasoning and Action of Large Language Model
- 2024 - Context Conquers Parameters: Outperforming Proprietary LLM in Commit Message Generation
- 2023 - Context-Encoded Code Change Representation for Automated Commit Message Generation
- 2023 - From Commit Message Generation to History-Aware Commit Message Completion
- 2022 - Combining Code Context and Fine-grained Code Difference for Commit Message Generation
- 2021 - CoreGen: Contextualized Code Representation Learning for Commit Message Generation
- 2021 - Context-aware Retrieval-based Deep Commit Message Generation
- 2017 - Mining Version Control System for Automatically Generating Commit Comment
- 2016 - On Automatic Summarization of What and Why Information in Source Code Changes
- 2014 - On Automatically Generating Commit Messages via Summarization of Source Code Changes
- 2025 - (New*) Towards Practical Defect-Focused Automated Code Review
- 2025 - (New *) CAPRA: Context-Aware patch risk assessment for detecting immature vulnerability in open-source software
- 2024 - Vulnerability Detection via Multiple-Graph-Based Code Representation
- 2024 - ReposVul: A Repository-Level High-Quality Vulnerability Dataset
- 2024 - VulEval: Towards Repository-Level Evaluation of Software Vulnerability Detection
- 2024 - COPS: An improved information retrieval-based bug localization technique using context-aware program simplification
- 2024 - Cross-project defect prediction via semantic and syntactic encoding
- 2024 - Fine-Grained Bug Localization Based on Rich Context using Attention Tree-GRU
- 2024 - Software Defect Prediction Based on Deep Representation Learning of Source Code From Contextual Syntax and Semantic Graph
- 2024 - CITADEL: Context Similarity Based Deep Learning Framework Bug Finding
- 2024 - Context and Multi-Features-Based Vulnerability Detection: A Vulnerability Detection Frame Based on Context Slicing and Multi-Features
- 2023 - Detecting Condition-Related Bugs with Control Flow Graph Neural Network
- 2023 - Identifying Vulnerability Patches by Comprehending Code Commits with Comprehensive Change Contexts
- 2023 - Semantic feature learning for software defect prediction from source code and external knowledge
- 2023 - ContextAug: model-domain failing test augmentation with contextual information
- 2023 - An empirical evaluation of defect prediction approaches in within-project and cross-project context
- 2022 - CPGBERT: An Effective Model for Defect Detection by Learning Program Semantics via Code Property Graph
- 2021 - Defect Prediction With Semantics and Context Features of Codes Based on Graph Representation Learning
- 2019 - Improving bug detection via context-based code representation learning and attention-based neural networks
- 2019 - Source code retrieval for bug localization using bug report
- 2017 - Improved bug localization based on code change histories and bug reports
- 2015 - Combining Deep Learning with Information Retrieval to Localize Buggy Files for Bug Reports
- 2015 - Bug Localization Based on Code Change Histories and Bug Reports
- 2014 - Version history, similar report, and structure: putting them together for improved bug localization
- 2013 - Improving bug localization using structured information retrieval
- 2012 - Locating Source Code to be Fixed based on Initial Bug Reports -A Case Study on the Eclipse Project
- 2012 - Where Should the Bugs Be Fixed? More Accurate Information Retrieval-Based Bug Localization Based on Bug Reports
- 2012 - Locus: locating bugs from software changes
- 2011 - Context-sensitive interprocedural defect detection based on a unified symbolic procedure summary model
- 2009 - Taming Coincidental Correctness: Coverage Refinement with Context Patterns to Improve Fault Localization
- 2007 - Context-based detection of clone-related bugs
- 2025 - (New*) EXPEREPAIR: Dual-Memory Enhanced LLM-based Repository-Level Program Repair
- 2025 - (New*) Enhancing repository-level software repair via repository-aware knowledge graphs
- 2024 - When Large Language Models Confront Repository-Level Automatic Program Repair: How Well They Done?
- 2024 - How to Understand Whole Software Repository?
- 2024 - Hybrid Automated Program Repair by Combining Large Language Models and Program Analysis
- 2024 - Out of context: How important is local context in neural program repair?
- 2024 - SWE-BENCH: CAN LANGUAGE MODELS RESOLVE REAL-WORLD GITHUB ISSUES?
- 2023 - Better Automatic Program Repair by Using Bug Reports and Tests Together
- 2023 - Embedding Context as Code Dependencies for Neural Program Repair
- 2022 - TransRepair: Context-aware Program Repair for Compilation Errors
- 2022 - Context-Aware Code Change Embedding for Better Patch Correctness Assessment
- 2021 - RepairNet: Contextual Sequence-to-Sequence Network for Automated Program Repair
- 2021 - SEQUENCER: Sequence-to-Sequence Learning for End-to-End Program Repair
- 2020 - DLFix: context-based code transformation learning for automated program repair
- 2020 - CoCoNuT: Combining Context-Aware Neural Translation Models using Ensemble for Program Repair
- 2019 - iFixR: Bug Report driven Program Repair
- 2019 - The effectiveness of context-based change application on automatic program repair
- 2019 - Automatic patch generation with context-based change application
- 2018 - Shaping program repair space with existing patches and similar code
- 2018 - Context-Aware Patch Generation for Better Automated Program Repair
- 2013 - R2Fix: Automatically Generating Bug Fixes from Bug Reports
- 2012 - Making Automatic Repair for Large-scale Programs More Efficient Using Weak Recompilation
If you use this repository in your research, please cite:
@article{wang2025towards,
title={Towards an Understanding of Context Utilization in Code Intelligence},
author={Wang, et al.},
journal={arXiv preprint arXiv:2504.08734},
year={2025}
}