Official implementation of the "SpareCodeSearch: How to search for code context when you do not have a spare GPU" - accepted at the ASE 2025 Context Collection Workshop.
This solution also won the golden prize in Kotlin track 🥇 and the silver prize in Python track 🥈 of the corresponding Context Collection Competition organized by Jetbrains and Mistral AI.
SpareCodeSearch ultilizes Zoekt for key-word based search. No GPU is required for this process. Make sure you have Docker engine and Docker Compose installed to run the submodules and microservices included in this project.
git clone --recurse-submodules https://github.com/minhna1112/spare-code-search.git
git submodule update --remote zoektThe submodule zoekt is a forked version from the original repository provided by Sourcegraph. Instead of using the current Zoekt image on Dockerhub, we chose to build the image from scratch, with ctags, zoekt-indexserver, and zoekt-webserver installed in the process. See the Dockerfile for more details.
Run the following command to build the Zoekt Docker image:
cd zoekt && docker build -t zoekt-local:v0.0.1 . && docker tag zoekt-local:v0.0.1 zoekt-local:latestThe structure for data is kept in accordance with the original Code Context competition data structure, which is as follows:
data
├── {language}-{stage}.jsonl # Competition data
└── repositories-{language}-{stage} # Folder with repositories
└── {owner}__{repository}-{revision} # Repository revision used for collecting context
└── repository contentsMore details on how to download and prepare the original competition data can be found in its starter repository.
To use your own custom data, make sure to follow the same directory structure as the original competition data. Each datapoint in the JSONL file should correspond to the following format.
{
"id": "revsionID",
"repo": "owner/repoName",
"revision": "repoRevision",
"path": "path/to/the/file/containing/completion/point",
"modified": [
"paths/to/files, that/are/modified, in/the/same/revision"
],
"prefix": "incompletedCodeBeforeCompletionPoint",
"suffix": "incompletedcodeAftercompletionPoint",
"archive": "this is optional"
}After preparing the data, to index it, you need to run the zoekt-indexer service. This service will read the data from the data folder and index it for searching. Each datapoint (corresponding to a JSON line in the {language}-{stage}.jsonl file) will be indexed as a single shard in the Zoekt index. For starting each of the service below, please you Docker compose:
STAGE=public LANGUAGE=kotlin docker-compose up zoekt-indexerFor Docker engine with Docker CLI greater than v20.10, compose plugin can be used, replacing the docker-compose standalone applicaiton
STAGE=public LANGUAGE=kotlin docker compose up zoekt-indexerYou can modify the environment variables to match your specific use case (stage and language). The Indexing process will take some time, depending on the size of the data and your machine's performance. On a M3 Macbook Air, the indexing process took about 10-15 minutes, for a dataset of 300-400 repositories. More on how to config the memory and CPU usages for indexing could be found in this Zoekt's thread.
Once the indexing is complete, you can start the zoekt-webserver service, exposing /api/search at default port 6070.
STAGE=public LANGUAGE=kotlin docker-compose up zoekt-webserveror with compose plugin
STAGE=public LANGUAGE=kotlin docker compose up zoekt-webserverOn your local machine, a web browser can be used to access the search interface at http://localhost:6070/.

Once everything with Zoekt is set up, and you can now build and run the Spare Code Context Docker image. Everytime you want to run the Spare Code Context, you need to run the following command, which will re-build the image and re-recreate the container.
docker-compose up --build --force-recreate spare-code-contextor with compose plugin
docker compose up spare-code-context --build --force-recreateYou can modify the docker-compose.yml file to set the appropriate environment variables for your STAGE and LANGUAGE. All other volumes and environment variables are set in the docker-compose.yml file.
environment:
- STAGE=${STAGE:-public} # Default to 'public' stage if not set
- LANGUAGE=${LANGUAGE:-kotlin} # Default to 'kotlin' language if not set
- ZOEKT_URL=http://zoekt-webserver:6070/api/search # URL of the zoekt web serverThe ZOEKT_URL is the URL of the Zoekt web server that provides the search API, which should be left as is unless you have a custom setup.
All the volumes are mounted to the spare_code_context container
volumes:
- ./data:/data # Mount local data directory
- ./predictions:/predictions # Mount local predictions directory
- ./queries:/queries # Mount local queries directoryAfter every run, you can find the predictions in the predictions folder, which will be created if it does not exist. The predictions will be saved in the format {language}-{stage}-predictions.jsonl, where language and stage are the same as in the docker-compose.yml file.
There will be two types of outputs generated from the Spare Code Context service:
- Predictions: These are the outputs used for submission to the competition. They represent the code context inside each shard founded by our solution, which serve as the inputs for the Code Language Models to generate the missing completions. They will be saved in single JSONL file the
predictionsfolder. Format of each prediction point:
{
"context": "The join string of every context founded by Spare Code Search",
"prefix": "prefix string ",
"suffix": "suffix string"
}- Queries: These are the queries used to retrieve the code context, via sending them to the Zoekt web server' search API. They will be saved in the
queriesfolder, also as a JSONL file, formatting as:
{
"candidates":
{
"query_type_a": "The query string for type A"
},
{
"query_type_b": "The query string for type B"
}
}Details on the inner workings of the SpareCodeSearch can be found in the technical design document.
If you want to use the code from this repository, please cite it as follows:
@misc{spare-code-search,
author = {Minh Nguyen},
title = {Spare Code Search},
year = {2025},
publisher = {GitHub},
journal = {GitHub Repository},
url = {https://github.com/minhna1112/spare-code-context}
}

