This is the artifact for the ICSE'26 submission:
ConfuGuard: Using Metadata to Detect Active and Stealthy Package Confusion Attacks Accurately and at Scale.
Update (Sep 2025): We did not include the latest ConfuDB dataset previously and have updated the artifact and README.md file correspondingly to address this issue.
Update (July 2025): We made some changes to the artifact as part of the ICSE'26 Revision.
- We fixed some refactoring issues in the artifact. (Thanks to Reviewer A!)
- We added the missing
org_allowlist.jsonfile to the artifact. - We added the
prompt_v1.pyandprompt_v2.pyscripts to the artifact. - We added the required external resources and extra setup instructions to the artifact.
| Item | Description | Corresponding content in the paper | Path |
|---|---|---|---|
| Empirical Analysis | The empirical analysis scripts and data | $4, Figure 2, Table3 | attack_analysis |
Implementation of Confuguard |
The implementation of ConfuGuard, including 6 parts | $5, Figure 3 | confuguard |
| Prompt | The prompt we used for Step 5 Benignity Check | Listing 1 | prompt |
| Dataset | We provide the dataset we used, including NeupaneDB (1840 packages) and ConfuDB (2361 packages) |
$6.1 | datasets |
| Evaluation | The evaluation scripts and results | $6.2-$6.6, Table 4-6, Figure 4-5 | eval |
- Installation
- Legit Packages Update
- Update Popular Packages
- HTTP Service
- Example Requests
- Example Outputs
- Supported Registries
pip install -r requirements.txt
This project requires the following external setup:
OpenAI API:
- Set
OPENAI_API_KEYenvironment variable with your API key
Google Cloud PostgreSQL (for cloud deployment):
- Set
GCP_PROJECT_IDenvironment variable - Set
GCP_REGIONenvironment variable - Set
GCP_INSTANCE_NAMEenvironment variable - Set
DB_USERenvironment variable - Set
DB_PASSenvironment variable - Set
DB_NAMEenvironment variable
Bearer Token:
- Set
TYPOSQUAT_BEARER_TOKENenvironment variable for API authentication
Package Metadata Database:
- As noted in the paper, ConfuGuard uses the package metadata database to get the package metadata. The metadata database should be installed and running before running the ConfuGuard.
- ConfuGuard implementation supports two kinds of metadata databases:
- Google Cloud PostgreSQL Database (for cloud deployment)
- Local PostgreSQL with pgvector extension (for local testing)
- Local SQLite database (for local testing)
The full script includes the following steps:
- Legit Packages Update
- Update Popular Packages
- Start HTTP Service
All these steps are run in the entrypoint.sh script so you can just run it by running the following command:
./entrypoint.sh
Run the get_legit_packages.py script.
python confuguard/Part2/get_legit_packages.py
This will save all the legit packages from each ecosystem into legit_packages/{ecosystem}_legit_packages.json file.
python confuguard/Part2/get_legit_packages.py --push_to_postgres
This will push all the legit packages from each ecosystem into postgres table named {ecosystem}_pop_packages.
python confuguard/Part2/update_pop_pkgs.py
This will update the popular packages in the database in a weekly basis.
python confuguard/app.py
Endpoints:
/detect: Detects confusing packages by comparing a package against its neighbors. NOTE: This is the main endpoint that is used in the pipeline. The rest are mainly for testing purposes./get_neighbors: Retrieves neighboring packages based on vector and name-based similarity./add_package: Adds or updates package details in the PostgreSQL database./similarity: Computes cosine similarity between two package embeddings.
curl -X POST http://localhost:5444/detect \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $BEARER_TOKEN" \
-d '{"package_name": "matplotlip", "registry": "pypi"}'
curl -X POST http://localhost:5444/get_neighbors \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $BEARER_TOKEN" \
-d '{"package_name": "dotenv", "registry": "pypi"}'
curl -X POST http://localhost:5444/add_package \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $BEARER_TOKEN" \
-d '{"package_name": "lodash", "registry": "npm"}'
curl -X POST http://localhost:5444/similarity \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $BEARER_TOKEN" \
-d '{"package_name1": "matplotlib", "package_name2": "catplotlib", "registry": "pypi"}'
{
typo_results: [
{
explanation: "The package name 'matplobblib' is very similar to 'matplotlib', with minor character changes that could confuse users. The description of 'matplobblib' is vague and does not indicate a distinct purpose. The maintainer 'Ackrome' is not recognized as a known maintainer in the community. These factors suggest it could be a package confusion attack.",
metadata_missing: false,
package_name: 'matplotlib',
typo_category: '1-step D-L dist'
}
]
}
{
"npm",
"pypi",
"ruby",
"maven",
"golang",
"hf"
}