This project uses the K-Nearest Neighbors (KNN) algorithm to classify small images. It is trained on a labeled dataset; given a new image, it finds the K closest training examples and returns the most frequent class among them (e.g. “dog”, “ship”). Three distance metrics are implemented: Manhattan, Chebyshev, and Levenshtein.
The code uses the CIFAR-10 dataset:
- 60,000 32×32 color images in 10 classes (6,000 images per class).
- 50,000 training images (in 5 batches of 10,000) and 10,000 test images.
- Classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck.
The dataset is split into pickled batch files: data_batch_1 … data_batch_5, test_batch, and batches.meta (label names). This layout matches the official Python version of CIFAR-10.
- Course / subject: Artificial Intelligence
- When: Feb 2018
-
Create a virtual environment and install dependencies (once):
python3 -m venv .venv source .venv/bin/activate # on Windows: .venv\Scripts\activate pip install -r requirements.txt
-
Download the CIFAR-10 dataset and place it in the project:
- Go to CIFAR-10 and CIFAR-100 datasets and download the CIFAR-10 Python version (~163 MB).
- Unpack the archive. Put the folder that contains
data_batch_1,test_batch, andbatches.metain this project:- Either name it
cifar-10-batches-pyat the same level asknn.py, or - Keep the default structure
cifar-10-python/cifar-10-batches-py.
- Either name it
-
Run the script:
python3 knn.py
Python, NumPy. Optional: Pillow (PIL) for the image-display function.
Versions: Python 3.8+; NumPy 1.20+ (or 2.x). Recommended: Python 3.10+ and NumPy 1.24+ or 2.x. NumPy 2.4+ may show a deprecation when loading CIFAR-10 pickles; the script suppresses it.
main.tex is the LaTeX source for the project report (in Spanish). It describes the KNN methodology, the three distance functions (Manhattan, Chebyshev, Levenshtein) with equations, the experiments run on CIFAR-10, and the conclusions. The PDF is built from this file together with biblio.bib.
View the report online (read-only, Spanish):
Reporte en Overleaf — K Nearest Neighbors Analysis
- Dataset and description: CIFAR-10 and CIFAR-100 datasets (Alex Krizhevsky, Vinod Nair, Geoffrey Hinton).
- Citation: Learning Multiple Layers of Features from Tiny Images, Alex Krizhevsky, 2009. If you use CIFAR-10, please cite the tech report as indicated on the dataset page.
Archived coursework; code has been lightly adapted for cross-platform paths and data lookup.