The code consists of 7 relevant python files which can be found in 'clusteringAndClassification':
- main.py: the main code file with which the assignments were done.
- dataETL.py: file containing functions for extraction, tranformation and loading of the data. These functions were specifically designed based on the format of the data that was provided for this project.
- dataProcessing.py: file containing some basic data processing functions that are non-data-format specific.
- clustering.py: file containing functions for k-means clustering and HCS clustering.
- Graph.py: file containing the Graph class that was used for the HCS clustering.
- classification.py: file containing functions for k nearest neighbour classification and classification in general.
- test.py testfile for in-between testing. Does not contribute to the project whatsoever.
The report can be found in the folder 'Report'.
Documentation can be found in 'build/latex/8cc00_clusteringandclassification.pdf'