This is subset of code files for our course in Social Network Analysis. This part of code is responsible for parsing the input graph of users on flicker, collected using Flickr API. It is optimised for extracting familiarity between users that have similarity in their selection of groups.
The main file inputs .csv files, and converts it to a temporary intermediate version to reduce memory requirements (converting strings to numbers, and also to test different logic faster by not ingesting full file sizes everytime) during processing time, and also optimised with map data structure to finds the bins faster. This was re-written from scratch in C++ to use proper data structure and to reduce inefficient memory usage observed in previous python implementation (roughly 50x reduction in main memory requirement and 3x reduction in time).
Although initially planned to support parallel data reading and processing, supporting both Linux and Windows with multi-processing would have added unnecessary complexity.
The correlation between familiarity and similarity between users peeked when they shared 6 to 12 groups in common.
