ADAPTLab · AdityaAS · Oct 5, 2020 · Oct 5, 2020 · Oct 6, 2020 · Dec 19, 2020
diff --git a/README.md b/README.md
@@ -1,7 +1,97 @@
 # MuDBSCAN
-A fast, exact, and scalable algorithm for DBSCAN clustering. This repository contains a sequential as well as a distributed memory implementation for the same.
+A fast, exact, and scalable algorithm for DBSCAN clustering.
+This repository contains the implementation for the distributed spatial clustering algorithm proposed in the paper `μDBSCAN: An Exact Scalable DBSCAN Algorithm for Big Data Exploiting Spatial Locality`
+Link to paper - https://adityaas.github.io/documents/MuDBSCAN_CLUSTER19.pdf
 
-# Code Execution
+We propose an extremely efficient way to compute neighbourhood queries that not only improves the average time complexity but exhibits super-linear speed up on large astronomical datasets. Using the distributed variant of our algorithm, we were **able to cluster 1 billion 3D points in under 42 minutes**
+
+To cite our work please use 
+```
+A. Sarma et al., "μDBSCAN: An Exact Scalable DBSCAN Algorithm for Big Data Exploiting Spatial Locality," 2019 IEEE International Conference on Cluster Computing (CLUSTER), Albuquerque, NM, USA, 2019, pp. 1-11, doi: 10.1109/CLUSTER.2019.8891020.
+```
+
+## Setup
+1. Clone the repository
+2. Install dependencies (gcc/g++, [open-mpi](https://www.open-mpi.org/))
+3. To run the distributed variant of the algorithm, a MPI cluster has to be [setup](https://mpitutorial.com/tutorials/running-an-mpi-cluster-within-a-lan/)
+
+## Running the algorithm
+1. Re-format your input according to the template below and store the file in a folder `datasets`
+
+```
+<number_of_data_points>
+<dimension>
+<data_1_dim_1> <data_1_dim_2> ... <data_1_dim_dimension>
+.
+.
+.
+...
+```
+
+For example
+```
+10
+2
+1 20
+2 20
+2 19
+8 15
+8 14
+7 15
+9 14
+9 17
+12 17
+11 18
+```
+
+2. Running the sequential algorithm
 ```shell
 ./runs.sh <dataset> <epsilon> <minpts> <MinDegree Rtree> <MaxDegree Rtree>
+
+where
+- <dataset> is the name of the file formatted according to 1.
+- <epsilon> represents the neighbourhood parameter (Anything within epislon distance from given point is considered a neighbour)
+- <minpts> represents the density parameter. It defines the minimum number of neighbours required for a point to be classified as `dense`
+- <MinDegree Rtree> and <MaxDegree Rtree> hyper-parameters and correspond to the minimum and maximum degree of the custom defined μC-RTree
+```
+
+3. Running the distributed algorithm
+```shell
+./rund.sh <dataset> <epsilon> <minpts> <nodes> <hostfile> <MinDegree Rtree> <MaxDegree Rtree>
+
+where <dataset>, <epsilon>, <minpts>, <MinDegree, Rtree>, <MaxDegree Rtree> are same as before and
+- <nodes> number of nodes to use within the cluster
+- <hostfile> list of nodes configured in the server (hostnames)
+
 ```
+
+## Overview
+### Overview of the Algorithm
+![overview](images/overview.png)
+
+### Time complexity
+![complexity](images/table1.png)
+
+## Results
+1. **Proposed sequential algorithm compared with existing sequential clustering algorithms**
+![sequential](images/table2.png)
+
+2. **Proposed algorithm compared with existing clustering algorithms on 32 nodes**
+![sequential](images/table5.png)
+
+3. **Run-time split up across various steps in μDBSCAN**
+**Sequential algorithm split up**
+![seq](images/table3.png)
+
+- **Distributed algorithm split up**
+![dist](images/table7.png)
+
+4. **Speed up across various steps in μDBSCAN**
+![speedup](images/table8.png)
+
+5. **Peak memory consumption of μDBSCAN**
+![memory](images/table4.png)
+
+6. **Scalability of μDBSCAN**
+![scalability](images/fig6_7.png)
+
diff --git a/images/fig5.png b/images/fig5.png
diff --git a/images/fig6_7.png b/images/fig6_7.png
diff --git a/images/overview.png b/images/overview.png
diff --git a/images/table1.png b/images/table1.png
diff --git a/images/table2.png b/images/table2.png
diff --git a/images/table3.png b/images/table3.png
diff --git a/images/table4.png b/images/table4.png
diff --git a/images/table5.png b/images/table5.png
diff --git a/images/table7.png b/images/table7.png
diff --git a/images/table8.png b/images/table8.png
diff --git a/rund.sh b/rund.sh
@@ -0,0 +1,16 @@
+path=../datasets/
+
+input=$1
+eps=$2
+minpts=$3
+
+nodes=$4
+hostfile=$5
+m=$6
+M=$7
+
+output=output_$1\_EPS=$eps\_Minpts=$minpts\_nodes=$nodes\_m=$m\M=$M.txt
+make clean
+make
+
+mpirun -np $nodes --map-by node --hostfile $hostfile ./output $path$input $eps $minpts $m $M $output
diff --git a/runs.sh b/runs.sh
@@ -13,4 +13,4 @@ output=output_$1\_EPS=$eps\_Minpts=$minpts\_m=$m\_M=$M.txt
 debug=debug_$1\_EPS=$eps\_Minpts=$minpts\_m=$m\_M=$M.txt
 neighbour=neighbour_$1\_EPS=$eps\_Minpts=$minpts\_m=$m\_M=$M.txt
 
-./output $path$input $eps $minpts $m $M $output
+./output $path$input $eps $minpts $m $M $output