Skip to content

Latest commit

 

History

History
103 lines (73 loc) · 10.5 KB

File metadata and controls

103 lines (73 loc) · 10.5 KB
title A Graphical User Interface for Well-Connected Clusters
tags
Python
Data Science
Network Science
authors
name orcid affiliation
Joao Alfredo Cardoso Lamy
0009-0005-4744-4754
1
name orcid affiliation
Tomas Alessi
0009-0006-2658-5758
1
name orcid affiliation
Tandy Warnow
0000-0001-7717-3514
2
name orcid affiliation
George Chacko
0000-0002-2127-1892
2
name orcid affiliation corresponding
Minhyuk Park
0000-0002-8676-7565
2
true
affiliations
name index
Insper Instituto de Ensino e Pesquisa, Sao Paulo, Brazil
1
name index
Siebel School of Computing and Data Science, University of Illinois Urbana-Champaign, IL 61801, USA
2
date 7 March 2025
bibliography paper.bib

Introduction

Community detection in networks has broad applications [@Fortunato2022]. Beyond the intuitive expectation that communities have greater edge density relative to network background, an important, but sometimes overlooked, quality of "good clusters" is that they should be internally well-connected [@park2024well;@traag2019louvain]. Post-clustering techniques such as the Connectivity Modifier (CM) [@park2024well;@Ramavarapu2024] and Well-Connected Clusters (WCC) [@park2024improved] modify a clustering to ensure that all clusters meet a minimum standard defined by their mincut. Here, we present a simple user-friendly graphical user interface (GUI) that reduces the burden of installation and the complexity of command line operations for non-expert users. The GUI enables a user to cluster a network and perform a post-treatment to enforce well-connectedness of clusters.

Background

The CM pipeline

CM was designed to enforce well-connectedness in clusters generated during a community detection process [@park2024well;@Ramavarapu2024]. The basis by which a cluster is considered well-connected is defined by the size of its min-cut, the smallest set of edges that need to be removed in order for the cluster to be split. In the CM pipeline, if the min-cut of a cluster is above a user-specified threshold, a cluster is considered well-connected. If not, then the min-cut is removed and the products of the cut (two clusters) are re-clustered and re-tested for their min-cuts until every community is well-connected. The threshold specified in [@park2024well] is a mild standard of $log_{10}n$, with $n$ being the number of nodes in the cluster but the pipeline allows users to specify their own criteria through custom functions.

Well-Connected Clusters

One use of the WCC or CM post-treatment is to address the “resolution limit” [@fortunato2007resolution], which is where a clustering method is unable to find the valid communities that are too small for the method. The classical example is a ring of cliques, where each pair of adjacent cliques are connected by a single edge. As established in [@fortunato2007resolution], for large enough numbers of cliques, modularity clustering will group adjacent cliques into single clusters, which is a failure to detect the obvious communities. Here we show that the same problem occurs for Leiden-CPM, given a ring of 40 cliques, each of size 6. Before CM or WCC, Leiden-CPM will group 5 of the cliques together into a single cluster, but following with CM or WCC post-treatment breaks up these incorrect clusters so that only the cliques are returned as clusters (\autoref{fig:cpm-wcc}).

\textbf{Leiden-CPM(0.01) without and with WCC treatment on a ring-of-cliques network (40 cliques with 6 nodes each)} Left: Zoomed in view of Leiden-CPM with resolution value 0.01. Right: Zoomed in view of Leiden-CPM with resolution value 0.01 and post-treated using WCC. The visualization uses colors to denote different clusters. Leiden-CPM by itself merges adjacent cliques into a single large cluster whereas WCC post-treatment is able to separate out individual cliques into their own clusters.  \label{fig:cpm-wcc}{height="150pt"}

Statement of need

Clustering has broad applications. The selection of a clustering method and choice of parameter settings is often assisted by exploratory analysis. A user-friendly GUI enables such initial exploratory analysis and lowers the barrier for entry. The GUI described here can also be used as an instructional tool in introductory classes on community detection.

The GUI

GUI Architecture

The GUI is modularized into front-end and back-end components. The GUI is implemented in Python leveraging Streamlit [@streamlit] for the front-end and FastAPI [@ramirez_fastapi_2018] for the back-end and supports remote hosting on a website. We show in \autoref{fig:gui-interface} the main interface for the GUI.

\textbf{Main interface} Example successful execution of clustering using the Leiden algorithm optimizing for CPM and following up with CM post-treatment. \label{fig:gui-interface}{height="180pt"}

In \autoref{fig:gui-options}, we show the different clustering options available in the GUI. At present, support is provided for 4 different clustering algorithms: Leiden-CPM (Constant Potts Model) [@traag2019louvain;@traag2011narrow], Leiden-Modularity [@traag2019louvain], Infomap [@Rosvall2008], and Stochastic Block Models (SBM) [@peixoto_graph-tool_2014]. Each algorithm takes a set of parameters that must be specified before running the pipeline. The parameters are explained in the CM pipeline documentation and in the CM-GUI documentation.

The user is required to upload a network as an edge list, which is then clustered using the method selected by the user, after which well-connectedness is enforced. The user may also upload a pre-computed clustering of the network and skip the initial clustering stage of the CM pipeline. Results can be downloaded at the end of the run by clicking the "Download Clustering data as CSV" button.

\textbf{Example options for GUI} Left: Choices for clustering algorithms. Right: Choices for post-treatment. In the GUI, the algorithm choices dropdown menu specifies the clustering algorithm for the initial clustering. The post-treatment specifies which, if any, post-treatment should be applied to the clustering. If an existing clustering is not uploaded, the algorithm must be specified to use CM as post-treatment. \label{fig:gui-options}{height="150pt"}

Installing the GUI

To install the CM pipeline GUI, the user has two options: via Docker or a manual install.

We recommend the Docker install as it simplifies installation of necessary packages for running the GUI. A local installation option without Docker is available and described in the README of the repository.

Conclusions

The GUI makes the CM pipeline accessible to entry-level users and can be used as an instructional tool. The size of the networks that can be handled by this GUI depend on the hardware that it is installed on. Future work planned will include retrieving basic cluster statistics and visualizations through the GUI on the clusterings produced.

Acknowledgements

We would like to thank Ian Wei Chen and The-Anh Vu-Le for their help in testing the software. Work on this project was supported by funds from the Illinois-Insper Partnership.

References