| title | A Graphical User Interface for Well-Connected Clusters | |||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| tags |
|
|||||||||||||||||||||||||||||||||||||
| authors |
|
|||||||||||||||||||||||||||||||||||||
| affiliations |
|
|||||||||||||||||||||||||||||||||||||
| date | 7 March 2025 | |||||||||||||||||||||||||||||||||||||
| bibliography | paper.bib |
Community detection in networks has broad applications [@Fortunato2022]. Beyond the intuitive expectation that communities have greater edge density relative to network background, an important, but sometimes overlooked, quality of "good clusters" is that they should be internally well-connected [@park2024well;@traag2019louvain]. Post-clustering techniques such as the Connectivity Modifier (CM) [@park2024well;@Ramavarapu2024] and Well-Connected Clusters (WCC) [@park2024improved] modify a clustering to ensure that all clusters meet a minimum standard defined by their mincut. Here, we present a simple user-friendly graphical user interface (GUI) that reduces the burden of installation and the complexity of command line operations for non-expert users. The GUI enables a user to cluster a network and perform a post-treatment to enforce well-connectedness of clusters.
CM was designed to enforce well-connectedness in clusters generated during a community detection process [@park2024well;@Ramavarapu2024]. The basis by which a cluster is considered well-connected is defined by the size of its min-cut, the smallest set of edges that need to be removed in order for the cluster to be split. In the CM pipeline, if the min-cut of a cluster is above a user-specified threshold, a cluster is considered well-connected. If not, then the min-cut is removed and the products of the cut (two clusters) are re-clustered and re-tested for their min-cuts until every community is well-connected. The threshold specified in [@park2024well] is a mild standard of
One use of the WCC or CM post-treatment is to address the “resolution limit” [@fortunato2007resolution], which is where a clustering method is unable to find the valid communities that are too small for the method. The classical example is a ring of cliques, where each pair of adjacent cliques are connected by a single edge. As established in [@fortunato2007resolution], for large enough numbers of cliques, modularity clustering will group adjacent cliques into single clusters, which is a failure to detect the obvious communities. Here we show that the same problem occurs for Leiden-CPM, given a ring of 40 cliques, each of size 6. Before CM or WCC, Leiden-CPM will group 5 of the cliques together into a single cluster, but following with CM or WCC post-treatment breaks up these incorrect clusters so that only the cliques are returned as clusters (\autoref{fig:cpm-wcc}).
Clustering has broad applications. The selection of a clustering method and choice of parameter settings is often assisted by exploratory analysis. A user-friendly GUI enables such initial exploratory analysis and lowers the barrier for entry. The GUI described here can also be used as an instructional tool in introductory classes on community detection.
The GUI is modularized into front-end and back-end components. The GUI is implemented in Python leveraging Streamlit [@streamlit] for the front-end and FastAPI [@ramirez_fastapi_2018] for the back-end and supports remote hosting on a website. We show in \autoref{fig:gui-interface} the main interface for the GUI.
In \autoref{fig:gui-options}, we show the different clustering options available in the GUI. At present, support is provided for 4 different clustering algorithms: Leiden-CPM (Constant Potts Model) [@traag2019louvain;@traag2011narrow], Leiden-Modularity [@traag2019louvain], Infomap [@Rosvall2008], and Stochastic Block Models (SBM) [@peixoto_graph-tool_2014]. Each algorithm takes a set of parameters that must be specified before running the pipeline. The parameters are explained in the CM pipeline documentation and in the CM-GUI documentation.
The user is required to upload a network as an edge list, which is then clustered using the method selected by the user, after which well-connectedness is enforced. The user may also upload a pre-computed clustering of the network and skip the initial clustering stage of the CM pipeline. Results can be downloaded at the end of the run by clicking the "Download Clustering data as CSV" button.
To install the CM pipeline GUI, the user has two options: via Docker or a manual install.
We recommend the Docker install as it simplifies installation of necessary packages for running the GUI. A local installation option without Docker is available and described in the README of the repository.
The GUI makes the CM pipeline accessible to entry-level users and can be used as an instructional tool. The size of the networks that can be handled by this GUI depend on the hardware that it is installed on. Future work planned will include retrieving basic cluster statistics and visualizations through the GUI on the clusterings produced.
We would like to thank Ian Wei Chen and The-Anh Vu-Le for their help in testing the software. Work on this project was supported by funds from the Illinois-Insper Partnership.


