This is the open source repository for paper HINTs: Sensemaking on large collections of documents with Hypergraph visualization and INTelligent agents: https://doi.org/10.1109/TVCG.2024.3459961
(a) The peripheral area of Cluster View shows the mentioned keywords of highlighted documents using Gilbert curves.
(b) The center area of Cluster View shows the topic structure of the corpus using Gosper curves.
(c) The Document View shows a list of selected documents.
(d) The Chatbot View provides a chatbot interface to answer user questions with the option to insert selected documents in the prompt.
Click: left click on a cluster to expand itCmd(mac) orCtrl(windows): Hover over a cluster, then holdCmdorCtrlto temporarily expand a cluster to see its sub clustersZoom: Default by mouse wheels or gestures on trackpadsPan: Drag on empty spaces in the layout to pan
The reproducing step can be divided into data processing and launching the server.
We provide preprocessed data so that interested individuals can skip the data processing step. Data can be downloaded via this Link.
After downloading, decompress and put the data under the server directory. You should see server/data/VisPub and server/data/AllTheNews
If you wish to reproduce the data processing step, follow the instructions under Reproducing the data processing stage
The backend server is implemented with Flask. In addition, the server needs an OpenAI api key. Follow these steps to set up the required environments.
- Navigate to
serverdirectory. - Create a file called
openai_api_keyunder theserverdirectory, and copy and paste the api key from OpenAI's website. - Set up the python environment with:
# typical
pip install -r requirements.txt
# or sometimes on Mac
python3 -m pip install -r requirements.txt- Make sure the data files are present (see Where to put the data)
- run
flask run - The server should be up and listening to
localhost:5000. It might take 3-5 minutes to start the server.
The frontend is implemented with vite.js and svelte.js
To launch, navigate to frontend-svelte and run:
npm i
npm run devThe frontend should be up and listening to localhost:5173.
Note that frontend-baseline is the interface for the ChatGPT control group in our user study. frontend-deprecated is a deprecated folder and can be ignored.
Reproduction on the AllTheNews and VisPub dataset can be done under the reproduce directory.
The directory is organized as follows:
Summarization,Keyword Extraction,Keyword Disambiguation, andEmbedding Generation, andHypergraph Constructionare done with separate prompts for different datasets, so they are separated inreproduce/AllTheNewsandreproduce/VisPub- For
Hierarchical ClusteringandTopic Label Generation, the two datasets use the same algorithm, so they are underreproduce/clustering.pyandreproduce/hierarchical_topics.py.
Follow these steps to reproduce (using VisPub as an example):
-
Download the raw dataset from this link
-
Decompress and put The raw datasets for
VisPubshould be put underreproduce: you should seereproduce/VisPub/data/raw/. (orreproduce/AllTheNews/data/forAllTheNews) -
The data processing needs an OpenAI api key. Request one from OpenAI, create a file called
openai_api_keyunderreproduceand copy and paste the key from OpenAI's website. -
Navigate to
reproduce/VisPub/, and runbash execute_pipeline.sh. -
Once finished, copy the files under
reproduce/VisPub/data/result/server/toserver/data/VisPub/ -
Then navigate back to
reproducefor clustering and hierarchical topic generation. -
Run the following commands
# process articles on VisPub
python3 clustering.py -dataset VisPub -data_type article
python3 hierarchical_topics.py -dataset VisPub -data_type article
# process entities (keywords) on VisPub
python3 clustering.py -dataset VisPub -data_type entity
python3 hierarchical_topics.py -dataset VisPub -data_type entity- Once finished, copy the json files under
reproduce/VisPub/data/result/server/toserver/data/VisPub/. - Follow the instructions for launching the backend server to launch the server.
