HINTs

This is the open source repository for paper HINTs: Sensemaking on large collections of documents with Hypergraph visualization and INTelligent agents: https://doi.org/10.1109/TVCG.2024.3459961

Overview

Overview of the HINTs system.

(a) The peripheral area of Cluster View shows the mentioned keywords of highlighted documents using Gilbert curves.

(b) The center area of Cluster View shows the topic structure of the corpus using Gosper curves.

(c) The Document View shows a list of selected documents.

(d) The Chatbot View provides a chatbot interface to answer user questions with the option to insert selected documents in the prompt.

Interactions

Click: left click on a cluster to expand it
Cmd (mac) or Ctrl (windows): Hover over a cluster, then hold Cmd or Ctrl to temporarily expand a cluster to see its sub clusters
Zoom: Default by mouse wheels or gestures on trackpads
Pan: Drag on empty spaces in the layout to pan

Reproduce

The reproducing step can be divided into data processing and launching the server. We provide preprocessed data so that interested individuals can skip the data processing step. Data can be downloaded via this Link.

Where to put the data

After downloading, decompress and put the data under the server directory. You should see server/data/VisPub and server/data/AllTheNews

If you wish to reproduce the data processing step, follow the instructions under Reproducing the data processing stage

Launching the backend server

The backend server is implemented with Flask. In addition, the server needs an OpenAI api key. Follow these steps to set up the required environments.

Navigate to server directory.
Create a file called openai_api_key under the server directory, and copy and paste the api key from OpenAI's website.
Set up the python environment with:

# typical
pip install -r requirements.txt
# or sometimes on Mac
python3 -m pip install -r requirements.txt

Make sure the data files are present (see Where to put the data)
run flask run
The server should be up and listening to localhost:5000. It might take 3-5 minutes to start the server.

Launching the frontend

The frontend is implemented with vite.js and svelte.js To launch, navigate to frontend-svelte and run:

npm i
npm run dev

The frontend should be up and listening to localhost:5173. Note that frontend-baseline is the interface for the ChatGPT control group in our user study. frontend-deprecated is a deprecated folder and can be ignored.

Reproducing the data processing stage

Reproduction on the AllTheNews and VisPub dataset can be done under the reproduce directory. The directory is organized as follows:

Summarization, Keyword Extraction, Keyword Disambiguation, and Embedding Generation, and Hypergraph Construction are done with separate prompts for different datasets, so they are separated in reproduce/AllTheNews and reproduce/VisPub
For Hierarchical Clustering and Topic Label Generation, the two datasets use the same algorithm, so they are under reproduce/clustering.py and reproduce/hierarchical_topics.py.

Follow these steps to reproduce (using VisPub as an example):

Download the raw dataset from this link
Decompress and put The raw datasets for VisPub should be put under reproduce: you should see reproduce/VisPub/data/raw/. (or reproduce/AllTheNews/data/ for AllTheNews)
The data processing needs an OpenAI api key. Request one from OpenAI, create a file called openai_api_key under reproduce and copy and paste the key from OpenAI's website.
Navigate to reproduce/VisPub/, and run bash execute_pipeline.sh.
Once finished, copy the files under reproduce/VisPub/data/result/server/ to server/data/VisPub/
Then navigate back to reproduce for clustering and hierarchical topic generation.
Run the following commands

# process articles on VisPub 
python3 clustering.py -dataset VisPub -data_type article
python3 hierarchical_topics.py -dataset VisPub -data_type article
# process entities (keywords) on VisPub 
python3 clustering.py -dataset VisPub -data_type entity
python3 hierarchical_topics.py -dataset VisPub -data_type entity

Once finished, copy the json files under reproduce/VisPub/data/result/server/ to server/data/VisPub/.
Follow the instructions for launching the backend server to launch the server.

Name		Name	Last commit message	Last commit date
Latest commit History 139 Commits
.vscode		.vscode
docs		docs
frontend-baseline		frontend-baseline
frontend-svelte		frontend-svelte
preprocess		preprocess
reproduce		reproduce
server		server
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HINTs

Overview

Interactions

Reproduce

Where to put the data

Launching the backend server

Launching the frontend

Reproducing the data processing stage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

SamLee-dedeboy/HINTs

Folders and files

Latest commit

History

Repository files navigation

HINTs

Overview

Interactions

Reproduce

Where to put the data

Launching the backend server

Launching the frontend

Reproducing the data processing stage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages