Character Insight Extraction Using LLM (DeepStack Assignment)

Demo Video

DeepStack_Video_demo.mp4

Technologies Used

1. LangChain and Python

LangChain:
- Provides a robust framework for LLM integration, chaining prompts, and managing workflows..
Python:
- Core language used for implementing the project, ensuring flexibility and integration with ML frameworks.

2. Large Language Models (LLMs)

Model: mixtral-8x7b (Inferenced using GroqAPI)
Used for generating summaries, identifying relationships, and classifying character types.

3. Vector Databases

ChromaDB: Stores and retrieves text embeddings for efficient context-based search.
Embeddings: Generated using HuggingFace's all-MiniLM-L6-v2 model for high-quality semantic understanding.

4. Text Splitting

RecursiveCharacterTextSplitter: Splits large text into manageable chunks with overlaps to ensure continuity of context.

File Structure

Installation

Clone the repository:

git clone https://github.com/VivekShinde7/Character_Insight_Extractor_Using_LLM.git

cd Character_Insight_Extractor_Using_LLM

Set Up a Python Environment:

 conda create -prefix ./env python=3.9 -y

conda activate ./env

Install Dependencies:
```
 pip install -r requirements.txt
```

Set Up Environment Variables:

Create a .env file in the project root:

 HF_TOKEN = your_huggingface_token
 GROQ_API_KEY = your_groq_api_key

Usage

1. Preprocess the Text

Use the compute_embeddings.py script to process your text and store embeddings in the vector database:
```
python src/compute_embeddings.py data/
```

2. Analyze a Character

Run the get_character_info.py script to extract character details:
```
python src/get_character_info.py "<character_name>"
```

Example

python src/get_character_info.py "Eliza"

Output

Future Enhancements

Integration with Neo4j: Enable graph-based relationship visualization to provide a clearer and more interactive representation of character connections.
Fine-Tuning the LLM: Improve role classification and relationship detection accuracy by fine-tuning the LLM on a domain-specific dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
embeddings		embeddings
output		output
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Character Insight Extraction Using LLM (DeepStack Assignment)

Demo Video

Technologies Used

1. LangChain and Python

2. Large Language Models (LLMs)

3. Vector Databases

4. Text Splitting

File Structure

Installation

Usage

1. Preprocess the Text

2. Analyze a Character

Future Enhancements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Character Insight Extraction Using LLM (DeepStack Assignment)

Demo Video

Technologies Used

1. LangChain and Python

2. Large Language Models (LLMs)

3. Vector Databases

4. Text Splitting

File Structure

Installation

Usage

1. Preprocess the Text

2. Analyze a Character

Future Enhancements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages