Skip to content

VivekShinde7/Character_Insight_Extractor_Using_LLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Character Insight Extraction Using LLM (DeepStack Assignment)

Demo Video

DeepStack_Video_demo.mp4

Technologies Used

1. LangChain and Python

  • LangChain:
    • Provides a robust framework for LLM integration, chaining prompts, and managing workflows..
  • Python:
    • Core language used for implementing the project, ensuring flexibility and integration with ML frameworks.

2. Large Language Models (LLMs)

  • Model: mixtral-8x7b (Inferenced using GroqAPI)
  • Used for generating summaries, identifying relationships, and classifying character types.

3. Vector Databases

  • ChromaDB: Stores and retrieves text embeddings for efficient context-based search.
  • Embeddings: Generated using HuggingFace's all-MiniLM-L6-v2 model for high-quality semantic understanding.

4. Text Splitting

  • RecursiveCharacterTextSplitter: Splits large text into manageable chunks with overlaps to ensure continuity of context.

File Structure

file_structure

Installation

  1. Clone the repository:

    git clone https://github.com/VivekShinde7/Character_Insight_Extractor_Using_LLM.git
    cd Character_Insight_Extractor_Using_LLM
  2. Set Up a Python Environment:

     conda create -prefix ./env python=3.9 -y
    conda activate ./env
  3. Install Dependencies:

     pip install -r requirements.txt
  4. Set Up Environment Variables:

    • Create a .env file in the project root:
     HF_TOKEN = your_huggingface_token
     GROQ_API_KEY = your_groq_api_key

Usage

1. Preprocess the Text

  • Use the compute_embeddings.py script to process your text and store embeddings in the vector database:
    python src/compute_embeddings.py data/

2. Analyze a Character

  • Run the get_character_info.py script to extract character details:
    python src/get_character_info.py "<character_name>"
  • Example
    python src/get_character_info.py "Eliza"
  • Output output

Future Enhancements

  • Integration with Neo4j: Enable graph-based relationship visualization to provide a clearer and more interactive representation of character connections.
  • Fine-Tuning the LLM: Improve role classification and relationship detection accuracy by fine-tuning the LLM on a domain-specific dataset.

About

This project utilizes advanced Large Language Models (LLMs) and vector database technologies to extract structured information about characters from literary texts. It is designed to analyze a given text, identify key characters, and determine their summaries, relationships, and roles (e.g., Protagonist, Antagonist, or Side character)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages