This project provides a full pipeline for analyzing financial news sentences using state-of-the-art NLP techniques. It includes:
- Data cleaning and exploratory analysis
- Topic modeling with LDA
- Sentiment classification using FinBERT and open-source LLMs
- Retrieval-Augmented Generation (RAG) for improved LLM performance
- Fine-tuning FinBERT for domain-specific sentiment analysis
-
Preprocessing & EDA:
- Load and clean the Financial PhraseBank dataset.
- Analyze sentiment distribution and text statistics.
-
Topic Modeling:
- Apply LDA to discover and interpret latent topics in financial text.
-
Sentiment Analysis:
- Use FinBERT for sentiment prediction.
- Evaluate with accuracy, F1-score, and confusion matrix.
-
Local LLM Sentiment Analysis:
- Run Mistral-7B or similar LLMs for zero-shot/few-shot sentiment classification.
- Compare with FinBERT results.
-
RAG:
- Retrieve similar examples using FAISS and sentence-transformers.
- Augment LLM prompts for better sentiment prediction.
-
Fine-Tuning:
- Fine-tune FinBERT on the dataset.
- Visualize and evaluate the improved model.
- Preprocessed data and topic assignments
- Sentiment predictions from FinBERT, LLM, and RAG
- Evaluation metrics and visualizations
- Fine-tuned FinBERT model
- Python (with Jupyter Notebook)
- pandas, numpy, matplotlib, seaborn, nltk, scikit-learn
- torch, transformers, sentence-transformers, faiss
Open DLP_Assignment_3.ipynb and run the cells step by step. Follow the comments for guidance.