This repository contains two Persian text analysis projects: Sentence Tokenization and Sentiment Analysis.
This project tokenizes sentences in Persian texts. It splits the input text into individual sentences and labels them based on the given ratings.
- Sentence Tokenization: Splits the text into separate sentences.
- Sentiment Labeling: Labels sentences based on their rating as
positive,neutral, ornegative.
- Install Required Libraries
- It's suggested to use Python 3.6 or 3.7 for sentence tokenizing and Parsinorm project.
- It's suggested to run the tokenizing project in a conda environment.
- Please visit the link and add any necessary files to the Anaconda directories.
- It's recommended to use Python 3.11 for running sentiment analysis.
Execute the script using Python. The script will prompt you for the following inputs:
- File Path: Path to the input CSV file.
- Text Column Index: Index of the column containing text data (e.g., 0 for the first column).
- Rating Column Index: Index of the column containing rating data (e.g., 1 for the second column).
- Output File Name: Name of the output CSV file where the tokenized data will be saved (e.g.,
tokenized_output.csv).
The script will read the CSV file, tokenize sentences, and save the results to the specified output file.
This project performs sentiment analysis on Persian sentences using the pre-trained "roberta-sentiment-persian" model. It classifies sentences into one of three categories: negative, positive, or neutral.
- Sentiment Analysis: Classifies sentences into
negative,positive, orneutral. - Pre-trained Model: Uses the "roberta-sentiment-persian" model for sentiment classification.
-
Install Required Libraries
Install the necessary libraries using the following command:
pip install transformers torch pandas
Execute the script using Python. The script will prompt you for the following inputs:
- File Path: Path to the input CSV file.
- Text Column Index: Index of the column containing text data (e.g., 0 for the first column).
- Output File Name: Name of the output CSV file where the sentiment labels will be saved (e.g.,
sentiment_output.csv).
The script will load the model, analyze the sentiment of each sentence, and save the results to the specified output file.