Right-Wing Discourse Classifier

Overview

The RWD Classifier is a natural language processing system designed to analyze text and classify it according to a structured dictionary of right-wing discourse themes and subthemes. The system uses BERT for text classification, trained on a specialized dictionary that categorizes terms by their ideological weight and association with right-wing perspectives.

Key Features

Thematic Classification: Identifies right-wing discourse themes and subthemes in text
Weighted Dictionary: Uses a carefully curated dictionary with weighted terms (+2 to -2 scale)
BERT-based Model: Leverages state-of-the-art transformer architecture for accurate classification
Batch Processing: Can analyze individual texts or process entire Excel files
Explainable AI: Provides supporting evidence (key terms and descriptions) for classifications

Installation

Prerequisites

Python 3.7 or higher
pip package manager

Steps

Clone the repository:

git clone https://github.com/codestreamhubio/rw-discourse-classifier.git
cd rw-discourse-classifier

Install required packages:
```
pip install -r requirements.txt
```

Usage

Training the Model

To train a new classification model using your dictionary:

```bash
python train_model.py
```

Uses default settings with RWDictionary.xlsx

For Custom Train:

```bash
python train_model.py --input_file RWDictionary.xlsx --model_name bert-base-uncased --epochs 15 --batch_size 16 --output_dir rwd_classifier
```

Arguments:

--input_file: Path to Excel dictionary file (default: RWDictionary.xlsx)
--model_name: Pretrained BERT model name (default: bert-base-uncased)
--epochs: Number of training epochs (default: 15)
--batch_size: Training batch size (default: 16)
--max_length: Maximum token sequence length (default: 128)
--output_dir: Directory to save trained model (default: rwd_classifier)

Analyzing Text

To analyze an Excel file containing text data:

```bash
python analyze_text.py
```

Uses default input/output file with input_data.xlsx

For Custom Analysis:

```bash
python analyze_text.py --input input_data.xlsx --output output_data.xlsx --text_column Text --model_path rwd_classifier
```

Arguments:

--input: Input Excel file path (default: input_data.xlsx)
--output: Output Excel file path (default: overwrites input file)
--text_column: Column name containing text to classify (default: 'Text')
--model_path: Path to model directory (default: rwd_classifier)

Dictionary Structure

The system requires an Excel dictionary file with two sheets:

1. Weighted Sheet

Contains terms organized by:

Theme (e.g., "Nationalism", "Traditional Values")
Sub-theme (e.g., "Border Security", "Family Structure")
Weight categories:
- +2 (Strongly Supports RW View)
- +1 (Moderately Supports RW View)
- 0 (Neutral/Ambiguous)
- -1 (Moderately Opposes RW View)
- -2 (Strongly Opposes RW View)

2. Typology Sheet

Contains detailed descriptions for each sub-theme.

Output Interpretation

The classifier provides:

Predicted Theme: Broad ideological category
Predicted Subtheme: Specific discourse element
Subtheme Description: Explanation of the subtheme

Technical Details

Model Architecture

Base Model: BERT (bert-base-uncased)
Classification Head: Single linear layer
Training: Fine-tuned with AdamW optimizer
Learning Rate: 2e-5 with 500 warmup steps

Data Processing

Tokenization: BERT WordPiece tokenizer
Sequence Length: 128 tokens (truncated/padded)
Label Encoding: sklearn LabelEncoder

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Readme.md		Readme.md
analyze_text.py		analyze_text.py
requirements.txt		requirements.txt
train_model.py		train_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Right-Wing Discourse Classifier

Overview

Key Features

Installation

Prerequisites

Steps

Usage

Training the Model

Arguments:

Analyzing Text

Arguments:

Dictionary Structure

1. Weighted Sheet

2. Typology Sheet

Output Interpretation

Technical Details

Model Architecture

Data Processing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Right-Wing Discourse Classifier

Overview

Key Features

Installation

Prerequisites

Steps

Usage

Training the Model

Arguments:

Analyzing Text

Arguments:

Dictionary Structure

1. Weighted Sheet

2. Typology Sheet

Output Interpretation

Technical Details

Model Architecture

Data Processing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages