GitHub - Social-AI-Studio/HateXScore

A public repository containing datasets and code for the paper HateXScore: A Metric Suite for Evaluating Reasoning Quality in Hate Speech Explanations (EACL 2026).

HateXScore is a metric suite for evaluating the quality of model-generated explanations in hate speech detection. It focuses on whether an explanation clearly states a conclusion, quotes relevant evidence faithfully, identifies the targeted protected group, and remains logically consistent with the model prediction.

This repository provides a modular implementation of HateXScore with the following components:

HTC: Hate-Type Check
QF: Quotation Faithfulness
TGI: Target-Group Identification
CC: Consistency Check

The final HateXScore is computed from these four sub-metrics. By default, all metrics have equal weight, and the repository also supports configurable metric weights.

Repository Structure

hatexscore/
├── __init__.py
├── htc.py
├── qf.py
├── tgi.py
├── cc.py
└── utils.py

What Each Module Does

hatexscore/htc.py: conclusion detection logic and label extraction utilities
hatexscore/qf.py: quotation overlap extraction, masking, probability estimation, and quotation faithfulness scoring
hatexscore/tgi.py: protected-group matching with language-aware tokenization and lemmatization
hatexscore/cc.py: consistency rule between prediction, QF, and TGI
hatexscore/utils.py: evaluator class, CLI entrypoint, dataset loading, protected-group lists, and final score aggregation

Features

Supports English, Chinese, and Korean
Supports multiple protected-group inventories
Works on JSONL reasoning outputs paired with CSV datasets
Keeps the original evaluation logic while exposing configurable final metric weights
Designed to align with the HateXScore paper workflow

Installation

Create and activate a Python environment first, then install the dependencies used in the current implementation.

pip install numpy pandas spacy jieba fuzzysearch openai konlpy

You may also need:

a spaCy English model such as en_core_web_sm
Java installed for konlpy in Korean settings

Example:

python -m spacy download en_core_web_sm

Input Format

The current implementation expects:

A JSONL file containing generated reasoning outputs
A CSV file containing the original input text and gold labels

Each JSONL line should contain fields used by the script such as:

{"ID": .., "text": "...", "raw": "...model explanation...", "label": "hateful", "flag": true}

The CSV schema depends on the dataset selected with --dataset. The script already contains dataset-specific column mappings for:

implicit
hatexplain
hatecheck
toxicn
hasoc
kold

Usage

Because the code uses package-relative imports, run it from the parent directory of hatexscore with module mode:

python -m hatexscore.utils \
  --dataset hatexplain \
  --data_path /path/to/reason_output.json \
  --input_csv /path/to/input.csv \
  --output_dir /path/to/output_dir \
  --model gpt \
  --protected_group un_en \
  --lang en

Configurable Metric Weights

The original code used a simple average across the four metrics. This repository keeps the same default behavior by setting all weights to 1.0, but also lets you adjust the final aggregation.

Available arguments:

--weight_htc
--weight_qf
--weight_tgi
--weight_cc

Example:

python -m hatexscore.utils \
  --dataset hatexplain \
  --data_path /path/to/reason_output.json \
  --input_csv /path/to/input.csv \
  --output_dir /path/to/output_dir \
  --model gpt \
  --protected_group un_en \
  --lang en \
  --weight_htc 1.0 \
  --weight_qf 2.0 \
  --weight_tgi 1.0 \
  --weight_cc 1.0

The final score is computed as a weighted average:

HateXScore =
  (HTC * w_htc + QF * w_qf + TGI * w_tgi + CC * w_cc)
  / (w_htc + w_qf + w_tgi + w_cc)

Protected Group Inventories

The current implementation includes several built-in inventories:

facebook
youtube
un_en
un_zh
un_kr

These are selected through --protected_group.

Output

For each sample, the script writes a JSON object containing:

input text
reasoning
gold label
predicted label
per-metric scores
final HateXScore

The output file is written to:

{output_dir}/{dataset}_metric_{model}.json

Programmatic Use

You can also import the evaluator directly:

from hatexscore import ReasoningMetricsEvaluator

weights = {
    "HTC": 1.0,
    "Quotation Faithfulness": 1.0,
    "TGI": 1.0,
    "Consistency Check": 1.0,
}

evaluator = ReasoningMetricsEvaluator(
    language="en",
    metric_weights=weights,
    runtime_args=args,
)

Then evaluate a sample:

sample = {
    "text": "example text",
    "reasoning": "example explanation",
    "gold_label": "hateful",
    "prediction": "hateful",
}

result = evaluator.evaluate_sample(sample, target_group_list)

Notes

The current implementation preserves the original logic of the provided codebase as closely as possible.
QF uses the OpenRouter-compatible OpenAI client configured in the source code.
If you run python utils.py directly inside the hatexscore/ directory, relative imports may fail. Use python -m hatexscore.utils instead.
Some dependencies in the original larger script are not needed in this simplified modular repo.

Citation

If you use this repository in academic work, please cite the HateXScore paper.

@article{hu2026hatexscore,
  title={HateXScore: A Metric Suite for Evaluating Reasoning Quality in Hate Speech Explanations},
  author={Hu, Yujia and Lee, Roy Ka-Wei},
  journal={arXiv preprint arXiv:2601.13547},
  year={2026}
}

Disclaimer

This repository is intended for research use on hate speech detection and explanation evaluation. It may process sensitive or offensive text as part of the evaluation pipeline.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
dataset		dataset
hatexscore		hatexscore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Repository Structure

What Each Module Does

Features

Installation

Input Format

Usage

Configurable Metric Weights

Protected Group Inventories

Output

Programmatic Use

Notes

Citation

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Repository Structure

What Each Module Does

Features

Installation

Input Format

Usage

Configurable Metric Weights

Protected Group Inventories

Output

Programmatic Use

Notes

Citation

Disclaimer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages