Language-Agnostic Text Classifier

Train on English. Works on Hindi. No translation. No multilingual training. This project demonstrates that multilingual intelligence is not about data quantity — it’s about abstraction.

Core Idea

Languages look different, but meaning is shared.

Instead of teaching the model:

“what English words mean” we force it to learn:
what sentences mean, independent of language If a Hindi sentence expresses the same idea as an English one,
their internal representations should be close.

Language-Agnostic Semantic Learning

The model is trained to focus on semantics, not surface words. It learns:

positivity vs negativity
intent and sentiment
meaning patterns rather than memorizing English vocabulary.

Contrastive Learning

During training, each English sentence is paired with a semantic variant of itself
(shuffled, reordered, lightly perturbed). The model is trained so that:

same meaning → embeddings closer
different meaning → embeddings farther This forces abstraction. As a result:
Hindi sentences with similar meaning naturally fall into the same regions.

Sentence-Level Classification

We classify entire sentences, not individual words. Why?

Meaning transfers across languages better than syntax
Sentence semantics are more universal than grammar rules This makes cross-lingual generalization practical.

Multiscript Tokenization

The model can read Hindi characters, but:

it is never trained on Hindi data
it never sees Hindi labels

Hindi capability is emergent, not supervised.

What the System Actually Does

Training phase

Uses only English data
Learns sentiment classification
Learns language-independent sentence representations

Inference phase

Accepts:
- English input
- Hindi input
- Code-mixed input (partial)
Outputs a classification label No translation. No retraining.

One-Line Summary

A language-agnostic text classifier trained only on English that generalizes to Hindi by learning meaning instead of language.

Usage

For using model visit: 🤗Hugging Face

- Shiv Prakash Verma

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Language_Agnostic_Text_Classifier.ipynb		Language_Agnostic_Text_Classifier.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Language-Agnostic Text Classifier

Core Idea

Language-Agnostic Semantic Learning

Contrastive Learning

Sentence-Level Classification

Multiscript Tokenization

What the System Actually Does

Training phase

Inference phase

One-Line Summary

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Language-Agnostic Text Classifier

Core Idea

Language-Agnostic Semantic Learning

Contrastive Learning

Sentence-Level Classification

Multiscript Tokenization

What the System Actually Does

Training phase

Inference phase

One-Line Summary

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages