Train on English. Works on Hindi. No translation. No multilingual training. This project demonstrates that multilingual intelligence is not about data quantity — it’s about abstraction.
Languages look different, but meaning is shared.
Instead of teaching the model:
- “what English words mean” we force it to learn:
- what sentences mean, independent of language
If a Hindi sentence expresses the same idea as an English one,
their internal representations should be close.
The model is trained to focus on semantics, not surface words. It learns:
- positivity vs negativity
- intent and sentiment
- meaning patterns rather than memorizing English vocabulary.
During training, each English sentence is paired with a semantic variant of itself
(shuffled, reordered, lightly perturbed).
The model is trained so that:
- same meaning → embeddings closer
- different meaning → embeddings farther This forces abstraction. As a result:
- Hindi sentences with similar meaning naturally fall into the same regions.
We classify entire sentences, not individual words. Why?
- Meaning transfers across languages better than syntax
- Sentence semantics are more universal than grammar rules This makes cross-lingual generalization practical.
The model can read Hindi characters, but:
- it is never trained on Hindi data
- it never sees Hindi labels
Hindi capability is emergent, not supervised.
- Uses only English data
- Learns sentiment classification
- Learns language-independent sentence representations
- Accepts:
- English input
- Hindi input
- Code-mixed input (partial)
- Outputs a classification label No translation. No retraining.
A language-agnostic text classifier trained only on English that generalizes to Hindi by learning meaning instead of language.
For using model visit: 🤗Hugging Face
- Shiv Prakash Verma