This project implements a deep learning approach to Next Syllable Prediction for the Myanmar language. Due to the unique script characteristics of Myanmar, this project utilizes a linguistically aware syllable-breaking strategy.
Demo - https://sanlinnaing.github.io/my-next-syllable-predict/
Myanmar text lacks explicit word boundaries, requiring tokenization at the syllable level:
- Syllable Extraction: The
syllable_base_patterngroups consonants and diacritics into single phonological units. - Real-time Prediction Logic: The
syllable_break_patternisolates the Onset from the Rhyme, allowing suggestions even while mid-syllable.
- Source: Data is sourced from the Wikimedia/Wikipedia Myanmar corpus.
- Sliding Window: A
SEQUENCE_LENGTHof 5 syllables is used as context. - Supervised Mapping: Generates pairs of
Input (5 syllables) -> Target (1 next syllable)and partial syllable "tail" mappings.
The model uses a Recurrent Neural Network (RNN) implemented in Keras:
- Embedding Layer: Maps syllables into a dense 256-dimensional vector space.
- Bidirectional LSTM: Captures forward and backward linguistic context.
- Mixed Precision: Utilizes
mixed_float16for faster training.
- Temperature Sampling: Adjusts the probability distribution for more or less "creative" results.
- Top-N Predictions: Returns multiple candidates, mirroring modern predictive text bars.
It is highly recommended to use a Python Virtual Environment to manage dependencies.
- Create and activate a virtual environment:
# Create environment python -m venv venv # Activate (MacOS/Linux) source venv/bin/activate # Activate (Windows) .\venv\Scripts\activate
- Install the required Python packages and Jupyter environment:
pip install datasets mwparserfromhell seaborn scikit-learn tensorflow keras jupyterlab
- Run Jupyter notebook
jupyter notebook
This project is licensed under the GPL-3.0 License. See the LICENSE file for details.
it is the work working together with Gemini Code Assist.