A lightweight N-gram based predictive text generator with multiple zero-dependency interfaces:
- Local Web UI using only Python's standard library (no external packages)
- Desktop UI using Tkinter (standard library)
- Command Line Interface (CLI)
- Optional Streamlit UI (requires dependencies)
- Robust N-gram model (configurable N from 2–4)
- Preprocessing and tokenization with NLTK when available, with a built-in regex fallback
- Laplace smoothing (configurable
k) - Interpolated prediction combining different N-gram orders
- User dictionary boosts: global word boosts and context-specific boosts
- Model persistence: save to / load from JSON
.
├── app.py # Streamlit UI (optional)
├── cli.py # Command Line Interface
├── ngram_model.py # Core N-gram model and utilities
├── tk_app.py # Tkinter Desktop UI (fallback visual interface)
├── web_server.py # Zero-dependency local Web UI (standard library)
├── user_dict.json # Optional user boosts file
├── requirements.txt # Dependencies for Streamlit UI
└── model.json # Example persisted model file
You can use the Web UI, Desktop UI, or CLI without any external packages.
- Start the server:
python web_server.py
- Open in your browser:
http://127.0.0.1:8000/
- Use the controls:
- Parameters: set N-gram, Laplace k, Top-K, then click "Apply"
- Corpus: paste text and click "Ingest Corpus"
- Compose: type in the input field to see suggestions; click a suggestion pill to append
- Boosts:
- Global: word + value, click "Add"
- Context: context + word + value, click "Add"
- Model Persistence:
- Path: set file path (e.g.,
model.json) - "Save" / "Load"
- Path: set file path (e.g.,
- Launch:
python tk_app.py - Similar controls for parameters, corpus ingestion, compose area, boosts, and save/load
- Note: Tkinter is part of the standard library, but on some systems may require installing "tk" support
- Run:
python cli.py --n 3 --top-k 5 - Options:
--nN-gram order (2–4)--laplace-kLaplace smoothing constant (default 1.0)--top-knumber of suggestions to show (default 5)--corpus PATH(optional) load text file(s) into the model
- Interactive usage:
- Type text and press Enter to see suggestions
- Built-in help is available within the CLI (type
help) and includes commands for global/context boosts and saving/loading the model
If you prefer Streamlit and can install dependencies:
- Create/Use a Python virtual environment (already present in the project as
.venv). - Upgrade pip (example for this project where the venv uses
bin):.venv\bin\python.exe -m pip install --upgrade pip
- Install dependencies:
pip install -r requirements.txt
- Run Streamlit:
python -m streamlit run app.py
If you encounter SSL/certificate or proxy issues when installing:
- Configure pip to use your corporate proxy and certificate store
- Try installing only
streamlitfirst:pip install streamlit - Prefer precompiled wheels for packages like
numpyandpandas
- Save: choose a path (e.g.,
model.json) in the UI or CLI command - Load: choose the same path to restore
- The model JSON contains N, smoothing, counts, and user boosts
- NLTK-based tokenization is used when available
- If NLTK is unavailable or
punktdata is missing, the model automatically falls back to a regex tokenizer
- SyntaxError: bytes can only contain ASCII literal characters
- Ensure HTML strings use Unicode (no
b""") inweb_server.py
- Ensure HTML strings use Unicode (no
- Tkinter not found
- Install Python with Tk support or your OS-specific Tk package
- Streamlit / pip install fails (SSL, proxies)
- Use your system/corporate proxy settings, trusted certificates, or precompiled wheels
- Internal/learning project. Add your preferred license as needed.