Skip to content

ayd1ndemirci/bigram

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Word Predictor

🌐 Available Languages: Türkçe 🇹🇷 | English 🇬🇧

Welcome to the Word Predictor project...

Text-Based Word Predictor

This project is a simple language modeling application that learns word bigrams—and optionally trigrams—from a given text source. It predicts the most likely next word following a user-provided word or sentence. It serves as a starting point for basic language processing tasks such as autocomplete or text generation.


Features

  • Text Cleaning: Input text is converted to lowercase and stripped of punctuation, which improves prediction accuracy.
  • Bigram Generation: Builds a basic language model by counting the frequency of word pairs.
  • Trigram Support (Optional): Supports trigram usage to make more contextual predictions by considering the two previous words. This feature can be enabled or disabled via config.json.
  • Chained Prediction: After a user-provided word or sentence, it can predict a chain of words based on the configured predictionChainLength.
  • Configuration Management: Application settings (input file path, prediction chain length, trigram usage) can be easily managed through the config.json file.
  • Modular Structure: The code is divided into separate packages (config, textprocessor, languagemodel, utils) with distinct responsibilities, making it easier to read and maintain.
  • Robust Error Handling: Unexpected conditions such as file read errors or configuration issues are reported to the user with clear messages.

Installation & Usage

Prerequisites

  • Go must be installed (version 1.16 or later is recommended).

Steps

  1. Clone or Download the Project: Download or copy the project files to your local machine.

  2. Ensure Folder Structure: Make sure your project directory follows this structure:

    project_folder/
    ├── main.go
    ├── internal/
    │   ├── config/
    │   │   └── config.go
    │   ├── textprocessor/
    │   │   └── text_processing.go
    │   ├── languagemodel/
    │   │   └── language_model.go
    │   └── utils/
    │       └── utils.go
    ├── config.json
    └── input.json
    
  3. Initialize the Go Module: Navigate to the root directory of your project (project_folder) in the terminal and initialize the Go module:

    go mod init bigram # 'bigram' is your module name.
                       # You can use a different name if preferred.

    This command will create a go.mod file in your project root.

  4. Download (or Verify) Required Dependencies: While still in the project root (where go.mod is located), run:

    go mod tidy

    This command tidies up your go.mod file and ensures that all necessary packages are downloaded and updated.

  5. Prepare the input.json File: The input.json file should contain the text to be used for model training. A sample input.json is included with the project. Its structure should look like this:

    {
      "text": "This is the entire text content that will be placed here. The model will learn from this text."
    }
  6. Configure the config.json File: The config.json file controls the behavior of the application:

    {
      "inputFilePath": "input.json",        // Path to the input text file
      "predictionChainLength": 3,           // Number of words to predict
      "useTrigrams": true                   // Enable trigrams? (true/false)
    }

    You can adjust these values based on your needs.

  7. Run the Application: From the project root (where the bigram module is defined), run the application:

    go run .

    The application will prompt you to enter a word or sentence:

    Enter a word or sentence:
    

    Enter your input and press Enter. The application will then display the predicted word sequence.


Usage Examples

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages