FakeSentences

Generates plausible-sounding nonsense sentences by training a Markov chain on any plain-text corpus.

Words are stored as nodes in a weighted directed graph. Each node tracks how often one word follows another, and whether the word ever ended a sentence in the training data (IsLeaf, IsNotLeaf, or IsMaybeLeaf). Sentences are generated by walking the graph from a random starting word, choosing each next word weighted by frequency, and stopping at leaf nodes or probabilistically at maybe-leaf nodes.

Requirements

.NET 10 SDK

Building

dotnet build FakeSentences.sln

Running

dotnet run --project FakeSentences/FakeSentences.csproj

The app prompts for one or more plain-text files to train on, then generates sentences from the combined graph. Two Project Gutenberg texts are included:

File	Contents
`FakeSentences/pg11.txt`	Alice's Adventures in Wonderland — Lewis Carroll
`FakeSentences/pg2591.txt`	Grimms' Fairy Tales — Brothers Grimm

How it works

Training on three sentences — "The cat sat.", "The cat ran.", "The dog sat." — builds this graph:

graph LR
    ROOT(["ROOT\n(sentence start)"])
    ROOT -->|"3"| the
    the -->|"2"| cat
    the -->|"1"| dog
    cat -->|"1"| sat_a["sat (leaf)"]
    cat -->|"1"| ran["ran (leaf)"]
    dog -->|"1"| sat_b["sat (leaf)"]

    style ROOT fill:#ddd,stroke:#999
    style sat_a fill:#ffe0b2,stroke:#e65100
    style ran   fill:#ffe0b2,stroke:#e65100
    style sat_b fill:#ffe0b2,stroke:#e65100

Edge weights are counts. (leaf) marks leaf nodes — words that ended a sentence in training. A word seen both mid-sentence and at the end becomes a maybe-leaf and generation stops there with 50% probability, producing shorter and more varied output.

Multiple files train into the same graph — edges from later files simply increment counts on existing nodes, so word-pair frequencies blend across all corpora.

To generate a sentence the program:

Picks a starting word from ROOT's children, weighted by count
Follows edges to the next word, weighted by count
Stops at a leaf, or probabilistically at a maybe-leaf

Sample run

Training on both included texts (Alice in Wonderland + Grimms' Fairy Tales):

Enter training files one per line, then press Enter to start:
FakeSentences/pg11.txt
  -> 'FakeSentences/pg11.txt'
Whole file was read!
Done processing training data
FakeSentences/pg2591.txt
  -> 'FakeSentences/pg2591.txt'
Whole file was read!
Done processing training data

Top 5 most common sentence starting words:
  1: the             (3608)
  2: and             (1525)
  3: a               (992)
  4: he              (941)
  5: you             (745)
The the owner of when the quicker she would that and when the the king the forest. You the father. The workmanship there you as the the dwarf and on the the king the wand and at the the sun. As what you but give the the other. Many a copy a dormouse was sitting between them fast asleep and a piece.

Running tests

dotnet test FakeSentences.Tests/FakeSentences.Tests.csproj

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
FakeSentences.Tests		FakeSentences.Tests
FakeSentences		FakeSentences
.gitattributes		.gitattributes
.gitignore		.gitignore
FakeSentences.sln		FakeSentences.sln
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FakeSentences

Requirements

Building

Running

How it works

Sample run

Running tests

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FakeSentences

Requirements

Building

Running

How it works

Sample run

Running tests

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages