Bubble Sort on a Turing Machine — Specification

1. Overview

This project implements bubble sort on a Turing Machine. It consists of two deliverables:

A YAML file (bubble_sort.yaml) describing the Turing Machine — its input, blank symbol, start state, and state transition table. This file must run correctly on turingmachine.io (and the locally downloaded copy of that page).
A Python emulator (emulator.py) that reads any turingmachine.io-compatible YAML file, executes the machine, and reports results.

The target input is the array [3, 2, 7, 1, 5, 4]. The expected output on the tape is 1 2 3 4 5 7.

2. Turing Machine Design

2.1 Tape Representation

Alphabet: {1, 2, 3, 4, 5, 6, 7} (single-symbol-per-number encoding).
Blank symbol: ' ' (space).
Input string: '327154' — each digit occupies one tape cell, no delimiters.
Blank cells on either side of the input mark the boundaries of the data.

This encoding is general: the state table will correctly sort any permutation of any subset of the digits 1–7. Only the input: line is specific to the problem instance.

2.2 Algorithm

Standard bubble sort via repeated passes:

Start at the leftmost cell.
Read the left element of an adjacent pair (remembering its value in the state).
Move right, read the right element, and compare.
If left > right, swap the two values on the tape and flag the pass as "dirty."
If left ≤ right, no swap; continue.
Move to the next pair (the right element becomes the left element of the next comparison).
When the head reaches a blank (end of data):
- If the pass was clean (no swaps): halt — the array is sorted.
- If the pass was dirty (at least one swap): rewind to the leftmost cell and start a new pass.

2.3 States (25 total)

State	Count	Description
`right_clean`	1	Beginning of a pair comparison. No swaps have occurred this pass. Read the left element.
`right_dirty`	1	Beginning of a pair comparison. At least one swap has occurred this pass. Read the left element.
`hold_N_clean` (N = 1–7)	7	Holding value N (the left element). Moving right to read the right element. No swaps yet this pass.
`hold_N_dirty` (N = 1–7)	7	Holding value N (the left element). Moving right to read the right element. At least one swap this pass.
`put_N` (N = 1–7)	7	Performing a swap: writing the smaller value N into the left cell, then resuming rightward in dirty mode.
`rewind`	1	Rewinding (moving left) to the start of the tape after a dirty pass.
`done`	1	Halt state. The array is sorted.

Total: 2 + 7 + 7 + 7 + 1 + 1 = 25 states.

2.4 Transitions

`right_clean`

Read	Write	Move	Next State
N (1–7)	—	R	`hold_N_clean`

(Blank is never encountered here in normal operation — the head is always positioned on a digit at the start of a pair.)

`right_dirty`

Read	Write	Move	Next State
N (1–7)	—	R	`hold_N_dirty`

`hold_N_clean` (for each N from 1 to 7)

Read	Condition	Write	Move	Next State
M (1–7)	N ≤ M	—	R	`hold_M_clean`
M (1–7)	N > M	N	L	`put_M`
`' '`	—	—	R	`done`

`hold_N_dirty` (for each N from 1 to 7)

Read	Condition	Write	Move	Next State
M (1–7)	N ≤ M	—	R	`hold_M_dirty`
M (1–7)	N > M	N	L	`put_M`
`' '`	—	—	L	`rewind`

`put_N` (for each N from 1 to 7)

Read	Write	Move	Next State
any (1–7)	N	R	`right_dirty`

(The cell being overwritten always contains the original left value; put_N writes the smaller value N regardless of what it reads.)

`rewind`

Read	Write	Move	Next State
any digit (1–7)	—	L	`rewind`
`' '`	—	R	`right_clean`

`done`

No transitions. Halt state.

2.5 Detailed Swap Walkthrough

Example: Tape is ...5 2..., head at the 5, state right_clean.

Step	State	Head Pos	Read	Action	Tape
1	`right_clean`	i	`5`	move R	`...5 2...`
2	`hold_5_clean`	i+1	`2`	write `5`, move L	`...5 5...`
3	`put_2`	i	`5`	write `2`, move R	`...2 5...`
4	`right_dirty`	i+1	`5`	move R	`...2 5...`
5	`hold_5_dirty`	i+2	...	continue	`...2 5...`

Step 4 is a re-read of the value we already know (5). This costs one extra step per swap but keeps the state count at 25 instead of 25 + 49.

2.6 YAML File Format

The output YAML must follow the turingmachine.io format:

input: '327154'
blank: ' '
start state: right_clean
table:
  right_clean:
    1: {R: hold_1_clean}
    2: {R: hold_2_clean}
    ...
  hold_3_clean:
    1: {write: 3, L: put_1}   # 3 > 1, swap
    2: {write: 3, L: put_2}   # 3 > 2, swap
    3: {R: hold_3_clean}       # 3 = 3, no swap
    4: {R: hold_4_clean}       # 3 < 4, no swap
    ...
    ' ': {R: done}             # end of pass, clean
  put_2:
    [1,2,3,4,5,6,7]: {write: 2, R: right_dirty}
  rewind:
    [1,2,3,4,5,6,7]: L
    ' ': {R: right_clean}
  done:

Note: the [1,2,3,4,5,6,7] syntax matches any of the listed symbols.

3. Python Emulator

3.1 Purpose

The emulator reads a turingmachine.io-compatible YAML file, simulates the Turing Machine, and reports results. It serves as a fast testing and debugging tool during development.

3.2 YAML Parsing

The emulator must handle the full turingmachine.io YAML format, including:

input: initial tape contents (string).
blank: the blank symbol (single character, default ' ').
start state: the initial state name.
table: a dictionary of states, each containing transitions.
Transition shorthand: R, L, {R: state}, {write: sym, L: state}, etc.
Multi-symbol keys: [1,2,3] matching any listed symbol.
Halt states: states with no transitions (empty or absent body).

3.3 Execution Modes

The emulator supports three modes, controlled by command-line flags:

Mode 1: Default (summary only)

$ python emulator.py bubble_sort.yaml

Output:

Final tape contents (trimmed to non-blank region).
Whether the result is correctly sorted.
Total number of steps executed.
Final state.

Example:

Final tape: 1 2 3 4 5 7
Sorted:     yes
Steps:      487
Final state: done

Mode 2: Verbose / Trace (`-v`)

$ python emulator.py -v bubble_sort.yaml

Output:

Everything from Mode 1, plus:
Every transition printed on one line: step number, state, head position, symbol read, action (write/move/new state), and a tape snapshot with the head position marked.

Example trace line:

  42 | hold_3_dirty  | pos=3 | read='1' | write='3' L -> put_1 | tape: 1 2 [3] 1 5 4

Mode 3: Step limit (`-n LIMIT`)

$ python emulator.py -n 10000 bubble_sort.yaml

Halt execution after LIMIT steps if the machine has not reached a halt state.
Print a warning indicating the machine did not halt.
Combinable with -v: python emulator.py -v -n 5000 bubble_sort.yaml

3.4 Command-Line Interface

usage: emulator.py [-h] [-v] [-n LIMIT] yamlfile

Turing Machine emulator compatible with turingmachine.io YAML format.

positional arguments:
  yamlfile       Path to the YAML file describing the Turing Machine.

optional arguments:
  -h, --help     Show this help message and exit.
  -v, --verbose  Print every transition (trace mode).
  -n, --limit N  Maximum number of steps before forced halt.

3.5 Exit Codes

Code	Meaning
0	Machine halted normally.
1	Machine did not halt (step limit reached).
2	Error (bad YAML, missing state, invalid transition, etc.).

4. Development Plan

Phase 1: Emulator

Build the Python emulator first. Test it against known-good YAML files (e.g., the binary addition example from INITIAL_IDEAS.md and the examples in EXAMPLE_YAML.md). This gives us a reliable tool for rapid iteration on the Turing Machine itself.

Phase 2: Bubble Sort YAML

Generate the bubble_sort.yaml file. This can be done programmatically (a Python script that emits the YAML) or by hand. Given the regularity of the state table (25 states, ~120 transitions, all following a pattern), programmatic generation is recommended to avoid typos.

Phase 3: Testing and Verification

Run bubble_sort.yaml through the emulator; verify sorted output.
Run bubble_sort.yaml on the local copy of turingmachine.io; visually confirm.
Optionally test with other inputs (e.g., already-sorted, reverse-sorted, duplicates).

5. Version 2 — Bubble Sort on Unary Numbers

This version sorts a list of positive integers encoded in unary, using bubble sort (repeated left-to-right passes until a pass completes with no swaps).

The design mirrors Version 1’s “clean/dirty pass” structure, but replaces constant-time digit comparison with a marking-based unary comparison and a swap implemented by moving “excess” units across the delimiter.

5.1 Tape Encoding

Symbols (tape alphabet)

Data:
- '1': unary unit
- '0': delimiter between numbers
Temporary (used only while processing one adjacent pair):
- 'A': marked unit from the left number (matched)
- 'B': marked unit from the right number (matched)
- 'X': excess unit from the left number (proves Left > Right)
Blank:
- ' ' (space)

Number list representation

A number (n) is encoded as (n) copies of '1'.
Numbers are separated by a single '0'.
No leading or trailing delimiter.
All numbers are positive (no empty numbers).

Example: ([3,2,1]) → 11101101

Target instance: ([3,2,7,1,5,4])

Input (no spaces on tape): 111011011111110101111101111
Expected output: ([1,2,3,4,5,7]) → 101101110111101111101111111

(For readability: 1 0 11 0 111 0 1111 0 11111 0 1111111)

5.2 High-Level Bubble Sort Control

A pass scans adjacent pairs left-to-right:

Compare each adjacent pair ((L, R)).
If (L > R), swap them and mark the pass dirty.
If the pass reaches end-of-tape:
- clean pass → halt (done)
- dirty pass → rewind to start and begin a new pass in clean mode

As in Version 1, we maintain the pass flag in the control state:

*_clean states: no swap has occurred in this pass yet
*_dirty states: at least one swap occurred this pass

5.3 Head Position Invariant (critical)

Invariant P (Pair Start): At the start of processing any adjacent pair, the head is positioned on the delimiter '0' between the left number and the right number, and the pair contains only '1' symbols (no A/B/X), i.e.:

… (left as 111…) [0] (right as 111…) …

This invariant is enforced by the local restore+advance phase (Section 5.6).

5.4 Comparison Algorithm (Unary, “B-first” marking)

Goal: decide whether (L \le R) (no swap) or (L > R) (swap).

We work only within the current pair L 0 R, using marks.

Step C1: Mark one unit in R (B-first)

From the middle delimiter:

Scan right within R:
- Skip 'B'
- On '1': write 'B' (consume one unit of R), then go match on the left
- On '0' or blank: R exhausted → go check excess in L (Step C3)

Step C2: Match one unit in L

After marking a 'B', scan left:

Move left back across right group and the middle '0'
In left group:
- Skip 'A'
- On '1': write 'A' (consume one unit of L), then return to the middle delimiter and repeat Step C1
- On left boundary ('0' or blank): L exhausted while R still had a unit ⇒ (L < R) → no swap, proceed to local restore

Step C3: Right exhausted → check for excess in L

When R is exhausted:

Return to the middle delimiter, then scan left in L:
- Skip 'A'
- If an unmarked '1' exists: excess ⇒ (L > R)
  - Mark that '1' as 'X'
  - Continue left marking all remaining unmarked '1' as 'X'
  - Proceed to swap phase
- If no unmarked '1' exists (hit '0' or blank): (L = R)
  - no swap, proceed to local restore

After comparison finishes, the tape in the pair region is one of:

(L < R): A…A 0 B…B 1…1 (right has remainder)
(L = R): A…A 0 B…B
(L > R): X…X A…A 0 B…B (left has excess X’s)

5.5 Swap Algorithm (only when (L > R))

We must transform:

X^e A^r 0 B^r → A^r 0 X^e B^r

Conceptually: move each excess unit (X) from the left side to the right side of the middle delimiter.

Swap strategy: “bubble an X right until it crosses the 0”

Repeat until no X remains in the left group:

From the left boundary of the left number, scan right to find the first 'X'.
Move that 'X' right by repeatedly swapping it with the symbol to its right:
- If right neighbor is 'A': swap X A → A X and continue
- If right neighbor is '0': swap X 0 → 0 X (the X has crossed to the right side; stop moving this X)
Return to the left boundary and find the next 'X'.

After this completes, all X’s are on the right side immediately after the delimiter.

Pass flag: performing any swap forces the pass into dirty mode for the remainder of the pass.

5.6 Local Restore + Advance to Next Pair

After either:

no-swap case ((L \le R)), or
swap complete ((L > R)),

we must:

Convert marks back to data:
- A → 1, B → 1, X → 1
Advance to the delimiter for the next adjacent pair in this pass.

Local restore behavior (must be local, not whole-tape)

Starting anywhere within the pair region:

Restore left group while moving to the middle delimiter:
- Convert A/X → 1
- Stop when the head reaches the middle '0'
Restore right group while moving to its right boundary:
- Convert B/X → 1
- Move right until reaching:
  - the next '0' (delimiter after right number), or
  - blank (end of tape)

If restore stops on '0': that '0' is now the Pair Start delimiter for the next comparison (Invariant P). Continue the pass.

If restore stops on blank: end-of-pass:

if current mode is *_clean → done
if current mode is *_dirty → rewind and start a new pass in clean mode

5.7 State Structure (recommended)

You can implement this cleanly with state families:

Pass entry / navigation

seek_first_delim_clean, seek_first_delim_dirty
- Move right over '1' to the first '0'
- If blank before any '0', list has ≤1 element → done

Comparison (two variants to carry pass flag)

cmpR_clean, cmpR_dirty (mark B in R or detect R exhausted)
cmpL_clean, cmpL_dirty (match A in L or detect L exhausted)
excess_scan_clean, excess_scan_dirty (detect/mark X or equality)

Swap (single set; always leads to dirty restore)

swap_find_X (from left boundary scan to first X; if hit middle 0, swap done)
swap_shift + helper states to swap X with A or 0

Restore + advance (two variants)

restore_clean, restore_dirty
- Restore marks locally and land on next delimiter or blank
- On blank: branch to done (clean) or rewind (dirty)

Pass reset / halt

rewind (move left to blank, then right; go to seek_first_delim_clean)
done (halt)

Note: This will likely be more than 25 states once you include the swap helper states. That’s normal; unary swapping needs extra “memory-in-state” to perform local two-cell swaps deterministically.

5.8 YAML Compatibility Notes

Quote symbols in YAML to avoid YAML numeric typing ambiguity:
- Use '0', '1', 'A', 'B', 'X', and ' ' for blank.
turingmachine.io supports multi-symbol keys like [ 'A','X' ], but if you want maximum compatibility with your current Python preprocessor, keep them on one line with proper indentation.

5.9 Testing Expectations

Minimum tests for bubble_sort_unary.yaml:

Target input 111011011111110101111101111 Output must be 101101110111101111101111111
Already sorted 101101110111101111101111111 Should halt after one clean pass.
Reverse-sorted (for stress) for 1–4 1111011101101 (i.e., 4,3,2,1) Should sort correctly (more steps).
Duplicates (if supported): e.g., [2,2,1] 1101101 → 1011011

FilesExpand file tree

SPECIFICATION.md

Latest commit

History