Alignments for Switchboard Dialog Act (SWDA) tags with Mississippi State (MS-State) word-level transcripts with timing information.
This project aligns the dialog act annotations from the Switchboard Dialog Act Corpus with the word-level time-aligned transcripts from the Mississippi State Switchboard annotations. The result is a dataset where each utterance has:
- Dialog act tags (from SWDA)
- Word-level timing information (from MS-State transcripts)
- Speaker identification
- Aligned transcripts from both sources
-
SWDA (Switchboard Dialog Act Corpus): Dialog act annotations for Switchboard conversations
- 1,155 conversations with DA tags
- Utterance-level annotations with act tags (statement, question, backchannel, etc.)
-
MS-State Transcripts: Word-level time-aligned transcripts
- 2,438 conversations available
- Word-level start/end timestamps
- Located in
swb_ms98_transcriptions/
The main alignment script that:
- Loads SWDA annotations and MS-State transcripts
- Normalizes text for better matching (handles variants like "uh-huh", "um-hum", etc.)
- Performs sequence alignment using
difflib.SequenceMatcher - Prioritizes longer consecutive match sequences over isolated single-word matches
- Automatically detects and corrects speaker channel flips (84 conversations had flipped channels)
- Outputs word-level alignments to
aligned_words/
Key features:
- Text normalization for common backchannel variations
- Lookahead logic to skip short matches in favor of longer sequences
- Speaker flip detection based on alignment quality (match rate < 50% threshold)
- Processes all 1,155 conversations with both SWDA and transcript data
Output: aligned_words/aligned_conv_{conv_num}.csv containing:
speaker: A or Balignment_type: match, mismatch, insertion, or deletionswda_word: word from SWDA transcripttrans_word: word from MS-State transcriptutterance_index,subutterance_index: SWDA utterance IDsact_tag: Dialog act tagconv_no: Conversation numberturn_id: MS-State turn IDstart,end: Word timing in seconds
Post-processes word-level alignments to create turn-level utterances:
- Drops deletion alignments (words only in SWDA, not in transcript)
- Sorts by timestamp
- Fills missing act tags using forward/backward/nearest neighbor passes
- Groups words by utterance to reconstruct full turns
- Outputs turn-level data to
aligned_turns/
Output: aligned_turns/aligned_turns_{conv_num}.csv containing:
speaker: A or Butterance_index,subutterance_index: SWDA utterance IDsstart,end: Utterance timing in secondsturn_id: MS-State turn IDtranscript_swda: Full utterance text from SWDAtranscript_ms: Full utterance text from MS-Stateact_tag: Dialog act tag (simplified DAMSL tags)
Converts fine-grained dialog act tags into 4 coarse categories for simplified analysis/question-answer pair turn extractions:
- Handles continuation markers: when
act_tagis "+", replaces it with the previous turn's tag from the same speaker - Maps fine-grained tags to coarse categories:
- question: tags starting with "q" (e.g., qy, qw, qh)
- statement: tags starting with "s" (e.g., sd, sv, sv^d)
- answer: tags starting with "a", "n", or "b" (e.g., aa, ny, bh)
- other: all remaining tags (e.g., h, fp, x)
- Outputs coarse-tagged data to
coarse_tags/
Output: coarse_tags/aligned_turns_{conv_num}.csv containing all columns from turn-level files plus:
act_tag_merge: Coarse dialog act category (question, answer, statement, other)
# Clone and setup SWDA data
bash download.sh
# Install dependencies
pip install -r requirements.txt# Step 1: Word-level alignment
python align_swda_mstrans.py
# Step 2: Turn-level aggregation
python process_aligned.py
# Step 3: Convert to coarse tags (optional)
python convert_to_coarse_tags.py- Word-level alignment: ~5 minutes on Intel Core Ultra 7 155U
- Turn-level aggregation: ~3 minutes on Intel Core Ultra 7 155U
- Coarse tag conversion: ~1 minute on Intel Core Ulttra 7 155U
- Aligned all 1,155 conversations from SWDA
- 84 conversations had speaker channels flipped (automatically corrected)
- ~99.7% word coverage on average (missing act_tags tracked in logs)
Most conversations have >99% of words successfully aligned with act tags. Conversations with most unfilled act_tags:
- 2768: 46/2121 words (2.2%)
- 2884: 45/2377 words (1.9%)
- 2386: 42/2215 words (1.9%)
See logs.txt for complete processing details.
align_swda_mstrans.py: Main word-level alignment scriptprocess_aligned.py: Turn-level aggregation scriptconvert_to_coarse_tags.py: Coarse tag conversion scriptcompare_alignment.py: Comparison of alignment algorithms (difflib's Ratcliff-Obershelp algorithm vs. Needleman–Wunsch algorithm)download.sh: Setup script for SWDA dataflipped_channels.txt: List of conversations with flipped speaker channelslogs.txt: Processing logs and statisticsaligned_words/: Word-level alignment outputs (1,155 files)aligned_turns/: Turn-level aggregated outputs (1,155 files)coarse_tags/: Coarse-tagged turn-level outputs (1,155 files)
See LICENSE file for details.
- Switchboard Dialog Act Corpus
- DAMSL Annotation Manual
- Mississippi State Switchboard resegmentation project
- My old code - a monstrosity that would probably cause most people to cringe, but it was surprisingly useful in giving me better, more focused ideas to prompt Claude with and reminding me of pitfalls I had run into previously.