Linguistic Pattern Laboratory: Advanced NLP pipeline for text analysis, entity extraction, and pattern recognition.
- Tokenization: Custom Graffl tokenizer with intelligent handling of contractions, abbreviations, and punctuation
- Parsing: Deep linguistic analysis with POS tagging, dependency parsing, and WordNet integration
- Entity Extraction: Pattern-based extraction of people and topics with anaphora resolution
- Segmentation: Paragraph and sentence boundary detection
- Rich Annotations: Sentiment, lemmatization, stemming, and morphological features
pip install lingpatlabfrom lingpatlab import LingPatLab
api = LingPatLab()
# Parse text into structured tokens
sentence = api.parse_input_text("Admiral Nimitz commanded the Pacific Fleet.")
print(sentence.to_string())
# Extract people with anaphora resolution
text = "Admiral William Halsey led the fleet. Halsey was known for his aggressive tactics."
sentence = api.parse_input_text(text)
people = api.extract_people(sentence)
# Returns: {'Halsey': ['Admiral William Halsey', 'Halsey']}
# Extract topics and named entities
topics = api.extract_topics(sentence)lines = [
"The Battle of Midway was a turning point.",
"Admiral Nimitz made crucial decisions."
]
sentences = api.parse_input_lines(lines)
for sentence in sentences:
print(sentence.to_string())from lingpatlab import segment_input_text
text = "First sentence. Second sentence. Third sentence."
segments = segment_input_text(text)
# Returns: ['First sentence.', 'Second sentence.', 'Third sentence.']sentence = api.parse_input_text("The quick brown fox jumps.")
for token in sentence:
print(f"Text: {token.text}")
print(f"POS: {token.pos}")
print(f"Lemma: {token.normal}")
print(f"Is WordNet: {token.is_wordnet}")
print(f"Dependency: {token.dep}")Sentence: Single sentence with token listSentences: Collection of sentencesSpacyResult: Individual token with full linguistic annotationOtherInfo: Additional morphological and dependency metadata
LingPatLab
├── tokenizer/ # Custom tokenization with Graffl
├── parser/ # spaCy integration + enhancements
├── analyzer/ # Entity extraction with pattern matching
├── segmenter/ # Sentence and paragraph segmentation
└── utils/ # WordNet, Porter stemmer, utilities
- Python 3.10+
- spaCy 3.8.2
- spaCy model:
en_core_web_sm
# Install with dev dependencies
pip install -e ".[linting,testing]"
# Run tests
pytest
# Run regression suite
python regression/regression_runner.pyMIT License - see LICENSE for details.
Craig Trim - craigtrim@gmail.com
More NLP articles and demos at craigtrim.com