diff --git a/README.md b/README.md index 014797b..164b398 100644 --- a/README.md +++ b/README.md @@ -20,49 +20,206 @@ higher are the bare minimum. # Building -With Visual Studio just build the solution. With the .net core tooling use `dotnet build` +## From Source + +To build USFMToolsSharp from source: + +```bash +# Clone the repository +git clone https://github.com/WycliffeAssociates/USFMToolsSharp.git +cd USFMToolsSharp + +# Build using .NET CLI +dotnet build + +# Run tests +dotnet test +``` + +Or open `USFMToolsSharp.sln` in Visual Studio and build the solution. # Contributing -Yes please! A couple things would be very helpful +We welcome contributions! Here are some ways you can help: + +- **Testing**: Test with various USFM documents and report any parsing or rendering issues +- **Marker Support**: Add support for additional USFM markers to the parser +- **Documentation**: Improve examples and documentation +- **Renderers**: Create new renderers (LaTeX, PDF, EPUB, etc.) or enhance existing ones -- Testing: Because I can't test every single possible USFM document in existance. If you find something that doesn't look right in the parsing or rendering please submit an issue. -- Adding support for other markers to the parser. There are still plenty of things in the USFM spec that aren't implemented. -- Adding support for other markers to the HTML renderer -- Adding other renderers (LaTeX, PDF, EPUB, JSON, etc.). Some of those renderers might not be possible in .net standard and if that is the case we'll just need to create another repo to contain the renderer +Please submit issues for bugs or feature requests, and pull requests for contributions. # Usage -There a couple useful classes that you'll want to use +USFMToolsSharp provides a parser and document model for working with USFM (Unified Standard Format Markers) content. Below are detailed examples to help you get started. + +## Installation + +Install the package from NuGet: -## USFMDocument +**.NET CLI** +```bash +dotnet add package USFMToolsSharp +``` + +**Package Manager Console** +```powershell +Install-Package USFMToolsSharp +``` + +**PackageReference** +```xml + +``` -This class is a tree of objects that represent a USFM document There are a couple of methods and properties that you'll find useful +Or visit the [NuGet package page](https://www.nuget.org/packages/USFMToolsSharp/). + +## Quick Start ```csharp -USFMDocument output = new USFMDocument(); -// The contents of the document -output.Contents; -// To find all the child markers of a certain type (in this case chapters) -output.GetChildMarkers(); -// To merge the contents of one USFMDocument with another -USFMDocument otherDocument = new USFMDocument(); -output.Insert(otherDocument); +using USFMToolsSharp; +using USFMToolsSharp.Models.Markers; + +// Create a parser +USFMParser parser = new USFMParser(); + +// Parse USFM content +string usfmContent = @"\id GEN +\h Genesis +\c 1 +\v 1 In the beginning God created the heavens and the earth. +\v 2 The earth was without form and void."; + +USFMDocument document = parser.ParseFromString(usfmContent); ``` -## USFMParser +## Core Classes + +### USFMParser -This class creates an abstract syntax tree from a USFM string. It can also be passed a -list of specific markers as strings into its constructor to ignore them if needed. +The `USFMParser` class converts USFM text into an abstract syntax tree (`USFMDocument`). -Example: +#### Basic Parsing ```csharp USFMParser parser = new USFMParser(); var contents = File.ReadAllText("01-GEN.usfm"); -USFMDocument output = parser.ParseFromString(contents); +USFMDocument document = parser.ParseFromString(contents); +``` + +#### Ignoring Specific Markers + +You can configure the parser to ignore certain markers during parsing: + +```csharp +// Ignore bold markers +var markersToIgnore = new List { "bd", "bd*" }; +USFMParser parser = new USFMParser(markersToIgnore); + +string usfm = @"\v 1 In the beginning \bd God \bd* created"; +USFMDocument document = parser.ParseFromString(usfm); +// The bold markers will be ignored, text "God " will be preserved +``` + +#### Ignoring Unknown Markers + +To ignore markers that aren't part of the USFM specification: + +```csharp +// Second parameter controls unknown marker handling +USFMParser parser = new USFMParser(null, ignoreUnknownMarkers: true); +USFMDocument document = parser.ParseFromString(usfmContent); +``` + +### USFMDocument + +The `USFMDocument` class represents a parsed USFM document as a tree structure. Each node in the tree is a `Marker` object. + +#### Accessing Document Contents + +```csharp +USFMDocument document = parser.ParseFromString(usfmContent); + +// Access all top-level markers +List contents = document.Contents; + +// Get total number of markers parsed +int markerCount = document.NumberOfTotalMarkersAtParse; +``` + +#### Finding Specific Markers + +Use `GetChildMarkers()` to find all markers of a specific type: + +```csharp +// Find all chapters in the document +var chapters = document.GetChildMarkers(); + +foreach (var chapter in chapters) +{ + Console.WriteLine($"Chapter {chapter.Number}"); +} + +// Find all verses +var verses = document.GetChildMarkers(); + +foreach (var verse in verses) +{ + Console.WriteLine($"Verse {verse.VerseNumber}"); +} + +// Find all section headings +var sections = document.GetChildMarkers(); +``` + +#### Working with Chapters and Verses + +```csharp +// Get the first chapter +var firstChapter = document.GetChildMarkers().FirstOrDefault(); + +if (firstChapter != null) +{ + Console.WriteLine($"Chapter {firstChapter.Number}"); + + // Get verses within this chapter + var verses = firstChapter.GetChildMarkers(); + + foreach (var verse in verses) + { + // Get the text content of the verse + var textBlocks = verse.Contents.OfType(); + string verseText = string.Join("", textBlocks.Select(t => t.Text)); + Console.WriteLine($" Verse {verse.VerseNumber}: {verseText}"); + } +} +``` + +#### Merging Documents + +You can merge multiple USFM documents together: + +```csharp +USFMDocument document1 = parser.ParseFromString(content1); +USFMDocument document2 = parser.ParseFromString(content2); + +// Merge document2 into document1 +document1.Insert(document2); + +// Or insert individual markers +Marker marker = new PMarker(); +document1.Insert(marker); + +// Insert multiple markers at once +document1.InsertMultiple(listOfMarkers); ``` +## Performance Tips + +- **Reuse Parser Instances**: Create one parser and reuse it for multiple documents +- **Use Specific Queries**: Use `GetChildMarkers()` instead of traversing all markers +- **Batch Hierarchy Queries**: If you need to get the hierarchy to multiple markers, use `GetHierachyToMultipleMarkers()` instead of calling `GetHierarchyToMarker()` in a loop + # Renderers ## HTMLRenderer @@ -74,3 +231,32 @@ For more information, please look into the [repository](https://github.com/Wycli ## JSONRenderer For more information, please look into the [repository](https://github.com/WycliffeAssociates/USFMToolsSharp.Renderers.JSON). > JSON Renderer for USFM + +# Practical Examples + +## Extract Verse Text + +```csharp +var chapter1 = document.GetChildMarkers().FirstOrDefault(c => c.Number == 1); +var verses = chapter1?.GetChildMarkers(); +foreach (var verse in verses) +{ + var text = string.Join("", verse.Contents.OfType().Select(t => t.Text)); + Console.WriteLine($"{verse.VerseNumber}. {text.Trim()}"); +} +``` + +## Process Footnotes + +```csharp +var verses = document.GetChildMarkers(); +foreach (var verse in verses) +{ + var footnotes = verse.GetChildMarkers(); + foreach (var footnote in footnotes) + { + var refMarker = footnote.Contents.OfType().FirstOrDefault(); + Console.WriteLine($"Verse {verse.VerseNumber} has footnote: {refMarker?.VerseReference}"); + } +} +```