Small FASTA/FASTQ I/O for Crystal.
Add this to shard.yml:
dependencies:
fastx:
github: bio-cr/fastx.crThen run:
shards install- Read FASTA
- Read FASTQ
- Write FASTA
- Write FASTQ
- Auto-handle gzip when the path ends with
.gz - Stream large files with low-allocation readers
require "fastx"
Fastx::Fasta::Reader.open("reads.fa.gz") do |reader|
reader.each do |header, sequence|
puts "#{header}\t#{sequence.bytesize}"
end
endUse #each_copy if you need to keep records after the current iteration:
Fastx::Fasta::Reader.open("reads.fa") do |reader|
reader.each_copy do |header, sequence|
stored_header = header
stored_sequence = sequence
end
endFastx::Fastq::Reader.open("reads.fq.gz") do |reader|
reader.each do |identifier, sequence, quality|
puts "#{identifier}\t#{sequence.bytesize}\t#{quality.bytesize}"
end
endUse #each_copy if you need safe String copies:
Fastx::Fastq::Reader.open("reads.fq") do |reader|
reader.each_copy do |identifier, sequence, quality|
saved_id = identifier
saved_sequence = sequence
saved_quality = quality
end
endFastx::Fasta::Writer.open("out.fa", line_width: 80) do |writer|
writer.write("seq1", "ACGTACGT")
writer.write("seq2", "TTTTCCCC")
endSet line_width: nil to write one sequence line per record.
Fastx::Fastq::Writer.open("out.fq.gz") do |writer|
writer.write("seq1", "ACGT", "!!!!")
writer.write("seq2", "TTTT", "####")
endFastx::Fastq::Writer raises ArgumentError if sequence and quality lengths differ.
Path-based APIs auto-detect gzip from the filename. IO-based APIs do not.
io = IO::Memory.new("@seq1\nACGT\n+\n!!!!\n")
reader = Fastx::Fastq::Reader.new(io)
reader.each_copy do |identifier, sequence, quality|
puts "#{identifier}\t#{sequence}\t#{quality}"
endio = IO::Memory.new
writer = Fastx::Fasta::Writer.new(io, line_width: 4)
writer.write("seq1", "ACGTGG")
puts io.to_sFastx.open is a convenience API based on file extension:
Fastx.open("reads.fa") do |reader|
reader.as(Fastx::Fasta::Reader).each do |header, sequence|
puts "#{header}\t#{sequence.bytesize}"
end
endYou can also pass the format explicitly:
Fastx.open("output", "w", Fastx::Format::FASTQ) do |writer|
writer.as(Fastx::Fastq::Writer).write("seq1", "ACGT", "!!!!")
endReader#eachreuses internal buffers for performance.- Values yielded by
#eachare only valid until the next iteration. - Use
#each_copyif you want values you can store safely. - Readers are one-pass. Create a new reader to read again.
- Reader and writer instances are not thread-safe.
Fastx::Fastq::Readercurrently supports four-line FASTQ records only.- Multi-line FASTQ is not supported.
- FASTQ reader and writer validate sequence/quality length equality.
encoded = Fastx.encode_bases("AcGtNxyz")
decoded = Fastx.decode_bases(encoded)Unknown bases are normalized to N by default. Use strict: true to raise:
Fastx.normalize_base('X'.ord.to_u8, strict: true)scores = Fastx.encode_phred("IIIIHGF")
quality = Fastx.decode_phred(scores)MIT License