Questions about the tvr_consensus in tlens_by_allele.tsv file

Hello Zach,
Wonderful software and I am using it for Schmidt’s data. I noticed that you have mentioned in other discussions to compare the FASTA file in the `tel_reads.fa.gz` file, with the supporting reads column from `tlens_by_allele.tsv` file, and pull the matched sequences out respectively. 
I have a question regarding it, as there are more than 1ID in the supporting reads in the tsv file. I extracted the sequences but they are not the same, however they are still being clustered together. May I know why?
I got the following result from the `tvr_consensus` in the tsv file:

> CCCCCCAAAAAAAAVVVVVVVVCCCCCCCGGGGGCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCNNNNNNNNNNCCCNCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCTTTTTTTCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCTTTTTTTTTTTTTCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFCFFFFFFFFFFFFFFCFFFCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCVVVCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCTTTTTTTTTTTTCCCCCCCCCCCCCCCCCCCTTTTTTTTTTTTTTTTTTTVVVVVVVCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCVVVVVVCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCVCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCTTTTTTTTTTTTTCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHVVVCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCLLLLLLLCCCCCCCCCCCCCCCCLLLLLLCCCCCCCCLLLCCCCCCCLLLLLLLLLLLLLLLLLLLCCCCMCLLLLLLLLLLLLLLLLLLLCCCLLLLLLLLLLLLCVVCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCTTTTTTTTTTTTTTTTTTTTTT

Canonical uses the Symbol C to represent the motif sequence TTAGGG. Does the C in the `tsv_consensus` reading above, represent the symbol or the C from the normal TCGA nitrogenous bases from DNA? 
Are the colours in `all_final_alleles.png`, reflected in the extraction of the `tsv_consensus`? 
I matched the tel_reads.fa.gz and ‘supporting reads’ column in `tlens_by_allele.tsv`, then reduced it by the sequence motifs into the respective symbols. However, my result is different from the sequence of `tsv_consensus` derived from the tsv file. Is there an explanation for the difference?

Looking forward to your clarification!
Thank you.
Flora


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about the tvr_consensus in tlens_by_allele.tsv file #33

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Questions about the tvr_consensus in tlens_by_allele.tsv file #33

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions