This isn't really an issue so much as a question. First off, I want to say thanks for posting this tutorial. I'm analyzing some Tnseq data for the first time and your pipeline has been incredibly helpful to me. One other thing I'm trying to accomplish is to compute the saturation of Tn insert sites (% of TA sites in the reference that are covered).
In my initial calculation I took the number of rows in the hits.txt file and divided by the number of TA's in the reference genome, which I had counted up using a custom script. One mistake (at least I'm pretty sure it was a mistake) that I made in doing this, was that I also counted the AT's, (I'm used to thinking of TA and AT as pretty much the same thing, since TA on one strand is AT on the other), but in reviewing your diagrams here I believe this was the wrong move, and gave a much lower saturation than it should have (there are many more ATs than TAs in my reference sequence).
I was wondering if you could confirm my suspicion that ATs shouldn't be counted, or if I'm going about this the wrong way entirely, recommend a method for computing % saturation.
Thanks so much, and thanks again for the great tutorial.
This isn't really an issue so much as a question. First off, I want to say thanks for posting this tutorial. I'm analyzing some Tnseq data for the first time and your pipeline has been incredibly helpful to me. One other thing I'm trying to accomplish is to compute the saturation of Tn insert sites (% of TA sites in the reference that are covered).
In my initial calculation I took the number of rows in the hits.txt file and divided by the number of TA's in the reference genome, which I had counted up using a custom script. One mistake (at least I'm pretty sure it was a mistake) that I made in doing this, was that I also counted the AT's, (I'm used to thinking of TA and AT as pretty much the same thing, since TA on one strand is AT on the other), but in reviewing your diagrams here I believe this was the wrong move, and gave a much lower saturation than it should have (there are many more ATs than TAs in my reference sequence).
I was wondering if you could confirm my suspicion that ATs shouldn't be counted, or if I'm going about this the wrong way entirely, recommend a method for computing % saturation.
Thanks so much, and thanks again for the great tutorial.