You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,10 +2,10 @@
2
2
3
3
### Documentation
4
4
5
-
Refer to [this documentation](https://csb5-page.github.io/documentation/) for guidelines on contributing to the website, including how to add blog posts and people, and how to preview the website before deployment.
5
+
Refer to [this documentation](https://csb5.github.io/documentation/) for guidelines on contributing to the website, including how to add blog posts and people, and how to preview the website before deployment.
6
6
7
-
If you are a website developer, refer to [this documentation for developers](https://csb5-page.github.io/documentation_developer/) on steps to update content/styles of the pages.
7
+
If you are a website developer, refer to [this documentation for developers](https://csb5.github.io/documentation_developer/) on steps to update content/styles of the pages.
8
8
9
9
### Blog post elements
10
10
11
-
Refer to [this page](https://csb5-page.github.io/elements/) for examples on how to format the text, images and videos in the blog posts.
11
+
Refer to [this page](https://csb5.github.io/elements/) for examples on how to format the text, images and videos in the blog posts.
Taha did an internship with us in late 2025, where he worked with Rafael on adaptive sampling and machine learning methods to improve viral classification.
Given a set of sequenced reads, how to determine if the sequencing run is good or not? Finding the quality of the reads, which includes estimating sequencing error rates and bias, has been an important first step in numerous Bioinformatics pipelines.
13
+
Given a set of sequenced reads, how can we determine if the sequencing run is good or not? Finding the quality of the reads, which includes estimating sequencing error rates and bias, has been an important first step in numerous Bioinformatics pipelines.
14
14
15
-
Previous ways of estimating sequencing error rates include **mapping the reads** to reference genomes and inferring error rates from **Phred quality scores**. Unfortunately, the reference genomes may be missing or different from the genomes that are actually sequenced, especially in metagenomic samples. On the other hand, Phred quality scores can produce biased estimates if they are uncalibrated.
15
+
Previous ways of estimating sequencing error rates include **mapping the reads** to reference genomes and inferring error rates from **Phred quality scores**. Reference genomes may however be missing or different from the genomes that are actually sequenced, especially in metagenomic samples. On the other hand, Phred quality scores can produce biased estimates if they are uncalibrated.
16
16
17
-
We therefore propose a new framework of estimating sequencing error and bias, called *skiver*, which works without the need for reference genome or relying on Phred scores.
17
+
We therefore propose a new framework for estimating sequencing error and bias, called *skiver*, which works without the need for reference genome or relying on Phred scores.
The key ideas of*skiver* is to use (*k*, *v*)-mer sketches to represent the large amount of sequencing reads. A (*k*, *v*)-mer is a segment of length *k*+*v*, where the first *k* bases are the *key* and the last *v* bases are the *value*. By grouping the (*k*, *v*)-mers with the same key together, we can identify the consensus value, as well as estimate the frequency of sequencing errors.
22
+
The key ideas in*skiver* is to use (*k*, *v*)-mer sketches to represent the large amount of sequencing reads. A (*k*, *v*)-mer is a segment of length *k*+*v*, where the first *k* bases are the *key* and the last *v* bases are the *value*. By grouping the (*k*, *v*)-mers with the same key together, we can identify the consensus value, as well as estimate the frequency of sequencing errors.
23
23
24
-
Experiments on various real datasets show that skiver is able to accurately estimate the sequencing error rate and infer the percentage of *k*-mers in the read set that are free of sequencing errors. In addition, skiver can estimate the substitution, insertion, and deletion rates, revealing the bias of various sequencing platforms.
24
+
Experiments on various real datasets show that skiver is able to accurately estimate the sequencing error rate and infer the percentage of *k*-mers in the read set that are free of sequencing errors. In addition, skiver can estimate the substitution, insertion, and deletion rates, revealing the biases of various sequencing platforms.
0 commit comments