kehrlab.github.io/research.html at master · kehrlab/kehrlab.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
<!DOCTYPE html>
<html lang="en"><head>
  <meta charset="utf-8">
  <meta http-equiv="X-UA-Compatible" content="IE=edge">
  <meta name="viewport" content="width=device-width, initial-scale=1"><!-- Begin Jekyll SEO tag v2.8.0 -->
<title>Research | Algorithmic Bioinformatics</title>
<meta name="generator" content="Jekyll v4.1.1" />
<meta property="og:title" content="Research" />
<meta property="og:locale" content="en_US" />
<link rel="canonical" href="https://username.github.io/research.html" />
<meta property="og:url" content="https://username.github.io/research.html" />
<meta property="og:site_name" content="Algorithmic Bioinformatics" />
<meta property="og:type" content="website" />
<meta name="twitter:card" content="summary" />
<meta property="twitter:title" content="Research" />
<script type="application/ld+json">
{"@context":"https://schema.org","@type":"WebPage","headline":"Research","url":"https://username.github.io/research.html"}</script>
<!-- End Jekyll SEO tag -->
<link rel="stylesheet" href="/assets/main.css"><link type="application/atom+xml" rel="alternate" href="https://username.github.io/feed.xml" title="Algorithmic Bioinformatics" /></head>
<body><header class="site-header" role="banner">

  <div class="wrapper"><a class="site-title" rel="author" href="/">Algorithmic Bioinformatics</a><nav class="site-nav">
        <input type="checkbox" id="nav-trigger" class="nav-trigger" />
        <label for="nav-trigger">
          <span class="menu-icon">
            <svg viewBox="0 0 18 15" width="18px" height="15px">
              <path d="M18,1.484c0,0.82-0.665,1.484-1.484,1.484H1.484C0.665,2.969,0,2.304,0,1.484l0,0C0,0.665,0.665,0,1.484,0 h15.032C17.335,0,18,0.665,18,1.484L18,1.484z M18,7.516C18,8.335,17.335,9,16.516,9H1.484C0.665,9,0,8.335,0,7.516l0,0 c0-0.82,0.665-1.484,1.484-1.484h15.032C17.335,6.031,18,6.696,18,7.516L18,7.516z M18,13.516C18,14.335,17.335,15,16.516,15H1.484 C0.665,15,0,14.335,0,13.516l0,0c0-0.82,0.665-1.483,1.484-1.483h15.032C17.335,12.031,18,12.695,18,13.516L18,13.516z"/>
            </svg>
          </span>
        </label>

        <div class="trigger"><a class="page-link" href="/research.html">Research</a><a class="page-link" href="/people.html">People</a><a class="page-link" href="/publications.html">Publications</a><a class="page-link" href="/contact.html">Contact</a></div>
      </nav></div>
</header>
<main class="page-content" aria-label="Content">
      <div class="wrapper">
        <article class="post">

  <header class="post-header">
    <h1 class="post-title">Research</h1>
  </header>

  <div class="post-content">
    <p>We are a purely computational lab developing new algorithms for solving bioinformatics problems on sequence data.
The problems we address include read alignment, variant detection &amp; genotyping, genome assembly, whole-genome alignment and more.
We implement our algorithms in new software tools and use them in the analysis of sequence data to gain new insights into human genetics and, since recently, the immune system.</p>

<p><strong>Some of our ongoing projects address:</strong></p>

<ul>
  <li>Structural variant detection in tens of thousands of genomes simultaneously  (<a href="https://github.com/kehrlab/PopDel">PopDel</a>)</li>
  <li>Non-reference sequence found in a subset of genomes in a population  (<a href="https://github.com/kehrlab/PopIns2">PopIns2</a>)</li>
  <li>Linked read data analysis: Barcode correction, Barcode mapping, Local assembly, and SV detection  (<a href="https://github.com/kehrlab/bcmap">bcmap</a>)</li>
  <li>Interactive, exploratory workflows for genome analysis</li>
  <li>Cancer genomics: Rearrangements in neuroblastoma genomes</li>
</ul>

<div style="background-color: white; padding-left: 10px; padding-right: 10px; padding-top: 2px">

  <h3 style="margin-bottom: 0px">Structural variant detection</h3>
  <h4>&emsp;&emsp;&emsp;in tens of thousands of genomes simultaneously</h4>

  <table class="project-table">
  <tbody>
  <tr>


    <td class="project-description-cell">

      <p>Our software tool <a href="https://github.com/kehrlab/PopDel">PopDel</a> can simultaneously analyze tens of thousands of (short-read sequenced) genomes for reliably detecting and accurately genotyping structural variants, differences between genomes that affect at least 50 bp of DNA sequence.
The current focus of the program is on deleted sequence but we are in the process of extending PopDel to other types of structural variants including inversions, duplications, and translocations.</p>

<p>Previously we identified a rare deletion in the <em>LDLR</em> gene using PopDel, which causes extremely low levels of LDL cholesterol in the blood (<strong><a href="https://www.ahajournals.org/doi/10.1161/CIRCGEN.120.003029">Björnsson E et al. 2021, Circulation: Genomic and Precision Medicine</a></strong>).
Its superior scalability, high accuracy, fast running time, and easy use make PopDel an attractive alternative to previous approaches.
At the core of PopDel is a space-efficient (binary) read pair profile format and a structural variant detection algorithm that is based on a likelihood ratio test.</p>


      <p>More details in <b><a href="https://www.nature.com/articles/s41467-020-20850-5">Niehus S et al. 2021, Nature Communications</a></b></p>


      <p><b>Code:</b> <a href="https://github.com/kehrlab/PopDel">https://github.com/kehrlab/PopDel</a></p>


      <p style="font-size:small; font-weight: bold; color: gray">Funded by the BMBF Computational Life Sciences Initiative</p></td>


    <td class="project-figure-cell">
      <img src="assets/images/projects/popdel.png" style="max-width:250px;" alt="" />
    </td>

  </tr>

  </tbody>
  </table>
</div>

<div style="background-color: white; padding-left: 10px; padding-right: 10px; padding-top: 2px">

  <h3 style="margin-bottom: 0px">Non-reference sequence</h3>
  <h4>&emsp;&emsp;&emsp;found in a subset of genomes in a population</h4>

  <table class="project-table">
  <tbody>
  <tr>


    <td class="project-description-cell">

      <p>Our software tool <a href="https://github.com/kehrlab/PopIns2">PopIns2</a> (successor of <a href="https://github.com/bkehr/PopIns">PopIns</a>) identifies a type of genomic structural variants that involves non-repetitive sequence not found in the reference genome.
We call these variants “non-reference sequence variants”, or short NRS variants.
Previously we could show that the majority of human non-reference sequence is ancestral (and not newly inserted) and described an association of a NRS variant in the <em>SREBF1</em> gene with myocardial infarction (<strong><a href="http://rdcu.be/pDbJ">Kehr et al. 2017, Nature Genetics</a></strong>).</p>

<p>The detection of NRS variants from short read data is particularly challenging as it inevitably involves a de novo assembly of the non-reference sequence.
We combine data of many individuals simultaneously for reliable NRS assembly.
PopIns2 realizes this by representing non-reference sequence data in colored de Bruijn graphs.</p>


      <p>More details in <b><a href="https://doi.org/10.1093/bioinformatics/btab749">Krannich T et al. 2021, Bioinformatics</a></b></p>


      <div align="middle">
        <img src="assets/images/projects/NRNRs.png" style="max-width:100%;" alt="Examples of structural variants called with PopIns in WGS data of 15,219 Icelanders." />
      </div>

      <div class="figure-caption">
        <p>
        Examples of structural variants called with PopIns in WGS data of 15,219 Icelanders.


        <br />
        from <a href="https://www.nature.com/articles/ng.3801">Kehr et al. 2017, Nature Genetics</a>

        </p>
      </div>


      <p><b>Code:</b> <a href="https://github.com/kehrlab/PopIns2">https://github.com/kehrlab/PopIns2</a></p>


      </td>


  </tr>

  </tbody>
  </table>
</div>

<div style="background-color: white; padding-left: 10px; padding-right: 10px; padding-top: 2px">

  <h3 style="margin-bottom: 0px">Linked read data analysis:</h3>
  <h4>&emsp;&emsp;&emsp;Barcode correction, Barcode mapping, Local assembly, and SV detection</h4>

  <table class="project-table">
  <tbody>
  <tr>


    <td class="project-description-cell">

      <p>Our software tools bcmap and bcctools address the analysis of linked read sequencing data.
Linked read data provides long-range information through barcode labels on accurate short reads, i.e., all reads labeled with the same barcode originate from a small set of long DNA molecules.</p>

<p>Our software tool <a href="https://github.com/kehrlab/bcmap">bcmap</a> efficiently determines genomic intervals of long DNA molecules.
We refer to the addressed problem as “barcode mapping”.
The output of bcmap enables efficient retrieval of reads from genomic regions of interest without the need to compute a full read alignment.
Our barcode mapping approach is significantly faster than read alignment.
Bcmap uses an open-addressing k-mer index and minimizers for efficiently determining genomic intervals of sets of reads labeled with the same barcode.</p>

<p><a href="https://github.com/kehrlab/bcctools">Bcctools</a> is a toolbox for pre-processing linked read data.
It can trim barcodes from the reads, infer a whitelist of barcodes, and implements an efficient index data structure for retrieving corrected barcode sequences in constant time.
Pre-processing linked-read data with bcctools is several times faster than with LongRanger.</p>


      <div align="middle">
        <img src="assets/images/projects/linkedReads.png" style="max-width:100%;" alt="Linked reads are short reads with barcode labels that provide information about longer DNA molecules." />
      </div>

      <div class="figure-caption">
        <p>
        Linked reads are short reads with barcode labels that provide information about longer DNA molecules.


        <br />
        from <a href="https://www.degruyter.com/document/doi/10.1515/medgen-2021-2072/html">Schwarz et al. 2021, Medizinische Genetik</a>

        </p>
      </div>


      <p><b>Code:</b> <a href="https://github.com/kehrlab/bcmap">https://github.com/kehrlab/bcmap</a></p>


      <p style="font-size:small; font-weight: bold; color: gray">Funded by the DFG, FOR 2841: "Beyond the exome", Project P3
</p></td>


  </tr>

  </tbody>
  </table>
</div>

<div style="background-color: white; padding-left: 10px; padding-right: 10px; padding-top: 2px">

  <h3 style="margin-bottom: 0px">Interactive, exploratory workflows</h3>
  <h4>&emsp;&emsp;&emsp;for genome analysis</h4>

  <table class="project-table">
  <tbody>
  <tr>


    <td class="project-description-cell">

      <p>In collaboration with the <a href="https://www.matthiasweidlich.com/">Weidlich lab</a> at Humboldt Universität in Berlin, we develop interactive workflows that support the exploration of genomic data.
Developing a workflow, such as those implemented for PopDel and PopIns2, is usually a dynamic process where various setups and alternatives are tested over a period of time.
With the workflows developed in this project, we simplify and systematically document this process while, at the same time, allowing and tracking user intervention during workflow execution.</p>


      <div align="middle">
        <img src="assets/images/projects/workflow.png" style="max-width:100%;" alt="" />
      </div>

      <div class="figure-caption">
        <p>


        </p>
      </div>


      <p style="font-size:small; font-weight: bold; color: gray">Funded by the DFG, SFB 1404: "FONDA – Foundations of Workflows for Large-Scale Scientific Data Analysis", Project A6
</p></td>


  </tr>

  </tbody>
  </table>
</div>

<div style="background-color: white; padding-left: 10px; padding-right: 10px; padding-top: 2px">

  <h3 style="margin-bottom: 0px">Cancer genomics:</h3>
  <h4>&emsp;&emsp;&emsp;Rearrangements in neuroblastoma genomes</h4>

  <table class="project-table">
  <tbody>
  <tr>


    <td class="project-description-cell">

      <p>In collaboration with the pediatric oncology group led by Prof. Dr. Johannes Schulte at Charité, we study rearrangements in Neuroblastoma genomes.
Neurobalstoma is a pediatric tumor affecting the sympathetic nervous system.
It is a model cancer predominantly driven by copy number variation in the high-risk cases.</p>


      <p style="font-size:small; font-weight: bold; color: gray">Funded by the DFG, GRK 2424: "CompCancer – Computational Methods for Oncology: Towards Personalized Therapies in Cancer"
</p></td>


  </tr>

  </tbody>
  </table>
</div>


  </div>

</article>

      </div>
    </main><footer class="site-footer h-card">
  <data class="u-url" href="/"></data>

  <div class="wrapper">

<!--    <h2 class="footer-heading">Algorithmic Bioinformatics</h2>   -->

    <div class="footer-col-wrapper">
      <div class="footer-col footer-col-1"><ul class="social-media-list"><li><a href="https://github.com/kehrlab"><svg class="svg-icon"><use xlink:href="/assets/minima-social-icons.svg#github"></use></svg> <span class="username">kehrlab</span></a></li></ul>
</div>

      <div class="footer-col footer-col-3">
        <p></p>
      </div>
    </div>

  </div>

</footer>
</body>

</html>