-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathquery.html
More file actions
2983 lines (2982 loc) · 128 KB
/
query.html
File metadata and controls
2983 lines (2982 loc) · 128 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
layout: default
---
<h1 id="introduction">Introduction</h1>
<h2 id="goals-of-the-query-ui">Goals of the Query UI</h2>
<p>
This tool will help you translate your research questions into Timur queries,
which you can run in your browser to generate a data frame. Most research
questions begin with a root question and a set of desired data points, as well
as some filtering criteria. For example, "I’m curious about patients with
positive disease status, under the age of 50. In particular, I also want to
know their sex at birth and IL-6 levels." Once you receive a data frame with
this information, you can then run various analyses on your own computer to
explore relationships among the data. In this document we’ll discuss:
</p>
<ol>
<li>
<p>
The thought process behind converting your research question into a
question that fits the Timur models.
</p>
</li>
<li><p>How to input that question into the Query UI.</p></li>
<li>
<p>
How to make sure you have a "flat" data frame, which will make your
subsequent analyses simpler.
</p>
</li>
</ol>
<p>
We’ll start with showing how to use the Map view to understand a project’s
models, walk through the overall Query UI, and then step through some concrete
examples, showing how to get from a research question to a data frame.
</p>
<h2 id="section:map">Map and Model Overview</h2>
<p>
The Timur Map view is available by clicking on the Map link in the top
navigation bar, once you navigate to your project.
</p>
<figure>
<img
src="/assets/images/timur/map/NavigationBar_Map.png"
id="fig:navigation_bar_map"
alt="A project’s Map link"
/>
<figcaption aria-hidden="true">A project’s Map link</figcaption>
</figure>
<p>
Clicking that brings up a view with a graphical representation of the project
models, as well as a tabular listing of the selected model’s attributes. By
default the project model is selected.
</p>
<figure>
<img
src="/assets/images/timur/map/ModelView.png"
id="fig:ipi_map"
alt="Map for the IPI project"
/>
<figcaption aria-hidden="true">Map for the IPI project</figcaption>
</figure>
<p>
We won’t describe the entire Map view here, only the relevant portions to
constructing queries. Each box in the map represents a specific model, and
each attribute represents some aspect of that model. You can think of a model
as a high-level concept (i.e. Bulk RNA sequencing), and the attributes as
specific details within that concept (i.e. Eisenberg score). Within the Timur
database, each model contains individual records that are concrete data points
identified by unique identifier strings, such as an individual library of
RNASeq or tube of CyTOF.
</p>
<h3 id="paths-between-models">Paths Between Models</h3>
<p>
Often in this document we will discuss paths between models, and intervening
models that appear along this path. When you specify a filter or column, the
Query UI will attempt to find the shortest path between the root model and
your filter or column model – shortest path meaning the fewest number of
intervening models between the starting point and the destination. You can see
the paths on the map, and while you do not have to select the path yourself,
you will want to note the intervening models, because they may affect how you
toggle the filter and column settings in your query.
</p>
<p>
Let’s look at <a
href="#fig:ipi_map_patient_subtree"
data-reference-type="ref"
data-reference="fig:ipi_map_patient_subtree"
>1.3</a
>, where we can use the Patient subtree for some examples. If your root model
is Patient, and you want to add a filter based on some Flow attribute, the
path would be Patient -> Sample -> Flow. If your root model is RnaSeq,
and you want to filter based on some Demographic value, the path in that case
is RnaSeq -> Sample -> Patient -> Demographic. Links are
non-parent-child relationships between models, and you can find them by
viewing the attributes for a model (links currently do not appear visually on
the graph). Paths may also traverse through these links, if they provide a
shorter path between two models.
</p>
<figure>
<img
src="/assets/images/timur/map/Map_PatientSubtree.png"
id="fig:ipi_map_patient_subtree"
alt="Patient Subtree"
/>
<figcaption aria-hidden="true">Patient Subtree</figcaption>
</figure>
<h3 id="travel-directionality">Travel Directionality</h3>
<p>
You’ll notice that the visual representation of the map looks like a tree,
with the project at the top of the tree. We’ll thus frequently use the terms
"up the tree" and "down the tree" to describe relationships between models.
This directionality will be important to understand if the relationship is
one-to-one or one-to-many, which affects the shape of your final data frame.
</p>
<h4 id="section:up-the-tree">Up the tree</h4>
<p>
The way models are designed, all relationships that go "up the tree" are
one-to-one relationships. When doing queries, one-to-one relationships result
in a single data point inside of a data frame cell, which makes analysis
simpler.
</p>
<p>
For example, looking at <a
href="#fig:ipi_map"
data-reference-type="ref"
data-reference="fig:ipi_map"
>1.2</a
>, we can see that Patient is above Sample in the tree. So going from Sample
to Patient means we’re moving up the tree – and each Sample belongs to one and
only one Patient. In a data frame, this might look like <a
href="#table:up-the-tree-example"
data-reference-type="ref"
data-reference="table:up-the-tree-example"
>1.1</a
>.
</p>
<div id="table:up-the-tree-example">
<table>
<caption>
Example data frame going up the tree
</caption>
<thead>
<tr class="header">
<th style="text-align: center">Sample</th>
<th style="text-align: center">Patient</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: center">Patient001.T1</td>
<td style="text-align: center">Patient001</td>
</tr>
<tr class="even">
<td style="text-align: center">Patient001.N1</td>
<td style="text-align: center">Patient001</td>
</tr>
<tr class="odd">
<td style="text-align: center">Patient002.T1</td>
<td style="text-align: center">Patient002</td>
</tr>
<tr class="even">
<td style="text-align: center">Patient003.N1</td>
<td style="text-align: center">Patient003</td>
</tr>
</tbody>
</table>
</div>
<p>
Note that knowing the relationship "up the tree" gives us no information about
the inverse direction, down the tree. To determine the relationship type going
down the tree, we’ll need to inspect the attributes.
</p>
<h4 id="down-the-tree">Down the tree</h4>
<p>
Most relationships that are "down the tree" are one-to-many, which can result
in data frames with nested information. If you take our previous example of
Patients and Samples, we can look at the Patient attributes to see what kind
of relationship exists between the two when going down the tree. This can be
seen in <a
href="#fig:ipi_patient_model"
data-reference-type="ref"
data-reference="fig:ipi_patient_model"
>1.4</a
>.
</p>
<figure>
<img
src="/assets/images/timur/map/PatientModel.png"
id="fig:ipi_patient_model"
alt="Patient model"
/>
<figcaption aria-hidden="true">Patient model</figcaption>
</figure>
<p>
We see that Sample is a ‘collection‘ type attribute. This means that a single
Patient has zero or more Samples. In a non-flat data frame, querying this out
results in a nested data frame, like in <a
href="#table:nested-data-frame-example"
data-reference-type="ref"
data-reference="table:nested-data-frame-example"
>1.2</a
>.
</p>
<div id="table:nested-data-frame-example">
<table>
<caption>
Example nested data frame
</caption>
<thead>
<tr class="header">
<th style="text-align: center">Patient</th>
<th style="text-align: center">Sample</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: center">Patient001</td>
<td style="text-align: center">Patient001.T1, Patient001.N1</td>
</tr>
<tr class="even">
<td style="text-align: center">Patient002</td>
<td style="text-align: center">Patient002.T1</td>
</tr>
<tr class="odd">
<td style="text-align: center">Patient003</td>
<td style="text-align: center">Patient003.N1</td>
</tr>
</tbody>
</table>
</div>
<p>
To analyze a nested data frame like this, you would have to extract the nested
data yourself. To get a flat data frame, you can either restructure your query
to take advantage of <a href="#section:up-the-tree"
>up the tree relationships</a
>, or use <a href="#section:column-slicing">column slicing</a> and extract
only the data values you are interested in.
</p>
<p>
For one-to-one "down the tree" relationships (they are rare, but do exist in
several projects), you do not have to worry about nested data.
</p>
<h2 id="query-ui-overview">Query UI Overview</h2>
<p>
The Timur Query UI is available by clicking on the Query link in the top
navigation bar, once you navigate to your project.
</p>
<figure>
<img
src="/assets/images/timur/query/NavigationBar_Query.png"
id="fig:navigation_bar_query"
alt="A project’s Query link"
/>
<figcaption aria-hidden="true">A project’s Query link</figcaption>
</figure>
<p>
Clicking that brings up an empty form builder, with a selector for a "Root
Model".
</p>
<figure>
<img
src="/assets/images/timur/query/Query_RootModel.png"
id="fig:query_unselected_root_model"
alt="Query form with unselected root model"
/>
<figcaption aria-hidden="true">
Query form with unselected root model
</figcaption>
</figure>
<p>Once you select a root model, the rest of the form builder appears.</p>
<figure>
<img
src="/assets/images/timur/query/Query_BlankForm.png"
id="fig:query_blank_form"
alt="Blank query form"
/>
<figcaption aria-hidden="true">Blank query form</figcaption>
</figure>
<h3 id="root-model">Root Model</h3>
<p>
The root model forms the starting point of your query. The identifier of the
model is the left-most column of your data frame and acts as the unique
identifier for the output. Generally, it is the starting point of the question
you’ve formulated. For example, if you want to know "All the patients that are
COVID positive and their IL-6 levels", you would most likely want to start
with Patient as your root model. We will explore a couple different ways to
extract the same data with different root models, but we’ll start with this
more straightforward approach in each example.
</p>
<p>
Note that once you select a root model, its identifier is automatically added
as a <a href="#section:columns">column</a> and appears in the <a
href="#section:data-frame"
>output data frame</a
>. Unlike other columns which you will add later, this column can only be
removed by selecting a different root model.
</p>
<h3 id="where-filters">Where Filters</h3>
<p>
Most research questions have some sort of constraint on the data that you want
to analyze. You can think of Where Filters as applying those constraints and
limiting the rows that appear in your data frame. This section of the form
builder allows you to specify the set of filters that will narrow down the
returned data.
</p>
<p>
As an added bonus, remember that where filters are one way that we might be
able to narrow down a nested data frame into a flat data frame.
</p>
<h4 id="specifying-a-filter">Specifying a filter</h4>
<p>
To specify a filter, you will need three or four basic pieces of information:
</p>
<ol>
<li><p>The model you want to filter on.</p></li>
<li>
<p>
Clauses that you want to apply to the filter model or its children models.
Each clause is composed of:
</p>
<ol>
<li><p>(sometimes) An "Any" or "Every" statement.</p></li>
<li><p>The clause model (can be same as the filter model).</p></li>
<li><p>The clause model’s attribute you want to filter on.</p></li>
<li><p>The operator you want to apply.</p></li>
<li><p>(sometimes) The operand to evaluate the operator against.</p></li>
</ol>
</li>
</ol>
<p>
For example, if you wanted to filter on Patients with age greater than 50, you
would need to know that "age" in IPI is contained in the demographic table.
Note that only models that have a valid path from your root model will appear
in the filter’s Model selector.
</p>
<p>
Once you have selected a filter model, you will need to add one or more
clauses to your filter. A clause is a condition on a specific model – either
the filter model itself, or a child of the filter model. If the clause model
has a one-to-many relationship with the filter model, you will also be able to
select <code>Any</code> or <code>Every</code> as part of the condition.
</p>
<p>
Once you determine the right clause model, you’ll have to determine the
attribute name. For non-table models, you can use the <a href="#section:map"
>Map view</a
>
to determine the attribute you want. For tables, you will have to inspect the
table and find the right "Name" or "Value" that you want to filter on. The
simplest way to do that is probably with the Search page, to view the raw
table data.
</p>
<p>
One tricky operator to apply is <code>Is present</code> or
<code>Is missing</code> on models. When you want to know if an attribute on a
model is populated, you might specify a filter like "Samples where
<code>tissue_type</code> Is present", to mean "Sample records where the
<code>tissue_type</code> field has some data provided". When you attempt to do
the same with a "collection" type attribute (i.e. Samples with RnaSeq data),
you cannot do a "Samples where RnaSeq Is present" query, since
<code>rna_seq</code> will not appear as an attribute for Sample. Instead, you
have to add a filter like "RnaSeq where <code>tube_name</code> Is present",
using the identifier of the RnaSeq model.
</p>
<p>
If using the <code>In</code> or <code>Not in</code> operator, you have to
provide a comma-separated string with no spaces. i.e. if you want Patient IDs
in the set Patient 5, Patient 9, and Patient 11, you would construct the
operand as:
</p>
<pre><code> Patient5,Patient9,Patient11</code></pre>
<p>
As you add and edit filters, you can see them appear in the <a
href="#section:query-preview"
>query preview</a
>. While the exact syntax of that section may not be very intuitive (and that
is okay!), hopefully as you edit the filters you can see how your filters
affect the query and the output data frame.
</p>
<p>
You can check <a
href="#table:appendix-where-filter-operators"
data-reference-type="ref"
data-reference="table:appendix-where-filter-operators"
>3.1</a
>
for a list of operators for each type of attribute.
</p>
<h4 id="any-vs.-every">Any vs. Every</h4>
<p>
When filters traverse across models, any one-to-many relationship creates the
opportunity to also specify an "Any" or "Every" operator. These models are
calculated by the tool and will appear to the left of the filter model. For
example, when using the Patient as a root model, if we add in a Sample filter
to only return samples with <code>tissue_type</code> as "Primary" we’ll see a
selector appear with "Any Sample" or "Every Sample" as options. This is seen
in <a
href="#fig:query_any_every_example"
data-reference-type="ref"
data-reference="fig:query_any_every_example"
>1.8</a
>.
</p>
<figure>
<img
src="/assets/images/timur/query/Query_AnyEveryExample.png"
id="fig:query_any_every_example"
alt="Any or every sample"
/>
<figcaption aria-hidden="true">Any or every sample</figcaption>
</figure>
<p>
By default, "Any" is selected. This means that you want records where Any of
the Samples meet the criteria. If a Patient has three Samples, one or more of
them must be labelled as Primary in order for the Patient to be included in
the output data frame. Only Patients with zero Primary Samples will be left
out of the data set. If you select "Every", that means you expect every single
Sample attached to that Patient to meet the criteria. So if a Patient has
three Samples, all of them must be labelled as Primary for that Patient to be
included in the output data frame. Patients with one or more non-Primary
Samples will be left out of the data set.
</p>
<p>
When you have multiple models in the path from your root model to the filter
model, you will have an Any / Every toggle for each one-to-many relationship
in the path, and each combination of selections will result in a different set
of output data. An example is shown in <a
href="#fig:query_multiple_any_every_example"
data-reference-type="ref"
data-reference="fig:query_multiple_any_every_example"
>1.9</a
>, where Patient has a one-to-many relationship with Sample, and Sample has a
one-to-many relationship with RnaSeq, and so two Any / Every toggles appear
when you add an RnaSeq filter.
</p>
<figure>
<img
src="/assets/images/timur/query/Query_MultipleAnyEverySelectors.png"
id="fig:query_multiple_any_every_example"
alt="Multiple any or every selectors"
/>
<figcaption aria-hidden="true">Multiple any or every selectors</figcaption>
</figure>
<p>
Note that the same Any / Every reasoning applies to the clauses within a given
filter, so you can control how a clause affects the filter’s output using
those selectors.
</p>
<h4 id="and-and-or">And and Or</h4>
<p>
Sometimes you may want to combine filters using AND or OR logic. For example,
I want Patients who are older than 50 OR younger than 25 (or conversely, I
want Patients who are younger than 50 AND older than 25). Currently, the Query
UI supports a very simple version of this – if you need to construct more
complicated queries, please let the Data Library Engineering team know, and we
can add that ability as a feature request.
</p>
<p>
By default, all filters are applied as AND filters. However, the checkboxes on
the left of each filter, as shown in <a
href="#fig:query_any_every_example"
data-reference-type="ref"
data-reference="fig:query_any_every_example"
>1.8</a
>, let you add one layer of OR filters. All checked filters are aggregated
into a single OR statement that is then ANDed with the other filters.
</p>
<p>
For example, a query with filters as shown in <a
href="#fig:query_patient_between_25_50"
data-reference-type="ref"
data-reference="fig:query_patient_between_25_50"
>1.10</a
>
would be translated as "Patients between the ages of 25 and 50".
</p>
<figure>
<img
src="/assets/images/timur/query/Query_PatientBetween25_50.png"
id="fig:query_patient_between_25_50"
alt="Patients between ages of 25 and 50"
/>
<figcaption aria-hidden="true">Patients between ages of 25 and 50</figcaption>
</figure>
<p>
Whereas a query with filters as shown in <a
href="#fig:query_patient_not_between_25_50"
data-reference-type="ref"
data-reference="fig:query_patient_not_between_25_50"
>1.11</a
>
would be translated as "Patients older than 50 OR younger than 25".
</p>
<figure>
<img
src="/assets/images/timur/query/Query_PatientOlder50_Younger25.png"
id="fig:query_patient_not_between_25_50"
alt="Patients older than 50 or younger than 25"
/>
<figcaption aria-hidden="true">
Patients older than 50 or younger than 25
</figcaption>
</figure>
<p>
You can then combine these with other filters as shown in <a
href="#fig:query_patient_not_between_25_50_bmi_35"
data-reference-type="ref"
data-reference="fig:query_patient_not_between_25_50_bmi_35"
>1.12</a
>, which would be translated as "Patients older than 50 OR younger than 25,
who have a bmi greater than 35".
</p>
<figure>
<img
src="/assets/images/timur/query/Query_PatientOlder50_Younger25_bmi.png"
id="fig:query_patient_not_between_25_50_bmi_35"
alt="Patients older than 50 or younger than 25 and with bmi greater than 35"
/>
<figcaption aria-hidden="true">
Patients older than 50 or younger than 25 and with bmi greater than 35
</figcaption>
</figure>
<p>
Note that you can also combine "And"-type statements as multiple clauses on a
single filter. Sometimes (especially with child-models), this is required to
answer a specific question. The usage of multiple clauses versus multiple
filters depends upon the question and in some cases, either may result in the
same answer.
</p>
<figure>
<img
src="/assets/images/timur/query/Query_PatientBetween25_50_clauses.png"
id="fig:query_patient_between_25_50_clauses"
alt="Patients between ages of 25 and 50, using clauses"
/>
<figcaption aria-hidden="true">
Patients between ages of 25 and 50, using clauses
</figcaption>
</figure>
<h3 id="section:columns">Columns</h3>
<p>
Where filters adjust which records of data to traverse, and thus affect the
contents of your data frame’s rows. But it takes more than rows to fill a data
frame. In the Columns section, you’ll pick what attributes to use to fill your
columns. This is the third section of the UI and appears auto-populated with
the identifier of your root model. You cannot directly remove this column, but
you can provide an alternate label to change how it appears in your data
frame. The default label is simply <code>model_name.attribute_name</code>.
</p>
<figure>
<img
src="/assets/images/timur/query/Query_ColumnSection.png"
id="fig:query_column_section"
alt="Column section with root model identifier"
/>
<figcaption aria-hidden="true">
Column section with root model identifier
</figcaption>
</figure>
<p>
Note that the columns are independent of the filters, so you can select
columns on models that do not have filters.
</p>
<h4 id="specifying-a-column">Specifying a Column</h4>
<p>
To specify a column, you only need the join model name and the attribute name:
</p>
<ol>
<li><p>The model you want data from.</p></li>
<li><p>The model’s attribute you want data for.</p></li>
</ol>
<p>
Note that the first input box for each column is for a
<code>Display Label</code> – this is optional, and it will default to
<code>model_name.attribute_name</code> if you leave the input blank.
</p>
<figure>
<img
src="/assets/images/timur/query/Query_NewColumnRow.png"
id="fig:query_new_column_row"
alt="New column row"
/>
<figcaption aria-hidden="true">New column row</figcaption>
</figure>
<p>
Once you define the column’s model and attribute, it will appear in the <a
href="#section:data-frame"
>data frame</a
>
at the bottom of the page. If you edit the <code>Display Label</code> for a
column, the data frame column heading should also change. You may need to do
this to prevent duplicate colum headings, which would confuse downstream R or
Python analysis.
</p>
<figure>
<img
src="/assets/images/timur/query/Query_ColumnDisplayLabel.png"
id="fig:query_column_display_label"
alt="Edited column labels"
/>
<figcaption aria-hidden="true">Edited column labels</figcaption>
</figure>
<p>
As you add and edit columns and slices, you can also see them appear in the
<a href="#section:query-preview">query preview</a>. While the exact syntax of
that section may not be very intuitive (and that is okay!), hopefully as you
edit the columns and slices you can intuit how they affect the query and the
output data frame.
</p>
<h4 id="section:column-slicing">Slicing</h4>
<p>
When your research question traverses across models that have one-to-many
relationships, many times we are only interested in a subset of those
relationships. In order to prevent nested data in your data frame, you can
construct one or more Slices to select a subset of the nested data and get a
flat data frame.
</p>
<p>
For example, you may only be interested in Tumor samples, in which case you
might slice on the Patient -> Sample relationship. Column slicing gives you
a subset of column data (as opposed to Filters, which give you a subset of row
data). Going back to our example in <a
href="#table:nested-data-frame-example"
data-reference-type="ref"
data-reference="table:nested-data-frame-example"
>1.2</a
>
where Patient is one-to-many with Sample, we said that one way to construct a
flat data frame with the same information was to use Sample as the root model.
However, perhaps we are collecting additional data, and we really want to keep
Patient as our root model. If we only want Tumor samples, we could construct a
slice to select only the Tumor samples out of each Sample column, as in <a
href="#fig:query_sample_slice_tumor"
data-reference-type="ref"
data-reference="fig:query_sample_slice_tumor"
>1.17</a
>.
</p>
<figure>
<img
src="/assets/images/timur/query/Query_SampleSliceTumor.png"
id="fig:query_sample_slice_tumor"
alt="Slice out tumor samples"
/>
<figcaption aria-hidden="true">Slice out tumor samples</figcaption>
</figure>
<p>
Assuming only a single tumor sample per patient, this would also result in a
flat data frame. Slicing columns in this fashion is particularly helpful with
clinical data, which generally appear in Timur as tables.
</p>
<p>
Note that slice construction requires the same set of information as a filter:
</p>
<ol>
<li><p>The model you want to filter on.</p></li>
<li><p>The model’s attribute you want to filter on.</p></li>
<li><p>The operator you want to apply.</p></li>
<li><p>(sometimes) The operand to evaluate the operator against.</p></li>
</ol>
<p>
You can also slice out Matrix data, which is currently used for gene
expression and gene count data in the RnaSeq model. The main difference is
that the only slice operator in this case is <code>Slice</code>, and you then
provide a comma-separated list of Ensembl Gene Ids, with no spaces:
</p>
<pre><code> ENSG00000000003,ENSG00000000005,ENSG00000000419</code></pre>
<p>
An example of a matrix slice can be seen in <a
href="#fig:query_matrix_slicing_example"
data-reference-type="ref"
data-reference="fig:query_matrix_slicing_example"
>1.18</a
>.
</p>
<figure>
<img
src="/assets/images/timur/query/Query_MatrixSlicingExample.png"
id="fig:query_matrix_slicing_example"
alt="Matrix Slicing Example"
/>
<figcaption aria-hidden="true">Matrix Slicing Example</figcaption>
</figure>
<p>
You can check <a
href="#table:appendix-column-slice-operators"
data-reference-type="ref"
data-reference="table:appendix-column-slice-operators"
>3.2</a
>
for a list of operators for each type of column slice attribute.
</p>
<h3 id="section:query-preview">Query Preview</h3>
<p>
As you construct your query by adding filters and columns, there is a small
text window that shows you what the raw Timur query will look like. It has a
green outline, as seen in <a
href="#fig:query_preview"
data-reference-type="ref"
data-reference="fig:query_preview"
>1.19</a
>. While you should not expect to understand the exact syntax of this string,
it should be helpful for you to see how changing the filters and columns
changes this raw query. As you get more familiar with the Query UI, you may be
able to intuit the kind of data in your data frame, based on this raw query
string.
</p>
<figure>
<img
src="/assets/images/timur/query/Query_Preview.png"
id="fig:query_preview"
alt="Query preview string"
/>
<figcaption aria-hidden="true">Query preview string</figcaption>
</figure>
<h3 id="section:data-frame">Data Frame</h3>
<p>
The bottom pane of the tool includes a set of control toggles and buttons as
well as a data frame. While you contruct your query, the columns will appear
in the data frame. This gives you an idea of what your final data will look
like.
</p>
<figure>
<img
src="/assets/images/timur/query/Query_EmptyDataFrame.png"
id="fig:query_empty_data_frame"
alt="Empty Data Frame"
/>
<figcaption aria-hidden="true">Empty Data Frame</figcaption>
</figure>
<h4 id="reset-query">Reset query</h4>
<p>
If you would like to reset the entire query, you can click this button to
remove all form entries and start over.
</p>
<h4 id="query">Query</h4>
<p>
Once you are satisfied with your data frame, you can click the
<code>Query</code> button at the top right of the data frame. Once data is
returned from the server, you will see it appear in the data frame. You can
navigate between pages or set a different number of items per page. Note that
setting a different number of items per page requires you to re-click the
<code>Query</code> button to re-fetch data.
</p>
<figure>
<img
src="/assets/images/timur/query/Query_PopulatedDataFrame_Flattened_Expanded.png"
id="fig:query_populated_data_frame_flattened_expanded"
alt="Populated Data Frame"
/>
<figcaption aria-hidden="true">Populated Data Frame</figcaption>
</figure>
<h4 id="download-tsv">Download TSV</h4>
<p>
To download all records in a tab-separated file, you can click the
<code>Download TSV</code> button. This will get you all records from the query
results. Note that with larger data frames or matrix data, the download may
take a while to complete.
</p>
<h4 id="copy-link">Copy link</h4>
<p>
You can share and bookmark the query you’ve built, using the URL. For your
convenience, clicking this button will put the entire URL into your clipboard,
so you can paste it / share it with others. You can also bookmark the URL in
your browser to save a specific query.
</p>
<h4 id="nesting-and-expanding-matrices">Nesting and Expanding Matrices</h4>
<p>
One thing you may have noticed from <a
href="#fig:query_populated_data_frame_flattened_expanded"
data-reference-type="ref"
data-reference="fig:query_populated_data_frame_flattened_expanded"
>1.21</a
>
is that we only specified two columns, but there are four columns in the data
frame. This is because the default behavior of the tool is to expand matrix
slices such that each data point is in its own, unique cell. This is generally
more convenient for analysis.
</p>
<p>
A toggle does exist for you to change that behavior. Toggling that to nested
matrices results in the expected two columns, but leaves the matrix data all
joined in a single cell, with no clear labelling of which data point belongs
to which Ensembl gene code, as seen in <a
href="#fig:query_populated_data_frame_nested_matrix"
data-reference-type="ref"
data-reference="fig:query_populated_data_frame_nested_matrix"
>1.22</a
>.
</p>
<figure>
<img
src="/assets/images/timur/query/Query_PopulatedDataFrame_NestedMatrix.png"
id="fig:query_populated_data_frame_nested_matrix"
alt="Data Frame with Nested Matrix"
/>
<figcaption aria-hidden="true">Data Frame with Nested Matrix</figcaption>
</figure>
<p>
If you include a matrix attribute as a column but do not slice it, and if the
toggle is switched to <code>Expand matrices</code>, the UI will expand the
column with all possible gene codes. This results in over 58,000 columns! The
UI will only render a maximum of 10, and to see the entire data set, you will
have to download the TSV. You will see a warning for this, as seen in <a
href="#fig:query_column_limit_warning"
data-reference-type="ref"
data-reference="fig:query_column_limit_warning"
>1.23</a
>. Downloading this TSV may be a slow operation, since the TSV is generated in
your browser. If you need to download an entire gene expression matrix, the
Search page may offer faster performance, once you have identified the
<code>rna_seq</code> records you want to pull data for. Alternatively, you can
add a column slice to your query and specify just the genes you need.
</p>
<figure>
<img
src="/assets/images/timur/query/Query_ColumnLimitWarning.png"
id="fig:query_column_limit_warning"
alt="Warning about unrendered columns"
/>
<figcaption aria-hidden="true">Warning about unrendered columns</figcaption>
</figure>
<h4 id="nested-or-flattened-data-frame">Nested or Flattened Data Frame</h4>
<p>
Another toggle for the data frame is to use a nested or flattened view – with
the default being a flattened view. While your goal should be to construct the
query in such a way as to get a flattened view, it may be difficult to realize
when you have not done so. Changing the value of this toggle will reveal if
your actual data is flat or not. If your data is flat, changing this value
between <code>Nested</code> and <code>Flattened</code> will not change the
data frame. However, if your query data is not flat, changing the value to
<code>Nested</code> will reveal additional labels and data points, as can be
seen in <a
href="#fig:query_populated_data_frame_nested_expanded"
data-reference-type="ref"
data-reference="fig:query_populated_data_frame_nested_expanded"
>1.24</a
>.
</p>
<figure>
<img
src="/assets/images/timur/query/Query_PopulatedDataFrame_Nested_Expanded.png"
id="fig:query_populated_data_frame_nested_expanded"
alt="Populated Data Frame, Nested"
/>
<figcaption aria-hidden="true">Populated Data Frame, Nested</figcaption>
</figure>
<h3 id="section:errors">Messages and Errors</h3>
<p>
As you explore queries, you may enounter messages at the top of the screen.
You can dismiss the messages by clicking on the green checkmark on the left.
Messages persist on the screen until you dismiss them, even if you take other
actions on the page, like re-running a query.
</p>
<figure>
<img
src="/assets/images/timur/query/Query_ErrorMessage.png"
id="fig:query_error_message"
alt="Error Notification"
/>
<figcaption aria-hidden="true">Error Notification</figcaption>
</figure>
<p>
If your query has no results, you will see a message like
<code>Page 1 not found</code>.
</p>
<p>
However, sometimes you may construct an invalid query due to a bug, or more
likely, an invalid operand value (i.e. typing in a text operand for a date
attribute). In that case, the message may be more obscure, like:
</p>
<pre><code> Server Error</code></pre>
<p>
In those cases you will want to check over your query to make sure you have
valid selections for filters and slices. We will be working over time to
improve error catching and messaging, and reduce how often obscure error
messages are returned. If you are unsure how to proceed, or if you think there
is a bug with a valid query, please reach out to the Data Library team.
</p>
<h1 id="examples">Examples</h1>
<p>
In this section we will dissect several example queries across different
projects. You may find that one query is similar to your research question,
and you can build off of it. Each example will be structured with the
following sections:
</p>
<ol>
<li><p>Specify the research question.</p></li>
<li><p>Identify the relevant models to answer the question.</p></li>
<li><p>Input the query into the Query UI.</p></li>
<li><p>Understand the output data frame.</p></li>
</ol>
<h2 id="ipi">IPI</h2>
<h3 id="gene-expression-from-rna-seq-single-compartment">
Gene expression from RNA Seq, single compartment
</h3>
<h4 id="question">Question</h4>
<p>
We are interested in exploring the gene expression from Bulk RNA Seq data in
the IPI data set, specifically for the stroma compartment. We have a list of
specific genes that we are interested in, so we will want to extract their
data only.
</p>
<h4 id="models">Models</h4>
<p>
From the IPI Map view, we can see that there is a model called
<code>rna_seq</code>, so this seems like a good place to start. Once we click
on it, we’ll see that there are a lot of attributes to get familiar with!
</p>
<figure>
<img
src="/assets/images/timur/query/Query_RnaSeq_Model.png"
id="fig:query_rna_seq_model"
alt="IPI Bulk RNA Seq Model"
/>
<figcaption aria-hidden="true">IPI Bulk RNA Seq Model</figcaption>
</figure>
<p>
We know that IPI uses the <code>eisenberg_score</code> attribute as a quality
control metric for Bulk RNA Seq data, and so we will want to only collect
"good" records in our final data set. Let’s keep in mind that we will want to
add in a filter on this attribute.
</p>
<p>
We see that gene expression data is kept in the
<code>gene_tpm</code> attribute. Since it is a matrix keyed to Ensembl gene
ids, we will need to find the Ensembl ids (not Hugo names) of the genes we are
interested in. For this example, we will simply use the first three options:
</p>
<pre><code> ENSG00000000003,ENSG00000000005,ENSG00000000419</code></pre>
<p>
Because we are only interested in exploring the stroma compartment, we will
also have to use the <code>compartment</code> attribute to narrow down our
results.
</p>
<p>
Looking at the above criteria, it seems like our research question might be
formulated along the lines of:
</p>
<pre><code> From the rna_seq model, I want the gene_tpm data for genes ENSG00000000003, ENSG00000000005, and ENSG00000000419, but only in the stroma compartment and with an eisenberg_score greater than 7 (since someone told me that 7 is a good cutoff).</code></pre>
<h4 id="ui-input">UI Input</h4>
<p>Let’s translate the general question we’ve formulated into the Query UI.</p>
<pre><code> From the rna_seq model</code></pre>
<p>indicates that the root model should be <code>rna_seq</code>.</p>
<pre><code> I want the gene_tpm data for genes ENSG00000000003, ENSG00000000005, and ENSG00000000419,</code></pre>
<p>
becomes a column, where the model is <code>rna_seq</code> and the attribute is
<code>gene_tpm</code>. Because we want only a subset of the gene data, we’ll
add a slice with the operand of
</p>
<pre><code> ENSG00000000003,ENSG00000000005,ENSG00000000419</code></pre>
<p>Lastly,</p>
<pre><code> but only in the stroma compartment and with an eisenberg_score greater than 7</code></pre>
<p>
could become two different filters. They would have the following settings:
</p>
<div id="table:filter-settings-for-rna-seq-example">
<table>
<caption>
Filter settings for Bulk RNA Seq example
</caption>
<thead>
<tr class="header">
<th style="text-align: center">Filter Model</th>
<th style="text-align: center">Clause Model</th>
<th style="text-align: center">Attribute</th>
<th style="text-align: center">Operator</th>
<th style="text-align: center">Operand</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: center">rna_seq</td>
<td style="text-align: center">rna_seq</td>
<td style="text-align: center">compartment</td>
<td style="text-align: center">Equals</td>
<td style="text-align: center">stroma</td>
</tr>
<tr class="even">
<td style="text-align: center">rna_seq</td>
<td style="text-align: center">rna_seq</td>
<td style="text-align: center">eisenberg_score</td>
<td style="text-align: center">Greater than</td>
<td style="text-align: center">7</td>
</tr>
</tbody>
</table>
</div>
<p>
Note that in this case, you could also create a single filter with two
clauses:
</p>
<div id="table:filter-settings-for-rna-seq-example-clauses">
<table>
<caption>
Filter settings for Bulk RNA Seq example, using clauses
</caption>
<thead>
<tr class="header">