The repository contains examples of using BigQuery with genomics data. The code within each language-specific folder demonstrates the same set of queries upon the Platinum Genomes dataset. For more detail about this data see Google Genomics Public Data.
- Go to the BigQuery Browser Tool.
- Click on "Compose Query".
- Copy and paste the following query into the dialog box and click on "Run Query":
SELECT
reference_name,
COUNT(reference_name) AS num_records,
COUNT(call.call_set_name) AS num_calls
FROM
[genomics-public-data:platinum_genomes.variants]
GROUP BY
reference_name
ORDER BY
reference_name
View the results!
- Try a few more queries in the sql subdirectory.
- variant-level-data-for-brca1.sql
- sample-level-data-for-brca1.sql
- sample-variant-counts-for-brca1.sql
- Replace
_THE_TABLE_withgenomics-public-data:platinum_genomes.variantsor your own table if you have exported variants from Google Genomics to BigQuery.
- New to BigQuery?
- See the query reference.
- See the BigQuery book Google BigQuery Analytics
- New to working with variants?
- See an overview of the VCF data format.
- Looking for more advanced sample queries?
- See BigQuery Examples.
Instead of using the browser tool to send queries to BigQuery, you can use code in many languages to call the BigQuery API.
- Try the "getting started" samples in one or more languages by navigating to the subdirectory in this repository for the desired langauage:
- All languages will require a Project ID from a project that has the BigQuery API enabled.
- Follow the BigQuery sign up instructions if you do not yet have a valid project. (Note: you do not need to enable billing for the small examples in this repository)
- You can find the Project ID for your new project in the Google Developers Console.