Skip to content

JBrowse Configuration

Kim Rutherford edited this page Nov 14, 2024 · 18 revisions

Track configuration file

The main configuration file (the "JBrowse metadata file") is in the pombase-config Git repo

This file is included in every nightly load.

Metadata file columns

data_type

users see (in documentation, submission template, etc.): Data type

e.g. "Chromatin binding" or "Poly(A) sites"

display_in_jbrowse

If "yes" this track will be displayed in JBrowse. Use "no" for tracks that aren't ready. Hidden from users.

show_feature_label

If "yes" the label will be shown beside the feature in the track. Hidden from users.

label

users see: Track label

The track label displayed to the user and also the key used by JBrowse to identify tracks internally. Labels can't include commas (use semicolons to separate parts).

comment

A free-text comment. Ignored by JBrowse.

assayed_gene_product

Only required for chromatin binding data to specify the protein binding to chromatin.

background

users see: Strain background

Any background alleles present in cells for technical purposes; can include mating type, ploidy, markers, etc. For vegetative cells haploid is assumed unless diploid is specified.

e.g. h90 or h+/h+ pat1-114 or h- ura4-D18 leu1-32

alleles

users see: WT or mutant

Notes whether the strain is wild type (WT) or has one or more mutations of experimental interest. WT designation ignores background mutations.

mutant(s)

users see: Mutant alleles

Specify mutant alleles of interest to the experiment.

e.g. prp2-1 or pcr1delta

conditions

e.g. YES, high temperature or glucose MM

growth_phase_or_response

e.g. vegetative growth or meiosis or quiescence or glucose starvation or oxidative stress or heat shock

strand

"forward" or "reverse". The default is no strand.

assay_type

e.g. "DamID / tiling array", "NGS" or "mRNA end sequencing"

first_author

publication_year

pmed_id

PubMed ID if known.

DB

e.g. GEO or ArrayExpress

study_id

e.g. GSE41773 or E-MTAB-1154

sample_id

e.g. GSM1024004 or ERS078453

data_file_type

"bigwig", "rnaseq", "GFF", "bed" or "vcf".

The data files for "rnaseq" should be .bam format.

source_url

The URL of the data file on pombase.org.

Raw files from publications should be placed on oliver1 in a subdirectory of /data/pombase/external_datasets/originals/ like Thodberg_2018_PMID_30566651.

These original files often need to be fixed to have the correct chromosome IDs. Once fixed, they'll need to be processed for use by JBrowse (sorted, compressed and indexed, depending on the format). Once processed they should be copied to a sub-directory of /data/pombase/external_datasets/processed/. For example Thodberg_2018_PMID_30566651.

Name directories following the convention: Author_YYYY_PMID_nnnnnnn_(optional-short-description).

The URL should begin with https://www.pombase.org/external_datasets/, then have the dataset directory name (Thodberg_2018_PMID_30566651), then the file name (GSE110976_TSSs.sorted.bed.gz).

So if the data file has this path:

  • /data/pombase/external_datasets/processed/Thodberg_2018_PMID_30566651/GSE110976_TSSs.sorted.bed.gz

use this URL:

  • https://www.pombase.org/external_datasets/Thodberg_2018_PMID_30566651/GSE110976_TSSs.sorted.bed.gz

(Note that the URL doesn't include processed)

See Formatting-data-files-for-JBrowse for notes on processing files for JBrowse.

Notes

Users submit metadata in a file that collects entries for most of the columns (see https://www.pombase.org/documentation/data-submission-form-for-HTP-sequence-linked-data); we have to add display_in_jbrowse, show_feature_label, and source_url.

Two columns, ensembl_source_name andshort_description, were carried over from the Ensembl Genomes configuration file for a while but have now been deprecated and removed.

Clone this wiki locally