From 878d781fce084b81b53b941fd2ccf80581df849c Mon Sep 17 00:00:00 2001 From: Jamie Lentin Date: Mon, 3 Nov 2025 10:57:45 +0000 Subject: [PATCH 1/7] README: Assume python executable is called python3 There is very likely no /usr/bin/python set up on the OS nowadays, assuming python3 is more C+P-friendly. --- README.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.markdown b/README.markdown index d889713..ce4e3d3 100644 --- a/README.markdown +++ b/README.markdown @@ -9,7 +9,7 @@ be used to populate a running OneZoom instance. The first step to using this repo is to create a Python virtual environment and activate it: # From the root of the repo, create a Python environment and activate it - python -m venv .venv + python3 -m venv .venv source .venv/bin/activate # Install it From 95f68a43f7671e5638969ae5d1fac97b0eb3eaac Mon Sep 17 00:00:00 2001 From: Jamie Lentin Date: Mon, 3 Nov 2025 11:20:49 +0000 Subject: [PATCH 2/7] Use dots in OT_VERSION, not underscores The OpenTree API returns . and uses . in it's URL, so using a . here too means we can use the environment variable to fetch a given version directly. Otherwise it just seems to be a convention to draftversion${OT_VERSION}.tre naming, so shouldn't upset anything else. --- README.markdown | 2 +- data/OpenTree/README.markdown | 2 +- .../taxon_mapping_and_popularity/CSV_base_table_creator.py | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/README.markdown b/README.markdown index ce4e3d3..e47eef5 100644 --- a/README.markdown +++ b/README.markdown @@ -48,7 +48,7 @@ You can check the most recent version of both the synthetic tree (`synth_id`) an [API](https://github.com/OpenTreeOfLife/germinator/wiki/Open-Tree-of-Life-Web-APIs) e.g. by running `curl -X POST https://api.opentreeoflife.org/v3/tree_of_life/about`. Later in the build, we use specific environment variables set to these version numbers. Assuming you are in a bash shell or similar, you can set them as follows: ``` -OT_VERSION=14_9 #or whatever your OpenTree version is +OT_VERSION=14.9 #or whatever your OpenTree version is OT_TAXONOMY_VERSION=3.6 OT_TAXONOMY_EXTRA=draft1 #optional - the draft for this version, e.g. `draft1` if the taxonomy_version is 3.6draft1 ``` diff --git a/data/OpenTree/README.markdown b/data/OpenTree/README.markdown index b094d95..16d5b6b 100755 --- a/data/OpenTree/README.markdown +++ b/data/OpenTree/README.markdown @@ -10,7 +10,7 @@ Files herein are .gitignored. To get the site working, this folder should contai Removing the `mrca***` labels can be done by using a simple regular expression substitution, as in the following perl command: ``` - # assumes you have defined OT_VERSION as an environment variable, e.g. > OT_VERSION=14_7 + # assumes you have defined OT_VERSION as an environment variable, e.g. > OT_VERSION=14.7 perl -pe 's/\)mrcaott\d+ott\d+/\)/g; s/[ _]+/_/g;' labelled_supertree_simplified_ottnames.tre > draftversion${OT_VERSION}.tre ``` diff --git a/oz_tree_build/taxon_mapping_and_popularity/CSV_base_table_creator.py b/oz_tree_build/taxon_mapping_and_popularity/CSV_base_table_creator.py index 48f9e10..fac5f3c 100755 --- a/oz_tree_build/taxon_mapping_and_popularity/CSV_base_table_creator.py +++ b/oz_tree_build/taxon_mapping_and_popularity/CSV_base_table_creator.py @@ -50,7 +50,7 @@ To test, try e.g. Usage: -OT_VERSION=9_1 +OT_VERSION=9.1 ServerScripts/TaxonMappingAndPopularity/CSV_base_table_creator.py \ ../static/FinalOutputs/Life_full_tree.phy data/OpenTree/ott/taxonomy.tsv \ data/EOL/identifiers.csv data/Wiki/wd_JSON/* data/Wiki/wp_SQL/* \ From b0644d9132a1c43f5ec8211b85bf4f41a7a4065a Mon Sep 17 00:00:00 2001 From: Jamie Lentin Date: Mon, 3 Nov 2025 11:06:08 +0000 Subject: [PATCH 3/7] oz_tree_build/README: Collect required setup Collect together the required setup steps from /README & data/README into oz_tree_build, so it's closer to a single runnable script. * Set environment variables from /README * Summarise/script download steps from data/REAME --- oz_tree_build/README.markdown | 58 +++++++++++++++++++++++++++-------- 1 file changed, 46 insertions(+), 12 deletions(-) diff --git a/oz_tree_build/README.markdown b/oz_tree_build/README.markdown index 7f6820b..18a7393 100755 --- a/oz_tree_build/README.markdown +++ b/oz_tree_build/README.markdown @@ -6,28 +6,62 @@ The instructions below are primarily intended for creating a full tree of all li The output files created by the tree building process (database files and files to feed to the js, and which can be loaded into the database and for the tree viewer) are saved in `output_files`. +## Environment -## Settings - -Assuming that the environment variables OT_VERSION and OT_TAXONOMY_VERSION have already -been set as described in the [main README file](../README.markdown), and the -appropriate data files downloaded as described [here](../data/README.markdown). -the instructions below only require the following environmental variable to be -set up: +The following environment variables should be set: ``` OZ_TREE=AllLife # a tree directory in data/OZTreeBuild OZ_DIR=../OZtree # the path to the OneZoom/OZtree github directory (here we assume the `tree-build` repo is a sibling to the `OZtree` repo) ``` -# Preliminaries +You also need to select the OpenTree version to build against. +You can discover the most recent version of both the synthetic tree (`synth_id`) and the taxonomy (`taxonomy_version`) via the +[API](https://github.com/OpenTreeOfLife/germinator/wiki/Open-Tree-of-Life-Web-APIs): + +```bash +$ curl -s -X POST https://api.opentreeoflife.org/v3/tree_of_life/about | grep -E '"synth_id"|"taxonomy_version"' + "synth_id": "opentree15.1", + "taxonomy_version": "3.7draft2" +``` + +You should then set these as environment variables: + +``` +OT_VERSION=15.1 #or whatever your OpenTree version is +OT_TAXONOMY_VERSION=3.7 +OT_TAXONOMY_EXTRA=draft2 #optional - the draft for this version, e.g. `draft1` if the taxonomy_version is 3.6draft1 +``` + +## Downloads + +Follow the [the download instructions](../data/README.markdown) to fetch required files. In summary, this should entail: + +``` +## Open Tree of Life +wget -cP data/OpenTree/ "https://files.opentreeoflife.org/synthesis/opentree${OT_VERSION}/output/labelled_supertree/labelled_supertree_simplified_ottnames.tre" +wget -cP data/OpenTree/ "https://files.opentreeoflife.org/ott/ott${OT_TAXONOMY_VERSION}/ott${OT_TAXONOMY_VERSION}.tgz" + +## Wikimedia +wget -cP data/Wiki/wp_SQL/ https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-page.sql.gz +wget -cP data/Wiki/wd_JSON/ https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.json.bz2 + +## Pre-processed PageViews - see https://github.com/OneZoom/tree-build/releases +wget -cP data/Wiki/wp_pagecounts/ https://github.com/OneZoom/tree-build/releases/download/pageviews-202306-202403/OneZoom_pageviews-202306-202403.tar.gz + +## EoL +# TODO: In theory fetchable from https://opendata.eol.org/dataset/identifier-map, but currently broken +cp provider_ids.csv.gz data/EOL/ + +``` -Follow [these instructions](../data/README.markdown) to download all required files. Note that as documented in that readme, -you will also need to create a `draftversionXXX.tre` file containing no `mrca` strings, e.g. -via the following in the OpenTree directory +Note that as documented in that readme, +you will also need to create a `draftversionXXX.tre` file containing no `mrca` strings: ``` -perl -pe 's/\)mrcaott\d+ott\d+/\)/g; s/[ _]+/_/g;' labelled_supertree_simplified_ottnames.tre > draftversion${OT_VERSION}.tre +perl -pe 's/\)mrcaott\d+ott\d+/\)/g; s/[ _]+/_/g;' \ + data/OpenTree/labelled_supertree_simplified_ottnames.tre \ + > data/OpenTree/draftversion${OT_VERSION}.tre ``` # Building a tree From f997b80daf2206315827682c9f995a90f041b629 Mon Sep 17 00:00:00 2001 From: Jamie Lentin Date: Mon, 3 Nov 2025 11:07:10 +0000 Subject: [PATCH 4/7] oz_tree_build/README: Reminder to activate venv Someone might have forgotten... --- oz_tree_build/README.markdown | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/oz_tree_build/README.markdown b/oz_tree_build/README.markdown index 18a7393..dc3660e 100755 --- a/oz_tree_build/README.markdown +++ b/oz_tree_build/README.markdown @@ -72,6 +72,13 @@ If you already have your own newick tree with open tree ids on it already, and d ## Create the tree +0. The following steps assume the venv has been activated: + + ``` + . .venv/bin/activate + ``` + + If not created, see installation steps in the [main README](../README.markdown). 1. (20 secs) Use the [OpenTree API](https://github.com/OpenTreeOfLife/germinator/wiki/Synthetic-tree-API-v3) to add OTT ids to any non-opentree taxa in our own bespoke phylogenies (those in `*.phy` or `*.PHY` files). The new `.phy` and `.PHY` files will be created in a new directory within `data/OZTreeBuild/${OZ_TREE}/BespokeTree`, and a symlink to that directory will be created called `include_files` From 9d0a40f0393018bf876abbea8051a4f1270484f2 Mon Sep 17 00:00:00 2001 From: Jamie Lentin Date: Mon, 3 Nov 2025 11:07:58 +0000 Subject: [PATCH 5/7] oz_tree_build/README: Create include_OTTxxx if not already present The script tries to empty it, if there's nothing there it will fall over. As a quick bodge, put something in it before deleting it. --- oz_tree_build/README.markdown | 2 ++ 1 file changed, 2 insertions(+) diff --git a/oz_tree_build/README.markdown b/oz_tree_build/README.markdown index dc3660e..ebc09da 100755 --- a/oz_tree_build/README.markdown +++ b/oz_tree_build/README.markdown @@ -83,6 +83,8 @@ If you already have your own newick tree with open tree ids on it already, and d 1. (20 secs) Use the [OpenTree API](https://github.com/OpenTreeOfLife/germinator/wiki/Synthetic-tree-API-v3) to add OTT ids to any non-opentree taxa in our own bespoke phylogenies (those in `*.phy` or `*.PHY` files). The new `.phy` and `.PHY` files will be created in a new directory within `data/OZTreeBuild/${OZ_TREE}/BespokeTree`, and a symlink to that directory will be created called `include_files` ``` + mkdir -p "data/OZTreeBuild/${OZ_TREE}/BespokeTree/include_OTT${OT_TAXONOMY_VERSION}${OT_TAXONOMY_EXTRA}" + touch "data/OZTreeBuild/${OZ_TREE}/BespokeTree/include_OTT${OT_TAXONOMY_VERSION}${OT_TAXONOMY_EXTRA}/dir" rm data/OZTreeBuild/${OZ_TREE}/BespokeTree/include_OTT${OT_TAXONOMY_VERSION}${OT_TAXONOMY_EXTRA}/* && \ add_ott_numbers_to_trees \ --savein data/OZTreeBuild/${OZ_TREE}/BespokeTree/include_OTT${OT_TAXONOMY_VERSION}${OT_TAXONOMY_EXTRA} \ From fb3093aeb69aa9320e3db1fee6b014a0a647c54a Mon Sep 17 00:00:00 2001 From: Jamie Lentin Date: Mon, 3 Nov 2025 11:08:41 +0000 Subject: [PATCH 6/7] oz_tree_build/README: Untar ott${OT_TAXONOMY_VERSION}.tgz There doesn't seem to be any explicit step to untar the taxonomy, so add one here. --- oz_tree_build/README.markdown | 1 + 1 file changed, 1 insertion(+) diff --git a/oz_tree_build/README.markdown b/oz_tree_build/README.markdown index ebc09da..317fde3 100755 --- a/oz_tree_build/README.markdown +++ b/oz_tree_build/README.markdown @@ -137,6 +137,7 @@ If you already have your own newick tree with open tree ids on it already, and d From the data folder, run the `generate_filtered_files` script: ``` + tar -C data/OpenTree -zxvf data/OpenTree/ott${OT_TAXONOMY_VERSION}.tgz (cd data && generate_filtered_files OZTreeBuild/AllLife/AllLife_full_tree.phy OpenTree/ott${OT_TAXONOMY_VERSION}/taxonomy.tsv EOL/provider_ids.csv.gz Wiki/wd_JSON/latest-all.json.bz2 Wiki/wp_SQL/enwiki-latest-page.sql.gz Wiki/wp_pagecounts/pageviews*.bz2) ``` From e380dd583d435cf072e68ac1f58a1457efc31973 Mon Sep 17 00:00:00 2001 From: Jamie Lentin Date: Mon, 3 Nov 2025 15:28:50 +0000 Subject: [PATCH 7/7] oz_tree_build/README: Need extracted pageviews If we just give it the tarball, then we try and read the tar as file contents. Which nearly works, bar the tarball noise at either end. --- oz_tree_build/README.markdown | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/oz_tree_build/README.markdown b/oz_tree_build/README.markdown index 317fde3..d3336f5 100755 --- a/oz_tree_build/README.markdown +++ b/oz_tree_build/README.markdown @@ -47,7 +47,7 @@ wget -cP data/Wiki/wp_SQL/ https://dumps.wikimedia.org/enwiki/latest/enwiki-late wget -cP data/Wiki/wd_JSON/ https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.json.bz2 ## Pre-processed PageViews - see https://github.com/OneZoom/tree-build/releases -wget -cP data/Wiki/wp_pagecounts/ https://github.com/OneZoom/tree-build/releases/download/pageviews-202306-202403/OneZoom_pageviews-202306-202403.tar.gz +curl -L https://github.com/OneZoom/tree-build/releases/download/pageviews-202306-202403/OneZoom_pageviews-202306-202403.tar.gz | tar -zxC data/Wiki/wp_pagecounts/ ## EoL # TODO: In theory fetchable from https://opendata.eol.org/dataset/identifier-map, but currently broken