diff --git a/.gitignore b/.gitignore index 497e825..42d27dd 100644 --- a/.gitignore +++ b/.gitignore @@ -4,6 +4,3 @@ .httr-oauth .DS_Store ramptools.Rproj -*.html -*.Rd -docs diff --git a/R/make_maps.R b/R/make_maps.R new file mode 100644 index 0000000..e69de29 diff --git a/docs/404.html b/docs/404.html new file mode 100644 index 0000000..1e9ceac --- /dev/null +++ b/docs/404.html @@ -0,0 +1,115 @@ + + +
+ + + + +YEAR: 2023 +COPYRIGHT HOLDER: ramptools authors ++ +
Copyright (c) 2023 ramptools authors
+Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+cloud.RmdThe pipeline stores data in Google BigQuery under project
+uganda-malaria, dataset uga_facility_data.
Tables:
+| Table | +Description | +
|---|---|
raw_{frequency}_data |
+Append-only versioned raw DHIS pulls | +
raw_{frequency}_version_metadata |
+Provenance metadata per version | +
clean_{frequency}_data |
+Latest clean aggregated data (overwritten each run) | +
imputed_{frequency}_facility_data |
+Facility-level imputed data (overwritten each run) | +
To initialize:
+
+library(ramptools)
+bq_init_tables(frequency = "both")installation.RmdTo install ramptools, you will need the
+devtools package.
+install.packages("devtools")
+devtools::install_github("dd-harp/ramptools")Shared R package for the Uganda RAMP (Routine Assessment of Malaria Programs) project. Provides metadata tables, shapefiles, and database utilities used by the ETL pipeline (uga-etl-facility-data) and analytics repos (outbreak). An R package providing utility functions and reference data for working with health information systems databases, particularly DHIS2 (District Health Information System 2) data from Uganda.
ramptools simplifies working with versioned health data by providing:
get_latest_version() - Retrieve the most recent data version from a databaseget_version_metadata() - Access metadata for specific data versionsget_data() - Query versioned data with filtering by ID variables and versionget_db_diff() - Identify new or changed data compared to existing database recordsget_id_vars() / get_value_var() - Discover database schema informationget_period_range() - Generate sequences of DHIS2-formatted periods (weekly/monthly)make_week_map() - Create mappings between ISO weeks and datesmake_month_map() - Create mappings between months and datesget_output_dir() - Create versioned output directories with standardized naming (YYYY_MM_DD.VV)get_latest_output_date_index() - Find the latest version index for a given datemake_human_readable() - Merge human-readable location and indicator names onto raw DHIS2 dataThe package includes reference data for Uganda:
+ + +
+# install.packages("devtools")
+devtools::install_github("dd-harp/ramptools")| Object | +Description | +
|---|---|
loc_table |
+Location hierarchy — 11,229 facilities and admin units with parent-child relationships (Uganda → Region → District → DLG → Subcounty → Facility) | +
health_facility_table |
+8,676 health facilities with ownership, status, coordinates, facility type | +
indicator_table |
+~100 DHIS2 data element mappings (DHIS ID → code_name, display_name, frequency, dhis_version) | +
district_pop |
+District-level population estimates | +
age_sex_table |
+Age-sex disaggregated indicator definitions | +
uga_district_shp |
+District-level shapefile | +
uga_subcounty_shp |
+Subcounty-level shapefile | +
uga_region_shp |
+Region-level shapefile | +
uga_water_shp |
+Water body geometries | +
get_data() — Read versioned data from a SQLite databaseget_db_diff() — Compare new pull against stored DB, return only new/changed rowsget_latest_version() — Get latest version number from DBget_version_metadata() — Get metadata for a specific versionget_id_vars() / get_value_var() — Introspect database schemamake_human_readable() — Join DHIS IDs to human-readable namesbq_connect() — Create a BigQuery connectionbq_get_data() — Read raw data from BigQuery (with version/filter support)bq_get_clean_data() — Read clean aggregated data from BigQuerybq_get_db_diff() — Diff new data against BigQuerybq_get_latest_version() — Get latest version from BigQuerybq_append_raw_data() / bq_append_version_metadata() — Append to BigQuerybq_write_clean_data() / bq_write_imputed_data() — Overwrite clean outputsbq_init_tables() — Initialize BigQuery schemaget_period_range() — Generate DHIS-formatted period vectorsmake_week_map() / make_month_map() — Date lookup tables for DHIS periodsget_output_dir() — Create versioned output directoriesThe pipeline stores data in Google BigQuery under project uganda-malaria, dataset uga_facility_data. Tables:
| Table | +Description | +
|---|---|
raw_{frequency}_data |
+Append-only versioned raw DHIS pulls | +
raw_{frequency}_version_metadata |
+Provenance metadata per version | +
clean_{frequency}_data |
+Latest clean aggregated data (overwritten each run) | +
imputed_{frequency}_facility_data |
+Facility-level imputed data (overwritten each run) | +
To initialize:
+
+library(ramptools)
+bq_init_tables(frequency = "both")devtools::install_github(“yourusername/ramptools”) ```
+ + + +age_sex_table.RdAge-sex indicator table
+age_sex_tableAn object of class data.table (inherits from data.frame) with 14 rows and 4 columns.
bq_append_raw_data.RdAppend raw data to BigQuery
+bq_append_raw_data(dt, con = NULL, frequency = "monthly", chunk_size = 500000L)bq_append_version_metadata.RdWrite version metadata to BigQuery
+bq_append_version_metadata(version_df, con = NULL, frequency = "monthly")bq_connect.RdRead/write functions for interacting with Google BigQuery as the + primary data backend for DHIS2 facility data. +Create a BigQuery connection
+bq_connect(
+ project = "uganda-malaria",
+ dataset = "uga_facility_data",
+ billing = project
+)A DBI connection object
+bq_get_clean_data.RdRead clean aggregated data from BigQuery
+bq_get_clean_data(
+ con = NULL,
+ frequency = "monthly",
+ code_names = NULL,
+ levels = NULL
+)data.table with clean aggregated data
+bq_get_data.RdRetrieves the latest version of each observation, analogous to
+get_data for SQLite.
bq_get_data(
+ con = NULL,
+ frequency = "monthly",
+ id_list = NULL,
+ version_id = NULL
+)data.table with the requested data
+bq_get_db_diff.RdCompares new_data against what is currently stored and returns
+only the diff (new or changed values).
bq_get_db_diff(new_data, con = NULL, frequency = "monthly")data.table of rows that are new or have changed values
+bq_get_imputed_sample.RdReturns a random sample of rows from the imputed facility data table. +Useful for computing outlier summaries without reading the full table.
+bq_get_imputed_sample(con = NULL, frequency = "monthly", n = 500000L)data.table with sampled imputed data
+bq_get_latest_version.RdGet the latest version number from BigQuery
+bq_get_latest_version(con = NULL, frequency = "monthly")Integer value of the latest version
+bq_get_outbreak_data.RdRead pre-computed outbreak index data from BigQuery
+bq_get_outbreak_data(con = NULL)data.table with outbreak index data
+bq_get_version_metadata.RdGet version metadata from BigQuery
+bq_get_version_metadata(con = NULL, frequency = "monthly", version_id = NULL)data.table with version metadata
+bq_init_tables.RdCreates the required tables in BigQuery if they don't exist. Run once during +initial setup.
+bq_init_tables(con = NULL, frequency = "both")bq_write_clean_data.RdOverwrites the clean data table with the latest processed output.
+bq_write_clean_data(
+ dt,
+ con = NULL,
+ frequency = "monthly",
+ chunk_size = 500000L,
+ append_mode = FALSE
+)bq_write_imputed_data.RdWrite imputed facility data to BigQuery (full replace)
+bq_write_imputed_data(
+ dt,
+ con = NULL,
+ frequency = "monthly",
+ chunk_size = 500000L,
+ append_mode = FALSE
+)bq_write_outbreak_data.RdOverwrites the outbreak_data table with the latest computed outbreak
+indices. Designed to be called after clean/aggregate to keep the Shiny app
+read-only.
bq_write_outbreak_data(dt, con = NULL)district_pop.RdDistrict population data
+district_popAn object of class data.table (inherits from data.frame) with 146 rows and 2 columns.
get_data.RdGet the data from a database with the option to subset to a specified version and id values
+get_data(db_path, id_list = NULL, version_id = NULL)get_db_diff.RdGet the data from new data that differs from what is present in the database
+get_db_diff(new_data, db_path)data.frame with the data from new data that differs from what is present in the database
+get_id_vars.RdGet the names of the variables that uniquely identify an observation
+get_id_vars(db_path)Vector of column names that uniquely identify an observation
+get_latest_output_date_index.Rddirectories are assumed to be named in YYYY_MM_DD.VV format with sane +year/month/date/version values.
+get_latest_output_date_index(dir, date)largest version in directory tree or 0 if there are no version OR +the directory tree does not exist
+get_output_dir.RdReturns an appropriate path to save results in, creating it if necessary.
+get_output_dir(root, date)get_period_range.RdGet the a vector of periods follow DHIS formatting that spans that start and end period provided
+get_period_range(frequency, year_start, sub_year_start, year_end, sub_year_end)A vector of periods following the DHIS formatting standard of either YYYYW{week number with no leading zero} or YYYY{month number with a leading zero for single digit integers}
+get_version_metadata.RdGet metadata for a specific version
+get_version_metadata(db_path, version_id = NULL)Integer value of latest version
+health_facility_table.RdA comprehensive list of health facilities and administrative hierarchies, +including ownership, operational status, and geographic coordinates.
+health_facility_tableA data frame with 8676 rows and 22 columns:
Name of Health facility
DHIS2 assigned ID for health facility
DHIS2 assigned ID of parent health facility (subcounty)
DHIS2 assigned hierarchical level
The governing body or authority responsible for the facility
The original classification level of the health facility
The medical bureau affiliation (e.g., UPMB, UCMB)
Current status of the facility (e.g., Functional, Non-Functional)
Type of ownership (e.g., Public, Private, PNFP)
Indicator for private sector classification (PFP, PNFP)
Indicator for public sector classification (e.g. MOH,BOU, Local Government)
Current status of data reporting compliance
String of DHIS2 IDs from current health facility to the top-level parent
Name of region associated with health facility
Name of district associated with health facility
Name of District Local Government (DLG) associated with health facility
Name of subcounty associated with health facility
Geographic coordinate: North-South position
Geographic coordinate: East-West position
Indicator if geometry was added through external web-scraping and not present in DHIS2
Secondary facility level classification (clinic, hospital, drugshop, other)
Secondary facility label created to match other surveys on health seeking (private, gov_HC, private_hosp, drugshop, gov_hosp, other)
+ Metadata+Data Objects + |
+ |
|---|---|
| + + | +Location hierarchy table |
+
| + + | +Health Facility and Administrative Units Table |
+
| + + | +DHIS indicator table |
+
| + + | +District population data |
+
| + + | +Age-sex indicator table |
+
+ Data Utilities+Utilities to Work with Metadata + |
+ |
| + + | +get the latest index for given an output dir and a date |
+
| + + | +Get output directory for results to save in |
+
| + + | +Get the a vector of periods follow DHIS formatting that spans that start and end period provided |
+
| + + | +Make a table that provides the date information associated with a DHIS period |
+
| + + | +Make a table that provides the date information associated with a DHIS period |
+
+ Shapefiles+GIS Shapefiles + |
+ |
| + + | +Subcounty shapefile version2.0 |
+
| + + | +District shapefile version1.0 |
+
| + + | +region shapefile version1.0 |
+
| + + | +Water shapefile version1.0 |
+
+ SQLite Utilities+Data Objects + |
+ |
| + + | +Get the data from a database with the option to subset to a specified version and id values |
+
| + + | +Get the data from new data that differs from what is present in the database |
+
| + + | +Get the latest version number from the database |
+
| + + | +Get metadata for a specific version |
+
| + + | +Get the names of the variables that uniquely identify an observation |
+
| + + | +Get the name of the value column |
+
| + + | +Merge on human readable columns to raw data |
+
+ Cloud Utilities+Utilities to Access the Data Warehouse + |
+ |
| + + | +BigQuery Utilities for RAMP Data Pipeline |
+
| + + | +Get the latest version number from BigQuery |
+
| + + | +Get version metadata from BigQuery |
+
| + + | +Read raw DHIS data from BigQuery |
+
| + + | +Get new/changed rows not yet in BigQuery |
+
| + + | +Read clean aggregated data from BigQuery |
+
| + + | +Sample imputed facility data from BigQuery |
+
| + + | +Append raw data to BigQuery |
+
| + + | +Write version metadata to BigQuery |
+
| + + | +Write clean aggregated data to BigQuery (full replace) |
+
| + + | +Write imputed facility data to BigQuery (full replace) |
+
| + + | +Initialize BigQuery tables for the DHIS data pipeline |
+
| + + | +Write pre-computed outbreak index data to BigQuery |
+
| + + | +Read pre-computed outbreak index data from BigQuery |
+
| + + | +ramptools: RAMP Uganda Malaria Data Tools |
+
indicator_table.RdDHIS indicator table
+indicator_tableAn object of class data.table (inherits from data.frame) with 118 rows and 6 columns.
loc_table.RdA list of all health facilities and administrative units, and their parent-child relationships
+loc_tableA data frame with 11219 rows and 9 columns:
Name of Geographic Entity
DHIS2 assigned ID for geographic entity
DHIS2 assigned ID of parent geographic entity
DHIS2 assigned hierarchical level
string of DHIS2 IDs from current geographic entity to Uganda as a whole
name of region associated with geographic entity. Blank is entity is region
name of district associated with geographic entity. Blank is entity is district
name of DLG (District Level Government) associated with geographic entity. Blank is entity is DLG
name of subcounty associated with geographic entity. Blank is entity is subcounty
make_month_map.RdMake a table that provides the date information associated with a DHIS period
+make_month_map(min_date = "2013-01-01")A table with the period in DHIS format, year, month, date_start, date_mid, and date_end
+make_week_map.RdMake a table that provides the date information associated with a DHIS period
+make_week_map(min_date = "2012-12-31")A table with the period in DHIS format, year, week, date_start, date_mid, and date_end
+ramptools-package.RdShared metadata tables and utility functions for the Uganda RAMP (Routine Assessment of Malaria Programs) project. Provides location hierarchies, indicator mappings, shapefiles, and database read/write utilities for both local SQLite and Google BigQuery backends.
+Useful links:
uga_district_shp.RdDistrict shapefile version1.0
+uga_district_shpAn object of class sf (inherits from tbl_df, tbl, data.frame) with 146 rows and 3 columns.
uga_region_shp.Rdregion shapefile version1.0
+uga_region_shpAn object of class sf (inherits from tbl_df, tbl, data.frame) with 15 rows and 3 columns.
uga_subcounty_shp.RdAn R readable, valid, shapefile with subcounty shapes combatible with administrative +hierarchies in loc_table
+uga_subcounty_shpA shapefile with 2204 rows and 8 variables
Name of Subcounty
DHIS2 assigned ID of Subcounty
geometry of subcounty (polygon or multipolygon)
indicator flag if the geometry is shared with another subcounty and cannot be resolved
indicator flag that geometry needs to be split, but has not been done
indicator flag if the subcounty shape is unknown and has therefore been assigned the district shape
Name of parent district of subcounty
uga_water_shp.RdWater shapefile version1.0
+uga_water_shpAn object of class sf (inherits from tbl_df, tbl, data.frame) with 217 rows and 2 columns.
The pipeline stores data in Google BigQuery under project
+uganda-malaria, dataset uga_facility_data.
Tables:
+| Table | +Description | +
|---|---|
raw_{frequency}_data |
+Append-only versioned raw DHIS pulls | +
raw_{frequency}_version_metadata |
+Provenance metadata per version | +
clean_{frequency}_data |
+Latest clean aggregated data (overwritten each run) | +
imputed_{frequency}_facility_data |
+Facility-level imputed data (overwritten each run) | +
To initialize:
+ +To install ramptools, you will need the
+devtools package.