This repository was archived by the owner on Sep 14, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
Proof-of-concept of code to make the FracFocus chemical disclosures into a usuable database.
gwallison/FF-POC
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
README for FF-POC repository and project
This CodeOcean capsule is a Proof of Concept version of code to transform
the online chemical disclosure site for hydraulic fracturing, FracFocus.org,
into a usable database. The code demonstrates cleaning, filtering, and
curating techniques to yield organized data sets and sample analyses
from a notoriously messy collection of chemical records.
The sample analyses are available in the results section as jupyter notebooks
and downloadable versions of the final data are also available there.
For a majority of the records, the mass of the chemicals is calculated.
(The FracFocus data used were downloaded June 25, 2019).
To be included in final data sets,
Fracking events must use water as carrier and percentages must be
consistent and within tolerance.
Chemicals must be identified by a match with an authoritative CAS number
or be labeled proprietary.
Further, portions of the raw data that are filtered out include:
- fracking events with no chemical records (mostly 2011-May 2013).
- fracking events with multiple entries (and no indication which entries
are correct).
- chemical records that are identified as redundant within the event.
Finally, I clean up some of the labeling fields by consolidating multiple
versions of a single category into an easily searchable name. For instance,
I collapse the 80+ versions of the supplier 'Halliburton' to a single name.
By removing or cleaning the difficult data from this unique data source,
I produce a data set that should facilitate more in-depth
analyses of chemical use in the fracking industry.
****** Version explanation ******
Version 5: adjusted the formatting in a number of the figures in the jupyter
notebooks to better display x and y axes and, especially, improve the
log-based displays. Some text in those notebooks was changed to reflect
the changes in the figures.
Version 4: added a jupyter notebook that finds the overlap between the
filtered FF dataset and the TEDX endocrine disruptor list. That generated
list is deposited in the results section as well as html of the notebook.
Version 3: corrected a mislabeled figure (the first one) in the
Summary_of_cleaned_FF_data notebook.About
Proof-of-concept of code to make the FracFocus chemical disclosures into a usuable database.
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published