Skip to content

msnoshain/OilPrice-DataProcess

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Oil Price Data Process (in Python)

Raw data is daily oil price from 2017.10.24 to 2022.10.24 with some loss in some days. Here are steps to process the data:


1. 🔢 Interpolate data with cubic

Interpolate data using scipy.interpolate package with cubic. See code here. See output file here.


2. 📰 Get titles and detailed page links of all the News from 2017.10.24 to 2022.10.24

Get HTML contents from OilPrice by requests package and parse HTML by BeautifulSoup 4. Use pandas to export .xls files. See code here. See output file here.


3. 📓 Get contents of all the links

Get all the news contens and save each as a single file in /News, which named [index].txt. See code here. See output files here.


4. 🧹 Clean the data

Clean the text. Replace all the characters that are not lowercase letters with space. Split texts with space and trim space of each word. See code here.

5. 🧮 Word frequecy calculation

Calculate each word's frequecy of each day. Export to .csv file. See code here. See output file here.


6. ➡️ Sort columns by words

Sort columns by words. See code here. See output file here.


About

Oil price data , Energy -> Oil-Prices News contents and word frequency statistic from 2017.10 to 2022.10

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages