Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added .DS_Store
Binary file not shown.
3 changes: 3 additions & 0 deletions Gemfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
source 'https://rubygems.org'
gem 'mechanize', '~>2.7.2', '<2.7.3'
gem 'pry' # because you'll probably need it
44 changes: 44 additions & 0 deletions Gemfile.lock
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
GEM
remote: https://rubygems.org/
specs:
coderay (1.1.1)
domain_name (0.5.20170404)
unf (>= 0.0.5, < 1.0.0)
http-cookie (1.0.3)
domain_name (~> 0.5)
mechanize (2.7.2)
domain_name (~> 0.5, >= 0.5.1)
http-cookie (~> 1.0.0)
mime-types (~> 1.17, >= 1.17.2)
net-http-digest_auth (~> 1.1, >= 1.1.1)
net-http-persistent (~> 2.5, >= 2.5.2)
nokogiri (~> 1.4)
ntlm-http (~> 0.1, >= 0.1.1)
webrobots (>= 0.0.9, < 0.2)
method_source (0.8.2)
mime-types (1.25.1)
mini_portile2 (2.1.0)
net-http-digest_auth (1.4.1)
net-http-persistent (2.9.4)
nokogiri (1.7.1)
mini_portile2 (~> 2.1.0)
ntlm-http (0.1.1)
pry (0.10.4)
coderay (~> 1.1.0)
method_source (~> 0.8.1)
slop (~> 3.4)
slop (3.6.0)
unf (0.1.4)
unf_ext
unf_ext (0.0.7.4)
webrobots (0.1.2)

PLATFORMS
ruby

DEPENDENCIES
mechanize (~> 2.7.2, < 2.7.3)
pry

BUNDLED WITH
1.14.3
9 changes: 8 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,11 @@
# assignment_web_scraper
Bottom feeding your way across the web
Andrea A

[A Ruby-based web scraping and parsing project which uses NokoGiri, Mechanize, Open-uri. From the Viking Code School.](http://www.vikingcodeschool.com)

# Run the code

```
load 'web_scraper.rb'
WebScraper.new.run
```
21 changes: 21 additions & 0 deletions csv_file.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
Ruby on Rails Developer - Ruby Developer - RoR - Back End,Energon Global,London,http://uk.dice.com/IT-Job/Ruby-on-Rails-Developer-Ruby-Developer-RoR-Back-End/9964802/en/?xc=247&SearchTerms=Ruby&LocationSearchTerms=London&LocationId=2047895&Radius=10&source=search,12-Apr-2017,9964802
Ruby Developer - Ruby on Rails - £35-55k,Spectrum IT,London,http://uk.dice.com/IT-Job/Ruby-Developer-Ruby-on-Rails-3555k/9964999/en/?xc=247&SearchTerms=Ruby&LocationSearchTerms=London&LocationId=2047895&Radius=10&source=search,12-Apr-2017,9964999
"Head of Development (Java, AI, Python, Ruby, IOS - Development)",Ampersand Consulting,London,http://uk.dice.com/IT-Job/Head-of-Development-Java-AI-Python-Ruby-IOS-Development/9978119/en/?xc=247&SearchTerms=Ruby&LocationSearchTerms=London&LocationId=2047895&Radius=10&source=search,04-May-2017,9978119
Ruby Developer ( Mobile Cloud MySQL ),Client ServerClient Server,London London,http://uk.dice.com/IT-Job/Ruby-Developer-Mobile-Cloud-MySQL-/9973773/en/?xc=247&SearchTerms=Ruby&LocationSearchTerms=London&LocationId=2047895&Radius=10&source=search,27-Apr-201724-Apr-2017,9973773
Ruby Developer - Mobile Marketing,Client Server,London,http://uk.dice.com/IT-Job/Ruby-Developer-RoR-TDD-Web-CICD/9970528/en/?xc=247&SearchTerms=Ruby&LocationSearchTerms=London&LocationId=2047895&Radius=10&source=search,20-Apr-2017,9970528
"Ruby/Rails Developer - Linux, PostgreSQL, Ruby on Rails, GIT",TEKsystemsClient Server,Surrey London,http://uk.dice.com/IT-Job/Ruby-Developer-Mobile-Marketing/9969124/en/?xc=247&SearchTerms=Ruby&LocationSearchTerms=London&LocationId=2047895&Radius=10&source=search,06-May-201713-Apr-2017,9969124
Senior Ruby on Rails Developer - Circa £70k,Spectrum IT,London,http://uk.dice.com/IT-Job/RubyRails-Developer-Linux-PostgreSQL-Ruby-on-Rails-GIT/9962615/en/?xc=247&SearchTerms=Ruby&LocationSearchTerms=London&LocationId=2047895&Radius=10&source=search,03-May-2017,9962615
Ruby on Rails Developer - London - Circa £70k,Spectrum IT,London,http://uk.dice.com/IT-Job/Ruby-Developer-Senior-RoR-SQL-Linux/9965708/en/?xc=247&SearchTerms=Ruby&LocationSearchTerms=London&LocationId=2047895&Radius=10&source=search,26-Apr-2017,9965708
Ruby Automation Tester- Mobile App Automation Engineer- Calabash- London,Opus Recruitment,London,http://uk.dice.com/IT-Job/Senior-Ruby-on-Rails-Developer-Circa-70k/9977179/en/?xc=247&SearchTerms=Ruby&LocationSearchTerms=London&LocationId=2047895&Radius=10&source=search,07-May-2017,9977179
Senior Ruby on Rails Developer - London - Circa £70k,Spectrum IT,City of London,http://uk.dice.com/IT-Job/Ruby-on-Rails-Developer-London-Circa-70k/9973098/en/?xc=247&SearchTerms=Ruby&LocationSearchTerms=London&LocationId=2047895&Radius=10&source=search,18-Apr-2017,9973098
Software Developer - Ruby on Rails - £35-55k,Spectrum IT,London,http://uk.dice.com/IT-Job/Ruby-Automation-Tester-Mobile-App-Automation-Engineer-Calabash-London/9972324/en/?xc=247&SearchTerms=Ruby&LocationSearchTerms=London&LocationId=2047895&Radius=10&source=search,10-Apr-2017,9972324
"Python Backend Engineer (Engineer, Ruby, Python)",Winton,"London, UK",http://uk.dice.com/IT-Job/Senior-Ruby-on-Rails-Developer-London-Circa-70k/9967174/en/?xc=247&SearchTerms=Ruby&LocationSearchTerms=London&LocationId=2047895&Radius=10&source=search,07-May-2017,9967174
Web Developer - Ruby on Rails - £35-55k,Spectrum IT,London,http://uk.dice.com/IT-Job/Software-Developer-Ruby-on-Rails-3555k/9963426/en/?xc=247&SearchTerms=Ruby&LocationSearchTerms=London&LocationId=2047895&Radius=10&source=search,02-May-2017,9963426
Web Developer - Ruby on Rails - £35-55k,Spectrum IT,London,http://uk.dice.com/IT-Job/Python-Backend-Engineer-Engineer-Ruby-Python/9970455/en/?xc=247&SearchTerms=Ruby&LocationSearchTerms=London&LocationId=2047895&Radius=10&source=search,20-Apr-2017,9970455
Mid-Level Ruby Developer,Opus Recruitment,London,http://uk.dice.com/IT-Job/Web-Developer-Ruby-on-Rails-3555k/9975898/en/?xc=247&SearchTerms=Ruby&LocationSearchTerms=London&LocationId=2047895&Radius=10&source=search,07-May-2017,9975898
Ruby on Rails Developer - Guildford,PS Computer ServicesThornton GregoryOpus Recruitment,"Guildford, Surrey London London",http://uk.dice.com/IT-Job/Web-Developer-Ruby-on-Rails-3555k/9969147/en/?xc=247&SearchTerms=Ruby&LocationSearchTerms=London&LocationId=2047895&Radius=10&source=search,04-May-201702-May-201707-May-2017,9969147
Junior Developer Java JavaScript Ruby - FinTech,Client Server,London,http://uk.dice.com/IT-Job/MidLevel-Ruby-Developer/9972252/en/?xc=247&SearchTerms=Ruby&LocationSearchTerms=London&LocationId=2047895&Radius=10&source=search,12-Apr-2017,9972252
Ruby Developer: Creative Problem-Solver in Code - Elixir training prov,James Chase SolutionsBentley Whitaker,London Central London,http://uk.dice.com/IT-Job/Ruby-on-Rails-Developer-Guildford/9978013/en/?xc=247&SearchTerms=Ruby&LocationSearchTerms=London&LocationId=2047895&Radius=10&source=search,06-May-201707-Apr-2017,9978013
Full Stack Ruby on Rails Developer - Blockchain,Harvey Nash,City of London City of London City of London London,http://uk.dice.com/IT-Job/Ruby-on-Rails-Developer-Contract-London/9976052/en/?xc=247&SearchTerms=Ruby&LocationSearchTerms=London&LocationId=2047895&Radius=10&source=search,02-May-201702-May-201702-May-201702-May-2017,9976052
Lead Full Stack Ruby on Rails Developer-Blockchain,G2 Recruitment Solutions,City of London London,http://uk.dice.com/IT-Job/Automation-Test-Analyst-Selenium-Ruby-Calabash-Java-London-Up-to-70000/9972334/en/?xc=247&SearchTerms=Ruby&LocationSearchTerms=London&LocationId=2047895&Radius=10&source=search,24-Apr-201713-Apr-2017,9972334
Linux Engineer / DevOps Engineer,People Source Consulting,London,http://uk.dice.com/IT-Job/Junior-Developer-Java-JavaScript-Ruby-FinTech/9964836/en/?xc=247&SearchTerms=Ruby&LocationSearchTerms=London&LocationId=2047895&Radius=10&source=search,20-Apr-2017,9964836
14 changes: 14 additions & 0 deletions csv_writer.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@

class CsvWriter

def create_file(results)
# it turns on Append Mode so you don't overwrite
# your own scrape file
j = 0
CSV.open('csv_file.csv', 'a') do |csv|
results.each do | item|
csv << item
end
end
end
end
70 changes: 70 additions & 0 deletions web_scraper.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Web Scraper to search job postings
require 'rubygems'
require 'bundler/setup'
require 'mechanize'
require 'csv'
require_relative 'csv_writer'

Job = Struct.new(:title, :company, :location, :link, :post_date, :job_id )

class WebScraper

attr_accessor :results, :scraper, :page, :csv_file

def initialize
# Instantiate a new Mechanize
@scraper = Mechanize.new

# Grab and parse our page in one step
@page = @scraper.get('http://uk.dice.com/')
@results = []

# this gives your Mechanize object
# an 0.5 second wait time after every HTML request
# Don't forget it!!!
@scraper.history_added = Proc.new { sleep 0.5 }
@csv_file = CsvWriter.new
end

def run
create_search
extract_job_details
@csv_file.create_file(@results)
# Print out the page using the "pretty print" command
pp @results
end

def create_search
dice_form = @page.form
# Enter the search terms and submit the form
dice_form.SearchTerms = "Ruby"
dice_form.LocationSearchTerms = "London"
dice_form.Radius = 10
dice_form.checkbox_with(:name => 'JobTypeFilter_2').check
button = dice_form.button_with(:value => "Search")
# Actually submit the form
@page = @scraper.submit(dice_form, button)
end


def extract_job_details
links = @page.parser.css("h2.standardLink").children

@page = @page.parser.css("div#SearchResults").text.strip
@page = @page.split("\"MESSAGE.ADVERT_SHORTLIST_COUNT_ALERT\" NOT FOUND\n\n\n\n\n\n\n\n\n\n\n")

i = 0
@page.each do |job|
listing = Job.new
listing.title = job.scan(/\A(.*)\n\nSalary/).join.strip
listing.company = job.scan(/Advertiser\n\n(.*)/).join.strip
listing.location = job.scan(/Location:\n(.*)/).join.strip
listing.link = links[i].attributes['href'].value
listing.post_date = job.scan(/Last Updated Date\n\n(.*)/).join.strip
job_link = links[i].attributes['id'].value
listing.job_id = job_link.scan(/TITLE\[([0-9]+)\]/).join.strip
@results << listing
i += 1
end
end
end