Skip to content

Latest commit

 

History

History
32 lines (24 loc) · 1.3 KB

File metadata and controls

32 lines (24 loc) · 1.3 KB

Cloud Computing Search Engine - MiniGoogle

A Google style web search engine computing Hadoop MapReduce on Amazon EC2 consisting of crawler, indexer, PageRank, and UI. Click Here to View Demo. Spring 2013

Skills

Language: Java
Web: HTML, CSS, Servlet, JSP, jQuery, AJAX
Cloud: Hadoop, MapReduce, Amazon EC2, Amazon EMR, FreePastry
Database: Amazon S3, Berkeley DB

Contribution

  1. Developed a scalable, Google-style crawler that distributed requests across multiple crawling peers over Pastry nodes.
  2. Developed a TF-IDF indexer for inverted index computation and a PageRank engine for link analysis based on MapReduce.
  3. Improved search relevancy by weighting ten ranking parameters, utilizing AJAX feedback and SVM classifier for tuning.
  4. Implemented features for fault tolerance with Berkeley DB revert, RESTful web services with Yahoo, Amazon, YouTube, Yelp, Wiki, MaxMind, EBay API.

About

  • Course: CIS 555, Internet & Web Systems, Spring 2013, University of Pennsylvania
  • Teamwork: Yayang Tian, Michael Collis, Angela Wu, Krishna Choksi

Snapshots

index.html search.html video.html image.html