Skip to content

Latest commit

 

History

History
105 lines (87 loc) · 7.28 KB

File metadata and controls

105 lines (87 loc) · 7.28 KB

Chapter 1: Overview and Frequently Asked Questions

What is Behavioral Analytics Starter Kit?

Behavioral Analytics Starter Kit is a cloud lab for developers where they can learn how to create powerful behavioral analytics applications using Hadoop and Mahout, and deploy them on Amazon Cloud with Qubell Adaptive PaaS. This starter kit presents a complete example of a common application of behavioral analytics - a Product Recommendation engine for an eCommerce store.

This Starter Kits has everything one needs to:

  • Understand how Recommendation Engine works
  • Walk through all aspects of designing a simple recommendation engine from scratch for a website that didn't have recommendations
  • Learn how to configure, deploy and monitor Hadoop cluster
  • Experiment with the recommendation engine’s behavior by changing the default product catalog, default transaction history or the default configuation of the analytical jobs
  • Explore how the system can be improved and extended by bringing your own catalog, integrating with a different web store, or writing a completely different analytics algorithms

Why is it called Behavioral Analytics Starter Kit, not Recommendations Engine Starter Kit?

We hope to use this kit to demonstrate, in a very hands-on way, how to write, deploy and manage a wide class of analytics applications. For the purposes of demonstration we chose a particular application, that is both common and not too complex to develop and execute. While much of our discussion is focused on how this specific recommendation engine works, the principles behind it are applicable to a wide range of analytics applications. We consider this kit a suitable starting point for many of them.

No prior experience with Hadoop or Qubell is required

This Starter Kit is designed to teach people how to write analytics applications and provide all necessary materials in the kit. You don’t need to be proficient in Hadoop in order to start. The only core requirement is strong working knowledge of Java.

What to bring with you

We will provide you with all the code, data, frameworks, tools and technologies needed to deploy, execute and modify all elements of the starter kit. All third party softwares used in the kit, with exception of two services listed below, are available under open source license and distributed with the kit. We will provide references to the external documentation for each tool used. You will need to bring two things with you:

  • A account on Qubell.com, which can be opened for free here
  • An account on Amazon EC2 and S3 which can be opened here

Who wrote this Starter Kit and why?

This Behavioral Analytics Starter Kit was developed by Grid Dynamics, in partnership with Qubell, to promote understanding of how to design, implement, deploy and support modern analytics applications, such as Product Recommendations.

How does this Kit work?

Behavioral Analytics Starter Kit has three main parts:

  1. A web store that sells consumer products over the Internet. It is written in java based on open source Broadleaf framework. Out of the box, Broadleaf framework doesn’t have product recommendations. We added them as a part of the check-out process. The product recommendation logic will match the items in the shopping card with other products often bought together and suggest additional items if they are found. This web store operates on three data sets that are relevant to the recommendation system:
  • Product catalog- The catalog included with our starter kit is not native to Broadleaf. We took a sample product catalog distributed with another open source eCommerce platform called Magento and used it as a base because it contains enough items to make for a good example.
  • Transaction log- In the real world, transaction log is recorded by the web store based on the actual purchases made by the customers. Recommendation algorithm finds patterns in that transaction log and extracts these patterns to identify and recommend related products. In our sample store, there are no real users. So, the transaction log is generated by a Hadoop job based on the product catalog and a set of the configuration choices for the generator’s logic provided by the developer.
  • Product recommendations- These are generated by the Recommendation Engine based on the transaction log and a set of configuration choices for the recommendation algorithms provided by the developer.
  1. A Recommendation Engine that produces product recommendations for the online store based on its transactional log. This recommendation engine consists of a Hadoop cluster, transaction log generator and recommendation processor written with Apache Mahout.

  2. A deployment and configuration management automation platform based on Qubell Adaptive PaaS technology. This platform does the heavy lifting of automatically deploying new instances of the web store to Amazon on a click of a button, provision the recommendation engine including Hadoop, HDFS and Mahout to Amazon EC2 and S3, run Hadoop jobs to generate new transaction logs and recommendations, load new product recommendations to the web store, run Ganglia’s monitoring agents, and execute all other automated administrative activities.

Workflow Diagram

Workflow Diagram

System Architecture

![System Architecture](/Images/cloud diagram.png)

Main Use Cases

Behavioral Analytics Starter Kit supports the following main use cases out of the box:

  • Deploy web store
  • See how recommendations work on the web store
  • Deploy recommendation engine
  • Review recommendation algorithm
  • Generate synthetic transaction log
  • Generate product recommendations based on transaction log
  • Push new recommendations to the web store
  • Use monitoring tool Ganglia to monitor the cluster

How can I go beyond the Starter Kit to write production-ready analytics?

Our starter kit is designed to give you a quick jump-start to develop your own analytics systems. Some common ideas to extending this kit to cover your needs include:

  • Integrate this analytics backend with your own web store
  • Replace Amazon cloud with production infrastructure, including your own servers
  • Write different analytics applications on top of self-deployable Hadoop infrastructure

How can I get help taking this Starter Kit to production systems?

Starter Kit is provided under the Apache 2.0 license. Although this kit comes without a support contract, you can report bugs, request FREs and ask questions here and Kits development team will get back to you as soon as possible.

Next Chapter: Chapter 2- Getting Started