Lead Scoring using Regression with Random Forest

Author : Rajdeep Dua
Last Updated : Sep 28 2016

Introduction

In this example we will learn how to use Random Forest based Regression algorithm for Leading Scoring with PredictionIO.

Template Code : https://github.com/rajdeepd/template-scala-parallel-leadscoring

Download the Template Code

Lead Scoring template has to be downloaded to /home/ubuntu/work/PredictionIO-0.12.0-incubating/

cd ~/PredictionIO-0.12.0-incubating
git clone template-scala-parallel-leadscoring

Event Server

Go the home directory of PredictionIO distribution /home/ubuntu/work/PredictionIO-0.12.0-incubating/

  1. Start the Event Server

    ./bin/pio eventserver &
    

    You can check status of the event server by going to the following url http://localhost:7070

    ../_images/pio-eventserver-status.png
  2. Create a new App by giving an access key. You can generate your own access key here https://www.uuidgenerator.net/version4

    cd template-scala-parallel-leadscoring
    ../bin/pio app new lead-scoring --access-key b3b06bc9-edbb-4c4d-9adb-a69dcccb4327
    

    Note that the app name is lead-scoring.

    Output

     [INFO] [App$] Initialized Event Store for this app ID: 7.
    [INFO] [Pio$] Created a new app:
    [INFO] [Pio$]       Name: lead-scoring
    [INFO] [Pio$]         ID: 7
    [INFO] [Pio$] Access Key: b3b06bc9-edbb-4c4d-9adb-a69dcccb4327
    
  3. Export the Access Key into an Environment Variable

    export ACCESS_KEY=b3b06bc9-edbb-4c4d-9adb-a69dcccb4327
    
  4. Data used for Lead Scoring

    We are planning to use the following DataSet in the data folder.

  • 10 Users
  • 50 Items
  • 20 Pages
  • 10 Referral IDs

View Data:

User, lands on page, referrer, browser
u4,example.com/page13,referrer3.com, Safari
u5,example.com/page13,referrer5.com, Internet Explorer
u8,example.com/page14,referrer9.com, Safari
u1,example.com/page18,referrer6.com, Firefox
u4,example.com/page17,referrer8.com, Internet Explorer
u3,example.com/page20,referrer1.com, Chrome
u10,example.com/page14,referrer6.com, Safari

Buy Data:

User, buys items
u4,i13
u4,i12
u4,i50
u3,i20
u3,i18
u10,i29
u10,i15
u10,i4
u1,i44
u1,i3
u1,i31
  1. Import Data into Event Server

    python data/import_eventserver.py --access_key $ACCESS_KEY
    

    If your data is successfully imported the output will be similar to the one listed below

    Namespace(access_key='b3b06bc9-edbb-4c4d-9adb-a69dcccb4327',
    , url='http://localhost:7070')
    
    Importing data...
    
    http://localhost:7070/events.json?
    accessKey=b3b06bc9-edbb-4c4d-9adb-a69dcccb4327&limit=-1
    153 events are imported.
    
  2. View the Events in the Browser at the url http://localhost:7070/events.json?accessKey=b3b06bc9-edbb-4c4d-9adb-a69dcccb4327&limit=-1

Events are inserted with the following format

{
  "eventId": "12dd84702e194222ab3fc4290d2a09de",
  "event": "$set",
  "entityType": "user",
  "entityId": "0",
  "properties": {
    "attr2": 12,
    "plan": 0,
    "attr0": 51,
    "attr1": 35
  },
  "eventTime": "2016-11-28T11:49:02.927Z",
  "creationTime": "2016-11-28T11:49:03.452Z"
}
../_images/pio-eventserver-lead-scoring.png

Lead Scoring Regression Engine

  1. Build the Engine

    ../bin/pio build --verbose
    
    ...
    
    [INFO] [Console$] Your engine is ready for training.
    
  2. Train the Engine

    ../bin/pio train
    

    Output will be similar to listing below

    [INFO] [Engine$] EngineWorkflow.train completed
    [INFO] [Engine] engineInstanceId=ee64cab3-38fa-4225-b8e7-8bfccc6b5b6f
    [INFO] [CoreWorkflow$] Inserting persistent model
    [INFO] [CoreWorkflow$] Updating engine instance
    [INFO] [CoreWorkflow$] Training completed successfully.
    
  3. Start the Engine

    ../bin/pio deploy
    

    Output will be similar to listing below. In our case engine is listening at http://0.0.0.0:8000

    INFO] [HttpListener] Bound to /0.0.0.0:8000
    [INFO] [MasterActor] Engine is deployed and running.
    Engine API is live at http://0.0.0.0:8000.
    

    Browse to the link http://0.0.0.0:8000 to see the Engine output.

    ../_images/pio-engine-random-forest.png

    You can also inspect the Feature Map for each Category

    ../_images/pio-engine-random-forest-tree.png
Tree

     forest: [TreeEnsembleModel regressor with 5 trees ]

     featureIndex: Map(
     landingPage -> 0, referrer -> 1, browser -> 2)

     featureCategoricalIntMap: Map(

     landingPage -> Map( -> 17, example.com/page9 -> 3, example.com/page17 -> 12,
          example.com/page12 -> 10, example.com/page13 -> 16,
          example.com/page6 -> 5, example.com/page18 -> 2, example.com/page14 -> 0,
          example.com/page2 -> 8, example.com/page3 -> 11, example.com/page15 -> 6,
          example.com/page20 -> 13, example.com/page10 -> 1, example.com/page16 -> 9,
          example.com/page19 -> 7, example.com/page8 -> 14, example.com/page4 -> 15,
          example.com/page1 -> 4),

     referrer -> Map( -> 10, referrer10.com -> 2, referrer3.com -> 8, referrer9.com -> 3,
             referrer2.com -> 7, referrer1.com -> 5, referrer6.com -> 6,
             referrer4.com -> 1, referrer7.com -> 9, referrer8.com -> 0, referrer5.com -> 4),

     browser -> Map( -> 4, Safari -> 0, Internet Explorer -> 2, Chrome -> 3, Firefox -> 1))
  1. Predict the Lead Score

    $ curl -H "application/json" -d '{
             "landingPageId" : "example.com/page9",
             "referrerId" : "referrer10.com",
             "browser": "Firefox" }' http://localhost:8000/queries.json
    

    Output

    {"score":0.15}
    

    As can be seen the lead score for page visit page9 from referrer10.com is 0.15.

Summary

In this tutorial we learnt how to apply Random Forest Algorithm based template to make prediction of a lead score for a web page visit.