Lead Scoring using Regression with Random Forest

Author : Rajdeep Dua
Last Updated : Sep 28 2016


In this example we will learn how to use Random Forest based Regression algorithm for Leading Scoring with PredictionIO.

Template Code : https://github.com/rajdeepd/template-scala-parallel-leadscoring

Download the Template Code

Lead Scoring template has to be downloaded to /home/ubuntu/work/PredictionIO-0.12.0-incubating/

cd ~/PredictionIO-0.12.0-incubating
git clone template-scala-parallel-leadscoring

Event Server

Go the home directory of PredictionIO distribution /home/ubuntu/work/PredictionIO-0.12.0-incubating/

  1. Start the Event Server

    ./bin/pio eventserver &

    You can check status of the event server by going to the following url http://localhost:7070

  2. Create a new App by giving an access key. You can generate your own access key here https://www.uuidgenerator.net/version4

    cd template-scala-parallel-leadscoring
    ../bin/pio app new lead-scoring --access-key b3b06bc9-edbb-4c4d-9adb-a69dcccb4327

    Note that the app name is lead-scoring.


     [INFO] [App$] Initialized Event Store for this app ID: 7.
    [INFO] [Pio$] Created a new app:
    [INFO] [Pio$]       Name: lead-scoring
    [INFO] [Pio$]         ID: 7
    [INFO] [Pio$] Access Key: b3b06bc9-edbb-4c4d-9adb-a69dcccb4327
  3. Export the Access Key into an Environment Variable

    export ACCESS_KEY=b3b06bc9-edbb-4c4d-9adb-a69dcccb4327
  4. Data used for Lead Scoring

    We are planning to use the following DataSet in the data folder.

  • 10 Users
  • 50 Items
  • 20 Pages
  • 10 Referral IDs

View Data:

User, lands on page, referrer, browser
u4,example.com/page13,referrer3.com, Safari
u5,example.com/page13,referrer5.com, Internet Explorer
u8,example.com/page14,referrer9.com, Safari
u1,example.com/page18,referrer6.com, Firefox
u4,example.com/page17,referrer8.com, Internet Explorer
u3,example.com/page20,referrer1.com, Chrome
u10,example.com/page14,referrer6.com, Safari

Buy Data:

User, buys items
  1. Import Data into Event Server

    python data/import_eventserver.py --access_key $ACCESS_KEY

    If your data is successfully imported the output will be similar to the one listed below

    , url='http://localhost:7070')
    Importing data...
    153 events are imported.
  2. View the Events in the Browser at the url http://localhost:7070/events.json?accessKey=b3b06bc9-edbb-4c4d-9adb-a69dcccb4327&limit=-1

Events are inserted with the following format

  "eventId": "12dd84702e194222ab3fc4290d2a09de",
  "event": "$set",
  "entityType": "user",
  "entityId": "0",
  "properties": {
    "attr2": 12,
    "plan": 0,
    "attr0": 51,
    "attr1": 35
  "eventTime": "2016-11-28T11:49:02.927Z",
  "creationTime": "2016-11-28T11:49:03.452Z"

Lead Scoring Regression Engine

  1. Build the Engine

    ../bin/pio build --verbose
    [INFO] [Console$] Your engine is ready for training.
  2. Train the Engine

    ../bin/pio train

    Output will be similar to listing below

    [INFO] [Engine$] EngineWorkflow.train completed
    [INFO] [Engine] engineInstanceId=ee64cab3-38fa-4225-b8e7-8bfccc6b5b6f
    [INFO] [CoreWorkflow$] Inserting persistent model
    [INFO] [CoreWorkflow$] Updating engine instance
    [INFO] [CoreWorkflow$] Training completed successfully.
  3. Start the Engine

    ../bin/pio deploy

    Output will be similar to listing below. In our case engine is listening at

    INFO] [HttpListener] Bound to /
    [INFO] [MasterActor] Engine is deployed and running.
    Engine API is live at

    Browse to the link to see the Engine output.


    You can also inspect the Feature Map for each Category


     forest: [TreeEnsembleModel regressor with 5 trees ]

     featureIndex: Map(
     landingPage -> 0, referrer -> 1, browser -> 2)

     featureCategoricalIntMap: Map(

     landingPage -> Map( -> 17, example.com/page9 -> 3, example.com/page17 -> 12,
          example.com/page12 -> 10, example.com/page13 -> 16,
          example.com/page6 -> 5, example.com/page18 -> 2, example.com/page14 -> 0,
          example.com/page2 -> 8, example.com/page3 -> 11, example.com/page15 -> 6,
          example.com/page20 -> 13, example.com/page10 -> 1, example.com/page16 -> 9,
          example.com/page19 -> 7, example.com/page8 -> 14, example.com/page4 -> 15,
          example.com/page1 -> 4),

     referrer -> Map( -> 10, referrer10.com -> 2, referrer3.com -> 8, referrer9.com -> 3,
             referrer2.com -> 7, referrer1.com -> 5, referrer6.com -> 6,
             referrer4.com -> 1, referrer7.com -> 9, referrer8.com -> 0, referrer5.com -> 4),

     browser -> Map( -> 4, Safari -> 0, Internet Explorer -> 2, Chrome -> 3, Firefox -> 1))
  1. Predict the Lead Score

    $ curl -H "application/json" -d '{
             "landingPageId" : "example.com/page9",
             "referrerId" : "referrer10.com",
             "browser": "Firefox" }' http://localhost:8000/queries.json



    As can be seen the lead score for page visit page9 from referrer10.com is 0.15.


In this tutorial we learnt how to apply Random Forest Algorithm based template to make prediction of a lead score for a web page visit.