Classification : Naive Bayes

Author : Rajdeep Dua
Last Updated : Oct 7 2017

Introduction

In this example we will learn how to use Naive Bayes classifier with PredictionIO.

Template Code

We are going to use the Template code from https://github.com/rajdeepd/incubator-predictionio-template-attribute-based-classifier.

Template code Dependency

  • Scala Version : 2.11.8
  • Spark Version : 2.1.1
  • Prediction IO : 0.12.0-incubating

Naive Bayes Classificiation template has to be downloaded to /home/ubuntu/work/PredictionIO-0.12.0-incubating/incubator-predictionio-template-attribute-based-classifier

Event Server

Go the home directory of PredictionIO distribution /home/ubuntu/work/PredictionIO-0.12.0-incubating/

  1. Start the Event Server

    ./bin/pio eventserver &
    

    You can check status of the event server by going to the following url http://localhost:7070

    ../_images/pio-eventserver-status.png
  2. Create a new App by giving an access key. You can generate your own access key here https://www.uuidgenerator.net/version4

    cd incubator-predictionio-template-attribute-based-classifier
    ../bin/pio app new AppClassifierOne --access-key b3b06bc9-edbb-4c4d-9adb-a69dcccb4326
    

    Output

    [INFO] [App$] Initialized Event Store for this app ID: 14.
    [INFO] [App$] Created new app:
    [INFO] [App$]       Name: AppClassifierOne
    [INFO] [App$]         ID: 14
    [INFO] [App$] Access Key: b3b06bc9-edbb-4c4d-9adb-a69dcccb4326
    
  3. Export the Access Key into an Environment Variable

    export ACCESS_KEY=b3b06bc9-edbb-4c4d-9adb-a69dcccb4326
    
  4. Data used from Classification

    We are planning to use the following DataSet in the data folder. It has a plan label and three attributes.

    Data Format

    plan, attr0 attr1 attr2
    
    0,51 35 12
    0,49 30 12
    0,47 32 12
    0,46 31 12
    0,50 36 12
    0,54 39 14
    0,46 34 13
    0,50 34 12
    0,44 29 12
    0,49 31 11
    0,54 37 12
    
  5. Import Data into Event Server

    python data/import_eventserver.py --access_key $ACCESS_KEY
    

    If your data is successfully imported the output will be similar to the one listed below

    Namespace(access_key='b3b06bc9-edbb-4c4d-9adb-a69dcccb4326',
    file='./data/data.txt', url='http://localhost:7070')
    
    Importing data...
    
    http://localhost:7070/events.json?
    accessKey=b3b06bc9-edbb-4c4d-9adb-a69dcccb4326&limit=-1
    153 events are imported.
    
  6. View the Events in the Browser at the url http://localhost:7070/events.json?accessKey=b3b06bc9-edbb-4c4d-9adb-a69dcccb4326&limit=-1

Events are inserted with the following format

{
  "eventId": "12dd84702e194222ab3fc4290d2a09de",
  "event": "$set",
  "entityType": "user",
  "entityId": "0",
  "properties": {
    "attr2": 12,
    "plan": 0,
    "attr0": 51,
    "attr1": 35
  },
  "eventTime": "2016-11-28T11:49:02.927Z",
  "creationTime": "2016-11-28T11:49:03.452Z"
}
../_images/pio-classification-events-naive-bayes.png

Classificiation Engine

  1. Build the Engine

    ../bin/pio build --verbose
    
    ...
    
    [INFO] [Console$] Your engine is ready for training.
    
  2. Train the Engine

    ../bin/pio train
    

    Output will be similar to listing below

    [INFO] [Engine$] EngineWorkflow.train completed
    [INFO] [Engine] engineInstanceId=ee64cab3-38fa-4225-b8e7-8bfccc6b5b6f
    [INFO] [CoreWorkflow$] Inserting persistent model
    [INFO] [CoreWorkflow$] Updating engine instance
    [INFO] [CoreWorkflow$] Training completed successfully.
    
  3. Start the Engine

    ../bin/pio deploy
    

    Output will be similar to listing below. In our case engine is listening at http://0.0.0.0:8000

    INFO] [HttpListener] Bound to /0.0.0.0:8000
    [INFO] [MasterActor] Engine is deployed and running.
    Engine API is live at http://0.0.0.0:8000.
    

    Browse to the link http://0.0.0.0:8000 to see the Engine output.

    ../_images/pio-classification-engine-server.png

    You can also verify the Classification engine’s class

    ../_images/pio-classification-engine-naive-bayes.png
  4. Predict the class Label

    curl -H "Content-Type: application/json" \
    -d '{ "attr0":2, "attr1":0, "attr2":0 }' http://0.0.0.0:8000/queries.json
    

    Output

    {"label":1.0}
    

Summary

In this tutorial we learnt how to apply Naive Bayes Algorithm based template to make prediction about a class in a two class dataset.