Classification : Naive Bayes

Author : Rajdeep Dua
Last Updated : Oct 7 2017


In this example we will learn how to use Naive Bayes classifier with PredictionIO.

Template Code

We are going to use the Template code from

Template code Dependency

  • Scala Version : 2.11.8
  • Spark Version : 2.1.1
  • Prediction IO : 0.12.0-incubating

Naive Bayes Classificiation template has to be downloaded to /home/ubuntu/work/PredictionIO-0.12.0-incubating/incubator-predictionio-template-attribute-based-classifier

Event Server

Go the home directory of PredictionIO distribution /home/ubuntu/work/PredictionIO-0.12.0-incubating/

  1. Start the Event Server

    ./bin/pio eventserver &

    You can check status of the event server by going to the following url http://localhost:7070

  2. Create a new App by giving an access key. You can generate your own access key here

    cd incubator-predictionio-template-attribute-based-classifier
    ../bin/pio app new AppClassifierOne --access-key b3b06bc9-edbb-4c4d-9adb-a69dcccb4326


    [INFO] [App$] Initialized Event Store for this app ID: 14.
    [INFO] [App$] Created new app:
    [INFO] [App$]       Name: AppClassifierOne
    [INFO] [App$]         ID: 14
    [INFO] [App$] Access Key: b3b06bc9-edbb-4c4d-9adb-a69dcccb4326
  3. Export the Access Key into an Environment Variable

    export ACCESS_KEY=b3b06bc9-edbb-4c4d-9adb-a69dcccb4326
  4. Data used from Classification

    We are planning to use the following DataSet in the data folder. It has a plan label and three attributes.

    Data Format

    plan, attr0 attr1 attr2
    0,51 35 12
    0,49 30 12
    0,47 32 12
    0,46 31 12
    0,50 36 12
    0,54 39 14
    0,46 34 13
    0,50 34 12
    0,44 29 12
    0,49 31 11
    0,54 37 12
  5. Import Data into Event Server

    python data/ --access_key $ACCESS_KEY

    If your data is successfully imported the output will be similar to the one listed below

    file='./data/data.txt', url='http://localhost:7070')
    Importing data...
    153 events are imported.
  6. View the Events in the Browser at the url http://localhost:7070/events.json?accessKey=b3b06bc9-edbb-4c4d-9adb-a69dcccb4326&limit=-1

Events are inserted with the following format

  "eventId": "12dd84702e194222ab3fc4290d2a09de",
  "event": "$set",
  "entityType": "user",
  "entityId": "0",
  "properties": {
    "attr2": 12,
    "plan": 0,
    "attr0": 51,
    "attr1": 35
  "eventTime": "2016-11-28T11:49:02.927Z",
  "creationTime": "2016-11-28T11:49:03.452Z"

Classificiation Engine

  1. Build the Engine

    ../bin/pio build --verbose
    [INFO] [Console$] Your engine is ready for training.
  2. Train the Engine

    ../bin/pio train

    Output will be similar to listing below

    [INFO] [Engine$] EngineWorkflow.train completed
    [INFO] [Engine] engineInstanceId=ee64cab3-38fa-4225-b8e7-8bfccc6b5b6f
    [INFO] [CoreWorkflow$] Inserting persistent model
    [INFO] [CoreWorkflow$] Updating engine instance
    [INFO] [CoreWorkflow$] Training completed successfully.
  3. Start the Engine

    ../bin/pio deploy

    Output will be similar to listing below. In our case engine is listening at

    INFO] [HttpListener] Bound to /
    [INFO] [MasterActor] Engine is deployed and running.
    Engine API is live at

    Browse to the link to see the Engine output.


    You can also verify the Classification engine’s class

  4. Predict the class Label

    curl -H "Content-Type: application/json" \
    -d '{ "attr0":2, "attr1":0, "attr2":0 }'




In this tutorial we learnt how to apply Naive Bayes Algorithm based template to make prediction about a class in a two class dataset.