Dataset of patients who had undergone surgery for breast cancer.
Features of dataset:
- Age - Age of patient at time of operation.
- Year - Patient's year of operation (year - 1900).
- Nodes - Number of positive axillary nodes detected.
- Class(Survived):
1 - the patient survived 5 years or longer
2 - the patient died within 5 year - Given the details of the patient we need to predict whether the patient survived or not.
Import required libraries
# For mathematical calculation import numpy as np # For handling datasets import pandas as pd # For plotting graphs from matplotlib import pyplot as plt # Import the sklearn library for Naive bayes from sklearn.naive_bayes import GaussianNB
Import dataset
# Import the csv file df = pd.read_csv('data.csv') print df.head() ''' Output: Age Year Nodes Survived 0 30 64 1 1 1 30 62 3 1 2 30 65 0 1 3 31 59 2 1 4 31 65 4 1 '''
Plot the classes against features.
# We plot the data to see dependency of any # feature on the class plt.xlabel('Feature') plt.ylabel('Survived') X = df.loc[:,'Age'] Y = df.loc[:,'Survived'] plt.scatter(X, Y,color='blue',label='Age') X = df.loc[:,'Year'] Y = df.loc[:,'Survived'] plt.scatter(X, Y,color='green',label='Year') X = df.loc[:,'Nodes'] Y = df.loc[:,'Survived'] plt.scatter(X, Y,color='red',label='Nodes') plt.legend(loc=4, prop={'size': 7}) plt.show()
Prepare data for training
# Prepare the training set X = df.loc[:,'Age':'Nodes'] Y = df.loc[:,'Survived']
Train the model
clf = GaussianNB() # Train the model clf.fit(X,Y)
Test the model
# Test the model(returns the class) prediction = clf.predict([[12,70,12], [13,20,13]]) print prediction ''' Output: [1 2] '''
No comments:
Post a Comment