The Analytic Edge Lecture code in Python Week3 Modeling Expert

Video 4

Read in dataset

Look at structure

Table outcome

Baseline accuracy


Create training and testing sets

I will be using train_test_split from scikit-learn to split the dataset into training and testing sets

The data types returned by train_test_split in default is object, thus you need to convert them back when creating dataframes usingpandas

Also simply using train_test_split on the entire dataframe will not enforce the ration of 1 and 0 in the dependent variable in the splitted sets.  So it should be done separately for both cases and then merge the two cases back into one.

Logistic Regression Model Using statsmodels

Need to create a dummy variable for intercept

Make predictions on training set

Analyze predictions

I am taking advantage of the describe() and groupby() functions of DataFrame objects to do the calculations.

Video 5

Confusion matrix for threshold of 0.5

A little bit trickier to do the table

Sensitivity and specificity

Confusion matrix for threshold of 0.7

Sensitivity and specificity

Confusion matrix for threshold of 0.2

This time using confusion_matrix from scikit-learn

Sensitivity and specificity

Video 6

Performance function

Using metrics from scikit-learn

Plot ROC curve

I have given up on coloring the line... and the thresholds are not as nice as R's ROCR package.


Leave a Reply

Your email address will not be published. Required fields are marked *