The Analytic Edge Lecture code in Python Week4 Supreme Court

VIDEO 4

Read in the data

 

CART Model

Proposed formula: Reverse ~ Circuit + Issue + Petitioner + Respondent + LowerCourt + Unconst

Notice all predictors, except Unconst , are categorical data

SO even before splitting the dataset, we need to do some extra work

Docket Term Circuit Issue Petitioner Respondent LowerCourt Unconst Reverse
0 93-1408 1994 2nd EconomicActivity BUSINESS BUSINESS liberal 0 1
1 93-1577 1994 9th EconomicActivity BUSINESS BUSINESS liberal 0 1
2 93-1612 1994 5th EconomicActivity BUSINESS BUSINESS liberal 0 1
3 94-623 1994 1st EconomicActivity BUSINESS BUSINESS conser 0 1
4 94-1175 1995 7th JudicialPower BUSINESS BUSINESS conser 0 1

Extra work for Python

Encoding categorical predictors using one-hot encoding

First : Create a dictionary with the categorical data points for each row

Second: Transform our dictionary to a binary on-hot encoded array for each row

Third: Construct a separate dataframe with the one-hot encoded data and name the columns

Finally: Construct the transformed dataset

Split the data

CART model

The tree algorithm implemented by sci-kit learn is CART. Set the minimal number of data points in each node to be 25

Plot the tree. For this part, you need to install graphviz on your machine and the path variable is added.

Look at the tree

png

Make predictions

ROC curve

We need real valued prediction output to get ROC curve.

png

VIDEO 5 - Random Forests

Make predictions

VIDEO 6

There is no complexitity parameter cp for CART model in scikit-learn. We got the min_sample_leaf = 25 from the lecturer, are there other options beside that? I will use Cross Validation to choose a min_samples_leaf to train model.

Got a different value

Leave a Reply

Your email address will not be published. Required fields are marked *