The Analytic Edge Lecture code in Python Week1

VIDEO 2

Basic Calculations

Python use ** for power

type '' and line break will allow you to continuetyping in code in new line

 

Functions

You can take square root without using a function...

You can also use libraies like math, numpy or scipy for sqrt function

The absoluate value function

to see help on function or object in python:

 

Variables

Store the values in variables using assignment '=' , '<-' will not work in Python

see names in current scope You will see a much shorter list if you are using the basic python prompt You can use 'name' in dir() to test if a object by that name existed in the scope or not

Similarily you can use locals().keys()

VIDEO 3

Vectors

use numpy array to store a vector

use print to see the output in a nicer way

Getting the element from numpy arrary using index ( starts from 0 rahter than 1 in R)

Create a array of fixed step integers using arange(), notice the end point is excluded, so you need +1

 

Data Frames

a DataFrame is a table-like data structure there are many ways to create a dataframe, I prefer to pass in data as a dictionary. This way, which vector becomes which column will be very clear

Use dataframeName['columnName'] to access column in the dataframe. Add a new column named 'Population'

Example of creating a new 'shorter' dateframe and append to the above one by row

notice the index is not correctly updated, need a quick fix

or we can add a second parameter when calling the

VIDEO 4

change the working directory to where the data file loacted...

Loading csv files

see the first 5 rows in the dataframe

Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
0 Afghanistan Eastern Mediterranean 29825 47.42 3.82 5.40 60 98.5 54.26 NaN 1140 NaN NaN
1 Albania Europe 3162 21.33 14.93 1.75 74 16.7 96.39 NaN 8820 NaN NaN
2 Algeria Africa 38482 27.42 7.17 2.83 73 20.0 98.99 NaN 8310 98.2 96.4
3 Andorra Europe 78 15.20 22.86 NaN 82 3.2 75.49 NaN NaN 78.4 79.4
4 Angola Africa 20821 47.58 3.84 6.10 51 163.5 48.38 70.1 5230 93.1 78.2

to see the structure of the data frame

Can't find a exact equalvalent function to str() in R

Print out a summary of variables in the dataframe

The outputed infomation is slightly different

Country Region Population Under15 Over60 FertilityRate LifeExpectancy ChildMortality CellularSubscribers LiteracyRate GNI PrimarySchoolEnrollmentMale PrimarySchoolEnrollmentFemale
count 194 194 194.000000 194.000000 194.000000 183.000000 194.000000 194.000000 184.000000 103.000000 162.000000 101.000000 101.000000
unique 194 6 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
top Russian Federation Europe NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
freq 1 53 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
mean NaN NaN 36359.974227 28.732423 11.163660 2.940656 70.010309 36.148969 93.641522 83.710680 13320.925926 90.850495 89.632673
std NaN NaN 137903.141241 10.534573 7.149331 1.480984 9.259075 37.992935 41.400447 17.530645 15192.988650 11.017147 12.817614
min NaN NaN 1.000000 13.120000 0.810000 1.260000 47.000000 2.200000 2.570000 31.100000 340.000000 37.200000 32.500000
25% NaN NaN 1695.750000 18.717500 5.200000 1.835000 64.000000 8.425000 63.567500 71.600000 2335.000000 87.700000 87.300000
50% NaN NaN 7790.000000 28.650000 8.530000 2.400000 72.500000 18.600000 97.745000 91.800000 7870.000000 94.700000 95.100000
75% NaN NaN 24535.250000 37.752500 16.687500 3.905000 76.000000 55.975000 120.805000 97.850000 17557.500000 98.100000 97.900000
max NaN NaN 1390000.000000 49.990000 31.920000 7.580000 83.000000 181.600000 196.410000 99.800000 86440.000000 100.000000 100.000000

Subsetting

Writing csv files

Removing variables

VIDEO 5

Basic data analysis

equvalent way to setting the value:

Scatterplot

png

Subsetting

Country GNI FertilityRate
22 Botswana 14550 2.71
55 Equatorial Guinea 25620 5.04
62 Gabon 13740 4.18
82 Israel 27110 2.92
87 Kazakhstan 11250 2.52
130 Panama 14510 2.52
149 Saudi Arabia 24700 2.76

VIDEO 6

Histograms

png

Boxplot

Using default settings

png

tweaking the params a little bit to get a nicer plot The code is a bit messy....

png

Summary Tables

A bit more complicated than in R

Leave a Reply

Your email address will not be published. Required fields are marked *