Is There A Relationship Between Voter Gender And Preference Among Candidates ?


I'm interested in whether there is a relationship between voter's gender and their preference in candidates. To be specfic, I want to test if there is a statistical significance difference in term of voting preference between male and female voters. The result of this reserch might rise question like whether voting is bias due to voters' gender in general, and that would be an interesting question for further research.


I am using the American National Elections Study(ANES)[][1] dataset for this research. The dataset is collected by conducting surveys of voters in the United States before and after every presidential election. Even though the surveys are carryed out to random sample of all U.S voters, since there is no experiment involved in the data collection stage, my data analysis on this dataset is an observational study. My research result will only be able to generalize to the people who were intended to vote in the 2008 election, it can not be generalized to all Americans so as to the 2012 election. And also, I can not draw causual relationship between voters' gender and there voting perference.

Each case in the dataset represents a voter that is been surveyed by ANES. I will be using voters' gender and who they are interested in voting for in the 2008 election as the two variables to work on. The voters gender is coded as “gender_respondent”“ in the dataset, and it is a categorical variable taking values in "Female” or “Male”. The candidate to whom the voter is interested in voting for is coded as “interest_whovote2008” in the dataset, and it is also a categorical variable which takes value in “Barack Obama”, “John Mccain” and “Other”

Exploratory data analysis:

Since the dataset also included people who didn't vote in 2008 election, first a subset of the original dataset that only contain people who actually voted in the 2008 election with the two variables of interest is created. Among the people who did report voted in the 2008 election, those who didn't report the candidate he or she voted for is then excluded from the subset. So anesVot is the name for the created dataset that will be used in the following research.

The obtained dataset anesVot contains 4520 observations, each case in the dataset represents an individual that not only voted in the 2008 election, but also reports who he or she voted.




Overall, there are 2342 Females and 2178 Males in the dataset. There are 2704 voters who voted for Barack Obama, 1702 voters who voted for John Mccain, and 114 voters who voted for candidates other than the pervious two.




63% female voters voted for Barack Obama, 56% male voters voted for Barack Obama.



The percentage of voters voted for John Mccain in both genders are the same 38%.



3% female voters and 2% male voters voted for candidates other than the two major competitors. So it seems from this calculations that female voters tend to slightly perfer Barack than John Mccain and other candidate.



To see it graphically, the above plot show the difference between gender and perfered candidates as describe before.

This difference could be ture, however it could also be simply due to chance. If we draw another sample from the people who voted in the 2008 election, this difference might not appear at all. So I want to carry out a Chi-square independence test.


Since I want to test the dependent relationship between two categorical variables, I will use Chi-square independence test.

The null hypothesis {H}_{0} for my Chi-square independence test is: Voter gender and preference among candidates are independent. And the alternative hypothesis {H}_{A} is that:Voter gender and preference among candidates are dependent.

Let's check the conditions for carry out the desired Chi-square independence test. The data is collectied on a random sample of U.S voters without replacement. According to the official Federal Election Commission report [][2] there are 131,313,820 voters voted in the 2008 election.


So the sample is less 10% of the population.

The expected count is calculated using the formula

expected count = frac{row total times column total}{table total}

Calculating expected values for each cell will produce a table as follows:


So this satisfied the required condition that each cell has at least 5 expected value.

Recall the forumla for Chi-square is:


and formula for degree of freedom is:





There is a function that's built in R that can do this calculation for us.

The out put suggest the calculation being carried out earlier is correct.

The resulting p value is less than 0.05 suggests that we should reject the null hypothesis {H}_{0} and conclude that there is a dependent relationship between voter gender and preference among candidates.


This research finds out that voters' gender do affect who he or she is likely to vote for in the 2008 election. It seems that Barack Obama successfully win over more womens' favour than John did.

As for future studies, I guess I will try to find data and see is this kind of dependent relationship can be found in previous and follow-up elections.


[1] “The American National Election Studies”

[2] “Official Federal Election Commission report”

Leave a Reply

Your email address will not be published. Required fields are marked *