Final Coursework

Introduction to Quantitative Research Methods (PUBL0055)

Instructions

? The coursework will be posted on Moodle on 13th December 2019 at 6pm, and is due on 13th January

2020 at 2pm. Please follow all designated SPP submission guidelines for online submission as detailed

on the PUBL0055 Moodle page. Standard late submission penalties apply.

? This is an assessed piece of coursework (worth 75% of your final module mark) for the PUBL0055

module; collaboration and/or discussion of the coursework with anyone is strictly prohibited. The rules

for plagiarism apply and any cases of suspected plagiarism of published work or the work of classmates

will be taken seriously.

? As this is an assessed piece of work, you may not email/ask the course tutors or teaching fellows

questions about the coursework.

? Along with the coursework itself, the datasets for the coursework can be found in the PUBL0055 page

on Moodle.

? Coursework should be submitted via the ‘Turnitin Submission: PUBL0055 Essay 2’ link on the course

Moodle page. You will need to click the ‘Submit Paper’ link at the bottom of the page. When presented

with the ‘Submit Paper’ box, the ‘Submission Title’ should be your candidate number, and

you should upload your document into the box provided.

– Please remember to state ONLY your candidate number on your coursework (your candidate

number is made up of four letters and one number e.g. ABCD5). Your name and/or student

number MUST NOT appear on your coursework.

? The coursework consists of 8 questions. The marks allocated for each question is indicated in the text.

? Unless otherwise stated, answers should be written in complete sentences. Be sure to answer all parts

of the questions posed and interpret the results.

? The word count for this assessment is 3000 words. This does not include the appendix, or any words

(or numbers) contained within tables.

? Please submit your type-written (numbered) answers in a single document. Create an appendix section

at the end which contains all the R code needed to reproduce your results (you do not need to include

the code that failed to run, but just the cleaned-up version. Your code has to work when we run it).

? You may assume the methods you have used (e.g. difference in means, linear regression, etc) are

understood by the reader and do not need definitions, but you do need to explain how they apply to

answering the question.

? Round all numbers to two digits after the decimal point.

? Do not copy and paste any brute R output (e.g. lm(y ～ x)) into your answers. Create a formatted

table that is easy to read.

? Assign every table and figure a title and a number and refer to the number in the text when discussing

a specific figure or table.

? All variable names in the coursework are written in this_font. 1

The Effects of Educational Television

Is educational television an effective teaching aid? “The Electric Company” was a television programme that

ran on US TV from 1971 to 1977. The programme used sketch comedy to provide an entertaining way of

helping elementary school children develop their grammar and reading skills. It was widely credited by many

teachers in US schools as having important effects on the literacy skills of second-, third-, and fourth-grade

children. In this section, you will analyse data from an experiment that involved randomly assigning classes

of children to watch “The Electric Company”. You will investigate what reading gains, if any, were made

classes as part of this experiment.

The unit of analysis in this data is a class of children, and there are 192 classes in the data. Each class was

either treated (to watch the program) or control (to not watch the program). The outcome of interest is the

average score on a reading test administered at the end of each year called post.score. In addition to the

treatment and outcome, the data also contains information on the year grade of the class and the score on

the same reading test as adminstered before the treatment took place:

Name Description

grade The school grade of the class (1st through 4th)

treatment 1 if the class was treated, 0 otherwise (randomized)

pre.score Class reading score before treatment, at the beginning of the school year

post.score Class reading score at the end of the school year

The data is stored in electric-company.csv. Once you have downloaded this file and placed it in the

relevant folder, it can be loaded into R as follows:

electric <- read.csv("data/electric-company.csv")

Question 1 (16 marks)

a. Calculate and interpret the average effect of the treatment on the class reading score at the end of the

school year.

b. Explain whether we can interpret your answer to part a as the causal effect of television on student scores.

c. Calculate the standard error of the difference in means. Show your work.

d. Conduct a hypothesis test for the difference in means. Can we reject the null hypothesis of no effect of the

treatment at the 95% and 99% confidence levels?

e. Calculate and interpret the 95% confidence intervals for the difference in means estimate.

f. Explain the concept of a “sampling distribution”. What is the shape of the sampling distribution in this

example?

Question 2 (10 marks)

a. Make a scatter plot which compares student scores at the beginning of the year to student scores at the

end of the year.

b. Make a box plot which depicts student scores at the end of the year as a function of the grade they are in.

c. Estimate three linear regression models. The first should predict post.score with only the treatment

variable. The second regression model should be the same as the first, but should also control for student

grade. The third model should be the same as the second, but should also control for pre.score.

d. Summarise these models in terms of how much of the variation in post.score they “explain”. What does

this tell us about the relationships between 1) the grade a student is in and reading ability, and 2) students’

prior performance on the test and current performance on the test?

2

e. Are the estimates of the treatment coefficient different across the three models? Why do you think that is?

You may wish to provide evidence from the data to support your argument. You may also wish to refer to

your answers to parts a and b of this question.

Question 3 (6 marks)

Use the grade variable to subset the data, and then use linear regression models to evaluate the effect of

treatment within each grade. How does the effect of the treatment differ as grade increases? Comment on

both the substantive and statistical significance of these results.

Question 4 (6 marks)

Write a short paragraph summarising your findings from these analyses. You should write as if you are trying

to communicate the results to someone who is interested in the effects of television on learning, but who has

not taken a course in quantitative methods. You may wish to create a visualisation to help communicate the

findings.

3

Political Parties and Policy Outcomes

Does which political party is in power matter for policy outcomes? This is an important question for political

scientists to answer, not least because many theories of voting assume that voters hold governing parties

to account on the basis of their performance in office. If such “retrospective voting” is to occur, it must be

the case that different political coalitions have clear and consistent effects on policy outcomes in the time

between elections.

To determine whether this is the case, in this section you will use data from the US to analyse the effects of

the party in power in US state governments (specifically, which party holds the governorship of the state) on

a number of different policy outcomes. The data comes from 864 elections across 50 states in the US, plus

the District of Columbia. The variables included in the data are:

Name Description

dem_governor The party of the governor (1 if Democrat, 0 if Republican)

dem_margin The Democratic electoral margin in the relevant election for governor.

This variable is measured as the Democratic candidate’s vote share minus

the Republican candidate’s vote share, such that negative values indicate

the Democratic candidate lost the election, positive values indicate the

Democratic candidate won the election.

dem_majority_house Whether the lower house of the state legislature is controlled by the

Democrats (1) or Republicans (0)

dem_majority_senate Whether the upper house of the state legislature is controlled by the

Democrats (1) or Republicans (0)

state The state.

year The year of the governor’s election.

unemployment The unemployment rate, measured 2 years after the governor was elected.

murder The number of murders (per 100,000 people), measured 2 years after the

governor was elected.

income_top_one_pct The percentage of income held by the top 1% of the population, measured

2 years after the governor was elected.

house_prices Quarterly Housing Price Index, measured 2 years after the governor was

elected.

The data is stored in governors.csv. Once you have downloaded this file and placed it in the relevant folder,

it can be loaded into R as follows:

governors <- read.csv("data/governors.csv")

Question 5 (6 marks)

a. For each of the 4 outcome variables, estimate a linear regression where dem_governor is the only explanatory

variable. Present the results in a table.

b. Interpret the regression coefficients in each model.

Question 6 (13 marks)

a. Adapt the regression models that you estimated above to include two additional control variables:

dem_majority_house and dem_majority_senate. Estimate these regression models and present the results

in a table.

b. Why might it be important to include these additional variables in your regression?

c. Interpret each of your three regressions, paying particular attention to the coefficient associated with the

dem_governor variable. Can the coefficient be interpreted causally in these models? Explain why or why not.

4

c. Your goal is to identify the causal effect of Democratic governors on these outcome variables. Imagine that

you had unlimited time and unlimited budget: describe one variable that you would ideally control for in

these models. Why?

Question 7 (18 marks)

In the paper on which this example is based, the authors use a regression discontinuity (RD) design. In this

design, the authors use the Democratic candidate’s electoral margin variable to make comparisons between

states that narrowly elected a Democrat to states that narrowly elected a Republican for governor. In this

question, you will replicate parts of the original RD analysis.

a. Write a short paragraph discussing why using a regression discontinuity design of this type might be better

than simply comparing states that have Democratic governors to states that have Republican governors.

Explain also one disadvantage of using a regression discontinuity design in the context of this study.

b. Use the dem_margin variable to compare policy outcomes between states that narrowly elected a Democratic

governer and states that narrowly elected a Republican governor. Report and interpret the regression

discontinuity treatment effect for all four outcome variables.

c. Produce four plots that depict the regression discontinuity design graphically. Each plot should depict

the relationship between the Democratic electoral margin and one of the policy outcomes. Your plot should

include two lines depicting the relationship on either side of the cutoff, and a vertical line to show the location

of the cutoff on the x-axis.

d. Write a short paragraph which compares your findings from the regression discontinuity design analysis

here to your findings from the regressions that you estimated in questions 1 and 2. What do you conclude

about whether political parties have important effects on policy outcomes?

5

Religion and the Electoral Success of the Nazi Party in 1932

In Weimar Germany, the Catholic Church vehemently warned ordinary parishioners about the dangers of

extremist parties. During this period, the church in Germany was particularly active in discouraging Catholics

from supporting the Nationalsozialistische Deutsche Arbeiterpartei (NSDAP), which is commonly known

in English as the Nazi party. Alerted by the Nazis’ sudden success at the polls and afraid of anticlerical

movements within the party, Catholic bishops took an explicit anti-Hitler stand in the autumn of 1930.

Historians have long contended that this anti-Nazi position from Catholic religious leaders had consequences

for the level of support amongst Catholic citizens, particularly in the context of the Reichstag elections in

1932.

In this section, you will investigate whether Catholic areas of Germany were less likely to support the Nazi

party in the elections held in November 1932. The data includes election results from 982 counties, and

includes a number of variables:

Name Description

nsdap_share The NSDAP % of the vote in the county

pct_catholic The Catholic percentage of the population in the county

pct_jewish The Jewish percentage of the population in the county

pct_other_rel The percentage of the population in the county of another Religion

pct_women The female percentage of the population in the county

log_population The total (log) population of the county

female_workforce The female percentage of the workforce in the county

urban_county If the county was urban (1) or rural (0)

unemp_white_collar The unemployment rate amongst white-collar workers in the county

unemp_blue_collar The unemployment rate amongst blue-collar workers in the county

unemp_domestic The unemployment rate amongst domestic servants in the county

industry_manufacturing The percentage of the county’s workforce in manufacturing

industries

industry_trade The percentage of the county’s workforce in trade and commerce

industry_services The percentage of the county’s workforce in public and private

services

industry_domestic The percentage of the county’s workforce employed in domestic

service

distance_to_berlin Distance of the county to Berlin (KM)

literacy_rate The literacy rate in the county

bavaria If the county was located in the state of Bavaria (1) or not (0)

log_income_per_capita County log income per capita

religiosity A binary indicator for whether the county is above average in terms

of “religiosity”, defined on the basis of how many citizens regularly

participate in religious festivals. 1 = more religious, 0 = less

religious.

The data is stored in reichstag.csv. Once you have downloaded this file and placed it in the relevant folder,

it can be loaded into R as follows:

reichstag <- read.csv("data/reichstag.csv")

Question 8 (25 marks)

Your task in this section is to investigate the relationship between the share of Catholics in a district and the

NSDAP vote share in that district in the election in order to answer the research question outlined above. In

particular, you should implement two linear regression models with nsdap_share as the dependent variable.

6

In the first model, the only explanatory variable should be the pct_catholic variable. For the second

model, you should build a model which – in addition to the pct_catholic variable – includes exactly three

additional explanatory variables that you think might be useful to include from the supplied dataset. You

should explain why you think these particular variables are important to include, given that our main interest

is in the relationship between Catholicism and Nazi vote share. Please note that, for the second model, you

should not estimate several different models and present the results, but rather you should argue theoretically

why you chose certain variables.

You should write up the results of these models as if they were to be published in a political science journal

article with a focus on communicating the substantive meaning of your results. In your discussion of these

models, you should focus on communicating the substantive implications of the regression that you implement,

paying particular attention to the relationship between the Catholic population of a district and Nazi vote

share in the election. You may wish to focus on the following:

? Provide descriptive statistics and/or plots to provide the reader with an overview of the dependent

variable and the important explanatory variable(s) that you intend to use.

? Provide a well-formatted table of regression output which includes the key information about the models

you have estimated.

? Discuss both the statistical and substantive significance of the relationships that you illustrate.

? Discuss model fit, using appropriate statistics.

? Discuss whether or not we should consider the estimates you present to be causally identified.

? Discuss weaknesses of you analysis, and potential alternative analysis designs that you might use (given

different data) to evaluate this research question.

7

版权所有：留学生作业网 2018 All Rights Reserved 联系方式：QQ:99515681 电子信箱：99515681@qq.com

免责声明：本站部分内容从网络整理而来，只供参考！如有版权问题可联系本站删除。