联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> javajava

日期:2019-01-16 10:14

Coursework

Univariate Statistics and Methodology using R - 2017/2018

Martin Corley and Milan Valá?ek

Read this whole document before you do anything else.

Overview

For the course assignment, you will be expected to retrieve, clean, and analyse a data set. In this

document we provide the primary research questions to be answered, information on the structure

and format of the final report, information on code that should be submitted, and a brief overview

of the marking criteria. You can find the codebook for the data set and the R script template on

LEARN. The data for this assignment come from the Timed picture naming in seven languages study

(Bates, et al., 2003) available as part of the (International Picture Naming Project)[https://crl.ucsd.

edu/experiments/ipnp/]

It can be tempting to over-complicate assessments like this, particularly if you have a long time to

complete them. The labs have been designed to prepare you for this assignment: to explore data,

to conduct appropriate analyses for given data types, and to make decisions that you can justify.

Bear in mind that completing this assessment does not require any knowledge that wasn’t covered

in lectures, labs, and readings.

What you need to submit

For your assessment you need to submit two documents: your report and your R code. More instructions

on how to submit are below. Here, we provide more detail on what to submit.

Report

You need to produce a report answering the assignment questions below. Your report should include

appropriate analyses to provide answers to these questions while describing the process and utilising

graphics where necessary to illustrate your points.

Your report should clearly identify the decisions you made in analysing the data, as well as

summarising what can be concluded from your analysis.

Figures and tables should be numbered and captioned, and referred to in the text; important

statistical outcomes should be summarised in the text.

1

Reporting should follow APA 6th Edition guidelines for the presentation of tables, figures, and

statistical results (see final lecture for more information). Alternative style is acceptable so long

as it is clear and consistent.

Your report should be a maximum of 4 sides of A4 (including tables and figures), in a standard

font, size 12, with normal 1 inch margins.

Code

Your report must be accompanied by an R script (a text file with the extension .R, the default file

type when saving a script from R-Studio) which can be used to exactly reproduce the results set out

in your submitted report. It should include all steps taken in data cleaning and all analyses. Every

answer to the assignment tasks/questions given below must be accompanied by code used to find

out the answer. You should provide clear and informative comments within the file describing the

steps taken. Please download the script template from LEARN and use it to write your script.

Important: Do not edit the lines of code in the script template that read in the data sets!

This lines will obtain the data to be used for this assignment from the internet and assign them to data

frames.

We will check that the code runs and produces the results presented in your report.

Any code copied and pasted or otherwise adapted from internet examples should be cited appropriately

in the comments. An appropriate citation should include the URL where the code was found,

the name of the website or blog, and the original author’s name. In the absence of a proper name,

you can cite the contributor’s nickname or alias.

You can work on the R-script in small groups (no more than 4 students) if preferred. If you do this, it is

important that you take a couple of steps:

1. At the very start of your script include a comment line (line starting with #) which includes the

exam numbers (not the names) of those you worked with. For example:

# Produced in collaboration with students B045329 and B018429

2. Within the script point out (again using comments) which blocks of code are shared.

3. Please ensure that your acknowledgements match those of others in your group (if you say you

produced the script in collaboration with B045329, we expect B045329 to acknowledge you).

Important: While the code can be worked on in small groups, the written report must be produced

entirely independently. It is not OK to include sections in the written report that are written collaboratively.

2

Submission and Marking

Submitting your work

All coursework must be submitted before 12:00 (noon) on Monday the 21st of January 2019 via Turnitin.

You can access it by clicking on the “Assessment details and submission” tab of the course page

on LEARN. There are two sections there, one for each of the two files you are required to submit.

You will be asked to provide your name and submission title. The submission title must be your exam

number (and nothing more). Your name will not appear anywhere in the documents accessed by

the markers. To ensure that the marking is entirely anonymous, please do not include your name or

student number anywhere in either of the submitted files.

Remember, the files you are required to submit are:

Report, as described above. The filename must be your exam number with whatever extension

is provided by your chosen word processor (e.g., ‘B045329.docx’). The file you create should

have your exam number on each page (e.g., in the header or footer).

R script which runs all of the data cleaning and the final analyses reported. The filename must

be your exam number with the .R extension (e.g., ‘B045329.R’).

Please ensure that you name your documents exactly as above. File names such as ‘R Script for

B04329.R’ or ‘B044329 Report final.docx’ slow down document matching and marking and will result

in loss of marks.

Please check LEARN for detailed instructions on the submission process prior to submitting.

Marking Criteria

The code is worth 30% of the coursework marks, and the report is worth 70% of the coursework marks.

Work will automatically fail (max mark of 30%) unless both components are submitted.

You will be assessed on the following:

1. Appropriate cleaning of the data set and key variables of interest, making appropriate and

justified decisions on the steps you take.

2. Selection of appropriate statistical tests and variables to answer the primary research question

and the justifications provided for your selections.

3. Interpretation of the results of the selected analyses.

4. R-code that runs without errors all the way through, is clear and appropriately commented.

For handy tips on writing good code, see http://adv-r.had.co.nz/Style.html (no need to stick

religiously to the guidelines but following them does make code nice and tidy).

5. Last but not least: Clarity of writing and formatting. The report should conform to the APA

6

th Edition style guidelines for formatting text, tables, and figures, reporting results of statistical

analyses, writing style, etc. However, alternative style is acceptable provided it is comparably

3

clear and consistent. For a useful resource, see https://owl.purdue.edu/owl/research_and_

citation/apa_style/apa_style_introduction.html.

Data

You are given four separate data sets:

df_e is a data set of 520 pictures and their associated variables in a English language picture

naming study

df_c, df_h, and df_s are data sets of 173 pictures in a Chinese, Hungarian, and Spanish

language picture naming study, respectively. Each data set uses a different picture set.

The code book for the data sets can be found on LEARN.

Assignment Questions

Question 1

Is there a relationship between the frequency of a target word in the English corpus and reaction

time (RT) on a picture-naming task? Once you are content that the data are appropriately cleaned,

run the following model:

m1 <- lm(rttar ~ lnfreq, data = df_e)

Question 1.1

Concisely report and interpret the results of the model.

Question 1.2

What is the predicted RT for a word with a frequency of 20?

Hint: Don’t forget the the frequency variable is log-transformed (see codebook for details) and that,

in R, exp() is the inverse function of log().

Question 1.3

Produce and interpret a diagnostic plot of the model that shows whether or not the model residuals

are normally distributed.

4

Question 2

Do target word length and the number of synonyms a word has have additional effects on RT above

and beyond that of word frequency in the English language data set?

Question 2.1

Fit an appropriate model to test this question.

Question 2.2

Run model diagnostics and, if needed, re-fit the model.

Question 2.3

Report and interpret the results of the final model.

Question 3

Do variables entered as predictors in the Question 2 model predict whether or not at least one participant

will produce an error response on a picture naming task in the English language data set?

Pictures for which there are any invalid or incorrect responses should be coded as containing errors.

Question 3.1

Fit, report, and interpret an appropriate model to test this question.

Question 3.2

What is the predicted probability of a correct response on a picture whose name has a frequency of

12, is 3 syllables long and has only one form?

Question 4

Does the effect of target word frequency on RT vary significantly between Chinese, Hungarian, and

Spanish?

Hint: You will need to construct a single data frame to answer this question.

5

Question 4.1

Fit an appropriate model to test this question. Run model diagnostics and re-fit the model if needed.

Question 4.2

Report and interpret the results of the final model.

Question 4.3

Which language has the weakest effect of frequency on RT? Describe it in terms of unit change in RT

as a result in a unit change in log-frequency.

Remember:

Explore and describe the data.

Build appropriate models, evaluate them and their associated assumptions, and interpret the

results.

Let your models be informed by the research questions they are supposed to address. There is

seldom a need for mind-bogglingly complex and borderline uninterpretable 6-way interaction

models.

GOOD LUCK!


版权所有:留学生程序网 2020 All Rights Reserved 联系方式:QQ:99515681 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。