MATH223 (Statistics)

Assignment

Due: 11:55 pm on 5 June 2020

This assignment must be lodged as a single PDF document (with each page numbered) in

the “Assignment 2 - submit here” on the Moodle site on or before the due date.

Penalty regarding late submission is given in the subject outline.

The assignment has 3 parts,and it will be assessed for completeness, accuracy, and

clarity of expression. Use your own words to express your answers. Must show your

work to earn your marks.

Important: Computer output must not form part of your submitted assignment unless it

has been properly annotated. It is your job to say what each piece of output means, and

not the marker’s job to guess. Be selective about what you include. The output should be

pasted into the relevant part of your submission, and not be left as an Appendix unless

specified otherwise.

This assignment must not exceed seven A4 pages, excluding appendix. An “A4 page” is

one side of a piece of A4 paper.

Part I: Calculation-based questions (2+2+2+2)

1. Let X be a random variable with the probability function f(x) given in the following

table

x 1 2 3 4 5

f(x) 0.1 0.5 0.2 0.1 0.1

Find E(X) and V ar(X). (2 marks)

2. A random variable X has probability density function f(x) defined as follows

f(x) =0.5x, 0 < x < 1,

1 ? 0.5x, 1 ≤ x < 2,

0.5x ? 1, 2 ≤ x < 4

(a) Find P(0 ≤ X ≤ 0.5).

(b) Find P(0.5 ≤ X ≤ 1.2).

(2 marks)

3. Consider an investment whose return is normally distributed with a mean of 5%

and a standard deviation of 10%

(a) Determine the probability of losing money.

(b) Find the probability of losing money when the standard deviation is equal to

20%.

(2 marks)

4. The weight of a typical roll of toilet paper is normally distributed with a mean of

230g and standard deviation of 15g. Use this information to answer the followings:

(a) In some stores toilet rolls are available individually. What is the probability

for a consumer buy a roll of toilet paper that weigh at least 250g?

(b) A standard pack of toilet paper contains 10 rolls. What is the probability that

the average weight of the pack of rolls is less than 227g?

(2 marks)

*****More questions on the next page******.

Part II: R analysis questions (5+9+8)

We will perform analysis on data from a study of online storage usage run by a

new cloud storage company. Researchers collected information from 25 users on the

following aspects of interest:

gender: gender of the account holder. Male is coded 0 and female 1.

OS: operation system used by the account holder

size: cloud storage used in GB

month: length of month since account activation

For the following questions in this section, each student must obtain your own

dataset. Instruction to obtain your dataset is given as follows:

1. Run the following code in R to open an HTML:

install.packages("shiny")

library(shiny)

#open page to data download

shiny::runGist("16a860341eafa12d2be4f2f1ac4b0ef5")

2. Select “MATH223” and download the csv file.

Instructions for questions in Part II:

? Must include your dataset as an Appendix.

? Use R to answer the questions in Part II unless specified otherwise.

? Any R output must be accompanied by relevant R code.

If you fail to follow the instructions given above, you will receive no mark

for the following questions in Part II.

5. Summary statistics. Make sure to show both R code and the output when necessary.

(a) What is the variable type for size? What is the variable type for month? (1

marks)

(b) Produce a frequency table for OS. What is the relative frequency for “Microsoft”?

Express your answer in percentages showing 1 decimal place. (1

marks)

(c) Show the boxplot for month. From the boxplot (no need for doing extra

coding), what are the approximate values for the five number summary

statistics? Is there any outlier shown in the boxplot? Explain your reason or

show in the graph. (1.5 marks)

(d) Show the histogram for size. Describe the distribution for size. (1.5 marks)

6. The company estimated the average cloud storage usage cross the server is 120 GB.

Answer the following questions. Make sure to show both R code and the output

when necessary.

(a) Use R to find the mean, standard deviation of size. (1 marks)

(b) To test this hypothesis, what statistical test shall we conduct? (1 marks)

(c) Define null and alternative hypotheses. (1 marks)

(d) Calculate by hand the test statistic. (1 marks)

(e) Calculate by hand the degree of freedom for this test. (1 marks)

(f) Given we specific a significance level of 0.05, find the rejection region using the

t statistic table. (1 marks)

(g) Based on the results from the rejection region, make conclusion about the

company’s estimation. (2 marks)

(h) Use R to perform a test for this question. Paste your R output. (1 marks)

7. The analytic team is interested in the relationship between size and month. The

team hypothesised that longer an users is with the company, the larger their storage

usage will be. The analytic team requests you to provide answers to the following

questions. Make sure to show both R code and the output when necessary.

(a) Show a scatterplot of month and size. Explain the reason(s) of choosing

variable for each axis. What is the relationship between month and size shown

in this scatterplot? (1.5 marks)

(b) Use R to find the correlation coefficient between month and size. (0.5 marks)

(c) Use R to fit a least square line to the data. What is the regression equation?

(1 marks)

(d) For month= 6, calculate by hand the predicted value for storage size. (1 marks)

(e) Use the R output from (c), what is the standard error for size? (1 marks)

(f) Calculate by hand the 95% confidence level for size. (1 marks)

(g) To test whether month has any effect on size, define the null and alternative

hypotheses. (1 marks)

(h) Use p-value form the R output from (c), make a conclusion for the test in (g).

(1 marks)

*****More questions on the next page*****.

Part III: Statistical report critique (3+5)

8. A researcher wrote a report on a study of renal transplant patients. The report is

named “stats report” and is available on Moodle under the Assignment 2 Section.

Based on guidelines of writing a statistical report in Section 1 lecture note, you will

(a) comment on the existing “stats report” (no more than one page); (3 marks)

and

(b) using the statistical information given in the existing report, write a standard

statistical report appropriate for a general audience (no more than one page in

length). (5 marks)

版权所有：留学生作业网 2018 All Rights Reserved 联系方式：QQ:99515681 电子信箱：99515681@qq.com

免责声明：本站部分内容从网络整理而来，只供参考！如有版权问题可联系本站删除。