Data留学生辅导、讲解analysis、辅导R语言、R编程设计调试-留学生程序网

联系方式

QQ：99515681
邮箱：99515681@qq.com
工作时间：8:00-23:00
微信：codinghelp2

您当前位置：首页 >> javajava

Data留学生辅导、讲解analysis、辅导R语言、R编程设计调试

日期：2020-04-02 10:22

1. (Wine Data Set) These data are the results of a chemical analysis of wines grown in

the same region in Italy but derived from three different cultivars. The analysis determined

the quantities of 13 constituents (including Alcohol, Malic acid, Ash, Alcalinity of

ash, Magnesium, Total phenols, Flavanoids, Nonflavanoid phenols, Proanthocyanins,

Color intensity, Hue,OD280/OD315 of diluted wines, and Proline) found in each of the

three types of wines. The sample size is 178. The dataset is available in the course site. The

main interest of this dataset is to study multiclassification of the three types of wines. Let yb

denote the predicted class of observations.

(a) Use nominal logistic regression in Section 2.3 to examine the multiclassification. The R

function is multinom. In addition, summarize the confusion table for y and yb, use macro

averaged metrics to evaluate recall, precision, F-measure, and then conduct performance

of classification.

(b) Use the methods in linear discriminant analysis and quadratic discriminant analysis to

obtain yb. In addition, summarize the confusion table for y and yb, use macro averaged

metrics to evaluate recall, precision, F-measure, and then conduct performance of classification.

table for y and yb, use macro averaged metrics to evaluate recall, precision, Fmeasure,

and then conduct performance of classification.

(d) Summarize your findings in (a)-(c).

2. (Simulation studies) Consider the following linear model:

y = X1β1 + X2β2 + X3β3 + X4β4 ? 4√ρX5β5 + , (1)

where X = (X1, · · · , Xp) is a p-dimensional vector of covariates and each Xk is generated

from N(0, 1). The correlations of all Xk except X5 are ρ, while X5 has the correlation √ρ

with all other p ? 1 variables. Suppose that the sample size is n = 200.

(a) Show that X5 is marginally independent of y.

(b) Now, consider p = 1500 and generate the artificial data based on model (1) for 1000

repetitions. Specifically, let βi = 1 for every i = 1, · · · , 5 and set ρ = 0.7. After that, use

the SIS and iterated SIS methods to do variable selection and estimate the parameters

associated with selected covariates. Finally, summarize the estimator in the following

table:

Table 1: Simulation result for (b)

k?βk1 k?βk2 #S #FN

SIS

Iterated SIS

with entry (j, k) in ΣX being 0.5

|j?k|

for j, k = 1, · · · , p. We generate the artificial data

based on (1) for 1000 repetition with βi = 1 for every i = 1, · · · , 5. After that, use the

lasso, adaptive lasso, and Elastic net (set α = 0.5) methods to estimate the parameters.

Finally, summarize numerical results in the following table.

Table 2: Simulation result for (c)

k?βk1 k?βk2 #S #FN

lasso

adaptive lasso

Elastic net (α = 0.5)

(d) Summarize your findings for parts (b) and (c), respectively.

Note: Let βb be the estimator, then ?β is defined as ?β = βb ? β with the ith component

being βbi ? βi

. Therefore, k?βk1 and k?βk2 are defined as

Hint: Regarding simulation studies with 1000 repetitions.

In Question 2, you are asked to use simulation studies with 1000 repetitions to estimate the

parameters. Specifically, based on the kth artificial data that are independently generated, you are

able to obtain the estimator, denoted by βb(k). As a result, with 1000 repetitions.

【返回顶部】【打印本稿】【关闭本页】

【上一篇】：讲解grep c 、讲解c grep程序、辅导c 报告

【下一篇】：讲解grep c 、讲解c grep程序、辅导c 报告

联系方式

最新辅导

热门辅导

您当前位置：首页 >> javajava

Data留学生辅导、讲解analysis、辅导R语言、R编程设计调试

日期：2020-04-02 10:22

相关文章