联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> javajava

日期:2019-08-03 11:46

Stats 101C Kaggle Competition Final Project

There are two competitions:

- A classification competition

- And a regression competition

Each competition accounts for 16% of your final course grade. Each competition is scored separately.

Grading of each competition:

- 4% Competition performance and R Script verification

- 12% Report

Competition performance grading:

First place and ties for first place: 4 points

Last place and ties for last place: 0 points

Everyone else in between first and last place earns points that are scaled.

For example, let’s say that at the end of the competition there are 12 unique positions on the leader

board. There might be more than 12 teams, but with ties, let’s say there are only 12 unique positions.

Having 12 unique positions means there will be 11 gaps. Each gap will be 4/11 = 0.3636 points. And the

scoring will be as follows:

- 1

st place (and any ties): 4 points

- 2

nd place (and any ties): 3.636 points

- 3

rd place (and any ties): 3.273 points

- Etc.

- 10th place (and any ties): 0.727 points

- 11th place (and any ties): 0.364 points

- 12th place (and any ties): 0 points

Each competition is scored separately. It is possible for a team/individual to get first place in one

competition and earn 4 points in that competition while ending up in last place for the other

competition and getting 0 points for that one.

R Script verification

You will submit an R script that shows how your predictions were made. I have provided a starting

template that imports the data and produces the necessary output file to submit to Kaggle.

Your R script will be run to verify that it does indeed produce the predictions you submitted to Kaggle.

If the predictions you submitted to Kaggle do not match the output produced by your R script, you will

get a 0 for the competition performance portion of your project grade. This rule is to prevent students

from making a model in R and then manually changing the predictions in the submission file to get a

higher score in the competition.

Similarly, your R script should not make predictions manually. It must use the trained model for making

predictions.

Report guidelines:

You will submit a PDF report explaining the model you fit. The report is worth 12 points.

The report will describe anything that is done to the data before the model is fit. This includes any data

cleaning, data manipulation, or data transformation that was performed. It includes any variable

selection or dimension reduction process or any new variables that were created. You don’t need to do

any of the above things in your script to get full credit, but if you do any of the above steps, they must

be explained in the report.

The report will describe what kind of model was chosen for the final prediction and submission.

The report will explain why you think your model is a good choice and/or any shortcomings of the model

and areas of improvement. This section should include how you evaluated your model performance.

(Your evaluation of model performance should not be, “I submitted the predictions to Kaggle and got a

score.”)

Report should be about 2 pages long.

Grading Rubric for the report.

Good: Basic: Needs Improvement:

Overall writing Explanations are correct,

complete, and convincing.

Assumptions are made explicit

and given justification.

[minus ~0 pts]

Explanations are partially

correct but incomplete or

unconvincing.

Assumptions are made explicit

but not justified.

[minus ~1pts]

Explanations are illogical,

incorrect, or incoherent.

Assumptions are not made

explicit.

[minus ~3 pts]

Description of

things done to

the data before

fitting the model

Explanation of any data

manipulation is complete

without mistakes.

Any and all steps that are

performed in the script are

explained. Reasons for each step

is provided and are justifiable.

[minus ~0pts]

Any and all steps that are

performed in the script are

explained.

Reasons for each step is

unconvincing or questionable.

[minus ~1pts]

Explanation of any data

manipulation is not complete.

There are steps performed in

the script that are not

explained.

Reasons for each step is not

provided or are not justified.

[minus ~3 pts]

Description of

final model

Explanation of model is

complete and without mistakes.

Report describes how many /

what variables are used. Report

describes properties of the

model (e.g. parametric vs nonparametric).

Report provides

reasons for using this particular

model.

[minus ~0pts]

Explanation of model is

complete but has minor

mistakes.

Report describes how many /

what variables are used. Report

describes properties of the

model. Report provides reasons

for using this particular model.

[minus ~1pts]

Explanation of model contains

serious mistakes. The model

used is not adequately

described.

Report does not provide reasons

for using this particular model.

[minus ~3 pts]

Discussion of

model strengths

and weaknesses

and model

performance

Evaluation of model

performance is complete and

reasonable.

Report discusses model

strengths and weaknesses /

possible improvement.

Discussion is correct and

justifiable.

[minus ~ 0pts]

Evaluation of model

performance is provided but

contains minor mistakes.

Report discusses model

strengths and weaknesses /

possible improvement.

[minus ~1pts]

Evaluation of model

performance is missing or

contains serious mistakes.

Discussion of model strengths

and weaknesses is missing or

contains serious mistakes.

[minus ~3 pts]


版权所有:留学生程序网 2020 All Rights Reserved 联系方式:QQ:99515681 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。