Projects #1 and #2
Analyzing Big Data I
Project 1. Classification is the process of predicting the class of given data points. Classes
are sometimes called as targets/ labels or categories. Classification predictive modeling is
the task of approximating a mapping function (f) from input variables (X) to discrete output
variables (y).
For example, spam detection in email service providers can be identified as a classification
problem. This is a binary classification since there are only 2 classes as spam and not
spam. A classifier utilizes some training data to understand how given input variables relate
to the class. In this case, known spam and non-spam emails have to be used as the training
data. When the classifier is trained accurately, it can be used to detect an unknown email.
Classification belongs to the category of supervised learning where the targets also provided
with the input data. There are many applications in classification in many domains
such as in credit approval, medical diagnosis, target marketing etc.
With a group of no more than two, perform a complete data science evaluation of a dataset
to classify. You will need to:
? Find an appropriate dataset.
? Use the code provided to you in R to perform the analysis.
? Write up a detailed synopsis of your analysis. Please keep it brief, say no more than
five pages.
Project 2. Prediction is the process of predicting the values of a given process based on
input data. Again, the prediction task is to approximate a mapping function (f) from input
variables (X) to discrete output variables (y). For example, regression can be used to model
unknown y values given X. There are many applications in classification in many domains
such as in stock analysis, loan size, econometric theory, etc.
With a group of no more than two, perform a complete data science evaluation of a dataset
to predict. You will need to:
Find an appropriate dataset.
Use the code provided to you in R to perform the analysis.
1
Write up a detailed synopsis of your analysis. Please keep it brief, say no more than
five pages.
版权所有:留学生程序网 2020 All Rights Reserved 联系方式:QQ:99515681 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。