The customer churn dataset dataset captures customer interactions with an online retail store. This document describes the data fields.
For this dataset, build two classifiers to predict Churn: one using Adaboost and one using random forest. Use a suitable evaluation metric to compare the performance of the classifiers.
The supermarket sales dataset has sales data for a supermarket with three branches. This document describes the data fields.
You have two tasks for this dataset.
Build two classifiers to predict Gender, one using a decision tree and one using a random forest. Use a suitable evaluation metric to compare the performance of the classifiers.
Build two models to predict Rating, one using linear regression and one using a decision tree regressor. Use a suitable evaluation metric to compare the performance of the models.
You can use any programming language, including Python and R. You can make use of standard packages for analytics and machine learning. Clearly document any external packages used by your code.
Submit the following via Moodle, as a Jupyter notebook if you are using Python and as a single archive (zip, tar.gz, …) otherwise:
The code you used to solve the assignment.
If you have voluminous output to report, save it somewhere on the cloud and provide a link.
A short write up describing how your code ran on the data sets: the parameters used, time taken, space required, and anything else of interest. This should include a comparative evaluation of the two classifiers.
You may work alone or in groups of two. Each group makes a single submission to Moodle. Use either person's Moodle account to submit. The submission should mention the names of the two partners.
There will be a short oral presentation and question/answer session for each submission.