Install R and R Studio or Python/Jupyter Notebook (if you do this in Python, I highly recommend that you download the package, bamboolib). and the appropriate packages. Remember to think analytically and if you need to or want to do some research on the packages, do so.
Based on the dataset provided by your instructor
Download here
, become knowledgeable about the data it contains.
You are an analyst for the company, IceCubed and you are running a fundraiser effort to raise money for your product that makes instant ice cream (think Keurig for Ice Cream). You are asking each donor to at least contribute $100 to your fund (these are only donors, this does not mean they will buy a device).
This file contains information about your donors, which will give you insight about people that are interested in your product. Explore the variables, data types, values, etc… Calculate appropriate summary statistics and create appropriate graphs that will give you insights into the data. Write a 3-4 page report that summarizes your key findings. Include the answers to following questions (Answer this in APA (Links to an external site.) Style, which includes a narrative and formatted sections, not bullet points):
1. What is the overall goal of what you are trying to solve for and how can machine learning solve this
2. What did you do with the data in the context of exploration?
3. Was there missing data? How clean was the data? (You need to clean the data too and discuss how it was cleaned)
4. Were there outliers or suspicious data?
5. What did you find? What intrigued you about the data? Why does that matter and what is the business value? (include charts and why you researched those specific columns. You don’t need to explore them all but you should discuss why you felt that it mattered)
6. What would your proposed next steps be? How do you plan to approach the cleansing of the data?
7. There should be NO modeling (regression,decision trees) in this assignment
8. There should be no code in the essay
Discuss/recap the steps in the exercise and their usefulness. Be sure to include screenshots that are no larger than 25% of the page and attach your code in a separate document.
Exceeds Standard
Describes the data preparation process including data cleaning, data imputation and data transformation. Include identification of variables with explicit explanations of how data was handled that are supported by reason and logic.
Data visualizations and tables are pertinent for the level and type of analysis. Cohesively and succinctly incorporates interpretation of the output, graphs, figures, charts, and tables and the significance of the results in the analysis.
Provides all code and the outputs.
Completely free of errors in grammar, spelling, and punctuation; and completely correct usage of title page, citations, and references.