1. (25 points) Identify a data set. The data set must have at least two variables and at least 10 observations. Bigger is better, but It doesn’t need to be huge. Read the data into R. If your dataset is really big, take a simple random sample to keep it manageable. I’ll provide R code to show you how. Don’t try to analyze 5 million rows of airline departure data, for example. For most analyses, 1000 rows should be fine. You can also use specific selection criteria to reduce the size, such as filtering by date or by country. You can do some of your data manipulation, filtering, random selection, and analysis in Excel if you want. 2. (25 points) Perform and submit descriptive statistics to figure out what kinds of variables you have and how they are distributed. For nominal and ordinal variables, you must provide a list of values or levels. For ordinal and interval variables, you must provide a summary of the range, and at least one measure each of central tendency and spread. Present your descriptive statistics in a neatly formatted table with clear labels. Don’t use raw R output right out of the console – take the time to pick the information you and your team feel it’s most important to present, and format it as if it’s a real business presentation. Assume your audience knows very little about statistics. Explain what the information means. 3. (25 points) You must perform and submit at least one data visualization. It can be as simple as a histogram or a scatterplot. The data visualization must be submitted as part of a document (pdf is ideal) so that I can see the result. If you want to get more creative than that, great. However, be sure to adhere to the standards of good data visualization design. For example, don’t use line graphs unless you are representing time series data. Don’t use cute graphics, such as ducks or houses, to represent observations. Make it look statistical. Use appropriate labels and axis values. Maximize the data-to-ink ratio. Scatterplot or line chart 4. (25 points) Perform and submit some analysis of the shared variability between at least two of the variables in your data set. The type of analysis you do should be dictated by the question you want to answer and the levels of measurement and distributions of your variables. For example, if you have one nominal variable and one interval variable, your analysis might consist of means on the interval variable grouped by levels of the nominal variable. If you have two interval level variables, you might do a scatterplot and/or calculate a correlation or perform a bivariate regression. There are many possibilities. As your assignment submission, write up a short report and submit it as a single PDF document that includes your descriptive statistics, visualizations, and some text telling the story of the data source, what it represents, and what you concluded from the data as a result of your analysis. Include your R code as an appendix within the same PDF document. Your R code should include everything needed to read in your data, perform any recodes, filtering steps or data cleaning needed, and reproduce all of your analyses.