I’m working on a r exercise and need a sample draft to help me learn.
You should document your exploration in an R script file, and by creating a PowerPoint (or equivalent) presentation that includes the following elements:
- R code that you use at each step.
- Plots that you create with your R code.
- Your description of the information you find in the data and the plots; describe how you interpret the plots and the information provided by R.
In the process of completing this project, you will want to save your plots to files so you can add them to your PowerPoint presentation. You can do this in R-Studio as follows: After you have created the plot, click the
Export button at the top of the Plots pane. Then click on the
Save as Image... menu. This brings up a dialog box in which you can choose the image format, the directory (folder) in which the image will be saved, and the name of the image file.
For this Final Project, you will use the Cigarette data set in the Ecdat package. To do this, you first must install the Ecdat package with the command
To make this package accessible to your R session, type the command
You should now be able to see the Cigarette data frame:
state year cpi pop packpc income tax avgprs taxs1 AL 1985 1.076 3973000 116.4863 46014968 32.5 102.18167 33.348342 AR 1985 1.076 2327000 128.5346 26210736 37.0 101.47500 37.000003 AZ 1985 1.076 3184000 104.5226 43956936 31.0 108.57875 36.170424 CA 1985 1.076 26444000 100.3630 447102816 26.0 107.83734 32.104005 CO 1985 1.076 3209000 112.9635 49466672 31.0 94.26666 31.000006 CT 1985 1.076 3201000 109.2784 60063368 42.0 128.02499 51.48333
Each row provides data about a given state in a given year. This data set has the following variables:
- state: the two letter abbreviation for the state.
- year: the year.
- cpi: consumer price index for the year.
- pop: state population
- packpc: average number of packs of cigarettes per capita per year
- income: total state personal income.
- tax: average state, federal, and average local excise taxes for fiscal year.
- avgprs: average price per pack during fiscal year, including sales taxes, in cents.
- taxs: average excise taxes per pack for fiscal year, including sales taxes, in cents.
Complete the project by completing the following. Remember to document your process and your results in your PowerPoint presentation.
- Create a boxplot of the average number of packs per capita by state. Which states have the highest number of packs? Which have the lowest?
- Find the median over all the states of the number of packs per capita for each year. Plot this median value for the years from 1985 to 1995. What can you say about cigarette usage in these years?
- Create a scatter plot of price per pack vs number of packs per capita for all states and years.
- Are the price and the per capita packs positively correlated, negatively correlated, or uncorrelated? Explain why your answer would be expected.
- Change your scatter plot to show the points for each year in a different color. Does the relationship between the two variable change over time?
- Do a linear regression for these two variables. How much variability does the line explain?
- The plot above does not adjust for inflation. You can adjust the price of a pack of cigarettes for inflation by dividing the avgprs variable by the cpi variable. Create an adjusted price for each row, then re-do your scatter plot and linear regression using this adjusted price.
- Create a data frame with just the rows from 1985. Create a second data frame with just the rows from 1995. Then, from each of these data frames, get a vector of the number of packs per capita. Use a paired t-test to see if the number of packs per capita in 1995 was significantly different than the number of packs per capita in 1985.
- In the process of doing this project, have any questions come to mind that this data set could answer? If so, pick one and do the analysis to find the answer to your question.
Be sure to zip and submit your entire document when finished!