6 Portfolio

Instead of a traditional final exam, you are tasked with writing a short portfolio on concepts we have discussed throughout the semester and then subsequently writing responses to three pre-determined prompts during our final exam time.

The final portfolio consists of the following pieces.

Mini-Project Summary OR Homework Summary (35 points)

For this part of the portfolio, complete one of the following two options (either Option 1 OR Option 2).

Option 1: Mini-Projects

Collaboration and AI: If you have questions or issues with the coding or Quarto part of this, you may ask other students, the PQRC, or myself and you may use AI. However, you may not collaborate or use AI at all to assist in writing your reflection.

For this option, you will create a Quarto Blog/Website containing your 5 mini-projects, and write a reflection that ties the mini-projects together and gives biggest take-aways from each project.

Your reflection should either go in the about.qmd file that appears at the root directory of the R Project, the index.qmd file that also appears at the root directory of the R Project, or in a separate blog post. The rendered .html file from about.qmd essentially serves as the home page for the blog. The reflection should be about 5 paragraphs or so and should address:

how each mini-project ties in to other content in the course.
how each mini-project ties in with some other mini-projects (note that some mini-projects are more connected than others: you do not need to “connect” every single mini-project with every other mini-project).
your biggest take-aways from these mini-projects (these should not be copy-pasted from the “take-away” section of any of your original mini-projects: the expectation is that, now that we have gone through the entire course, your take-aways may be more sophisticated now).

Some extra notes:

mini-project 1 is the only mini-project that had substantial handwritten work. You do not need to include the handwritten derivations in the blog.
if, for other mini-projects, you have things that you submitted that you wrote by hand, you can embed these as images in quarto using knitr::include_graphics("path_to_image.jpg").
as you are creating your five posts for the five mini-projects, note that the .qmd file must be named index.qmd within each folder and that the folder itself cannot have any spaces or special characters in its name.

What to Submit: For this part, you should submit either a url link to the blog (if you have published it through GitHub pages) OR a link to the GitHub repository with the blog (either a public repository, or, if private, add me as a collaborator @highamm).

Where to Start: If you already have R Studio connected to GitHub, start creating the blog following the directions at https://highamm.github.io/dataviz334/02-software.html#creating-a-quarto-blogwebsite. If you do not have R Studio connected to GitHub, then start by following directions at https://highamm.github.io/dataviz334/02-software.html#installing-git-and-using-github-class-prep.

Option 2: Homework

Collaboration and AI: You may not collaborate or use AI at all to assist in this part of the portfolio.

For this option, you will choose one homework problem from each of the five sections of the course (Sampling Distributions, Estimation, Confidence Intervals, Bayesian, and Hypothesis Testing). Three of the problems should be problems that you have already completed (and were assigned as homework already) while two of the problems should be problems that you have not yet completed (that were not assigned on any homework). You should submit both a formal write-up on each problem as well as a more elaborate description of the problem itself, how it connects to other topics within that particular section, and why you chose that problem in the first place.

A few notes on the problems themselves:

it is up to you whether you want to type up your solutions, or, if you want to write them neatly on a sheet of paper.
if you choose an “easy” problem (e.g., there was a one-line homework problem in the estimation section showing that the standard estimator for \(p\) in a binomial setting is unbiased), then the expectation is that your written description is much more lengthy. For a more challenging homework problem, the written description might be one long paragraph or two short paragraphs per problem. For an “easier” homework problem, the written description should be closer to 4-5 paragraphs in order to demonstrate breadth of understanding.

A few notes on the descriptions:

for each homework problem description, you should (1) explicitly say why you selected that particular problem, (2) how the problem directly connects to topics within that particular section, and (3) how the problem connects to other topics in the course and to statistics as a whole. Use this description to convince me both that you understand the problem itself and to explain how that particular problem has aided your understanding of statistics.

What to Submit: For this part, you should submit a hard copy of each of the homework problems along with typed descriptions of each problem.

Reflection (20 points)

AI Usage: You may not use generative AI for this reflection at all. I want to know what your thoughts are here! Use of AI will result in a 0 on the entire portfolio.

Each prompt is worth 5 points. For each, 3 points will be awarded for a complete answer while 5 points will be awarded for a thoughtful and sophisticated answer.

Reflect on your overall experience in this class by describing an interesting idea that you learned, why it was interesting to you, and what it tells you about practicing statistics.
Which assessment (homework assignment, recap task, mini-project, or in-class assessment) do you feel you learned the most from? Explain what you learned from this assessment and/or why the assessment was valuable to you.
What was a concept you learned in this course that you hope you remember 5 years from now? Why is it important that you remember this concept?
What statistical ideas are you curious to know more about as a result of taking this class? Try to give two examples and explain why you would like to know more about them.

What to Submit: For this part, you should typed responses to each of the 4 prompts.

In-Class Assessment (45 points)

In class during our final exam time, you will be prompted with 3 short “essay”-style questions that are based off of the following prompts. Note that the questions posed will not be exact verbatim matches to the five conceptual questions above, but they will be extremely similar.

Collaboration and AI: To prepare, you may work with any others in the class and you may use AI as a tool to help you prepare. However, I want to assess what you have learned, not what you might copy down from an AI response: therefore, you may not use a cheatsheet for the in-class component.

Our course was broken up into 5 sections: Sampling Distributions, Estimation, Confidence Intervals, Bayesian Statistics, and Hypothesis Testing. Below are some questions that are inspired by each of the sections. In class, you will be prompted to write a written response (about two short paragraphs) on three (not-so-randomly-selected) prompts that will be very similar to three of the prompts below.

Section 1 (Sampling Distributions). Throughout most of the section on sampling distributions, we supposed that we knew a probability model that could generate data and that we knew the values of the parameters in that probability model (e.g., \(\mu = 10\), \(\sigma = 2\) in a normal probability model). Given that in practice, we rarely know the true values of the parameters in a model, what do you think the overall purpose of this section was? In other words, why complete this section if we rarely actually know the values of parameters?
Section 2 (Estimation). In the section on estimation, we used a few different methods (method of moments, maximum likelihood estimation) to derive estimators for parameters in probability models using data. Something that we emphasized is the idea of a “bias variance trade-off.” Explain what the bias variance trade-off is. In your explanation, you should address what bias is and what variance is by explaining what would happen if you applied an estimator with high bias to repeated (thousands of) samples of data and what would happen if you applied an estimator with high variance to repeated (thousands of) samples of data.
Section 3 (Confidence Intervals). We spent some time in class discussing what is meant by “95% confidence.” Now, consider the simple linear regression model \(Y_i = \beta_0 + \beta_1 x_i + \epsilon_i\), where \(\epsilon_i\) are iid \(\text{N}(0, \sigma^2)\) random variables. Suppose that you use obtain a 95% confidence interval for \(\beta_1\), the slope parameter in the model using R, Python, or other statistical software. Explain what is meant by “95% confidence” in this setting (perhaps reviewing Lab 3.2).
Section 4 (Bayesian). What are the primary differences (conceptually) between working in a “frequentist” framework (all statistics you would have likely seen up to this section) and a “Bayesian” framework?
Section 5. Consider again the simple linear regression model \(Y_i = \beta_0 + \beta_1 x_i + \epsilon_i\), where \(\epsilon_i\) are iid \(\text{N}(0, \sigma^2)\) random variables, and suppose that your friend collects some data to conduct a hypothesis test \(H_0: \beta_1 = 0\) vs. \(H_a: \beta_1 \neq 0\). They correctly obtain a p-value of 0.994 from the test and conclude: “My p-value is very large and quite close to 1, so the null hypothesis is extremely likely to be true.” Explain what is wrong with your friend’s conclusion.

You will have the full 3 hours to complete the assessment, but I would expect the average time to complete it to be about 45 minutes.

What to Submit: Nothing before the assessment! You’ll submit your written responses to three of these prompts during our final exam time.