DATA / STAT 234

Syllabus and Course Information

Welcome! This is the course materials website for DATA / STAT 234. Though we will use other sources at times, we will use materials on this site most heavily.

General Information

Instructor Information

  • Professor: Matt Higham
  • Office: Bewkes 123
  • Email:
  • Semester: Fall 2024
  • Sections:
    • MWF 10:30 - 11:30
  • Office Hours: 15 minute slots bookable at my calendly page.
    • Note that you must book a time for office hours at least 12 hours in advance to guarantee that I am present and available at that time.

Course Materials

  • DATA_STAT 234 Materials Bundle. This will be our primary source of materials.
  • Textbooks (only used as references):
    • R for Data Science by Grolemund and Wickham, found here in a free online version.
    • Modern Data Science with R by Baumer, Kaplan, and Horton, found here in a free online version.
  • Computer with Internet access.


Course Information

Welcome to DATA/STAT 234! The overall purpose of this course is learn the data science skills necessary to complete large-scale data analysis projects. The tool that we will be using to achieve this goal is the statistical software language R. We will work with a wide variety of interesting data sets throughout the semester to build our R skills. In particular, we will focus on the Data Analysis Life Cycle (Grolemund and Wickham 2020):

We will put more emphasis on the Import, Tidy, Transform, Visualize, and Communicate parts of the cycle, as an introduction to Modeling part is covered in STAT 213.

Use of R and RStudio

We will use the statistical software R to construct graphs and analyze data. A few notes:

  • R and RStudio are both free to use.
  • Additionally, we will be using Quarto for data analysis reports. Note: It’s always nice to start assignments and projects as early as possible, but this is particularly important to do for assignments and projects involving R. It’s no fun to try and figure out why code is not working at the last minute. If you start early enough though, you will have plenty of time to seek help and therefore won’t waste a lot of time on a coding error.


General Course Outcomes

  1. Import data of a few different types into R for analysis.

  2. Tidy data into a form that can be more easily visualized, summarised, and modeled.

  3. Transform, Wrangle, and Visualize variables in a data set to assess patterns in the data.

  4. Communicate the results of your analysis to a target audience with a written report, or, possibly an oral presentation.

  5. Practice reproducible statistical practices through the use of Quarto for data analysis projects.

  6. Explain why it is ethically important to consider the context that a data set comes in.

  7. Develop the necessary skills to be able to ask and answer future data analysis questions on your own, either using R or another program, such as Python.

To paraphrase the R for Data Science textbook, about 80% of the skills necessary to do a complete data analysis project can be learned through coursework in classes like this one. But, 20% of any particular project will involve learning new things that are specific to that project. Achieving Goal # 7 will allow you to learn this extra 20% on your own.



How You Will Be Assessed

The components to your grade are described below:

  • Modules

Aside from the first module (which has a unique structure), the modules in the course will be composed of either:

  1. An exercise set due on Monday (worth 5 points), a take-home quiz due on Wednesday (worth 10 points), and a handwritten in-class quiz on Wednesday (worth 60 points). We should have 4 total modules follow this structure.

  2. An exercise set due on Monday (worth 5 points), a take-home quiz due on Wednesday (worth 10 points), and an in-class coding quiz taken on a computer on Wednesday (worth 60 points). We should have 4 total modules follow this structure.

  3. An exercise set due on Monday (worth 5 points), a participation assessment (worth 15 points), and a Project due Wednesday (worth 55 points). We should have 3 total modules follow this structure.

The lowest module will be dropped from your grade so the total number of points available from all modules is 11 * 75 = 825 points.

Additionally, for one module, you are permitted to complete whatever is due for that module late and turn it in by the following class period. If there is an in-class quiz as part of the module, you should contact me via email to schedule a make-up time to take the quiz (again, this time should be scheduled before the following class).

Finally, if you choose to take the (optional) in-person final exam (described below), the score that you earn on that final will replace your second lowest module score.

  • Final Project

There is one final project, worth 100 points. The primary purpose of the final project is to give you an opportunity to assemble topics throughout the course into one coherent data analysis. You will be able to choose the data set you use for your final project, so you might begin thinking about a particular topic or data set you are interested in exploring. The final project will be presented in a format to be decided later in the semester.

  • Final Exam

There is an optional Final Exam worth 75 points. You must be on campus for our final exam time if you would like to take the optional final exam. If you do not take the optional final, then the average of your 11 highest module scores will be used for the 75 points for this exam.

If you take the final exam and if your final exam score percentage is better than your 2nd lowest module grade, then that grade will be replaced with your final exam score. For example, suppose your 12 module scores are: 75, 70, 70, 70, 65, 65, 59, 58, 40, 34, 15, 0. You take the final exam and score a 60 / 75. Then, the 60 is used for your final exam score and your new module scores would be: 75, 70, 70, 70, 65, 65, 59, 58, 40, 34, 60, 0. The 0 would still be dropped as your lowest module score.

Breakdown

  • 825 points for Modules
  • 75 points for the (optional) Final Exam
  • 100 points for the Final Project

Points add up to 1000 so your grade at the end of the semester will be the number of points you’ve earned across all categories divided by 1000.

Grading Scale

The following is a rough grading scale. I reserve the right to make any changes to the scale if necessary.

Grade 4.0 3.75 3.5 3.25 3.0 2.75 2.5 2.25 2.0 1.75 1.5 1.25 1.0 0.0
Points 950-1000 920-949 890-919 860-889 830-859 810-829 770-809 750-769 720-749 700-719 670-699 640-669 600-639 0-599


Collaboration, Diversity, Accessibility, and Academic Integrity

Rules for Collaboration

Collaboration with your classmates on exercises, take-home quizzes, and projects is encouraged, but you must follow these guidelines:

  • you must state the name(s) of who you collaborated with at the top of each assessment.
  • all work must be your own. This means that you should never send someone your code via email or let someone directly type code off of your screen. Instead, you can talk about strategies for solving problems and help or ask someone about a coding error.
  • you may use the Internet and StackExchange, but you also should not copy paste code directly from the website, without citing that you did so. Policies about AI will be clearly stated on each individual assignment: for most assignments, you are permitted to use AI as long as you clearly state all queries that you make and clearly state what you used from the AI responses to the queries.


Diversity Statement

Diversity encompasses differences in age, colour, ethnicity, national origin, gender, physical or mental ability, religion, socioeconomic background, veteran status, sexual orientation, and marginalized groups. The interaction of different human characteristics brings about a positive learning environment. Diversity is both respected and valued in this classroom.



Accessibility Statement

The message below is copied from the Student Accessibility Services Office:

Your experience in this class is important to me. It is the policy and practice of St. Lawrence University to create inclusive and accessible learning environments consistent with federal and state law. If you have established accommodations with the Student Accessibility Services Office in the past, please activate your accommodations so we can discuss how they will be implemented in this course.

If you have not yet established services through the Student Accessibility Services Office but have a temporary health condition or permanent disability that requires accommodations (conditions include but not limited to; mental health, attention-related, learning, vision, hearing, physical or health impacts), please contact the Student Accessibility Services Office directly to set up a meeting. The Student Accessibility Services Office will work with you on the interactive process that establishes reasonable accommodations.

Color Vision Deficiency: The Student Accessibility Services office can loan glasses for students who are color vision deficient. Please contact the office to make an appointment.

For more specific information about setting up an appointment with Student Accessibility Services please see the options listed below:

Telephone: 315.229.5537

Email: studentaccessibility@stlawu.edu

Website: https://www.stlawu.edu/offices/student-accessibility-services



Academic Dishonesty

Academic dishonesty will not be tolerated. Any specific policies for this course are supplementary to the

Honor Code. According to the St. Lawrence University Academic Honor Policy,

  1. It is assumed that all work is done by the student unless the instructor/mentor/employer gives specific permission for collaboration.
  2. Cheating on examinations and tests consists of knowingly giving or using or attempting to use unauthorized assistance during examinations or tests.
  3. Dishonesty in work outside of examinations and tests consists of handing in or presenting as original work which is not original, where originality is required.

Claims of ignorance and academic or personal pressure are unacceptable as excuses for academic dishonesty. Students must learn what constitutes one’s own work and how the work of others must be acknowledged.

For more information, refer to www.stlawu.edu/acadaffairs/academic_honor_policy.pdf.

To avoid academic dishonesty, it is important that you follow all directions and collaboration rules and ask for clarification if you have any questions about what is acceptable for a particular assignment or exam. If I suspect academic dishonesty, a score of zero will be given for the entire assignment in which the academic dishonesty occurred for all individuals involved and Academic Honor Council will be notified. If a pattern of academic dishonesty is found to have occurred, a grade of 0.0 for the entire course can be given.

It is important to work in a way that maximizes your learning. Be aware that students who rely too much on others for the homework and projects tend to do poorly on the quizzes and exams.

Please note that in addition the above, any assignments in which your score is reduced due to academic dishonesty will not be dropped according to the quiz policy e.g., if you receive a zero on a quiz because of academic dishonesty, it will not be dropped from your grade.



Tentative Schedule

Week Date Topics
0 8/28 Introduction to R, R Studio
1 9/2 Graphics with ggplot2
2 9/9 Data Wrangling with dplyr
3 9/16 Communication with Quarto and ggplot2
4 9/23 Soft Skills and Workflow
5 9/30 Data Tidying with tidyr
6 10/7 Base R
7 10/14 Factors with forcats and Data Ethics
8 10/21 Data Import with readr, jsonlite, rvest, and tibble
9 10/28 Data Merging with dplyr
10 11/4 Intro to Statistical/Machine Learning with knn
11 11/11 Text Data with tidytext and stringr
12 11/18 Dates and Times with lubridate
13 12/2 Connections to STAT and CS
14 12/9 Databases and SQL with dbplyr