11 Data Import

Goals:

Use readr to read in data to R from .csv, .txt, and .tsv files.
Use rvest to scrape data from public websites.
Use jsonlite to read in data in JSON (Java Script Object Notation) files.

11.1 `readr` to Read in Data

Up to now, we have mostly worked with data that was “R Ready”: meaning that it was in a nice .csv file that could be read into R easily with read_csv() from the readr package. We will begin by looking at some options in the read_csv() function and then move into formats other than .csv that data are commonly stored in.

11.1.1 `read_csv()` Options

The mtcarsex.csv has observations on different car models with variables that include things like gas mileage, number of cylinders, etc. Read in the mtcarsex.csv data set with the following code. Then, examine the data set with head().

library(tidyverse)
library(here)
cars_df <- read_csv(here("data/mtcarsex.csv"))
head(cars_df)
#> # A tibble: 6 × 11
#>   This is …¹ ...2  ...3  ...4  ...5  ...6  ...7  ...8  ...9 
#>   <chr>      <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 "I'm a na… <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
#> 2 "mpg"      cyl   disp  hp    drat  wt    qsec  vs    am   
#> 3  <NA>      <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
#> 4  <NA>      <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
#> 5 "-999"     6     160   110   3.9   2.62  16.46 0     1    
#> 6 "21"       6     160   110   3.9   2.875 17.02 0     1    
#> # … with 2 more variables: ...10 <chr>, ...11 <chr>, and
#> #   abbreviated variable name
#> #   ¹`This is a data set about cars.`

What do you notice about the data set that seems odd? Open the .csv file with Excel or some other program to examine the data set outside of R.

Type in ?read_csv in the bottom-left window and look at some of the options in read_csv(). In particular, we will use the na and the skip arguments to fix up our reading.

Let’s start with skip so that we aren’t reading in the first two rows of the data set:

cars_df <- read_csv(here("data/mtcarsex.csv"), skip = 2)
## first two lines will be skipped
head(cars_df)
#> # A tibble: 6 × 11
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1   NA      NA    NA    NA NA    NA     NA      NA    NA
#> 2   NA      NA    NA    NA NA    NA     NA      NA    NA
#> 3 -999       6   160   110  3.9   2.62  16.5     0     1
#> 4   21       6   160   110  3.9   2.88  17.0     0     1
#> 5   22.8     4   108    93  3.85  2.32  18.6     1     1
#> 6   21.4     6   258   110  3.08  3.22  19.4     1     0
#> # … with 2 more variables: gear <dbl>, carb <dbl>

That looks better, but there are still a couple of problems. What do you notice?

Go the help and read about the na argument. Let’s add that as an option to fix the missing value issue.

cars_df <- read_csv(here("data/mtcarsex.csv"), na = c("NA", "-999"), skip = 2)
head(cars_df)
#> # A tibble: 6 × 11
#>     mpg   cyl  disp    hp  drat    wt  qsec    vs    am
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  NA      NA    NA    NA NA    NA     NA      NA    NA
#> 2  NA      NA    NA    NA NA    NA     NA      NA    NA
#> 3  NA       6   160   110  3.9   2.62  16.5     0     1
#> 4  21       6   160   110  3.9   2.88  17.0     0     1
#> 5  22.8     4   108    93  3.85  2.32  18.6     1     1
#> 6  21.4     6   258   110  3.08  3.22  19.4     1     0
#> # … with 2 more variables: gear <dbl>, carb <dbl>

Now look at the classes of each variable. Which classes look like they are incorrect?

We’ve talked about how to re-specify classes of variables using mutate() and the as.factor() or as.Date() or as.numeric() functions, but sometimes it’s easier just to respecify the class when we are reading in the data. Notice how, when we use read_csv(), R gives us a message about each of the column types. This is actually an argument in read_csv() called col_types. We can add a |> spec() piping statement after a read_csv() statement to tell R to print the col_types so that it’s easy for us to copy and paste it into read_csv() and change any classes.

read_csv(here("data/mtcarsex.csv"), na = c("NA", "-999"), skip = 2) |>
  spec()
#> Rows: 34 Columns: 11
#> ── Column specification ────────────────────────────────────
#> Delimiter: ","
#> dbl (11): mpg, cyl, disp, hp, drat, wt, qsec, vs, am, ge...
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> cols(
#>   mpg = col_double(),
#>   cyl = col_double(),
#>   disp = col_double(),
#>   hp = col_double(),
#>   drat = col_double(),
#>   wt = col_double(),
#>   qsec = col_double(),
#>   vs = col_double(),
#>   am = col_double(),
#>   gear = col_double(),
#>   carb = col_double()
#> )

For example, notice how cyl = col_double() is changed to cyl = col_factor() in the code chunk below:

cars_df <- read_csv(here("data/mtcarsex.csv"), na = c(NA, "-999"), skip = 2,
  col_types = cols(
  mpg = col_double(),
  cyl = col_factor(),
  disp = col_double(),
  hp = col_double(),
  drat = col_double(),
  wt = col_double(),
  qsec = col_double(),
  vs = col_factor(),
  am = col_double(),
  gear = col_double(),
  carb = col_double()
))

Finally, there are two rows with all missing values. These aren’t providing anything useful so we can slice() them out:

cars_df <- read_csv(here("data/mtcarsex.csv"), na = c("NA", "-999"), skip = 2,
  col_types = cols(
  mpg = col_double(),
  cyl = col_factor(),
  disp = col_double(),
  hp = col_double(),
  drat = col_double(),
  wt = col_double(),
  qsec = col_double(),
  vs = col_factor(),
  am = col_double(),
  gear = col_double(),
  carb = col_double()
)) |>
  slice(-(1:2))
head(cars_df)
#> # A tibble: 6 × 11
#>     mpg cyl    disp    hp  drat    wt  qsec vs       am
#>   <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
#> 1  NA   6       160   110  3.9   2.62  16.5 0         1
#> 2  21   6       160   110  3.9   2.88  17.0 0         1
#> 3  22.8 4       108    93  3.85  2.32  18.6 1         1
#> 4  21.4 6       258   110  3.08  3.22  19.4 1         0
#> 5  NA   8       360   175  3.15  3.44  17.0 0         0
#> 6  18.1 6       225   105  2.76  3.46  20.2 1         0
#> # … with 2 more variables: gear <dbl>, carb <dbl>

There are many other possible file formats for data storage. For example, there is a data set called oscars.tsv, which is a tab-separated file. You can read it in with read_tsv() instead of read_csv().

oscars_df <- read_tsv(here("data/oscars.tsv"))
head(oscars_df)
#> # A tibble: 6 × 51
#>   FilmName    Oscar…¹ Durat…² Rating Direc…³ Direc…⁴ Oscar…⁵
#>   <chr>         <dbl>   <dbl>  <dbl> <chr>     <dbl>   <dbl>
#> 1 Crash          2006     113      4 Haggis        0       1
#> 2 Brokeback …    2006     134      4 Lee           0       0
#> 3 Capote         2006     114      4 Miller        0       0
#> 4 Good Night…    2006      93      2 Clooney       0       0
#> 5 Munich         2006     164      4 Spielb…       0       0
#> 6 The Depart…    2007     151      4 Scorse…       0       1
#> # … with 44 more variables: GenreName <chr>,
#> #   Genre_Drama <dbl>, Genre_Bio <dbl>, CountryName <chr>,
#> #   ForeignandUSA <dbl>, ProductionName <chr>,
#> #   ProductionCompany <dbl>, BudgetRevised <chr>,
#> #   Budget <chr>, DomesticBoxOffice <dbl>,
#> #   WorldwideRevised <dbl>, WorldwideBoxOffice <dbl>,
#> #   DomesticPercent <dbl>, LimitedOpeningWnd <dbl>, …

You’ll be able to work with .txt files and Excel files in the Exercises. Check out https://rawgit.com/rstudio/cheatsheets/master/data-import.pdf for a data import cheatsheet.

The final issue that we will discuss in this section occurs when a data set has units within its cells. Consider the earlier example that we used in the reprex section:

test_df <- read_csv(here("data/parsedf.csv"))
head(test_df)
#> # A tibble: 3 × 2
#>   x                   y
#>   <chr>           <dbl>
#> 1 20,000 dollars      1
#> 2 40 dollars          2
#> 3 only 13 dollars     3

The parse_number() function is really useful if you just want the number (no commas, no units, etc.). The function is often paired with mutate() since we are creating a new variable:

test_df |> mutate(x2 = parse_number(x))
#> # A tibble: 3 × 3
#>   x                   y    x2
#>   <chr>           <dbl> <dbl>
#> 1 20,000 dollars      1 20000
#> 2 40 dollars          2    40
#> 3 only 13 dollars     3    13

11.1.2 Exercises

Exercises marked with an * indicate that the exercise has a solution at the end of the chapter at 11.5.

* The birthdays.txt file has information on the birthdays of various animals on my Animal Crossing island. There are also columns for the Animal’s Name, Animal Type, and how long the animal has lived on the island (in weeks). Click on the file to open it to look at the format of the data.

Start with the following code chunk and use the options of read_delim() to read in the data (?read_delim). The delim argument that’s already provided specifies that the delimiter (separator) that you’ll use is a -, as opposed to, for example, a , in a .csv file. Arguments that you may need to change include

skip
col_names
na
trim_ws
col_types

library(tidyverse)
df <- read_delim(here("data/birthdays.txt"), delim = " - ")
head(df)

* Another common format for data to be stored in is an Excel file. Often, it’s easiest just to save the Excel file as a .csv file and read it in using read_csv(). But, sometimes this route can be difficult (for example, if your Excel file has thousands of sheets). To read in directly from Excel, you’ll need to install the readxl with install.packages("readxl"). Once installed, load the package with library(readxl), and read in the first sheet evals_prof.xlsx data set, a similar data set as the one that will be used for Project 2, with the read_excel() function.
* Now, read in the second sheet in the Excel file, using the help file for ?read_excel to change one of the arguments.

11.2 Data Scraping with `rvest`

Sometimes, you might want data from a public website that isn’t provided in a file format. To obtain this data, you’ll need to use web scraping, a term which just means “getting data from a website.” The easiest way to do this in R is with the rvest package. Note that we could spend an entire semester talking about web scraping, but we will focus only on websites where the scraping of data is “easy” and won’t give us any major errors.

Go to the following website and suppose that you wanted the table of gun violence statistics in R: https://en.wikipedia.org/wiki/Gun_violence_in_the_United_States_by_state. You could try copy-pasting the table into Excel and reading the data set in with read_excel(). Depending on the format of the table, that strategy may work but it may not. Another way is to scrape it directly with rvest. Additionally, if the website continually updates (standings for a sports league, enrollment data for a school, best-selling products for a company, etc.), then scraping is much more convenient, as you don’t need to continually copy-paste for updated data.

In the following code chunk, read_html() reads in the entire html file from the url provided while html_nodes() extracts only the tables on the website.

library(tidyverse)
library(rvest)

## provide the URL and name it something (in this case, url).
url <- "https://en.wikipedia.org/wiki/Gun_violence_in_the_United_States_by_state"

## convert the html code into something R can read
h <- read_html(url)

## grabs the tables
tab <- h |> html_nodes("table")

You’ll see that, for this example, there are 3 tables provided. The tables are stored in a list and we can reference the first table using [[1]], the second table using [[2]], etc. For the purposes of this class, we will figure out which of the 3 tables is the one we actually want using trial and error.

The html_table() function converts the table into a data.frame object.

test1 <- tab[[1]] |> html_table()
test2 <- tab[[2]] |> html_table()
test3 <- tab[[3]] |> html_table()

head(test1)
head(test2)
head(test3)

Which of the 3 tables is the one that we would want to use for an analysis on gun violence in the United States?

As another example, consider scraping data from SLU’s athletics page. In particular, suppose we want to do an analysis on SLU’s baseball team.

Go to the following website to look at the table of data that we want to scrape: https://saintsathletics.com/sports/baseball/stats/2021.

After looking at the website, use the following code to scrape the data set.

url <- "https://saintsathletics.com/sports/baseball/stats/2021"
h <- read_html(url)
tab <- h |> html_nodes("table")
tab
obj <- tab[[1]] |> html_table(fill = TRUE)
head(obj)
tail(obj)
obj2 <- tab[[2]] |> html_table(fill = TRUE)
head(obj2)
tail(obj2)

There’s now 72 different tables! See if you can figure out where the first few tables are coming from on the website.

11.2.1 Exercises

Exercises marked with an * indicate that the exercise has a solution at the end of the chapter at 11.5.

Choose a topic/person/place/etc. that interests you that has tables on Wikipedia and scrape the table that is related to that topic.
* SLU keeps track of diversity of faculty through time and makes this data public on the following website: https://www.stlawu.edu/offices/institutional-research/faculty-diversity. Use rvest to scrape the data tables into R.

Hint: You may need to use an extra argument in html_table() like fill.

11.3 JSON Files with `jsonlite`

A final common data format that we will discuss is JSON (JavaScript Object Notation). We will only cover the very basics of JSON data and use the jsonlite package in R to read in some .json files. JSON files are read in to R as a list object.

11.3.1 Everything Working Well

First, consider data from the mobile game Clash Royale. Install the jsonlite package and then use it to read in the json file with the function fromJSON():

## install.packages("jsonlite")
library(jsonlite)
cr_cards <- fromJSON(here("data/clash_royale_card_info.json"))

You should get a warning message, which we will investigate in class.

Next, type View(cr_cards) in your console (bottom-left) window to look at the data. See if you can pull out the data set by clicking on some things in the View() window.

The following give a couple of ways to grab the data using code. The as_tibble() function converts a rectangular object into our familiar tibble.

The first option specifies the name of the table that’s in the JSON file (in this case, the name is "cards"):

library(tidyverse)
cr_cards_flat <- cr_cards[["cards"]]
cr_cards_df <- as_tibble(cr_cards_flat)
head(cr_cards_df)
#> # A tibble: 6 × 8
#>   key     name      elixir type  rarity arena descr…¹     id
#>   <chr>   <chr>      <int> <chr> <chr>  <int> <chr>    <int>
#> 1 knight  Knight         3 Troop Common     0 A toug… 2.6 e7
#> 2 archers Archers        3 Troop Common     0 A pair… 2.60e7
#> 3 goblins Goblins        2 Troop Common     1 Three … 2.60e7
#> 4 giant   Giant          5 Troop Rare       0 Slow b… 2.60e7
#> 5 pekka   P.E.K.K.A      7 Troop Epic       4 A heav… 2.60e7
#> 6 minions Minions        3 Troop Common     0 Three … 2.60e7
#> # … with abbreviated variable name ¹description

The second method uses the flatten() function from the purrr package, the only package in the core tidyverse that we do not talk about in detail in this class. There is also a different flatten() function in the jsonlite package. In the code below, we specify that we want to use flatten() from purrr with purrr::flatten(). If we wanted to use flatten() from jsonlite, we’d use jsonlite::flatten()

cr_cards_flat2 <- purrr::flatten(cr_cards)
cr_cards_df2 <- as_tibble(cr_cards_flat2)
head(cr_cards_df2)
#> # A tibble: 6 × 8
#>   key     name      elixir type  rarity arena descr…¹     id
#>   <chr>   <chr>      <int> <chr> <chr>  <int> <chr>    <int>
#> 1 knight  Knight         3 Troop Common     0 A toug… 2.6 e7
#> 2 archers Archers        3 Troop Common     0 A pair… 2.60e7
#> 3 goblins Goblins        2 Troop Common     1 Three … 2.60e7
#> 4 giant   Giant          5 Troop Rare       0 Slow b… 2.60e7
#> 5 pekka   P.E.K.K.A      7 Troop Epic       4 A heav… 2.60e7
#> 6 minions Minions        3 Troop Common     0 Three … 2.60e7
#> # … with abbreviated variable name ¹description

Both methods give a tibble that we can then use our usual tidyverse tools ggplot2, dplyr, tidyr, etc. on.

11.3.2 Things Aren’t Always So Easy

Now let’s try to look at some animal crossing data that were obtained from https://github.com/jefflomacy/villagerdb. We first just want to look at the data from one individual villager (ace) in the file ace.json.

acedata <- fromJSON(here("data/ace.json"))
aceflat <- purrr::flatten(acedata)
head(aceflat)
#> $gender
#> [1] "male"
#> 
#> $species
#> [1] "bird"
#> 
#> $birthday
#> [1] "3-13"
#> 
#> $ac
#> $ac$personality
#> [1] "jock"
#> 
#> $ac$clothes
#> [1] "spade-shirt"
#> 
#> $ac$song
#> [1] "K.K. Parade"
#> 
#> $ac$phrase
#> [1] "ace"
#> 
#> 
#> $`afe+`
#> $`afe+`$personality
#> [1] "jock"
#> 
#> $`afe+`$clothes
#> [1] "spade-shirt"
#> 
#> $`afe+`$song
#> [1] "K.K. Parade"
#> 
#> 
#> $name
#> [1] "Ace"

Things are now….more complicated. This example is just to show that it’s not always easy working with JSON data. Lists can be nested and that creates problems when trying to convert a deeply nested list into our “rectangular” format that’s easy to work with.

There’s also the added problem of reading in the .json files from all villagers at the same time We could do this with a for loop or a mapping function from purrr to download and read in the JSON files for all villagers. We won’t delve any more deeply into this, but there’s a lot more to all of the file formats that we’ve discussed this week, particularly web scraping and .json files.

11.3.3 Exercises

Exercises marked with an * indicate that the exercise has a solution at the end of the chapter at 11.5.

* Read in the pokedex.json file, a data set that has information on the 151 original Pokemon. Then, use the flatten() function from purrr to flatten the list.
* Use as_tibble() to convert your flattened list to a tibble.
* Use parse_number() with mutate() to tidy two of the variables in the data set.
* Look at the type variable. What looks odd about it? What happens when you try to use it, either in a plot, or using a dplyr function?

You can unnest() the Type variable with the unnest() function from tidyr. We didn’t discuss this function but feel free to read about it with ?unnest

pokemon_unnest <- unnest(pokemon_df, cols = c(type))

There are 6 pokemon with a spawn_chance of 0. Figure out what these 6 pokemon are.
Figure out what the 5 most common Pokemon types are in the first generation (you’ll need to use the unnest()-ed data set for this: why?).

11.4 Chapter Exercises

Exercises marked with an * indicate that the exercise has a solution at the end of the chapter at 11.5.

Choose a sports team at SLU, and go to that team’s website (by simply googling SLU name_of_sport). Scrape the data tables from the “Results” or “Statistics” section of this sport. After you scrape the data, tidy the data set. Then, choose one of the following options (different options might make more/less sense for different sports)

(a). Summarise different team statistics, either numerically or graphically. Perhaps make some graphs showing different statistics through time.

(b). Summarise different individual statistics, either numerically or graphically.

(c). Ask and answer any other questions that make sense for the particular sport that you are looking at!

Note: A few sports (men’s and women’s golf, for example), give results in PDF format. PDF format is generally a horrible way to record and share data, as it’s very difficult to read in to almost any program. Therefore, avoid sports with PDF results for the purposes of this exercise.

11.5 Exercise Solutions

11.5.1 `readr` S

* The birthdays.txt file has information on the birthdays of various animals on my Animal Crossing island. There are also columns for the Animal’s Name, Animal Type, and how long the animal has lived on the island (in weeks). Click on the file to open it to look at the format of the data.

skip
col_names
na
trim_ws
col_types

library(tidyverse)
df <- read_delim(here("data/birthdays.txt"), delim = " - ")
head(df)

read_delim(here("data/birthdays.txt"), delim = "-", skip = 4,
  col_names = c("Birthday", "Name",
    "Animal", "Island"),
  trim_ws = TRUE,
  col_types = list(
    col_character(), col_character(), col_character(), col_number()
  ), na = c("N/A", "?"))

* Another common format for data to be stored in is an Excel file. Often, it’s easiest just to save the Excel file as a .csv file and read it in using read_csv(). But, sometimes this route can be difficult (for example, if your Excel file has thousands of sheets). To read in directly from Excel, you’ll need to install the readxl with install.packages("readxl"). Once installed, load the package with library(readxl), and read in the first sheet evals_prof.xlsx data set with the read_excel() function.

## install.packages("readxl")
library(readxl)
read_excel(here("data/evals_prof.xlsx"))

* Now, read in the second sheet, using the help file for ?read_excel to change one of the arguments.

read_excel(here("data/evals_prof.xlsx"), sheet = 2)

11.5.2 `rvest` and Data Scraping S

* SLU keeps track of diversity of faculty through time and makes this data public on the following website: https://www.stlawu.edu/ir/diversity/faculty. Use rvest to scrape the data tables into R.

url <- "https://www.stlawu.edu/ir/diversity/faculty"
h <- read_html(url)
tab <- h |> html_nodes("table")
obj <- tab[[1]] |> html_table(fill = TRUE)
obj

11.5.3 JSON with `jsonlite` S

* Read in the pokedex.json file, a data set that has information on the 151 original Pokemon. Then, use the flatten() function from purrr to flatten the list.

library(jsonlite)
pokedex <- fromJSON(here("data/pokedex.json"))
df <- purrr::flatten(pokedex)

* Use as_tibble() to convert your flattened list to a tibble.

pokemon_df <- as_tibble(df)

* Use parse_number() with mutate() to tidy two of the variables in the data set.

pokemon_df <- pokemon_df |> mutate(height = parse_number(height),
                                    weight = parse_number(weight))

* Look at the type variable. What looks odd about it? What happens when you try to use it, either in a plot, or using a dplyr function?

## it's a variable of lists....this is happening because some 
## pokemon have more than one type.

## most ggplot() and dplyr() functions won't work, or
## won't work as you'd expect

You can unnest() the Type variable with the unnest() function from tidyr. We didn’t discuss this function but feel free to read about it with ?unnest

pokemon_unnest <- unnest(pokemon_df, cols = c(type))

11.5.4 Chapter Exercises S

11.6 Non-Exercise `R` Code

library(tidyverse)
library(here)
cars_df <- read_csv(here("data/mtcarsex.csv"))
head(cars_df)
cars_df <- read_csv(here("data/mtcarsex.csv"), skip = 2)
## first two lines will be skipped
head(cars_df)
cars_df <- read_csv(here("data/mtcarsex.csv"), na = c("NA", "-999"), skip = 2)
head(cars_df)
cars_df <- read_csv(here("data/mtcarsex.csv"), na = c(NA, "-999"), skip = 2,
  col_types = cols(
  mpg = col_double(),
  cyl = col_factor(),
  disp = col_double(),
  hp = col_double(),
  drat = col_double(),
  wt = col_double(),
  qsec = col_double(),
  vs = col_factor(),
  am = col_double(),
  gear = col_double(),
  carb = col_double()
))
cars_df <- read_csv(here("data/mtcarsex.csv"), na = c("NA", "-999"), skip = 2,
  col_types = cols(
  mpg = col_double(),
  cyl = col_factor(),
  disp = col_double(),
  hp = col_double(),
  drat = col_double(),
  wt = col_double(),
  qsec = col_double(),
  vs = col_factor(),
  am = col_double(),
  gear = col_double(),
  carb = col_double()
)) |>
  slice(-(1:2))
head(cars_df)
oscars_df <- read_tsv(here("data/oscars.tsv"))
head(oscars_df)
test_df <- read_csv(here("data/parsedf.csv"))
head(test_df)
test_df |> mutate(x2 = parse_number(x))
library(tidyverse)
library(rvest)

## provide the URL and name it something (in this case, url).
url <- "https://en.wikipedia.org/wiki/Gun_violence_in_the_United_States_by_state"

## convert the html code into something R can read
h <- read_html(url)

## grabs the tables
tab <- h |> html_nodes("table")
test1 <- tab[[1]] |> html_table()
test2 <- tab[[2]] |> html_table()
test3 <- tab[[3]] |> html_table()

head(test1)
head(test2)
head(test3)
url <- "https://saintsathletics.com/sports/baseball/stats/2021"
h <- read_html(url)
tab <- h |> html_nodes("table")
tab
obj <- tab[[1]] |> html_table(fill = TRUE)
head(obj)
tail(obj)
obj2 <- tab[[2]] |> html_table(fill = TRUE)
head(obj2)
tail(obj2)
## install.packages("jsonlite")
library(jsonlite)
cr_cards <- fromJSON(here("data/clash_royale_card_info.json"))
library(tidyverse)
cr_cards_flat <- cr_cards[["cards"]]
cr_cards_df <- as_tibble(cr_cards_flat)
head(cr_cards_df)
cr_cards_flat2 <- purrr::flatten(cr_cards)
cr_cards_df2 <- as_tibble(cr_cards_flat2)
head(cr_cards_df2)
acedata <- fromJSON(here("data/ace.json"))
aceflat <- purrr::flatten(acedata)
head(aceflat)

10 Data Ethics

12 Merging with dplyr

11 Data Import

11.1 readr to Read in Data

11.1.1 read_csv() Options

11.1.2 Exercises

11.2 Data Scraping with rvest

11.2.1 Exercises

11.3 JSON Files with jsonlite

11.3.1 Everything Working Well

11.3.2 Things Aren’t Always So Easy

11.3.3 Exercises

11.4 Chapter Exercises

11.5 Exercise Solutions

11.5.1 readr S

11.5.2 rvest and Data Scraping S

11.5.3 JSON with jsonlite S

11.5.4 Chapter Exercises S

11.6 Non-Exercise R Code

11.1 `readr` to Read in Data

11.1.1 `read_csv()` Options

11.2 Data Scraping with `rvest`

11.3 JSON Files with `jsonlite`

11.5.1 `readr` S

11.5.2 `rvest` and Data Scraping S

11.5.3 JSON with `jsonlite` S

11.6 Non-Exercise `R` Code