The goal of ‘meltr’ is to provide a fast and friendly way to read non-rectangular data (like ragged forms of ‘csv’, ‘tsv’, and ‘fwf’).
Standard tools like readr::read_csv()
can cope to some extent with unusual inputs, like files with empty rows or newlines embedded in strings. But some files are so wacky that standard tools don’t work at all, and instead you have to take the file to pieces and reassemble to get structured data you can work with.
The meltr package provides tools to do this.
You can install the released version of meltr from CRAN with:
install.packages("meltr")
Or you can install the development version with:
# install.packages("devtools")
devtools::install_github("r-lib/meltr")
Here’s a contrived example that breaks two assumptions made by common tools like readr::read_csv()
.
In contrast, the melt_csv()
function reads the file one cell at a time, importing each cell of the file into a whole row of the final data frame.
writeLines("Help,,007,I'm
1960-09-30,FALSE,trapped in,7,1.21
non-rectangular,data,NA", "messy.csv")
library(meltr)
melt_csv("messy.csv")
#> # A tibble: 12 × 4
#> row col data_type value
#> <dbl> <dbl> <chr> <chr>
#> 1 1 1 character Help
#> 2 1 2 missing <NA>
#> 3 1 3 character 007
#> 4 1 4 character I'm
#> 5 2 1 date 1960-09-30
#> 6 2 2 logical FALSE
#> 7 2 3 character trapped in
#> 8 2 4 integer 7
#> 9 2 5 double 1.21
#> 10 3 1 character non-rectangular
#> 11 3 2 character data
#> 12 3 3 missing <NA>
The output of melt_csv()
gives us:
data_type
column merely gives meltr’s best guess of what the data types ought to be.What are some ways you can you use this? To begin with, you can do some simple manipulations with ordinary functions.
For example you could extract the words.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
data <- melt_csv("messy.csv")
data %>%
filter(data_type == "character")
#> # A tibble: 6 × 4
#> row col data_type value
#> <dbl> <dbl> <chr> <chr>
#> 1 1 1 character Help
#> 2 1 3 character 007
#> 3 1 4 character I'm
#> 4 2 3 character trapped in
#> 5 3 1 character non-rectangular
#> 6 3 2 character data
Or find if there are missing entries.