While reading through Brian Espinoza’s introduction to stringr
, I thought I would test some speeds.
library(magrittr)
library(stringr)
library(microbenchmark)
library(data.table)
library(ggplot2)
source("~/Dropbox/Research/.Rprofile")
reps <- 1e1
After loading the libraries I need, I figured I also need a big list of strings. I’ll use my list of first names compiled from various sources. I turn that into a character vector with 1119 strings.
# load the firstname database
fn_db <- fread("~/Dropbox/Research/_knowledgebase/censuslink/out/fn_links_men_5.csv") %>%
.[, list(name1)] %>%
unique() %>%
as.vector() %>%
unlist() %>%
unname()
microbenchmark(
str_to_upper(fn_db),
toupper(fn_db),
str_to_lower(fn_db),
tolower(fn_db),
str_to_title(fn_db),
times = reps
) %>%
autoplot()
Seems like stringr
is faster (though not always). str_to_title
is slow but there’s nothing to compare it to in base R. (Not that I’m complaining, but the proper
command in stata
works just fine.)
microbenchmark(
str_c(fn_db, sep = ""),
paste(fn_db, sep = ""),
paste0(fn_db),
str_length(fn_db),
nchar(fn_db),
times = reps
) %>%
autoplot()
Again, win for stringr
, especially when counting string lengths!
microbenchmark(
str_sub(fn_db, start = 3),
str_sub(fn_db, start = 3, end = 5),
substr(fn_db, start = 3, stop = 5),
times = reps
) %>%
autoplot()
Close but stringr
wins (and only str_sub
works without the end
or stop
option, even if it is slower).
The rest of the introduction to stringr
from Brian goes through a lot of regular expression commands (str_replace
, str_detect
, str_extract
, etc). I will have to take it on faith that these are faster than their base counterparts but they are also much easier to use and write than the base alternatives as well.