#' --- #' title: "What is vectorization in R?" #' date: "2021-11-03" #' author: "Jose https://ajuda.multifarm.top" #' output: #' html_document: #' code_folding: show #' toc: yes #' toc_float: #' smooth_scroll: true #' df_print: paged #' highlight: zenburn #' --- #' One operation that is slow in R, and somewhat slow in all languages, is memory allocation. So one of the slower ways to write a for loop is to resize a vector repeatedly, so that R has to re-allocate memory repeatedly, like this: j <- 1 system.time(for (i in 1:10) { j[i] = 10 }) n <- 1:10 j <- 1 system.time(for (i in seq_along(n)) { j[i] = 10 }) fxn <- function(j){ for (i in 1:10) { j[i] = 10 } return(j) } system.time(fxn(j)) #' Here, in each repetition of the for loop, R has to re-size the vector and re-allocate memory. It has to find the vector in memory, create a new vector that will fit more data, copy the old data over, insert the new data, and erase the old vector. This can get very slow as vectors get big. #' If one pre-allocates a vector that fits all the values, R doesn’t have to re-allocate memory each iteration, and the results can be much faster. Here’s how you’d do that for the above case: j <- rep(NA, 10) system.time(for (i in 1:10) { j[i] = 10 }) j <- rep(NA, 10) system.time(for (i in seq_along(1:10)) { j[i] = 10 }) ## There are still situations that it may make sense to use for loops instead of vectorized functions, though. These include: ## Using functions that don’t take vector arguments ## Loops where each iteration is dependent on the results of previous iterations ## Note that the second case is tricky. In some cases where the obvious implementation of an algorithm uses a for loop, there’s a vectorized way around it. For instance, here is a good example of implementing a random walk using vectorized code. In these cases, you often want to call functions that are essentially C/FORTRAN implementations of loop operations to avoid the loop in R. Examples of such functions include cumsum (cumulative sums), rle (counting number of repeated value), and ifelse (vectorized if…else statements). #' ## Using rle #' Compute the lengths and values of runs of equal values in a vector # - or the reverse operation. #' Building data x <- c("952345172", "alju12", "amou79", "amou91", "baab81", NA) code <- rep(x, c(5, 10, 10, 20, 2, 7)) df <- data.frame(id = 1:length(code), code) rle_code <- rle(df$code) class(rle_code) attributes(rle_code) rle_code$values rle_code$lengths rle_code$values > 6 rle_code[rle_code$lengths > 6] rle_code[[1]] > 6 inverse.rle(rle_code) #' ## Using 'cumsum' #' #' ## Using 'ifelse' ?do.call