Vectorization

This commit is contained in:
Jose 2022-10-27 12:56:36 -03:00
parent 2512eccd4b
commit 76873af5a9
2 changed files with 104 additions and 0 deletions

View File

@ -17,6 +17,10 @@ data using the R programming environment
* [[./script/extractpdf.R][Reading pdf files]]
** Vectorization
* [[./script/vectorization.R][Vectorization]]
** Iteration
* [[./script/iteration.R][Lapply, apply and for loop: brief introduction]]

100
script/vectorization.R Executable file
View File

@ -0,0 +1,100 @@
#' ---
#' title: "What is vectorization in R?"
#' date: "2021-11-03"
#' author: "Jose https://ajuda.multifarm.top"
#' output:
#' html_document:
#' code_folding: show
#' toc: yes
#' toc_float:
#' smooth_scroll: true
#' df_print: paged
#' highlight: zenburn
#' ---
#' One operation that is slow in R, and somewhat slow in all languages, is memory allocation. So one of the slower ways to write a for loop is to resize a vector repeatedly, so that R has to re-allocate memory repeatedly, like this:
j <- 1
system.time(for (i in 1:10) {
j[i] = 10
})
n <- 1:10
j <- 1
system.time(for (i in seq_along(n)) {
j[i] = 10
})
fxn <- function(j){
for (i in 1:10) {
j[i] = 10
}
return(j)
}
system.time(fxn(j))
#' Here, in each repetition of the for loop, R has to re-size the vector and re-allocate memory. It has to find the vector in memory, create a new vector that will fit more data, copy the old data over, insert the new data, and erase the old vector. This can get very slow as vectors get big.
#' If one pre-allocates a vector that fits all the values, R doesnt have to re-allocate memory each iteration, and the results can be much faster. Heres how youd do that for the above case:
j <- rep(NA, 10)
system.time(for (i in 1:10) {
j[i] = 10
})
j <- rep(NA, 10)
system.time(for (i in seq_along(1:10)) {
j[i] = 10
})
## There are still situations that it may make sense to use for loops instead of vectorized functions, though. These include:
## Using functions that dont take vector arguments
## Loops where each iteration is dependent on the results of previous iterations
## Note that the second case is tricky. In some cases where the obvious implementation of an algorithm uses a for loop, theres a vectorized way around it. For instance, here is a good example of implementing a random walk using vectorized code. In these cases, you often want to call functions that are essentially C/FORTRAN implementations of loop operations to avoid the loop in R. Examples of such functions include cumsum (cumulative sums), rle (counting number of repeated value), and ifelse (vectorized if…else statements).
#' ## Using rle
#' Compute the lengths and values of runs of equal values in a vector
# - or the reverse operation.
#' Building data
x <- c("952345172", "alju12", "amou79", "amou91", "baab81", NA)
code <- rep(x, c(5, 10, 10, 20, 2, 7))
df <- data.frame(id = 1:length(code), code)
rle_code <- rle(df$code)
class(rle_code)
attributes(rle_code)
rle_code$values
rle_code$lengths
rle_code$values > 6
rle_code[rle_code$lengths > 6]
rle_code[[1]] > 6
inverse.rle(rle_code)
#' ## Using 'cumsum'
#'
#' ## Using 'ifelse'
?do.call