Grouping functions (tapply, by, aggregate) and the *apply family
Whenever I want to do something "map"py in R, I usually try to use a function in the apply
family.
However, I've never quite understood the differences between them -- how {sapply
, lapply
, etc.} apply the function to the input/grouped input, what the output will look like, or even what the input can be -- so I often just go through them all until I get what I want.
Can someone explain how to use which one when?
My current (probably incorrect/incomplete) understanding is...
sapply(vec, f)
: input is a vector. output is a vector/matrix, where elementi
isf(vec[i])
, giving you a matrix iff
has a multi-element outputlapply(vec, f)
: same assapply
, but output is a list?apply(matrix, 1/2, f)
: input is a matrix. output is a vector, where elementi
is f(row/col i of the matrix)tapply(vector, grouping, f)
: output is a matrix/array, where an element in the matrix/array is the value off
at a groupingg
of the vector, andg
gets pushed to the row/col namesby(dataframe, grouping, f)
: letg
be a grouping. applyf
to each column of the group/dataframe. pretty print the grouping and the value off
at each column.aggregate(matrix, grouping, f)
: similar toby
, but instead of pretty printing the output, aggregate sticks everything into a dataframe.
Side question: I still haven't learned plyr or reshape -- would plyr
or reshape
replace all of these entirely?