Vector Look-Ups and Safer Sampling
A collection of utility functions that facilitate looking up vector values from a lookup table, annotate values in at table for clearer viewing, and support a safer approach to vector sampling, sequence generation, and aggregation.
Installation
You can install the released version of zmisc
from CRAN with:
install.packages("zmisc")
You can use pak
to install the development version of
zmisc
from GitHub with:
pak::pak("torfason/zmisc")
Quick and easy value lookups
The functions lookup()
and lookuper()
are used to look up values from a lookup table, which can be supplied as
a vector
, a list
, or a
data.frame
. The functions are in some ways similar to the
Excel function VLOOKUP()
, but are designed to work smoothly
in an R workflow, in particular within pipes.
lookuper: Get the character representation of a labelled variable
Returns a character representation of a labelled variable, using the value labels to look up the label for a given value.
The default behavior of this function is similar to
[labelled::to_character()]. The options, however, are slightly
different. Most importantly, instead of specifying NA
handling using parameters, the function relies on the
default
parameter to determine what happens for unlabelled
variables, allowing users to specify including the original values of
x
instead of the labels, returning NA
, or
returning a specific string value. Also, the default behavior is to drop
any variable label attribute, in line with the default [as.character()]
method.
Safer sampling, sequencing and aggregation
The functions zample(), zeq(), and zingle() are intended to make your code less likely to break in mysterious ways when you encounter unexpected boundary conditions. The zample() and zeq() are almost identical to the sample() and seq() functions, but a bit safer.
zeq: Recode values using tilde syntax
Recode elements of a vector using a series of formulas
(lhs ~ rhs
) passed via ...
. Each
lhs
is matched against elements of x
, and the
corresponding rhs
provides the new value.
This function is closely based on [dplyr::case_match()] with minimal
changes to make it more intuitive for re-coding tasks. In particular,
rather than setting unmatched values to NA
by default, they
remain unchanged .default
, which itself defaults to
x
. The output type can be controlled with
.ptype
. .ptype
defaults to
.default
, which means that type can be changed by setting
.default
to either NA
or to a value of the
same type as the rhs
formula values. Incompatibility
between the rhs
values and the .ptype
results
in a type error.
Examples
recode_tilde(letters, "a" ~ "first", "z" ~ "last")
recode_tilde(1:5, 1 ~ 10, 2 ~ 20)
# Recoding to different type requires explicit .default values
recode_tilde(1:4, 1 ~ "low", 2 ~ "medium", 3 ~ "high", .default = NA)
zingle: Sample from a vector in a safe way
The zample()
function duplicates the functionality of sample(), with the
exception that it does not attempt the (sometimes dangerous)
user-friendliness of switching the interpretation of the first element
to a number if the length of the vector is 1. zample()
always treats its first argument as a vector containing
elements that should be sampled, so your code won’t break in unexpected
ways when the input vector happens to be of length 1.
Examples
# For vectors of length 2 or more, zample() and sample() are identical
set.seed(42); zample(7:11)
set.seed(42); sample(7:11)
# For vectors of length 1, zample() will still sample from the vector,
# whereas sample() will "magically" switch to interpreting the input
# as a number n, and sampling from the vector 1:n.
set.seed(42); zample(7)
set.seed(42); sample(7)
# The other arguments work in the same way as for sample()
set.seed(42); zample(7:11, size=13, replace=TRUE, prob=(5:1)^3)
set.seed(42); sample(7:11, size=13, replace=TRUE, prob=(5:1)^3)
# Of course, sampling more than the available elements without
# setting replace=TRUE will result in an error
set.seed(42); tryCatch(zample(7, size=2), error=wrap_error)
Getting a better view on variables
The notate()
function adds annotations to factor
and
labelled
variables that make it easier to see both values
and labels/levels when using the View() function
notate: Construct lookup function based on a specific lookup table
The lookuper() function returns a function equivalent to the lookup() function, except that instead of taking a lookup table as an argument, the lookup table is embedded in the function itself.
This can be very useful, in particular when using the lookup function
as an argument to other functions that expect a function which maps
character
->character
(or other data types),
but do not offer a good way to pass additional arguments to that
function.