Datasets are stored in data/, not as regular R objects
in the package. This means you need to document them in a slightly
different way: instead of documenting the data directly, you quote the
dataset’s name. For example, this is the roxygen2 block used for
ggplot2::diamonds:
#' Prices of over 50,000 round cut diamonds
#'
#' A dataset containing the prices and other attributes of almost 54,000
#' diamonds. The variables are as follows:
#'
#' @format A data frame with 53940 rows and 10 variables:
#' \describe{
#' \item{price}{price in US dollars ($326--$18,823)}
#' \item{carat}{weight of the diamond (0.2--5.01)}
#' \item{cut}{quality of the cut (Fair, Good, Very Good, Premium, Ideal)}
#' \item{color}{diamond colour, from D (best) to J (worst)}
#' \item{clarity}{a measurement of how clear the diamond is (I1 (worst), SI2,
#' SI1, VS2, VS1, VVS2, VVS1, IF (best))}
#' \item{x}{length in mm (0--10.74)}
#' \item{y}{width in mm (0--58.9)}
#' \item{z}{depth in mm (0--31.8)}
#' \item{depth}{total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43--79)}
#' \item{table}{width of top of diamond relative to widest point (43--95)}
#' }
#'
#' @source {ggplot2} tidyverse R package.
"diamonds"Datasets should never be exported with @export because
they are not found in the NAMESPACE. Instead, datasets will
either be automatically available if you set LazyData: true
in your DESCRIPTION, or available after calling
data() if not. This field also affects the default usage.
If you have LazyData: true, the usage will be just the
dataset name (e.g. diamonds). Otherwise, the usage will be
wrapped in data() (e.g. data(diamonds)).
Note the use of two additional tags that are particularly useful for documenting data:
@format, which gives an overview of the structure of the dataset. This should include a definition list that describes each variable. There’s currently no way to generate this with Markdown, so this is one of the few places you’ll need to Rd markup directly.@sourcewhere you got the data form, often a URL.
