R Guide

Applies to: R 4.1+, Statistical Computing, Data Analysis, R Packages, Shiny Apps

Core Principles

Tidyverse First: Use tidyverse conventions for data manipulation, visualization, and functional programming; fall back to base R only when performance demands it
Vectorize Everything: Prefer vectorized operations and purrr::map() over explicit for loops; R is optimized for vector operations
Reproducibility: Every analysis must be reproducible -- use renv for dependency management, set.seed() for stochastic operations, and R Markdown/Quarto for literate programming
Functional Style: Write pure functions with no side effects; avoid modifying global state or relying on .GlobalEnv
Explicit Over Implicit: No reliance on partial matching, implicit type coercion, or positional argument passing for non-trivial functions

Guardrails

Version & Dependencies

Target R 4.1+ (native pipe |>, lambda shorthand \(x))
Manage dependencies with renv -- always commit renv.lock
For packages, declare all dependencies in DESCRIPTION (Imports:, Suggests:)
Pin CRAN snapshot dates in renv for full reproducibility
Audit new dependencies: check CRAN status, reverse dependencies, license (GPL compatibility)

Code Style

Follow the tidyverse style guide
Run styler::style_pkg() and lintr::lint_package() before every commit
Naming: snake_case for functions/variables, PascalCase for R6/S4 classes
Max line length: 80 characters
Use <- for assignment (not = outside function arguments)
Explicit library() at top of scripts; never use require()
Always use TRUE/FALSE (never T/F -- they can be overwritten)
No attach() or setwd() -- use here::here() for project-relative paths

Vectorization

Prefer vectorized operations: x * 2 not for (i in seq_along(x)) x[i] * 2
Use dplyr::mutate() / dplyr::summarise() for column-wise transformations
Use purrr::map() family for list iteration (map_dbl(), map_chr(), map_dfr())
Use dplyr::across() for applying functions to multiple columns
Reserve for loops for side effects only (writing files, API calls)
Use vapply() over sapply() when base R is required (explicit return type)

Error Handling

Use rlang::abort() / cli::cli_abort() over stop() for structured conditions
Validate inputs at the start of every exported function
Use stopifnot() or rlang::arg_match() for argument validation
Never use try() -- always tryCatch() or purrr::safely()

validate_dataframe <- function(df, required_cols) {
  if (!is.data.frame(df)) {
    cli::cli_abort("{.arg df} must be a data frame, not {.obj_type_friendly {df}}.")
  }
  missing_cols <- setdiff(required_cols, names(df))
  if (length(missing_cols) > 0) {
    cli::cli_abort(
      "Missing required column{?s}: {.field {missing_cols}}.",
      class = "validation_error"
    )
  }
  invisible(df)
}

Reproducibility

Always use set.seed() before stochastic operations; document the seed
Use renv::snapshot() after adding or updating packages
Never use absolute paths -- use here::here() for project-relative paths
Use R Markdown (.Rmd) or Quarto (.qmd) for analysis reports
Include sessioninfo::session_info() at the end of reports

Project Structure

mypackage/                          myanalysis/
├── R/             # Source files   ├── R/             # Reusable functions
│   ├── data-clean.R                ├── analysis/      # Rmd/Quarto (numbered)
│   └── utils.R                     │   ├── 01-exploration.Rmd
├── tests/                          │   └── 02-modeling.qmd
│   ├── testthat.R  # Runner        ├── data/
│   └── testthat/                   │   ├── raw/       # Immutable input
│       └── test-data-clean.R       │   └── processed/ # Generated output
├── man/            # roxygen2      ├── output/        # Figures, reports
├── vignettes/                      ├── tests/testthat/
├── data-raw/       # Data scripts  ├── renv.lock
├── DESCRIPTION                     └── README.md
├── NAMESPACE       # roxygen2
├── renv.lock
└── README.md

Use roxygen2 for all docs; never edit man/ or NAMESPACE by hand
Raw data is immutable -- store in data/raw/, process into data/processed/

Key Patterns

Tidyverse Pipe Chains

# Prefer native pipe |> (R 4.1+) over magrittr %>%
result <- raw_data |>
  dplyr::filter(year >= 2020, !is.na(revenue)) |>
  dplyr::mutate(
    revenue_m = revenue / 1e6,
    growth = (revenue - dplyr::lag(revenue)) / dplyr::lag(revenue)
  ) |>
  dplyr::summarise(
    mean_revenue = mean(revenue_m, na.rm = TRUE),
    .by = region
  )

Tidy Evaluation

# Use {{ }} (embrace) for column names passed as arguments
summarise_by <- function(df, group_col, value_col) {
  df |>
    dplyr::summarise(
      mean_val = mean({{ value_col }}, na.rm = TRUE),
      n = dplyr::n(),
      .by = {{ group_col }}
    )
}

# Use .data pronoun for string column references
filter_column <- function(df, col_name, threshold) {
  df |> dplyr::filter(.data[[col_name]] > threshold)
}

# Use across() for multiple columns
standardize_numeric <- function(df) {
  df |>
    dplyr::mutate(dplyr::across(
      where(is.numeric),
      \(x) (x - mean(x, na.rm = TRUE)) / sd(x, na.rm = TRUE)
    ))
}

ggplot2 Grammar of Graphics

plot_distribution <- function(df, x_col, fill_col = NULL) {
  ggplot2::ggplot(df, ggplot2::aes(x = {{ x_col }})) +
    ggplot2::geom_histogram(ggplot2::aes(fill = {{ fill_col }}), bins = 30, alpha = 0.7) +
    ggplot2::labs(title = "Distribution", x = NULL, y = "Count") +
    ggplot2::theme_minimal(base_size = 14)
}

Functional Programming with purrr

# Type-stable map variants -- read and combine CSV files
results <- purrr::map_dfr(file_paths, \(path) {
  readr::read_csv(path, show_col_types = FALSE) |>
    dplyr::mutate(source_file = basename(path))
})

# Safe execution -- capture errors without stopping
safe_read <- purrr::safely(readr::read_csv)
reads <- purrr::map(file_paths, safe_read)
successes <- purrr::map(purrr::keep(reads, \(x) is.null(x$error)), "result")

Testing

Standards

Use testthat 3rd edition (Config/testthat/edition: 3 in DESCRIPTION)
Test files: test-*.R (mirror source: data-clean.R -> test-data-clean.R)
Test names describe behavior: test_that("filter_active removes inactive users", ...)
Coverage target: >80% for business logic, >60% overall (measured with covr)
Use snapshot tests (expect_snapshot()) for complex output (plots, printed tables)
No test interdependencies -- each test_that() block is self-contained
Use withr::local_*() for temporary state changes (env vars, options, files)

testthat Examples

test_that("summarise_by computes correct group means", {
  df <- tibble::tibble(
    region = c("east", "east", "west", "west"),
    revenue = c(100, 200, 300, 400)
  )
  result <- summarise_by(df, region, revenue)
  expect_equal(nrow(result), 2)
  expect_equal(result$mean_val[result$region == "east"], 150)
})

test_that("validate_dataframe errors on missing columns", {
  df <- tibble::tibble(a = 1, b = 2)
  expect_error(validate_dataframe(df, c("a", "c")), class = "validation_error")
})

Tooling

Essential Commands

Rscript -e 'styler::style_pkg()'        # Format package code
Rscript -e 'lintr::lint_package()'       # Lint package
Rscript -e 'devtools::test()'            # Run tests
Rscript -e 'covr::package_coverage()'    # Coverage report
Rscript -e 'devtools::check()'           # Full R CMD check
Rscript -e 'renv::snapshot()'            # Lock dependencies
Rscript -e 'devtools::document()'        # Rebuild roxygen2 docs
quarto render analysis/report.qmd        # Render Quarto document

References

For detailed patterns and examples, see:

references/patterns.md -- dplyr pipelines, ggplot2 recipes, purrr functional patterns

External References

Tidyverse Style Guide
R for Data Science (2e)
Advanced R (2e)
R Packages (2e)
Tidy Evaluation
testthat 3e Documentation
ggplot2 Documentation
renv Documentation
Quarto Guide
lintr Documentation

R Guide

Applies to: R 4.1+, Statistical Computing, Data Analysis, R Packages, Shiny Apps

Core Principles

Tidyverse First: Use tidyverse conventions for data manipulation, visualization, and functional programming; fall back to base R only when performance demands it
Vectorize Everything: Prefer vectorized operations and purrr::map() over explicit for loops; R is optimized for vector operations
Reproducibility: Every analysis must be reproducible -- use renv for dependency management, set.seed() for stochastic operations, and R Markdown/Quarto for literate programming
Functional Style: Write pure functions with no side effects; avoid modifying global state or relying on .GlobalEnv
Explicit Over Implicit: No reliance on partial matching, implicit type coercion, or positional argument passing for non-trivial functions

Guardrails

Version & Dependencies

Target R 4.1+ (native pipe |>, lambda shorthand \(x))
Manage dependencies with renv -- always commit renv.lock
For packages, declare all dependencies in DESCRIPTION (Imports:, Suggests:)
Pin CRAN snapshot dates in renv for full reproducibility
Audit new dependencies: check CRAN status, reverse dependencies, license (GPL compatibility)

Code Style

Follow the tidyverse style guide
Run styler::style_pkg() and lintr::lint_package() before every commit
Naming: snake_case for functions/variables, PascalCase for R6/S4 classes
Max line length: 80 characters
Use <- for assignment (not = outside function arguments)
Explicit library() at top of scripts; never use require()
Always use TRUE/FALSE (never T/F -- they can be overwritten)
No attach() or setwd() -- use here::here() for project-relative paths

Vectorization

Prefer vectorized operations: x * 2 not for (i in seq_along(x)) x[i] * 2
Use dplyr::mutate() / dplyr::summarise() for column-wise transformations
Use purrr::map() family for list iteration (map_dbl(), map_chr(), map_dfr())
Use dplyr::across() for applying functions to multiple columns
Reserve for loops for side effects only (writing files, API calls)
Use vapply() over sapply() when base R is required (explicit return type)

Error Handling

Use rlang::abort() / cli::cli_abort() over stop() for structured conditions
Validate inputs at the start of every exported function
Use stopifnot() or rlang::arg_match() for argument validation
Never use try() -- always tryCatch() or purrr::safely()

validate_dataframe <- function(df, required_cols) {
  if (!is.data.frame(df)) {
    cli::cli_abort("{.arg df} must be a data frame, not {.obj_type_friendly {df}}.")
  }
  missing_cols <- setdiff(required_cols, names(df))
  if (length(missing_cols) > 0) {
    cli::cli_abort(
      "Missing required column{?s}: {.field {missing_cols}}.",
      class = "validation_error"
    )
  }
  invisible(df)
}

Reproducibility

Always use set.seed() before stochastic operations; document the seed
Use renv::snapshot() after adding or updating packages
Never use absolute paths -- use here::here() for project-relative paths
Use R Markdown (.Rmd) or Quarto (.qmd) for analysis reports
Include sessioninfo::session_info() at the end of reports

Project Structure

mypackage/                          myanalysis/
├── R/             # Source files   ├── R/             # Reusable functions
│   ├── data-clean.R                ├── analysis/      # Rmd/Quarto (numbered)
│   └── utils.R                     │   ├── 01-exploration.Rmd
├── tests/                          │   └── 02-modeling.qmd
│   ├── testthat.R  # Runner        ├── data/
│   └── testthat/                   │   ├── raw/       # Immutable input
│       └── test-data-clean.R       │   └── processed/ # Generated output
├── man/            # roxygen2      ├── output/        # Figures, reports
├── vignettes/                      ├── tests/testthat/
├── data-raw/       # Data scripts  ├── renv.lock
├── DESCRIPTION                     └── README.md
├── NAMESPACE       # roxygen2
├── renv.lock
└── README.md

Use roxygen2 for all docs; never edit man/ or NAMESPACE by hand
Raw data is immutable -- store in data/raw/, process into data/processed/

Key Patterns

Tidyverse Pipe Chains

# Prefer native pipe |> (R 4.1+) over magrittr %>%
result <- raw_data |>
  dplyr::filter(year >= 2020, !is.na(revenue)) |>
  dplyr::mutate(
    revenue_m = revenue / 1e6,
    growth = (revenue - dplyr::lag(revenue)) / dplyr::lag(revenue)
  ) |>
  dplyr::summarise(
    mean_revenue = mean(revenue_m, na.rm = TRUE),
    .by = region
  )

Tidy Evaluation

# Use {{ }} (embrace) for column names passed as arguments
summarise_by <- function(df, group_col, value_col) {
  df |>
    dplyr::summarise(
      mean_val = mean({{ value_col }}, na.rm = TRUE),
      n = dplyr::n(),
      .by = {{ group_col }}
    )
}

# Use .data pronoun for string column references
filter_column <- function(df, col_name, threshold) {
  df |> dplyr::filter(.data[[col_name]] > threshold)
}

# Use across() for multiple columns
standardize_numeric <- function(df) {
  df |>
    dplyr::mutate(dplyr::across(
      where(is.numeric),
      \(x) (x - mean(x, na.rm = TRUE)) / sd(x, na.rm = TRUE)
    ))
}

ggplot2 Grammar of Graphics

plot_distribution <- function(df, x_col, fill_col = NULL) {
  ggplot2::ggplot(df, ggplot2::aes(x = {{ x_col }})) +
    ggplot2::geom_histogram(ggplot2::aes(fill = {{ fill_col }}), bins = 30, alpha = 0.7) +
    ggplot2::labs(title = "Distribution", x = NULL, y = "Count") +
    ggplot2::theme_minimal(base_size = 14)
}

Functional Programming with purrr

# Type-stable map variants -- read and combine CSV files
results <- purrr::map_dfr(file_paths, \(path) {
  readr::read_csv(path, show_col_types = FALSE) |>
    dplyr::mutate(source_file = basename(path))
})

# Safe execution -- capture errors without stopping
safe_read <- purrr::safely(readr::read_csv)
reads <- purrr::map(file_paths, safe_read)
successes <- purrr::map(purrr::keep(reads, \(x) is.null(x$error)), "result")

Testing

Standards

Use testthat 3rd edition (Config/testthat/edition: 3 in DESCRIPTION)
Test files: test-*.R (mirror source: data-clean.R -> test-data-clean.R)
Test names describe behavior: test_that("filter_active removes inactive users", ...)
Coverage target: >80% for business logic, >60% overall (measured with covr)
Use snapshot tests (expect_snapshot()) for complex output (plots, printed tables)
No test interdependencies -- each test_that() block is self-contained
Use withr::local_*() for temporary state changes (env vars, options, files)

testthat Examples

test_that("summarise_by computes correct group means", {
  df <- tibble::tibble(
    region = c("east", "east", "west", "west"),
    revenue = c(100, 200, 300, 400)
  )
  result <- summarise_by(df, region, revenue)
  expect_equal(nrow(result), 2)
  expect_equal(result$mean_val[result$region == "east"], 150)
})

test_that("validate_dataframe errors on missing columns", {
  df <- tibble::tibble(a = 1, b = 2)
  expect_error(validate_dataframe(df, c("a", "c")), class = "validation_error")
})

Tooling

Essential Commands

Rscript -e 'styler::style_pkg()'        # Format package code
Rscript -e 'lintr::lint_package()'       # Lint package
Rscript -e 'devtools::test()'            # Run tests
Rscript -e 'covr::package_coverage()'    # Coverage report
Rscript -e 'devtools::check()'           # Full R CMD check
Rscript -e 'renv::snapshot()'            # Lock dependencies
Rscript -e 'devtools::document()'        # Rebuild roxygen2 docs
quarto render analysis/report.qmd        # Render Quarto document

References

For detailed patterns and examples, see:

references/patterns.md -- dplyr pipelines, ggplot2 recipes, purrr functional patterns

External References

Tidyverse Style Guide
R for Data Science (2e)
Advanced R (2e)
R Packages (2e)
Tidy Evaluation
testthat 3e Documentation
ggplot2 Documentation
renv Documentation
Quarto Guide
lintr Documentation

Adoption

ar4mirez/r-guide

$ install --global

Security Scan Results

SKILL.md

R Guide

Core Principles

Guardrails

Version & Dependencies

Code Style

Vectorization

Error Handling

Reproducibility

Project Structure

Key Patterns

Tidyverse Pipe Chains

Tidy Evaluation

ggplot2 Grammar of Graphics

Functional Programming with purrr

Testing

Standards

testthat Examples

Tooling

Essential Commands

References

External References

Related Skills

ar4mirez/zig-guide

ar4mirez/wordpress

ar4mirez/webapp-testing

ar4mirez/web-artifacts-builder

ar4mirez/r-guide

$ install --global

Security Scan Results

SKILL.md

R Guide

Core Principles

Guardrails

Version & Dependencies

Code Style

Vectorization

Error Handling

Reproducibility

Project Structure

Key Patterns

Tidyverse Pipe Chains

Tidy Evaluation

ggplot2 Grammar of Graphics

Functional Programming with purrr

Testing

Standards

testthat Examples

Tooling

Essential Commands

References

External References

Related Skills

ar4mirez/zig-guide

ar4mirez/wordpress

ar4mirez/webapp-testing

ar4mirez/web-artifacts-builder