R Research Package Development

When to Use This Skill

Use this skill when you need to:

Create an R package for research statistical methods
Write roxygen2 documentation with parameter descriptions and examples
Test R functions with testthat framework
Build and check packages for CRAN submission
Create vignettes demonstrating package usage
Set up continuous integration for R packages (R CMD check)
Interface R packages with Python using rpy2

Trigger keywords: R package, devtools, roxygen2, testthat, CRAN, BioConductor, R CMD check, DESCRIPTION file, NAMESPACE, vignette, pkgdown, usethis, rextendr, S3 class, S4 class, R6 class, rpy2, reticulate, statistical package, CRAN submission.

Background & Key Concepts

R Package Structure

mypackage/
├── DESCRIPTION          # Package metadata
├── NAMESPACE            # Exported functions (auto-generated by roxygen2)
├── R/                   # R source files
│   ├── utils.R
│   └── main_functions.R
├── man/                 # Documentation (auto-generated by roxygen2)
├── tests/
│   └── testthat/
│       └── test-main_functions.R
├── vignettes/
│   └── introduction.Rmd
└── inst/
    └── extdata/         # Example data files

roxygen2 Documentation Tags

@title, @description: Function title/description
@param name desc: Parameter documentation
@return: Return value description
@examples: Executable examples
@export: Add to NAMESPACE (public function)
@importFrom pkg fun: Import from dependency
@seealso, @references: Cross-references

S3 vs S4 vs R6

S3: Simple, list-based. class(x) <- "myclass", print.myclass <- function(x, ...)
S4: Formal. setClass(), setMethod(), slot access via @
R6: Reference semantics (mutable). Ideal for stateful objects.

DESCRIPTION File

Key fields:

Package, Title, Description, Version
Authors@R: ORCID-linked author metadata
Depends, Imports, Suggests
License, Encoding, LazyData

Environment Setup

# Install development tools (run in R)
# install.packages(c("devtools", "roxygen2", "testthat", "usethis", "pkgdown"))

# Python interface (optional)
pip install rpy2>=3.5 pandas>=2.0 numpy>=1.24

# Verify R tools
Rscript -e "packageVersion('devtools')"
Rscript -e "packageVersion('testthat')"

Core Workflow

Step 1: Create Package Scaffolding

# Run in R console or Rscript
library(usethis)
library(devtools)

# Create new package
usethis::create_package("~/research/mystatpack")
setwd("~/research/mystatpack")

# Set up testing
usethis::use_testthat()

# Set up version control
usethis::use_git()

# Set up CI (GitHub Actions)
usethis::use_github_action("check-standard")

# Add a license
usethis::use_mit_license()

# Create vignette template
usethis::use_vignette("introduction")

# File: DESCRIPTION
Package: mystatpack
Title: Statistical Methods for Research Analysis
Version: 0.1.0
Authors@R:
    person("Research", "Team",
           email = "[email protected]",
           role = c("aut", "cre"),
           comment = c(ORCID = "0000-0001-2345-6789"))
Description: Provides functions for common statistical analyses in
    academic research, including effect size computation, power
    analysis, and reproducible reporting utilities.
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.2.3
Imports:
    stats,
    utils
Suggests:
    testthat (>= 3.0.0),
    knitr,
    rmarkdown
Config/testthat/edition: 3

Step 2: Write Documented R Functions

# File: R/effect_size.R

#' Compute Effect Size Between Two Groups
#'
#' @description
#' Calculates Cohen's d, Glass's delta, or Hedges' g effect size for
#' comparing means between two independent groups. Includes bootstrap
#' confidence intervals when \code{ci = TRUE}.
#'
#' @param group1 Numeric vector. Observations from the first group.
#' @param group2 Numeric vector. Observations from the second group.
#' @param method Character string. Effect size measure:
#'   \code{"cohens_d"} (default), \code{"glass_delta"}, or \code{"hedges_g"}.
#' @param ci Logical. Compute bootstrap confidence interval? Default \code{FALSE}.
#' @param n_boot Integer. Number of bootstrap resamples if \code{ci = TRUE}. Default 1000.
#' @param alpha Numeric. Significance level for CI. Default 0.05.
#'
#' @return An object of class \code{"effect_size"} with components:
#' \describe{
#'   \item{estimate}{Point estimate of the effect size.}
#'   \item{method}{Effect size method used.}
#'   \item{ci_lower}{Lower CI bound (if \code{ci = TRUE}, else \code{NA}).}
#'   \item{ci_upper}{Upper CI bound (if \code{ci = TRUE}, else \code{NA}).}
#'   \item{n1}{Sample size of group 1.}
#'   \item{n2}{Sample size of group 2.}
#'   \item{interpretation}{Qualitative interpretation following Cohen (1988).}
#' }
#'
#' @references
#' Cohen, J. (1988). \emph{Statistical Power Analysis for the Behavioral Sciences}
#' (2nd ed.). Lawrence Erlbaum Associates.
#'
#' @seealso \code{\link{power_analysis}}, \code{\link{cohens_d_ci}}
#'
#' @examples
#' set.seed(42)
#' g1 <- rnorm(50, mean = 10, sd = 2)
#' g2 <- rnorm(50, mean = 8, sd = 2)
#'
#' # Cohen's d
#' result <- compute_effect_size(g1, g2)
#' print(result)
#'
#' # Hedges' g with bootstrap CI
#' result_g <- compute_effect_size(g1, g2, method = "hedges_g", ci = TRUE)
#' summary(result_g)
#'
#' @export
compute_effect_size <- function(
    group1,
    group2,
    method = c("cohens_d", "glass_delta", "hedges_g"),
    ci = FALSE,
    n_boot = 1000L,
    alpha = 0.05
) {
  method <- match.arg(method)

  # Input validation
  if (!is.numeric(group1) || !is.numeric(group2)) {
    stop("'group1' and 'group2' must be numeric vectors.", call. = FALSE)
  }
  if (length(group1) < 2 || length(group2) < 2) {
    stop("Both groups must have at least 2 observations.", call. = FALSE)
  }
  if (alpha <= 0 || alpha >= 1) {
    stop("'alpha' must be between 0 and 1 exclusive.", call. = FALSE)
  }

  # Compute effect size
  estimate <- .es_compute(group1, group2, method)

  # Bootstrap CI
  ci_lower <- NA_real_
  ci_upper <- NA_real_
  if (ci) {
    boot_estimates <- vapply(
      seq_len(n_boot),
      function(i) {
        b1 <- sample(group1, replace = TRUE)
        b2 <- sample(group2, replace = TRUE)
        .es_compute(b1, b2, method)
      },
      numeric(1L)
    )
    ci_lower <- stats::quantile(boot_estimates, alpha / 2, names = FALSE)
    ci_upper <- stats::quantile(boot_estimates, 1 - alpha / 2, names = FALSE)
  }

  # Interpret
  abs_d <- abs(estimate)
  interpretation <- dplyr_like_case(
    abs_d < 0.2  ~ "negligible",
    abs_d < 0.5  ~ "small",
    abs_d < 0.8  ~ "medium",
    TRUE         ~ "large"
  )

  structure(
    list(
      estimate       = estimate,
      method         = method,
      ci_lower       = ci_lower,
      ci_upper       = ci_upper,
      n1             = length(group1),
      n2             = length(group2),
      interpretation = interpretation
    ),
    class = "effect_size"
  )
}

# Internal helper (not exported)
.es_compute <- function(g1, g2, method) {
  n1 <- length(g1); n2 <- length(g2)
  mean_diff <- mean(g1) - mean(g2)

  switch(method,
    cohens_d = {
      pooled_sd <- sqrt(((n1 - 1) * var(g1) + (n2 - 1) * var(g2)) / (n1 + n2 - 2))
      mean_diff / pooled_sd
    },
    glass_delta = mean_diff / sd(g2),
    hedges_g = {
      d <- .es_compute(g1, g2, "cohens_d")
      j <- 1 - 3 / (4 * (n1 + n2 - 2) - 1)
      d * j
    }
  )
}

# Workaround for dplyr::case_when without dependency
dplyr_like_case <- function(...) {
  cases <- list(...)
  for (formula in cases) {
    cond <- eval(formula[[2]])
    if (isTRUE(cond)) return(eval(formula[[3]]))
  }
  NA_character_
}

#' @export
print.effect_size <- function(x, ...) {
  cat(sprintf(
    "Effect Size (%s)\n  Estimate: %.3f [%s]\n  n1 = %d, n2 = %d\n",
    x$method, x$estimate, x$interpretation, x$n1, x$n2
  ))
  if (!is.na(x$ci_lower)) {
    cat(sprintf("  95%% CI: [%.3f, %.3f]\n", x$ci_lower, x$ci_upper))
  }
  invisible(x)
}

#' @export
summary.effect_size <- function(object, ...) {
  print(object, ...)
  cat(sprintf(
    "  Interpretation: %s effect (%s, 1988)\n",
    object$interpretation, "Cohen"
  ))
  invisible(object)
}

# File: R/power_analysis.R

#' Sample Size Determination for Two-Sample t-Test
#'
#' @description
#' Computes the required sample size per group to achieve specified
#' statistical power for detecting a given effect size.
#'
#' @param d Numeric. Expected Cohen's d effect size.
#' @param power Numeric. Desired statistical power (default 0.80).
#' @param alpha Numeric. Significance level (default 0.05, two-tailed).
#' @param ratio Numeric. Ratio n2/n1 for unequal group sizes (default 1).
#'
#' @return Named numeric vector with \code{n1}, \code{n2}, and \code{total}.
#'
#' @examples
#' # Power for medium effect, 80% power, alpha = 0.05
#' power_analysis(d = 0.5)
#'
#' # Small effect, 90% power
#' power_analysis(d = 0.2, power = 0.90)
#'
#' @export
power_analysis <- function(d, power = 0.80, alpha = 0.05, ratio = 1) {
  if (d <= 0) stop("'d' must be positive.")
  if (power <= 0 || power >= 1) stop("'power' must be in (0, 1).")

  z_alpha <- stats::qnorm(1 - alpha / 2)
  z_beta  <- stats::qnorm(power)

  # Derivation for unequal samples:
  # n1 = (z_alpha + z_beta)^2 * (1 + 1/ratio) / d^2
  n1 <- ceiling((z_alpha + z_beta)^2 * (1 + 1 / ratio) / d^2)
  n2 <- ceiling(n1 * ratio)

  c(n1 = n1, n2 = n2, total = n1 + n2)
}

Step 3: testthat Unit Tests

# File: tests/testthat/test-effect_size.R

library(testthat)

test_that("compute_effect_size returns correct class", {
  set.seed(42)
  g1 <- rnorm(50, mean = 10, sd = 2)
  g2 <- rnorm(50, mean = 8, sd = 2)
  result <- compute_effect_size(g1, g2)
  expect_s3_class(result, "effect_size")
})

test_that("Cohen's d is approximately 1.0 for known groups", {
  set.seed(42)
  n <- 1000
  g1 <- rnorm(n, mean = 2, sd = 1)  # true d = (2-0)/1 = 2
  g2 <- rnorm(n, mean = 0, sd = 1)
  result <- compute_effect_size(g1, g2, method = "cohens_d")
  expect_equal(result$estimate, 2, tolerance = 0.1)
})

test_that("Reversing groups negates effect size", {
  set.seed(42)
  g1 <- rnorm(50, mean = 10, sd = 2)
  g2 <- rnorm(50, mean = 8, sd = 2)
  d_fwd <- compute_effect_size(g1, g2)$estimate
  d_rev <- compute_effect_size(g2, g1)$estimate
  expect_equal(d_fwd, -d_rev, tolerance = 1e-10)
})

test_that("Hedges' g is smaller than Cohen's d", {
  set.seed(42)
  g1 <- rnorm(20, mean = 10, sd = 2)
  g2 <- rnorm(20, mean = 8, sd = 2)
  d <- abs(compute_effect_size(g1, g2, method = "cohens_d")$estimate)
  g <- abs(compute_effect_size(g1, g2, method = "hedges_g")$estimate)
  expect_lt(g, d)
})

test_that("Bootstrap CI has correct coverage", {
  set.seed(42)
  g1 <- rnorm(100, mean = 10, sd = 2)
  g2 <- rnorm(100, mean = 8, sd = 2)
  result <- compute_effect_size(g1, g2, ci = TRUE, n_boot = 200)
  expect_true(!is.na(result$ci_lower))
  expect_true(result$ci_lower < result$estimate)
  expect_true(result$ci_upper > result$estimate)
})

test_that("Invalid inputs raise errors", {
  expect_error(compute_effect_size("a", c(1, 2, 3)), "numeric")
  expect_error(compute_effect_size(c(1), c(1, 2, 3)), "at least 2")
  expect_error(compute_effect_size(c(1, 2), c(1, 2), alpha = 1.5), "alpha")
})

test_that("power_analysis gives sensible n for d=0.5", {
  result <- power_analysis(d = 0.5, power = 0.80, alpha = 0.05)
  expect_true(result["n1"] >= 60 && result["n1"] <= 70)
})

test_that("All effect size methods return numeric", {
  g1 <- c(8, 9, 10, 11, 12)
  g2 <- c(6, 7, 8, 9, 10)
  for (m in c("cohens_d", "glass_delta", "hedges_g")) {
    result <- compute_effect_size(g1, g2, method = m)
    expect_type(result$estimate, "double")
  }
})

# Run all tests
Rscript -e "devtools::test()"

# Run R CMD check (full CRAN check)
Rscript -e "devtools::check()"

# Install package
Rscript -e "devtools::install()"

# Build documentation
Rscript -e "devtools::document()"

# Build pkgdown website
Rscript -e "pkgdown::build_site()"

Advanced Usage

rpy2: Call R Package from Python

import os
import numpy as np
import pandas as pd

# Set up rpy2 environment
os.environ["R_HOME"] = os.popen("R RHOME").read().strip()

try:
    import rpy2.robjects as ro
    from rpy2.robjects import numpy2ri, pandas2ri
    from rpy2.robjects.packages import importr
    numpy2ri.activate()
    pandas2ri.activate()

    # Install package if needed (run once)
    # utils = importr("utils")
    # utils.install_packages("mystatpack")

    # Import the R package
    # mystatpack = importr("mystatpack")

    # Call R function from Python
    # g1_py = np.random.normal(10, 2, 50)
    # g2_py = np.random.normal(8, 2, 50)
    # result_r = mystatpack.compute_effect_size(g1_py, g2_py)
    # print(f"Cohen's d (via rpy2): {result_r.rx2('estimate')[0]:.3f}")

    # Use base R stats functions
    stats = importr("stats")
    g1 = np.random.normal(10, 2, 50)
    g2 = np.random.normal(8, 2, 50)
    t_test = stats.t_test(g1, g2, alternative="two.sided")
    print(f"t-test via rpy2: t = {t_test.rx2('statistic')[0]:.3f}, "
          f"p = {t_test.rx2('p.value')[0]:.4f}")

except ImportError:
    print("rpy2 not available — using Python statsmodels instead")
    import scipy.stats as stats
    np.random.seed(42)
    g1 = np.random.normal(10, 2, 50)
    g2 = np.random.normal(8, 2, 50)
    t_stat, p_val = stats.ttest_ind(g1, g2)
    print(f"t-test (scipy): t = {t_stat:.3f}, p = {p_val:.4f}")

Package Vignette Template

# File: vignettes/introduction.Rmd

---
title: "Introduction to mystatpack"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to mystatpack}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

\`\`\`{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, fig.width = 7, fig.height = 4)
library(mystatpack)
set.seed(42)
\`\`\`

## Basic Usage

### Computing Effect Sizes

\`\`\`{r effect-size-example}
# Generate two groups
group1 <- rnorm(50, mean = 10, sd = 2)
group2 <- rnorm(50, mean = 8, sd = 2)

# Cohen's d
result <- compute_effect_size(group1, group2)
print(result)
\`\`\`

### Power Analysis

\`\`\`{r power-example}
# Required sample size for medium effect (d=0.5), 80% power
n_required <- power_analysis(d = 0.5)
cat("Required n per group:", n_required["n1"], "\n")
cat("Total sample size:   ", n_required["total"], "\n")
\`\`\`

## Advanced Usage

### Bootstrap Confidence Intervals

\`\`\`{r bootstrap-ci}
result_ci <- compute_effect_size(
  group1, group2,
  method = "hedges_g",
  ci = TRUE,
  n_boot = 1000
)
summary(result_ci)
\`\`\`

Troubleshooting

| Problem | Cause | Fix | |---------|-------|-----| | R CMD check: no visible binding | Undeclared variable in function | Use utils::globalVariables() or add :: prefix | | roxygen2 doesn't generate NAMESPACE | Forgot @export tag | Add @export to all public functions | | Test fails with testthat edition error | Edition mismatch | Add Config/testthat/edition: 3 to DESCRIPTION | | rpy2 import error | R not found | Set R_HOME to R RHOME output | | pkgdown fails | Missing GitHub Pages setup | Use usethis::use_pkgdown_github_pages() | | CRAN check warning: missing examples | @examples not runnable | Wrap time-consuming examples in \dontrun{} |

External Resources

R Packages (2e) by Hadley Wickham
Writing R Extensions (CRAN)
roxygen2 documentation
testthat documentation
rpy2 documentation
pkgdown — automated R package website

Examples

Example 1: S3 Class for Statistical Models

# R: Define a custom S3 class for regression results

#' Fit a Robust Linear Model
#'
#' @param formula R formula object
#' @param data Data frame
#' @return Object of class "robust_lm"
#' @export
robust_lm <- function(formula, data) {
  fit <- lm(formula, data = data)
  res <- residuals(fit)

  # HC3 robust standard errors (manual implementation)
  X <- model.matrix(fit)
  n <- nrow(X); k <- ncol(X)
  leverage <- hat(X)
  e2 <- res^2 / (1 - leverage)^2
  meat <- crossprod(X * sqrt(e2))
  bread <- solve(crossprod(X))
  vcov_hc3 <- n / (n - k) * bread %*% meat %*% bread

  structure(
    list(
      fit       = fit,
      vcov_hc3  = vcov_hc3,
      coef      = coef(fit),
      se_hc3    = sqrt(diag(vcov_hc3))
    ),
    class = "robust_lm"
  )
}

#' @export
print.robust_lm <- function(x, ...) {
  cat("Robust Linear Model (HC3 SE)\n")
  coef_table <- data.frame(
    Estimate = x$coef,
    `HC3 SE` = x$se_hc3,
    `t value` = x$coef / x$se_hc3,
    check.names = FALSE
  )
  print(coef_table)
  invisible(x)
}

Example 2: Python Interface via reticulate

# Python script that calls R statistical routines via rpy2

import numpy as np
from scipy.stats import ttest_ind, mannwhitneyu

def compare_groups(g1, g2, paired=False):
    """Compare two groups using multiple tests.

    Provides results from both Python (scipy) implementations.
    In a full setup, would also call R via rpy2 for consistency checks.

    Args:
        g1, g2: numpy arrays of observations
        paired: whether groups are paired
    Returns:
        dict of test results
    """
    g1, g2 = np.asarray(g1), np.asarray(g2)
    t_stat, t_p = ttest_ind(g1, g2, equal_var=False)  # Welch's t-test
    u_stat, u_p = mannwhitneyu(g1, g2, alternative="two-sided")

    # Cohen's d
    pooled_sd = np.sqrt((np.var(g1, ddof=1) + np.var(g2, ddof=1)) / 2)
    d = (np.mean(g1) - np.mean(g2)) / pooled_sd

    return {
        "t_stat": t_stat, "t_p": t_p,
        "u_stat": u_stat, "u_p": u_p,
        "cohens_d": d,
        "n1": len(g1), "n2": len(g2),
    }

np.random.seed(42)
g1 = np.random.normal(10, 2, 50)
g2 = np.random.normal(8, 2, 50)
results = compare_groups(g1, g2)
print("Group comparison results:")
for k, v in results.items():
    print(f"  {k}: {v:.4f}" if isinstance(v, float) else f"  {k}: {v}")

R Research Package Development

When to Use This Skill

Use this skill when you need to:

Create an R package for research statistical methods
Write roxygen2 documentation with parameter descriptions and examples
Test R functions with testthat framework
Build and check packages for CRAN submission
Create vignettes demonstrating package usage
Set up continuous integration for R packages (R CMD check)
Interface R packages with Python using rpy2

Background & Key Concepts

R Package Structure

mypackage/
├── DESCRIPTION          # Package metadata
├── NAMESPACE            # Exported functions (auto-generated by roxygen2)
├── R/                   # R source files
│   ├── utils.R
│   └── main_functions.R
├── man/                 # Documentation (auto-generated by roxygen2)
├── tests/
│   └── testthat/
│       └── test-main_functions.R
├── vignettes/
│   └── introduction.Rmd
└── inst/
    └── extdata/         # Example data files

roxygen2 Documentation Tags

@title, @description: Function title/description
@param name desc: Parameter documentation
@return: Return value description
@examples: Executable examples
@export: Add to NAMESPACE (public function)
@importFrom pkg fun: Import from dependency
@seealso, @references: Cross-references

S3 vs S4 vs R6

S3: Simple, list-based. class(x) <- "myclass", print.myclass <- function(x, ...)
S4: Formal. setClass(), setMethod(), slot access via @
R6: Reference semantics (mutable). Ideal for stateful objects.

DESCRIPTION File

Key fields:

Package, Title, Description, Version
Authors@R: ORCID-linked author metadata
Depends, Imports, Suggests
License, Encoding, LazyData

Environment Setup

# Install development tools (run in R)
# install.packages(c("devtools", "roxygen2", "testthat", "usethis", "pkgdown"))

# Python interface (optional)
pip install rpy2>=3.5 pandas>=2.0 numpy>=1.24

# Verify R tools
Rscript -e "packageVersion('devtools')"
Rscript -e "packageVersion('testthat')"

Core Workflow

Step 1: Create Package Scaffolding

# Run in R console or Rscript
library(usethis)
library(devtools)

# Create new package
usethis::create_package("~/research/mystatpack")
setwd("~/research/mystatpack")

# Set up testing
usethis::use_testthat()

# Set up version control
usethis::use_git()

# Set up CI (GitHub Actions)
usethis::use_github_action("check-standard")

# Add a license
usethis::use_mit_license()

# Create vignette template
usethis::use_vignette("introduction")

# File: DESCRIPTION
Package: mystatpack
Title: Statistical Methods for Research Analysis
Version: 0.1.0
Authors@R:
    person("Research", "Team",
           email = "[email protected]",
           role = c("aut", "cre"),
           comment = c(ORCID = "0000-0001-2345-6789"))
Description: Provides functions for common statistical analyses in
    academic research, including effect size computation, power
    analysis, and reproducible reporting utilities.
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.2.3
Imports:
    stats,
    utils
Suggests:
    testthat (>= 3.0.0),
    knitr,
    rmarkdown
Config/testthat/edition: 3

Step 2: Write Documented R Functions

# File: R/effect_size.R

#' Compute Effect Size Between Two Groups
#'
#' @description
#' Calculates Cohen's d, Glass's delta, or Hedges' g effect size for
#' comparing means between two independent groups. Includes bootstrap
#' confidence intervals when \code{ci = TRUE}.
#'
#' @param group1 Numeric vector. Observations from the first group.
#' @param group2 Numeric vector. Observations from the second group.
#' @param method Character string. Effect size measure:
#'   \code{"cohens_d"} (default), \code{"glass_delta"}, or \code{"hedges_g"}.
#' @param ci Logical. Compute bootstrap confidence interval? Default \code{FALSE}.
#' @param n_boot Integer. Number of bootstrap resamples if \code{ci = TRUE}. Default 1000.
#' @param alpha Numeric. Significance level for CI. Default 0.05.
#'
#' @return An object of class \code{"effect_size"} with components:
#' \describe{
#'   \item{estimate}{Point estimate of the effect size.}
#'   \item{method}{Effect size method used.}
#'   \item{ci_lower}{Lower CI bound (if \code{ci = TRUE}, else \code{NA}).}
#'   \item{ci_upper}{Upper CI bound (if \code{ci = TRUE}, else \code{NA}).}
#'   \item{n1}{Sample size of group 1.}
#'   \item{n2}{Sample size of group 2.}
#'   \item{interpretation}{Qualitative interpretation following Cohen (1988).}
#' }
#'
#' @references
#' Cohen, J. (1988). \emph{Statistical Power Analysis for the Behavioral Sciences}
#' (2nd ed.). Lawrence Erlbaum Associates.
#'
#' @seealso \code{\link{power_analysis}}, \code{\link{cohens_d_ci}}
#'
#' @examples
#' set.seed(42)
#' g1 <- rnorm(50, mean = 10, sd = 2)
#' g2 <- rnorm(50, mean = 8, sd = 2)
#'
#' # Cohen's d
#' result <- compute_effect_size(g1, g2)
#' print(result)
#'
#' # Hedges' g with bootstrap CI
#' result_g <- compute_effect_size(g1, g2, method = "hedges_g", ci = TRUE)
#' summary(result_g)
#'
#' @export
compute_effect_size <- function(
    group1,
    group2,
    method = c("cohens_d", "glass_delta", "hedges_g"),
    ci = FALSE,
    n_boot = 1000L,
    alpha = 0.05
) {
  method <- match.arg(method)

  # Input validation
  if (!is.numeric(group1) || !is.numeric(group2)) {
    stop("'group1' and 'group2' must be numeric vectors.", call. = FALSE)
  }
  if (length(group1) < 2 || length(group2) < 2) {
    stop("Both groups must have at least 2 observations.", call. = FALSE)
  }
  if (alpha <= 0 || alpha >= 1) {
    stop("'alpha' must be between 0 and 1 exclusive.", call. = FALSE)
  }

  # Compute effect size
  estimate <- .es_compute(group1, group2, method)

  # Bootstrap CI
  ci_lower <- NA_real_
  ci_upper <- NA_real_
  if (ci) {
    boot_estimates <- vapply(
      seq_len(n_boot),
      function(i) {
        b1 <- sample(group1, replace = TRUE)
        b2 <- sample(group2, replace = TRUE)
        .es_compute(b1, b2, method)
      },
      numeric(1L)
    )
    ci_lower <- stats::quantile(boot_estimates, alpha / 2, names = FALSE)
    ci_upper <- stats::quantile(boot_estimates, 1 - alpha / 2, names = FALSE)
  }

  # Interpret
  abs_d <- abs(estimate)
  interpretation <- dplyr_like_case(
    abs_d < 0.2  ~ "negligible",
    abs_d < 0.5  ~ "small",
    abs_d < 0.8  ~ "medium",
    TRUE         ~ "large"
  )

  structure(
    list(
      estimate       = estimate,
      method         = method,
      ci_lower       = ci_lower,
      ci_upper       = ci_upper,
      n1             = length(group1),
      n2             = length(group2),
      interpretation = interpretation
    ),
    class = "effect_size"
  )
}

# Internal helper (not exported)
.es_compute <- function(g1, g2, method) {
  n1 <- length(g1); n2 <- length(g2)
  mean_diff <- mean(g1) - mean(g2)

  switch(method,
    cohens_d = {
      pooled_sd <- sqrt(((n1 - 1) * var(g1) + (n2 - 1) * var(g2)) / (n1 + n2 - 2))
      mean_diff / pooled_sd
    },
    glass_delta = mean_diff / sd(g2),
    hedges_g = {
      d <- .es_compute(g1, g2, "cohens_d")
      j <- 1 - 3 / (4 * (n1 + n2 - 2) - 1)
      d * j
    }
  )
}

# Workaround for dplyr::case_when without dependency
dplyr_like_case <- function(...) {
  cases <- list(...)
  for (formula in cases) {
    cond <- eval(formula[[2]])
    if (isTRUE(cond)) return(eval(formula[[3]]))
  }
  NA_character_
}

#' @export
print.effect_size <- function(x, ...) {
  cat(sprintf(
    "Effect Size (%s)\n  Estimate: %.3f [%s]\n  n1 = %d, n2 = %d\n",
    x$method, x$estimate, x$interpretation, x$n1, x$n2
  ))
  if (!is.na(x$ci_lower)) {
    cat(sprintf("  95%% CI: [%.3f, %.3f]\n", x$ci_lower, x$ci_upper))
  }
  invisible(x)
}

#' @export
summary.effect_size <- function(object, ...) {
  print(object, ...)
  cat(sprintf(
    "  Interpretation: %s effect (%s, 1988)\n",
    object$interpretation, "Cohen"
  ))
  invisible(object)
}

# File: R/power_analysis.R

#' Sample Size Determination for Two-Sample t-Test
#'
#' @description
#' Computes the required sample size per group to achieve specified
#' statistical power for detecting a given effect size.
#'
#' @param d Numeric. Expected Cohen's d effect size.
#' @param power Numeric. Desired statistical power (default 0.80).
#' @param alpha Numeric. Significance level (default 0.05, two-tailed).
#' @param ratio Numeric. Ratio n2/n1 for unequal group sizes (default 1).
#'
#' @return Named numeric vector with \code{n1}, \code{n2}, and \code{total}.
#'
#' @examples
#' # Power for medium effect, 80% power, alpha = 0.05
#' power_analysis(d = 0.5)
#'
#' # Small effect, 90% power
#' power_analysis(d = 0.2, power = 0.90)
#'
#' @export
power_analysis <- function(d, power = 0.80, alpha = 0.05, ratio = 1) {
  if (d <= 0) stop("'d' must be positive.")
  if (power <= 0 || power >= 1) stop("'power' must be in (0, 1).")

  z_alpha <- stats::qnorm(1 - alpha / 2)
  z_beta  <- stats::qnorm(power)

  # Derivation for unequal samples:
  # n1 = (z_alpha + z_beta)^2 * (1 + 1/ratio) / d^2
  n1 <- ceiling((z_alpha + z_beta)^2 * (1 + 1 / ratio) / d^2)
  n2 <- ceiling(n1 * ratio)

  c(n1 = n1, n2 = n2, total = n1 + n2)
}

Step 3: testthat Unit Tests

# File: tests/testthat/test-effect_size.R

library(testthat)

test_that("compute_effect_size returns correct class", {
  set.seed(42)
  g1 <- rnorm(50, mean = 10, sd = 2)
  g2 <- rnorm(50, mean = 8, sd = 2)
  result <- compute_effect_size(g1, g2)
  expect_s3_class(result, "effect_size")
})

test_that("Cohen's d is approximately 1.0 for known groups", {
  set.seed(42)
  n <- 1000
  g1 <- rnorm(n, mean = 2, sd = 1)  # true d = (2-0)/1 = 2
  g2 <- rnorm(n, mean = 0, sd = 1)
  result <- compute_effect_size(g1, g2, method = "cohens_d")
  expect_equal(result$estimate, 2, tolerance = 0.1)
})

test_that("Reversing groups negates effect size", {
  set.seed(42)
  g1 <- rnorm(50, mean = 10, sd = 2)
  g2 <- rnorm(50, mean = 8, sd = 2)
  d_fwd <- compute_effect_size(g1, g2)$estimate
  d_rev <- compute_effect_size(g2, g1)$estimate
  expect_equal(d_fwd, -d_rev, tolerance = 1e-10)
})

test_that("Hedges' g is smaller than Cohen's d", {
  set.seed(42)
  g1 <- rnorm(20, mean = 10, sd = 2)
  g2 <- rnorm(20, mean = 8, sd = 2)
  d <- abs(compute_effect_size(g1, g2, method = "cohens_d")$estimate)
  g <- abs(compute_effect_size(g1, g2, method = "hedges_g")$estimate)
  expect_lt(g, d)
})

test_that("Bootstrap CI has correct coverage", {
  set.seed(42)
  g1 <- rnorm(100, mean = 10, sd = 2)
  g2 <- rnorm(100, mean = 8, sd = 2)
  result <- compute_effect_size(g1, g2, ci = TRUE, n_boot = 200)
  expect_true(!is.na(result$ci_lower))
  expect_true(result$ci_lower < result$estimate)
  expect_true(result$ci_upper > result$estimate)
})

test_that("Invalid inputs raise errors", {
  expect_error(compute_effect_size("a", c(1, 2, 3)), "numeric")
  expect_error(compute_effect_size(c(1), c(1, 2, 3)), "at least 2")
  expect_error(compute_effect_size(c(1, 2), c(1, 2), alpha = 1.5), "alpha")
})

test_that("power_analysis gives sensible n for d=0.5", {
  result <- power_analysis(d = 0.5, power = 0.80, alpha = 0.05)
  expect_true(result["n1"] >= 60 && result["n1"] <= 70)
})

test_that("All effect size methods return numeric", {
  g1 <- c(8, 9, 10, 11, 12)
  g2 <- c(6, 7, 8, 9, 10)
  for (m in c("cohens_d", "glass_delta", "hedges_g")) {
    result <- compute_effect_size(g1, g2, method = m)
    expect_type(result$estimate, "double")
  }
})

# Run all tests
Rscript -e "devtools::test()"

# Run R CMD check (full CRAN check)
Rscript -e "devtools::check()"

# Install package
Rscript -e "devtools::install()"

# Build documentation
Rscript -e "devtools::document()"

# Build pkgdown website
Rscript -e "pkgdown::build_site()"

Advanced Usage

rpy2: Call R Package from Python

import os
import numpy as np
import pandas as pd

# Set up rpy2 environment
os.environ["R_HOME"] = os.popen("R RHOME").read().strip()

try:
    import rpy2.robjects as ro
    from rpy2.robjects import numpy2ri, pandas2ri
    from rpy2.robjects.packages import importr
    numpy2ri.activate()
    pandas2ri.activate()

    # Install package if needed (run once)
    # utils = importr("utils")
    # utils.install_packages("mystatpack")

    # Import the R package
    # mystatpack = importr("mystatpack")

    # Call R function from Python
    # g1_py = np.random.normal(10, 2, 50)
    # g2_py = np.random.normal(8, 2, 50)
    # result_r = mystatpack.compute_effect_size(g1_py, g2_py)
    # print(f"Cohen's d (via rpy2): {result_r.rx2('estimate')[0]:.3f}")

    # Use base R stats functions
    stats = importr("stats")
    g1 = np.random.normal(10, 2, 50)
    g2 = np.random.normal(8, 2, 50)
    t_test = stats.t_test(g1, g2, alternative="two.sided")
    print(f"t-test via rpy2: t = {t_test.rx2('statistic')[0]:.3f}, "
          f"p = {t_test.rx2('p.value')[0]:.4f}")

except ImportError:
    print("rpy2 not available — using Python statsmodels instead")
    import scipy.stats as stats
    np.random.seed(42)
    g1 = np.random.normal(10, 2, 50)
    g2 = np.random.normal(8, 2, 50)
    t_stat, p_val = stats.ttest_ind(g1, g2)
    print(f"t-test (scipy): t = {t_stat:.3f}, p = {p_val:.4f}")

Package Vignette Template

# File: vignettes/introduction.Rmd

---
title: "Introduction to mystatpack"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to mystatpack}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

\`\`\`{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, fig.width = 7, fig.height = 4)
library(mystatpack)
set.seed(42)
\`\`\`

## Basic Usage

### Computing Effect Sizes

\`\`\`{r effect-size-example}
# Generate two groups
group1 <- rnorm(50, mean = 10, sd = 2)
group2 <- rnorm(50, mean = 8, sd = 2)

# Cohen's d
result <- compute_effect_size(group1, group2)
print(result)
\`\`\`

### Power Analysis

\`\`\`{r power-example}
# Required sample size for medium effect (d=0.5), 80% power
n_required <- power_analysis(d = 0.5)
cat("Required n per group:", n_required["n1"], "\n")
cat("Total sample size:   ", n_required["total"], "\n")
\`\`\`

## Advanced Usage

### Bootstrap Confidence Intervals

\`\`\`{r bootstrap-ci}
result_ci <- compute_effect_size(
  group1, group2,
  method = "hedges_g",
  ci = TRUE,
  n_boot = 1000
)
summary(result_ci)
\`\`\`

Troubleshooting

External Resources

R Packages (2e) by Hadley Wickham
Writing R Extensions (CRAN)
roxygen2 documentation
testthat documentation
rpy2 documentation
pkgdown — automated R package website

Examples

Example 1: S3 Class for Statistical Models

# R: Define a custom S3 class for regression results

#' Fit a Robust Linear Model
#'
#' @param formula R formula object
#' @param data Data frame
#' @return Object of class "robust_lm"
#' @export
robust_lm <- function(formula, data) {
  fit <- lm(formula, data = data)
  res <- residuals(fit)

  # HC3 robust standard errors (manual implementation)
  X <- model.matrix(fit)
  n <- nrow(X); k <- ncol(X)
  leverage <- hat(X)
  e2 <- res^2 / (1 - leverage)^2
  meat <- crossprod(X * sqrt(e2))
  bread <- solve(crossprod(X))
  vcov_hc3 <- n / (n - k) * bread %*% meat %*% bread

  structure(
    list(
      fit       = fit,
      vcov_hc3  = vcov_hc3,
      coef      = coef(fit),
      se_hc3    = sqrt(diag(vcov_hc3))
    ),
    class = "robust_lm"
  )
}

#' @export
print.robust_lm <- function(x, ...) {
  cat("Robust Linear Model (HC3 SE)\n")
  coef_table <- data.frame(
    Estimate = x$coef,
    `HC3 SE` = x$se_hc3,
    `t value` = x$coef / x$se_hc3,
    check.names = FALSE
  )
  print(coef_table)
  invisible(x)
}

Example 2: Python Interface via reticulate

# Python script that calls R statistical routines via rpy2

import numpy as np
from scipy.stats import ttest_ind, mannwhitneyu

def compare_groups(g1, g2, paired=False):
    """Compare two groups using multiple tests.

    Provides results from both Python (scipy) implementations.
    In a full setup, would also call R via rpy2 for consistency checks.

    Args:
        g1, g2: numpy arrays of observations
        paired: whether groups are paired
    Returns:
        dict of test results
    """
    g1, g2 = np.asarray(g1), np.asarray(g2)
    t_stat, t_p = ttest_ind(g1, g2, equal_var=False)  # Welch's t-test
    u_stat, u_p = mannwhitneyu(g1, g2, alternative="two-sided")

    # Cohen's d
    pooled_sd = np.sqrt((np.var(g1, ddof=1) + np.var(g2, ddof=1)) / 2)
    d = (np.mean(g1) - np.mean(g2)) / pooled_sd

    return {
        "t_stat": t_stat, "t_p": t_p,
        "u_stat": u_stat, "u_p": u_p,
        "cohens_d": d,
        "n1": len(g1), "n2": len(g2),
    }

np.random.seed(42)
g1 = np.random.normal(10, 2, 50)
g2 = np.random.normal(8, 2, 50)
results = compare_groups(g1, g2)
print("Group comparison results:")
for k, v in results.items():
    print(f"  {k}: {v:.4f}" if isinstance(v, float) else f"  {k}: {v}")

Adoption

xjtulyc/r-package-dev

$ install --global

Security Scan Results

SKILL.md

R Research Package Development

When to Use This Skill

Background & Key Concepts

R Package Structure

roxygen2 Documentation Tags

S3 vs S4 vs R6

DESCRIPTION File

Environment Setup

Core Workflow

Step 1: Create Package Scaffolding

Step 2: Write Documented R Functions

Step 3: testthat Unit Tests

Advanced Usage

rpy2: Call R Package from Python

Package Vignette Template

Troubleshooting

External Resources

Examples

Example 1: S3 Class for Statistical Models

Example 2: Python Interface via reticulate

Related Skills

xjtulyc/quarto-reporting

xjtulyc/python-package-dev

xjtulyc/latex-workflow

xjtulyc/data-version-control

xjtulyc/r-package-dev

$ install --global

Security Scan Results

SKILL.md

R Research Package Development

When to Use This Skill

Background & Key Concepts

R Package Structure

roxygen2 Documentation Tags

S3 vs S4 vs R6

DESCRIPTION File

Environment Setup

Core Workflow

Step 1: Create Package Scaffolding

Step 2: Write Documented R Functions

Step 3: testthat Unit Tests

Advanced Usage

rpy2: Call R Package from Python

Package Vignette Template

Troubleshooting

External Resources

Examples

Example 1: S3 Class for Statistical Models

Example 2: Python Interface via reticulate

Related Skills

xjtulyc/quarto-reporting

xjtulyc/python-package-dev

xjtulyc/latex-workflow

xjtulyc/data-version-control