What is hypothesize?

The hypothesize package provides a consistent API for hypothesis testing in R. It is designed around three principles from Structure and Interpretation of Computer Programs (SICP):

  1. Data Abstraction: Hide implementation details behind a clean interface
  2. Closure Property: Combining tests yields tests
  3. Higher-Order Functions: Transform tests into new tests

This vignette introduces the package through examples, building from simple primitives to more sophisticated compositions.

The Hypothesis Testing Interface

Every hypothesis test in this package implements the same interface:

  • pval(x) — extract the p-value
  • test_stat(x) — extract the test statistic
  • dof(x) — extract degrees of freedom
  • is_significant_at(x, alpha) — check significance at level α

This abstraction means you can work with any test the same way, regardless of its internal implementation.

Primitive Tests

The Z-Test: Simplest Case

The z-test is the most basic hypothesis test. It tests whether a population mean equals a hypothesized value when the population standard deviation is known:

# A manufacturer claims widgets weigh 100g on average
# We test 30 widgets (population SD is known to be 5g from historical data)
set.seed(42)
weights <- rnorm(30, mean = 102, sd = 5)

# Test H0: mu = 100 vs H1: mu != 100
z <- z_test(weights, mu0 = 100, sigma = 5)
z
#> Hypothesis test ( z_test )
#> -----------------------------
#> Test statistic:  2.566556 
#> P-value:  0.01027141 
#> Degrees of freedom:  Inf 
#> Significant at 5% level:  TRUE

We can extract components:

test_stat(z)   # The z-statistic
#> [1] 2.566556
pval(z)        # The p-value
#> [1] 0.01027141
dof(z)         # Degrees of freedom (Inf for z-test)
#> [1] Inf

The Wald Test: General Parameters

The Wald test generalizes the z-test to any asymptotically normal estimator. If you have an estimate and its standard error, you can test whether the true parameter equals a hypothesized value:

# Suppose we fit a regression and got coefficient = 2.3 with SE = 0.8
# Test H0: beta = 0 (no effect) vs H1: beta != 0
w <- wald_test(estimate = 2.3, se = 0.8, null_value = 0)
w
#> Hypothesis test ( wald_test )
#> -----------------------------
#> Test statistic:  8.265625 
#> P-value:  0.004040275 
#> Degrees of freedom:  1 
#> Significant at 5% level:  TRUE

is_significant_at(w, 0.05)
#> [1] TRUE

The Wald statistic is z², which follows a chi-squared distribution with 1 df:

w$z            # The z-score
#> [1] 2.875
test_stat(w)   # z² (the Wald statistic)
#> [1] 8.265625

The Likelihood Ratio Test: Comparing Models

The LRT compares nested models by examining the ratio of their likelihoods:

# Suppose we fit two models:
# Null model (simpler): log-likelihood = -250
# Alt model (complex):  log-likelihood = -240
# The alt model has 3 extra parameters

test <- lrt(null_loglik = -250, alt_loglik = -240, dof = 3)
test
#> Hypothesis test ( likelihood_ratio_test )
#> -----------------------------
#> Test statistic:  20 
#> P-value:  0.0001697424 
#> Degrees of freedom:  3 
#> Significant at 5% level:  TRUE

# Is the more complex model justified?
is_significant_at(test, 0.05)
#> [1] TRUE

Duality: Tests and Confidence Intervals

Hypothesis tests and confidence intervals are two sides of the same coin. A 95% confidence interval contains exactly those parameter values that would not be rejected at the 5% level.

For tests that store their estimates, we can extract confidence intervals:

w <- wald_test(estimate = 5.2, se = 1.1)
confint(w)               # 95% CI
#>   lower   upper 
#> 3.04404 7.35596
confint(w, level = 0.99) # 99% CI
#>    lower    upper 
#> 2.366588 8.033412

Notice the duality: the CI contains 0 if and only if the test is not significant:

# Is 0 in the 95% CI?
ci <- confint(w)
ci["lower"] < 0 && 0 < ci["upper"]
#> [1] FALSE

# Is the test significant at 5%?
is_significant_at(w, 0.05)
#> [1] TRUE

The Closure Property: Combining Tests

A key principle from SICP is the closure property: when you combine things, you get back the same kind of thing. In our case, combining hypothesis tests yields a hypothesis test.

Fisher’s method combines p-values from independent tests:

# Three independent studies test the same hypothesis
# None is individually significant, but together...
p1 <- 0.08
p2 <- 0.12
p3 <- 0.06

combined <- fisher_combine(p1, p2, p3)
combined
#> Hypothesis test ( fisher_combined_test )
#> -----------------------------
#> Test statistic:  14.91881 
#> P-value:  0.02089771 
#> Degrees of freedom:  6 
#> Significant at 5% level:  TRUE

is_significant_at(combined, 0.05)
#> [1] TRUE

You can also combine actual test objects:

t1 <- wald_test(estimate = 1.2, se = 0.8)
t2 <- wald_test(estimate = 0.9, se = 0.6)
t3 <- wald_test(estimate = 1.5, se = 1.0)

fisher_combine(t1, t2, t3)
#> Hypothesis test ( fisher_combined_test )
#> -----------------------------
#> Test statistic:  12.07678 
#> P-value:  0.06027762 
#> Degrees of freedom:  6 
#> Significant at 5% level:  FALSE

Because the result is itself a hypothesis_test, you can use all the same methods on it:

combined <- fisher_combine(0.05, 0.10, 0.15)
pval(combined)
#> [1] 0.02556195
test_stat(combined)
#> [1] 14.39087
dof(combined)
#> [1] 6

Higher-Order Functions: Transforming Tests

When performing multiple tests, we need to adjust p-values to control error rates. The adjust_pval() function transforms tests by adjusting their p-values:

# Perform 5 tests
tests <- list(
  wald_test(estimate = 2.5, se = 1.0),
  wald_test(estimate = 1.8, se = 0.9),
  wald_test(estimate = 1.2, se = 0.7),
  wald_test(estimate = 0.9, se = 0.8),
  wald_test(estimate = 2.1, se = 1.1)
)

# Original p-values
sapply(tests, pval)
#> [1] 0.01241933 0.04550026 0.08647627 0.26058903 0.05625037

# Bonferroni-adjusted p-values
adjusted <- adjust_pval(tests, method = "bonferroni")
sapply(adjusted, pval)
#> [1] 0.06209665 0.22750132 0.43238133 1.00000000 0.28125183

# Less conservative: Benjamini-Hochberg (FDR control)
adjusted_bh <- adjust_pval(tests, method = "BH")
sapply(adjusted_bh, pval)
#> [1] 0.06209665 0.09375061 0.10809533 0.26058903 0.09375061

The adjusted tests retain all original properties:

adj <- adjusted[[1]]
test_stat(adj)           # Same as original
#> [1] 6.25
adj$original_pval        # Can access unadjusted p-value
#> [1] 0.01241933
adj$adjustment_method    # Method used
#> [1] "bonferroni"

Because adjusted tests are still hypothesis tests, they compose naturally:

# First adjust, then combine
adjusted <- adjust_pval(tests[1:3], method = "bonferroni")
fisher_combine(adjusted[[1]], adjusted[[2]], adjusted[[3]])
#> Hypothesis test ( fisher_combined_test )
#> -----------------------------
#> Test statistic:  13.26117 
#> P-value:  0.03907096 
#> Degrees of freedom:  6 
#> Significant at 5% level:  TRUE

Extending the Package

The hypothesis_test() constructor makes it easy to add new test types:

# Example: Create a simple chi-squared goodness-of-fit test wrapper
chisq_gof <- function(observed, expected) {
  stat <- sum((observed - expected)^2 / expected)
  df <- length(observed) - 1
  p.value <- pchisq(stat, df = df, lower.tail = FALSE)

  hypothesis_test(
    stat = stat,
    p.value = p.value,
    dof = df,
    superclasses = "chisq_gof_test",
    observed = observed,
    expected = expected
  )
}

# Use it
result <- chisq_gof(
  observed = c(45, 35, 20),
  expected = c(40, 40, 20)
)
result
#> Hypothesis test ( chisq_gof_test )
#> -----------------------------
#> Test statistic:  1.25 
#> P-value:  0.5352614 
#> Degrees of freedom:  2 
#> Significant at 5% level:  FALSE

# It works with all the standard functions
pval(result)
#> [1] 0.5352614
is_significant_at(result, 0.05)
#> [1] FALSE

Summary

The hypothesize package demonstrates that a small set of well-designed primitives and combinators can provide a powerful, extensible framework for hypothesis testing:

Function Purpose SICP Principle
hypothesis_test() Base constructor Data abstraction
z_test(), wald_test(), lrt() Primitive tests Building blocks
pval(), test_stat(), dof() Accessors Abstraction barrier
fisher_combine() Combine tests Closure property
adjust_pval() Transform tests Higher-order functions
confint() Extract CI Duality

This design makes hypothesis testing composable, extensible, and easy to reason about.