Discrete Multivariate Analysis - STAT 579 - Problem Set 10

May 1, 2021 Alex Towell (lex@metafunctor.com) 4 min read Updated: December 16, 2025

Supplemental material

We refactored the code for independence testing as a function that returns all relevant calculations given a matrix of cross-sectional data.

snugshade Highlighting [# csdata is cross sectional data (observed counts) entered as an IxJ matrix.]{style=“color: 0.56,0.35,0.01”} [# tests for independence P(X,Y) = P(X)P(Y) using the X^2 statistic.]{style=“color: 0.56,0.35,0.01”} independence_test_X2 [<-]{style=“color: 0.56,0.35,0.01”} [function]{style=“color: 0.13,0.29,0.53”}(csdata) { [# define the dimensions of the table]{style=“color: 0.56,0.35,0.01”} I [=]{style=“color: 0.56,0.35,0.01”} [dim]{style=“color: 0.13,0.29,0.53”}(csdata)[[1]{style=“color: 0.00,0.00,0.81”}] J [=]{style=“color: 0.56,0.35,0.01”} [dim]{style=“color: 0.13,0.29,0.53”}(csdata)[[2]{style=“color: 0.00,0.00,0.81”}]

[# compute the overall sample size, the row sample sizes, and the column sample sizes]{style=“color: 0.56,0.35,0.01”} total [=]{style=“color: 0.56,0.35,0.01”} [sum]{style=“color: 0.13,0.29,0.53”}(csdata) row.sum [=]{style=“color: 0.56,0.35,0.01”} [apply]{style=“color: 0.13,0.29,0.53”}(csdata,[1]{style=“color: 0.00,0.00,0.81”},sum) col.sum [=]{style=“color: 0.56,0.35,0.01”} [apply]{style=“color: 0.13,0.29,0.53”}(csdata,[2]{style=“color: 0.00,0.00,0.81”},sum)

[# use matrix algebra to compute the table of expected cell counts]{style=“color: 0.56,0.35,0.01”} expected [=]{style=“color: 0.56,0.35,0.01”} [matrix]{style=“color: 0.13,0.29,0.53”}(row.sum) [%*%]{style=“color: 0.81,0.36,0.00”} [t]{style=“color: 0.13,0.29,0.53”}([matrix]{style=“color: 0.13,0.29,0.53”}(col.sum)) [/]{style=“color: 0.81,0.36,0.00”} total [dimnames]{style=“color: 0.13,0.29,0.53”}(expected) [=]{style=“color: 0.56,0.35,0.01”} [dimnames]{style=“color: 0.13,0.29,0.53”}(csdata)

[# compute the X^2 statistic, degrees of freedom, and p-value]{style=“color: 0.56,0.35,0.01”} X2 [=]{style=“color: 0.56,0.35,0.01”} [sum]{style=“color: 0.13,0.29,0.53”}((obs[-]{style=“color: 0.81,0.36,0.00”}expected)[^]{style=“color: 0.81,0.36,0.00”}[2]{style=“color: 0.00,0.00,0.81”}[/]{style=“color: 0.81,0.36,0.00”}expected) df [=]{style=“color: 0.56,0.35,0.01”} (I[-1]{style=“color: 0.00,0.00,0.81”})[*****]{style=“color: 0.81,0.36,0.00”}(J[-1]{style=“color: 0.00,0.00,0.81”}) p.value.X2 [=]{style=“color: 0.56,0.35,0.01”} [pchisq]{style=“color: 0.13,0.29,0.53”}(X2,df,[lower.tail =]{style=“color: 0.13,0.29,0.53”} [FALSE]{style=“color: 0.56,0.35,0.01”})

[# return computed values as a map]{style=“color: 0.56,0.35,0.01”} [list]{style=“color: 0.13,0.29,0.53”}([expected_counts =]{style=“color: 0.13,0.29,0.53”} expected, [estimate =]{style=“color: 0.13,0.29,0.53”} expected[/]{style=“color: 0.81,0.36,0.00”}[sum]{style=“color: 0.13,0.29,0.53”}(obs), [X2 =]{style=“color: 0.13,0.29,0.53”} X2, [df =]{style=“color: 0.13,0.29,0.53”} df, [p.value =]{style=“color: 0.13,0.29,0.53”} p.value.X2) }

We also refactored the code for independence testing under the assumption of a monotonic association ([\(\gamma\)]{.math .inline} correlation).

snugshade

Problem 1

The observed data is cross-sectional with a general model given by [\[\{n_{i j}\} \sim \operatorname{MULT}(n, \{\pi_{i j}\}).\]]{.math .display}

We consider a simpler model where [\(X\)]{.math .inline} and [\(Y\)]{.math .inline} are independent, i.e., [\(\Pr(X,Y) = \Pr(X)\Pr(Y)\)]{.math .inline}. Then, our task is to measure how compatible the observed data is to the null hypothesis [\[H_0 : \pi_{i j} = \pi_{i+}\pi_{+j}.\]]{.math .display} We prefer [\(H_0\)]{.math .inline} to a more general model that requires more parameters to estimate (and thus will have greater variance) and interpret.

We perform a chi-square test for independence. The test statistic is given by [\[X^2 = \sum_{i=1}^{4} \frac{(n_{i j} - \hat{m}_{i j})^2}{\hat{m}_{i j}}\]]{.math .display} where [\(\hat{m}_{i j} = n \hat{\pi}_{i j 0}\)]{.math .inline} and [\(\hat{\pi}_{i j 0} = \frac{n_{i+}n_{+j}}{n^2}\)]{.math .inline}.

Under the null model, [\(X^2\)]{.math .inline} is distributed [\(\chi^2\)]{.math .inline} with [\(\rm{df} = 9\)]{.math .inline} degrees of freedom.

We compute the [\(X^2\)]{.math .inline} statistic and [\(p\)]{.math .inline}-value with the following R code:

snugshade Highlighting obs [=]{style=“color: 0.56,0.35,0.01”} [matrix]{style=“color: 0.13,0.29,0.53”}([c]{style=“color: 0.13,0.29,0.53”}([7]{style=“color: 0.00,0.00,0.81”},[7]{style=“color: 0.00,0.00,0.81”},[2]{style=“color: 0.00,0.00,0.81”},[3]{style=“color: 0.00,0.00,0.81”},[2]{style=“color: 0.00,0.00,0.81”},[8]{style=“color: 0.00,0.00,0.81”},[3]{style=“color: 0.00,0.00,0.81”},[7]{style=“color: 0.00,0.00,0.81”},[1]{style=“color: 0.00,0.00,0.81”},[5]{style=“color: 0.00,0.00,0.81”},[4]{style=“color: 0.00,0.00,0.81”},[9]{style=“color: 0.00,0.00,0.81”},[2]{style=“color: 0.00,0.00,0.81”},[8]{style=“color: 0.00,0.00,0.81”},[9]{style=“color: 0.00,0.00,0.81”},[14]{style=“color: 0.00,0.00,0.81”}), [nrow=]{style=“color: 0.13,0.29,0.53”}[4]{style=“color: 0.00,0.00,0.81”}, [byrow=]{style=“color: 0.13,0.29,0.53”}[TRUE]{style=“color: 0.56,0.35,0.01”}) [colnames]{style=“color: 0.13,0.29,0.53”}(obs) [<-]{style=“color: 0.56,0.35,0.01”} [c]{style=“color: 0.13,0.29,0.53”}(["never/occassionally"]{style=“color: 0.31,0.60,0.02”}, ["fairly often"]{style=“color: 0.31,0.60,0.02”}, ["very often"]{style=“color: 0.31,0.60,0.02”}, ["almost always"]{style=“color: 0.31,0.60,0.02”}) [rownames]{style=“color: 0.13,0.29,0.53”}(obs) [<-]{style=“color: 0.56,0.35,0.01”} [colnames]{style=“color: 0.13,0.29,0.53”}(obs)

[print]{style=“color: 0.13,0.29,0.53”}(obs)

##                     never/occassionally fairly often very often almost always
## never/occassionally                   7            7          2             3
## fairly often                          2            8          3             7
## very often                            1            5          4             9
## almost always                         2            8          9            14

snugshade Highlighting x2_test [<-]{style=“color: 0.56,0.35,0.01”} [independence_test_X2]{style=“color: 0.13,0.29,0.53”}(obs)

The observed [\(X_0^2\)]{.math .inline} is 16.955 and the [\(p\)]{.math .inline}-value is 0.049. We consider this [\(p\)]{.math .inline}-value to be moderate evidence against the null model. Stated differently, the observed data is moderately incompatible with the independence model.

For completeness, the estimates of [\(\pi_{i j}\)]{.math .inline} under the independence model (with no additional assumptions) for the given cross-sectional data is given by:

snugshade

##                     never/occassionally fairly often very often almost always
## never/occassionally               0.028        0.064      0.041         0.076
## fairly often                      0.029        0.068      0.043         0.080
## very often                        0.028        0.064      0.041         0.076
## almost always                     0.048        0.112      0.072         0.132

Problem 2

The observed data is cross-sectional with a general model given by [\[\{n_{i j}\} \sim \operatorname{MULT}(n, \{\pi_{i j}\}).\]]{.math .display}

We consider a simpler model using [\(\gamma\)]{.math .inline}, [\[H_0 : \gamma = 0,\]]{.math .display} i.e., [\(\gamma\)]{.math .inline} is zero if [\(X\)]{.math .inline} and [\(Y\)]{.math .inline} are independent.

The test statistic is given by [\[Z^* = \frac{\hat\gamma-0}{\hat\sigma(\hat\gamma)},\]]{.math .display} which is normally distributed [\(\mathcal{N}(0,1)\)]{.math .inline} under the null model.

We compute [\(Z^*\)]{.math .inline} and [\(p\)]{.math .inline}-value with the following R code:

snugshade Highlighting gamma_test [<-]{style=“color: 0.56,0.35,0.01”} [independence_test_gamma]{style=“color: 0.13,0.29,0.53”}(obs)

The observed [\(Z^*\)]{.math .inline} is 3.207 and the [\(p\)]{.math .inline}-value is 0.001. We consider this [\(p\)]{.math .inline}-value to be very strong evidence against the null model. That is, the observed data provides very strong evidence against the independence model when testing for support of a monotonic association model.

For completeness, the estimate for [\(\gamma\)]{.math .inline} is [\(\hat\gamma=0.36\)]{.math .inline}.

Problem 3

Estimates from simpler models have smaller variances than for estimates from more complicated models.