From 167655de96a2482df40a6b31c6aa69410d742ba7 Mon Sep 17 00:00:00 2001 From: David Palmer Date: Mon, 30 Sep 2024 16:18:03 -0600 Subject: [PATCH] Changed fig 3. image to the correct F distribution and cited the source --- ancova.qmd | 17 +- docs/Anova_F-test.html | 383 +-- docs/Anova_F-test.qmd | 42 +- .../libs/bootstrap/bootstrap-icons.css | 2078 +++++++++++++++++ .../libs/bootstrap/bootstrap-icons.woff | Bin 0 -> 176200 bytes .../libs/bootstrap/bootstrap.min.css | 12 + .../libs/bootstrap/bootstrap.min.js | 7 + .../libs/clipboard/clipboard.min.js | 7 + .../libs/quarto-html/anchor.min.js | 9 + .../libs/quarto-html/popper.min.js | 6 + .../quarto-syntax-highlighting.css | 205 ++ .../libs/quarto-html/quarto.js | 908 +++++++ .../libs/quarto-html/tippy.css | 1 + .../libs/quarto-html/tippy.umd.min.js | 2 + docs/images/p-value_example.jpg | Bin 16448 -> 30171 bytes images/p-value_example.jpg | Bin 16448 -> 30171 bytes split_plot.qmd | 18 +- 17 files changed, 3356 insertions(+), 339 deletions(-) create mode 100644 docs/Anova_F-test_files/libs/bootstrap/bootstrap-icons.css create mode 100644 docs/Anova_F-test_files/libs/bootstrap/bootstrap-icons.woff create mode 100644 docs/Anova_F-test_files/libs/bootstrap/bootstrap.min.css create mode 100644 docs/Anova_F-test_files/libs/bootstrap/bootstrap.min.js create mode 100644 docs/Anova_F-test_files/libs/clipboard/clipboard.min.js create mode 100644 docs/Anova_F-test_files/libs/quarto-html/anchor.min.js create mode 100644 docs/Anova_F-test_files/libs/quarto-html/popper.min.js create mode 100644 docs/Anova_F-test_files/libs/quarto-html/quarto-syntax-highlighting.css create mode 100644 docs/Anova_F-test_files/libs/quarto-html/quarto.js create mode 100644 docs/Anova_F-test_files/libs/quarto-html/tippy.css create mode 100644 docs/Anova_F-test_files/libs/quarto-html/tippy.umd.min.js diff --git a/ancova.qmd b/ancova.qmd index 20c50ab..2fb755f 100644 --- a/ancova.qmd +++ b/ancova.qmd @@ -2,14 +2,13 @@ title: "ANCOVA" --- -```{=html} -``` + ```{r} #| label: setup #| include: false @@ -227,13 +226,13 @@ Though the formula may seem a little different, the good news is that R will do The assumptions for the ANCOVA model are essentially the same as for [CB\[1\]](cb1.qmd), with the addition that the response and the covariate need to have a linear relationship: -| Requirements | Method for Checking | What You Hope to See | -|--------------------|--------------------------|--------------------------| -| Linear relationship between covariate and response | Residual vs. Fitted Plot | No trend or pattern | -| Constant variance across factor levels | Residual vs. Fitted Plot | No wedge or megaphone shape | -| Normally Distributed Residuals | Normal Q-Q plot | Straight line, majority of points in boundaries | -| Independent residuals | Order plot (only if applicable) | No pattern/trend | -| | Familiarity with/critical thinking about the experiment | No potential source for bias | +| Requirements | Method for Checking | What You Hope to See | +|----|----|----| +| Linear relationship between covariate and response | Residual vs. Fitted Plot | No trend or pattern | +| Constant variance across factor levels | Residual vs. Fitted Plot | No wedge or megaphone shape | +| Normally Distributed Residuals | Normal Q-Q plot | Straight line, majority of points in boundaries | +| Independent residuals | Order plot (only if applicable) | No pattern/trend | +| | Familiarity with/critical thinking about the experiment | No potential source for bias | The following example illustrates how to conduct an ANOVA in R. diff --git a/docs/Anova_F-test.html b/docs/Anova_F-test.html index a6a2a6d..0787367 100644 --- a/docs/Anova_F-test.html +++ b/docs/Anova_F-test.html @@ -2,12 +2,12 @@ - + -Math326 Notebook - ANOVA and the F-Test +ANOVA and the F-Test - - - - - - - - - - - - - - - - - + + + + + + + + + + - + - - + -
-
- -
- -
- - - -
@@ -367,7 +139,7 @@

Regression

Analysis of Variance

Though ANOVA may seem intimidating at first, the underlying idea is relatively simple. If we want to determine if a factor has a significant effect on a response variable we can look at the variance, or spread, in the resulting data. Specifically, we will compare the spread between factor level means with the spread of observations within a factor level.

Figure 1 illustrates some made up data for a very simple, generic example where there are just two factor levels: X and O.

-
+
@@ -379,7 +151,7 @@

Analysis of Variance

Each individual observation is represented by a black X or O above the number line. The factor level mean for each factor is in blue, plotted beneath the number line. It is apparent that the X factor level mean and the O factor level mean are statistically different. The spread between factor level means is quite large relative to the spread of observations within each factor level.

Now consider the case, in Figure 2, where the spread between factor level means is quite small relative to the spread of observations within a factor level. This suggests that the difference in factor level means could just as easily be due to random chance as due to a real difference between factor levels.

-
+
@@ -409,8 +181,8 @@

ANOVA F-test: A Variance Ratio

\]

-

The F distribution is defined by 2 values for degrees of freedom. One degrees of freedom value for the variance estimate in the numerator, and another for the variance estimate in the denominator. The p-value for the ANOVA F-test is calculated as the area under the F distribution curve to the right of the F statistic, as shown in Figure 3 .

-
+

The F distribution is defined by 2 values for degrees of freedom. One degrees of freedom value for the variance estimate in the numerator, and another for the variance estimate in the denominator. The p-value for the ANOVA F-test is calculated as the area under the F distribution curve to the right of the F statistic, as shown in Figure 3 2.

+
@@ -423,19 +195,19 @@

ANOVA F-test: A Variance Ratio

If the ratio of variances is large (as in Figure 1), then we get a large F statistic and a small p-value. If the p-value is less than our chosen level of significance we conclude that at least one factor level effect is non-zero. On the other hand, as the F ratio gets smaller and approaches 1 (similar to Figure 2) our p-value increases and we may have insufficient evidence to say that the treatment factor has an effect on the response.

An F-test is only performed on structural factors. A test of the grand mean is not interesting, and a test of the residual error factor does not make sense. In fact, doing an F test of the residual error factor is not possible since the mean square error is typically the denominator in our F statistic.

The question then follows, how do we estimate variation between factor levels and within factor levels?

-

Similar to the way in which we were able to break down each observation into its component parts using the effects model, we now want to break down the variability in the dataset into its component parts. We will estimate how much of the total variability in the dataset comes from each of the terms (or factors) in the effects model.

+

Similar to the way in which we were able to break down each observation into its component parts using the effects model, we now want to break down the variability in the dataset into its component parts. We will estimate how much of the total variability in the dataset comes from each of the terms (or factors) in the effects model.

The effects model:

\(y_\text{ij} = \mu + \alpha_i + \epsilon_\text{ij}\)

The breakdown of variability and the resulting F-statistic and p-value are often displayed in an ANOVA summary table. Table 1 is a blank version of the ANOVA summary table with just 1 structural factor/treatment. This table is essentially a container to provide perspective about how variability in the dataset is allocated across factors.

-
+
Table 1: Blank ANOVA summary table for an experiment with 1 treatment factor
- +
@@ -492,20 +264,20 @@

Mean Squares (MS)

F = \frac{\text{Variation between factor levels}}{\text{Variation within factor levels}} = \frac{\text{Mean squares of treatment factor means}}{\text{Mean squares of residual errors}} \]

With this is mind, we can fill in the F column of the ANOVA table for Treatment Factor.

-
+
Table 2: Blank ANOVA summary table for an experiment with 1 treatment factor
-
Source
+
------++++++ @@ -557,13 +329,13 @@

Mean Squares (MS)

To find the variation between factor level means, calculate the sample variance of factor level means and multiply it by the number of replicates in each factor level. (In the case of unbalanced data where you do not have the same number of replicates in each factor level, weighting by sample size is needed).

The Mean Squares of the residual error factor (i.e. Mean Squared Error, MSE) represents the within factor level variation. To calculate it you can find the sample variance within each factor level and then take the mean of those variances. (In the case of unbalanced data where you do not have the same number of replicates in each factor level, you would take a weighted average).

-

Figure 4 (a) and Figure 4 (b) below show deviations necessary to calculate the between group and within group variance (or in other words the mean squares treatment and the mean squares error). The figures are based on data from an experiment with 3 factor levels. Figure 4 (a) shows variation between factor level means. It shows the factor level means plotted as blue lines and the grand mean as a red line. In this chart we see the deviation from each factor level mean to the grand mean represented as a gray dashed line. As mentioned above, the mean squares for treatment could be computed as the variance of the 3 factor level means2, and multiplied by 15 (the number of replicates in each factor level).

+

Figure 4 (a) and Figure 4 (b) below show deviations necessary to calculate the between group and within group variance (or in other words the mean squares treatment and the mean squares error). The figures are based on data from an experiment with 3 factor levels. Figure 4 (a) shows variation between factor level means. It shows the factor level means plotted as blue lines and the grand mean as a red line. In this chart we see the deviation from each factor level mean to the grand mean represented as a gray dashed line. As mentioned above, the mean squares for treatment could be computed as the variance of the 3 factor level means3, and multiplied by 15 (the number of replicates in each factor level).

-
+
@@ -575,7 +347,7 @@

Mean Squares (MS)

-
+
@@ -593,7 +365,7 @@

Mean Squares (MS)

-

Figure 4 (b) shows variation within factor levels. Each point is plotted in a cluster according to the factor level it belongs to. The deviation from each point to its respective factor level mean is depicted with a gray dashed line. The mean square error could be computed by finding the sample variance within each group3 and then taking the mean of those 3 variance estimates.

+

Figure 4 (b) shows variation within factor levels. Each point is plotted in a cluster according to the factor level it belongs to. The deviation from each point to its respective factor level mean is depicted with a gray dashed line. The mean square error could be computed by finding the sample variance within each group4 and then taking the mean of those 3 variance estimates.

Thinking about things in this way is helpful to understand conceptually what is happening. However, the ANOVA summary table captures interim steps for calculating mean squares slightly different. Since Mean Squares is synonymous with variance, now is a good time to review the sample variance formula.

\[ s^2 = \frac{\sum{(y_i - \bar{y})}^2}{n-1} @@ -607,14 +379,14 @@

Mean Squares (MS)

  • denominator: the number of pieces of information used to create the sum in the numerator
  • Therefore, in a mean square calculation, the numerator is the sum of squares and the denominator is the degrees of freedom.

    -
    +
    ------++++++ @@ -667,14 +439,14 @@

    Sum of Squares (SS)

    Let’s talk about the numerator first, this will be the sum of squared deviations, or Sum of Squares for short. The Sum of Squares (SS) is a measure of the total variability in a dataset. A naïve approach to calculating total variability in a dataset is to measure the distance from each value to the mean of the dataset. The problem with this approach is that those distance measures will always sum to zero.

    To avoid this problem, statisticians square the distances before summing them. This results in a value that summarizes the total amount of spread in the dataset. This quantity, the Sum of Squares, is important and so it has its own column in the ANOVA summary table.

    In the table below an equation for each factor’s SS is listed using terms from the factor effects model. We’ll walk through the meaning of each of those equations.

    -
    +
    ------++++++ @@ -755,19 +527,19 @@

    Degrees of Freedom

    In our dataset we have a certain number of observations. All those observations can be used to estimate the variance in the dataset. But you will notice in Equation 2 the data has already been used to estimate the grand mean (\(\bar{y}\) estimates \(\mu\)). In other words, before we can estimate the variance we must use the data to estimate the mean. Estimating the mean “uses up” one degree of freedom. This is why the denominator of the sample variance formula divides by \(n-1\) instead of by \(n\).

    For additional explanation, consider this simple example. There are three data points and you know that the mean of these 3 data points is 10. The value of the first data point could be any number, it is free to vary. The value of the second data point could also be any number, it is free to vary. The third number’s value is not free to vary. It is constrained by the fact that the mean of the 3 data points must be 10. The values of the first two datapoints will determine the value of the third under the constraint of a known (or estimated) mean.

    The example described above is summarized in Table 3. The first number is represented as an \(a\) and the second number if represented with a \(b\).

    -
    +
    Table 3: Only n-1 values are free to vary when the mean of the values is known
    -
    +
    ---+++ @@ -820,8 +592,6 @@

    Degrees of Freedom

    \]

    - - @@ -839,10 +609,12 @@

    Degrees of Freedom

    F = \frac{\text{variance within factor levels}}{\text{variance within factor levels}} = 1 \]

    The values for degrees of freedom affect the shape and the spread of the F distribution. Visit this applet to learn more and interact with the family of F distributions.↩︎

    -
  • The variance would be calculated by squaring the deviations, summing them up and then dividing by the degrees of freedom, which is 2 in this case.↩︎

  • -
  • The variance is calculated by squaring the deviations, and then summing them together and dividing by the degrees of freedom, which in this case is 14 for each group.↩︎

  • +
  • From the F distribution applet (https://homepage.divms.uiowa.edu/~mbognar/applets/f.html) copyright 2021 by Matt Bogner, Department of Statistics and Actuarial Science, University of Iowa↩︎

  • +
  • The variance would be calculated by squaring the deviations, summing them up and then dividing by the degrees of freedom, which is 2 in this case.↩︎

  • +
  • The variance is calculated by squaring the deviations, and then summing them together and dividing by the degrees of freedom, which in this case is 14 for each group.↩︎

  • - + + -``` + ```{r} #| label: setup #| include: false @@ -157,12 +156,12 @@ $$ ANOVA tests are appropriate for a split-plot analysis if the following requirements are satisfied: -| Requirements | Method for Checking | What You Hope to See | -|----------------------------------------|---------------------------------------------------------|----------------------------------------------------------| -| Constant variance across factor levels | Residual vs. Fitted Plot | No major disparity in vertical spread of point groupings | -| The residuals are normally distributed | Normal Q-Q plot | Straight line, majority of points in boundaries | -| Independent residuals | Order plot | No pattern/trend | -| | Familiarity with/critical thinking about the experiment | No potential source for bias | +| Requirements | Method for Checking | What You Hope to See | +|----|----|----| +| Constant variance across factor levels | Residual vs. Fitted Plot | No major disparity in vertical spread of point groupings | +| The residuals are normally distributed | Normal Q-Q plot | Straight line, majority of points in boundaries | +| Independent residuals | Order plot | No pattern/trend | +| | Familiarity with/critical thinking about the experiment | No potential source for bias | # Design @@ -503,7 +502,6 @@ $$ Similarly, we get the sum of squares for the remaining factors. -```{=tex} \begin{align} SS_\text{Fungicide} &= 12 * 5.\overline{4} = 65.\overline{3}\\ \\ @@ -516,7 +514,7 @@ SS_\text{Fung. x Variety Interaction} &= 4 * ( 1.36\overline{1} + 4.3402\overlin SS_\text{Residuals} &= 2 * (2.\overline{7} + 25 + .69\overline{4} + 9 + .69\overline{4} + 4) = 84.\overline{3} \end{align} -``` + Putting this information into the ANOVA table gets us the result shown in @tbl-ss_anova. ```{r echo = FALSE}