6  Statistics module

6.1 General

This module of FXMATE let’s you determine NOEC and LOEC values depending on the type (currently continuous and quantal are supported) and further characteristics of your data.

The data type is automatically set by the selected guideline. While it can be changed and the app will try to convert your data to make it amenable to the respective methods, this is almost never correct to do and will likely cause erroneous results.

Also results from testing normality, variance homogeneity, and monotonicity can be overruled by the user based on expert judgement by clicking the checkboxes Assume … anyway.

FXMATE supports you visually with method choice in the Decision tree tab, where you always find your current choice of method framed in red, while the optimal choice determined by the app is highlighted in green.

R error and warning messages

For reasons of transparency, error and warning messages from the R backend are displayed in this module. Error messages are related to failure of performing a particular test, about which FXMATE informs you also in a pop-up box. Warning messages, however, can provide important insights into testing performance. A common message, for instance, informs about the presence of ties (equal numbers in data) that can affect the precision with which some methods can compute p-values. If you want to know more about warning messages, you can enter “R <your test> <warning message>” in an online search engine.

Tests in this module come from R packages car, DescTools, ecotoxicology, PMCMRplus, rstatix, and survey. If sources of tests are not explicitly mentioned below, realizations were taken from base R.

Note that FXMATE generally uses an alpha of 0.05 for all null-hypothesis significance tests. Statistical significance is hence assumed if p-values are < 0.05.

6.2 Data type

6.2.1 Continuous data

Continuous data has an infinite number of possible values. A good example is algae growth data. In the context of mortality data, it get’s more complicated. While this data, in theory, comes from a binomial distribution as animals can only be either alive or dead, many test guidelines house several animals in one test unit, producing aggregated estimates of mortality, which then - in the context of NOEC and LOEC determination - are handled as continuous. A good example for this is acute Daphnia data. This is also the reason why data types in the Model and Statistics modules can differ (e.g., Daphnia data is handled as quantal in the former).

6.2.1.1 Normality tests

FXMATE offers you two ways to assess if your data is normally distributed:

  1. As done customarily, respective tests are performed for each treatment. If one treatment shows a statistically significant test, data are assumed not to originate from a normal distribution.

  2. When you click the checkbox Test normality globally, the normality test will be applied to your data with respective means being substracted. This option is offered to you as ANOVA and other so-called parametric testing methods (see Section 6.2.1.4) do not actually assume treatment data to be normally distributed but model residuals (i.e., data minus treatment means).

To support your decision about normality visually, a Q-Q plot is provided in the Normality panel. This shows you quantiles of your samples against theoretical quantiles from a normal distribution. The more closely points follow the 1:1 line, the more likely your data originates from a normal distribution. Using a Q-Q plot is particularly useful, if your sample size is very large as then even small deviations from a normal distribution can result in significant tests when using the methods listed below.

6.2.1.1.1 Shapiro-Wilks test
  • General: tests if a sample comes from a normally distributed population
  • Prerequisites: sample size \(\geq\) 3
  • Test statistic: W
  • Interpretation: if significant, data cannot be assumed to originate from a normal distribution
  • Original publication: Shapiro and Wilk (1965)
6.2.1.1.2 Kolmogorov-Smirnov test
  • General: tests if a sample comes from a given reference probability distribution (here: normal)
  • Test statistic: D
  • Interpretation: if significant, data cannot be assumed to originate from a normal distribution
  • Remarks: known to be less sensitive than Shapiro-Wilks test
  • Original publication: Kolmogorov (1933)

6.2.1.2 Tests for variance homogeneity

To support your decision about variance homogeneity visually, a graph combining treatment-wise boxplots with standard deviations is provided in the Variance homogeneity panel.

6.2.1.2.1 Levene’s test
  • General: tests for equality of variances for a variable with two or more groups
  • Test statistic: F
  • Interpretation: if significant, variances cannot be assumed to be equal
  • Original publication: Levene (1960)
  • Realized using: car
6.2.1.2.2 Bartlett’s test
  • General: tests if multiple samples are from populations with equal variances
  • Prerequisites: normal distribution
  • Test statistic: K-squared
  • Interpretation: if significant, variances cannot be assumed to be equal
  • Original publication: Bartlett (1937)

6.2.1.3 Monotonicity tests

To support your decision about monotonicity visually, a plot is provided displaying linear connections between control and treatment means in the Monotonicity panel. If the same effect direction is present across treatments this indicates monotonicity.

6.2.1.3.1 Linear vs. quadratic contrasts
  • General: linear and quadratic contrasts of normalized rank statistics are constructed
  • Interpretation: if the quadratic trend is significant and the linear trend is not, the response is not considered monotonic, otherwise it is
  • Original publication: OECD (2006)
  • Realized using: own code
6.2.1.3.2 Kendall rank correlation coefficient
  • General: measures the ordinal association between two variables
  • Test statistic: tau
  • Interpretation: if significant, a homogeneous relationship can be assumed with the sign of tau indicating the effect direction
  • Original publication: Kendall (1938)
6.2.1.3.3 Spearman’s rank correlation coefficient
  • General: assesses how well the relationship between two variables can be described using a monotonic function
  • Test statistic: S/rho
  • Interpretation: if significant, a homogeneous relationship can be assumed with the sign of rho (not displayed in app) indicating the effect direction
  • Original publication: Spearman (1904)

6.2.1.4 NOEC/LOEC determination

6.2.1.4.1 Omnibus tests

These are statistical tests that determine if there is a difference between multiple groups. They do not test which groups actually differ - that’s where post-hoc tests come into play. A significant omnibus test is usually considered a prerequisite for post-hoc testing. Results of omnibus tests are displayed in the NOEC/LOEC panel.

6.2.1.4.1.1 ANOVA
  • General: analysis of variance testing if two or more population means are equal
  • Prerequisites: normal distribution, variance homogeneity
  • Test statistic: F
  • Interpretation: significance indicates that some of the group means are different
  • Original publication: Fisher (1921)
6.2.1.4.1.2 Welch’s ANOVA
6.2.1.4.1.3 Kruskal-Wallis test
  • General: tests if samples originate from the same distribution
  • Prerequisites: identically shaped and scaled distribution for all groups; note that this is hard to test for and usually ignored but violation may result in significant results not being due to differences in medians
  • Test statistic: Chi-squared
  • Interpretation: if significant, at least one population median of one group is different from the population median of at least one other group
  • Original publication: Kruskal and Wallis (1952)
6.2.1.4.2 Post-hoc tests
6.2.1.4.2.1 P-value corrections

When comparing several groups to a common control, as done in post-hoc testing for determining the NOEC and LOEC values, we face the multiple comparisons problem. That means that the likelihood of falsely rejecting a null-hypothesis (here that groups do not differ) increases with each comparison and we need to correct for this phenomenon. Many methods listed below use an internal procedure to this end but methods designed to only compare two groups (e.g., Student’s t-test) do not and we do need to take care of this on our own.

FXMATE offers two different correction methods, which are implemented as p-value adjustments:

  1. Bonferroni correction

This technique, developed by Bonferroni (1936), is pretty uncomplicated as p-values derived from two-sample tests are simply multiplied by the number of comparisons (i.e., number of groups minus one). However, this makes the Bonferroni correction quite conservative (particularly if you have many comparisons) and might result in an unnecessary loss of statistical power (i.e., the probability to detect a true effect).

  1. Holm-Bonferroni method

This technique, developed by Holm (1979), is a bit more complicated. All p-values from the two-sample tests are sorted from lowest to highest and then they are corrected sequentially:

\[ \frac{alpha}{m}, \frac{alpha}{m - 1}, ..., \frac{alpha}{2}, \frac{alpha}{1} \]

\(alpha\) is the significance level, which is set to 0.05 in FXMATE, and \(m\) is the number of comparisons. These corrected p-values are then sequentially checked for significance and the procedure stops at the first non-significant comparison. While being a bit more complex, it is generally more powerful than the Bonferroni method and is hence used as default in the app. If desired, you can change this in the Select p-value adjustment method dropdown menu in the sidebar.

6.2.1.4.2.2 Student’s t-test with correction
  • General: tests if means of two samples differ
  • Prerequisites: normal distribution, variance homogeneity
  • P-value correction: Bonferroni or Holm (see Section 6.2.1.4.2.1)
  • Test statistic: t
  • Interpretation: if significant, means of samples differ
  • Original publication: Student (1908)
  • Realized using: rstatix
6.2.1.4.2.3 Welch’s t-test with correction
  • General: adaptation of Student’s t-test that can handle unequal variances
  • Prerequisites: normal distribution
  • P-value correction: Bonferroni or Holm (see Section 6.2.1.4.2.1)
  • Test statistic: t
  • Interpretation: if significant, means of samples differ
  • Original publication: Welch (1947)
  • Realized using: rstatix
6.2.1.4.2.4 Wilcoxon rank-sum test with correction
  • General: tests if samples from two populations have the same distribution
  • P-value correction: Bonferroni or Holm (see Section 6.2.1.4.2.1)
  • Prerequisites: dispersions and shapes of sample distributions are equal; note that this is hard to test for and usually ignored but violation may result in significant results not being due to differences in medians
  • Test statistic: U/W
  • Interpretation: if significant, medians of samples differ
  • Original publication: Mann and Whitney (1947)
6.2.1.4.2.5 Dunn’s many-to-one rank comparison test
  • General: tests for differences in mean ranks of groups
  • P-value correction: Bonferroni or Holm (see Section 6.2.1.4.2.1)
  • Prerequisites: same as for Kruskal-Wallis test
  • Test statistic: z
  • Interpretation: if significant, medians of samples differ
  • Original publication: Dunn (1964)
  • Realized using: PMCMRplus
6.2.1.4.2.6 Dunnett’s test
  • General: tests if any treatment mean differs from the control mean
  • P-value correction: internal control
  • Prerequisites: normal distribution, variance homogeneity
  • Test statistic: t
  • Interpretation: if significant, treatment mean differs from the control mean
  • Original publication: Dunnett (1955)
  • Realized using: PMCMRplus
6.2.1.4.2.7 William’s trend test
  • General: tests if any treatment mean differs from the control mean using a step-down procedure
  • P-value correction: internal control
  • Prerequisites: normal distribution, variance homogeneity, monotonicity
  • Test statistic: t
  • Interpretation: if significant (indicated in the app if decision is reject), treatment mean differs from the control mean
  • Remarks: as this test uses a step-down procedure, it stops at the first/highest concentration/dose that is not significant (indicated in the app if decision is accept); the test is a one-sided test and alternatives greater or less are automatically selected according to the selected guideline but this can be changed by the user in the Alternative dropdown menu in the sidebar
  • Original publication: Williams (1971)
  • Realized using: PMCMRplus
6.2.1.4.2.8 Jonckheere’s trend test
  • General: tests if population medians have an a priori ordering using a step-down procedure
  • P-value correction: internal control
  • Prerequisites: monotonicity
  • Test statistic: z
  • Interpretation: if significant, treatment differs from the control
  • Remarks: as this test uses a step-down procedure, it stops at the first/highest concentration/dose that is not significant; by default, a two-sided test is performed, which can be changed by the user in the Alternative dropdown menu in the sidebar
  • Original publication: Jonckheere (1954)
  • Realized using: PMCMRplus

6.2.2 Quantal data

Quantal data arises from binary (0 or 1) data. A classical example is mortality data, where animals can only be either dead or alive.

6.2.2.1 Monotonicity test

To judge if proportions arising from quantal data are monotonically increasing or decreasing, FXMATE uses the functions IsMonotonicallyIncreasing and IsMonotonicallyDecreasing from the ecotoxicology package. These simply test:

\[ p_0 \leq p_1 \leq p_2 \leq ... \leq p_m \]

or

\[ p_0 \geq p_1 \geq p_2 \geq ... \geq p_m \]

\(p_0\) is the proportion in the control and \(p_m\) the proportion observed at the highest dose/concentration.

You can see the result of these tests in the sidebar under Monotonicity.

6.2.2.2 Test for extrabinomial variance

Extrabinomial variance occurs if observed variance is larger than would be expected for a binomial distribution. As the Cochran-Armitage test is sensitive to extrabinomial variation, it needs to be tested for to decide if the Rao-Scott test is required as an alternative.

FXMATE uses Tarone’s Z test (Tarone, 1979) and is implemented in the app following suggestions discussed here.

A significant Tarone’s Z test indicates extrabinomial variation, which points to using the Rao-Scott test.

6.2.2.3 NOEC/LOEC determination

6.2.2.3.1 Cochran-Armitage test
  • General: tests for the presence of an association between a variable with two categories (here 0 and 1) and an ordinal variable with several categories with an ordering in the effects using a step-down procedure
  • P-value correction: internal control
  • Prerequisites: monotonicity, no extrabinomial variance
  • Test statistic: Z
  • Interpretation: if significant, treatment differs from the control
  • Remarks: as this test uses a step-down procedure, it stops at the first/highest concentration/dose that is not significant; by default, a two-sided test is performed, which can be changed by the user in the Alternative dropdown menu in the sidebar
  • Original publication: Cochran (1954)
  • Realized using: DescTools
6.2.2.3.2 Rao-Scott test
  • General: version of the Cochran-Armitage test adjusted for extrabinomial variance
  • P-value correction: internal control
  • Prerequisites: monotonicity
  • Test statistic: Chi-squared
  • Interpretation: if significant, treatment differs from the control
  • Remarks: as this test uses a step-down procedure, it stops at the first/highest concentration/dose that is not significant
  • Original publication: Rao and Scott (1992)
  • Realized using: survey
6.2.2.3.3 Chi-squared test with correction
  • General: tests if there is a difference between the expected frequencies and the observed frequencies in one or more categories of a contingency table
  • P-value correction: Bonferroni or Holm (see Section 6.2.1.4.2.1)
  • Prerequisites: large sample size (values in Expected contingency table in the NOEC/LOEC panel need to be 5 or larger)
  • Test statistic: Chi-squared
  • Interpretation: if significant, control and treatment frequencies differ
  • Original publication: Pearson (1900)
6.2.2.3.4 Fisher’s exact test with correction
  • General: tests if there is a nonrandom association between two categorical variables in a contingency table
  • P-value correction: Bonferroni or Holm (see Section 6.2.1.4.2.1)
  • Test statistic: odds ratio
  • Interpretation: if significant, control and treatment frequencies differ
  • Original publication: Fisher (1922)