A general modular framework for gene set enrichment analysis

The transformation of the gene level statistic has a substantial impact
Transformations help to find gene sets that contain up- and downregulated genes
Combination of square transformation and rank transformation shows the best overall performance
Binary transformation (i.e. using a cutpoint) and FDRs decrease the performance

Outcomes O1 and O2 are presented as Table 2 in the original publication.

3.3 Outcome O3: Gene set statistics

"mean and the maxmean statistic produce ... overall very good results"
"median and the Wilcoxon test are primarily advantageous if the competitive null hypothesis is tested, or if there are many outliers in the data"
"conditional FDR ... vary strongly with the choice of the gene-level statistic, transformation and permutation approach.
The ES score showed a rather weak performance

Outcomes O3 are presented as Table 3 in the original publication.

3.4 Outcome O4: Significance assessment

The parametric approach has the best power but is overoptimistic if the assumption of statistical indpendence is violated
Permutation seems to slightly outperform resampling
"restandardization procedure performs very similar to resampling"

Outcomes O4 are presented as Table 4 in the original publication.

3.5 Outcome O5: Global approaches

The performance of the globaltest procedure "is not better than that of the less sophisticated univariate methods" but "is computationally a little bit faster".
For Hotellings T2-test:
- an "overall poor" performance was obtained
- "the uncorrelated sets are found with the same reliability as with univariate approaches. However, ... the sets with correlation ... are hardly detected."
- shows "improved performance with sample label permutation as opposed to gene sampling."

Outcomes O5 are presented as Table 5 for the global test and in Table 6 for Hotellings T2 in the original publication.

3.6 Further outcomes

4 Study design and evidence level

4.1 General aspects

100 data sets were simulated
The simulated data sets have 600 features (genes) and 20 samples (10 vs. 10)
The data was simulated with normally distributed noise with variance equals to one
520 genes were consided as uninformative (delta=0, rho=0)
Altogether, nine different simulation data sets were generated that consist of the following combinations:
- Gene sets with different levels of differential expression (delta \in {0, 0.75, 1, -1}) were simulated
- Gene sets with varying levels of intra-group correlation (rho \in {0, 0.6, -0.6}) were simulated
- Gene sets that contain regulated and unregulated genes (half/half) were generated as well as gene set that contain up- and downregulated genes.
"The gene set statistic ES was not combined with a binary transformation since the latter does not allow a sensible ranking of the genes."
In total
- 3 gene level statistics ×
- 5 transformations ×
- 6 gene set statistics ×
- 3 significance assessments
- minus 9 insensible combinations
- = 261 (in total) variants of gene set analyses were considered
The authors count how frequently the p-values that assess significance at the gene-set level are below a significance level 0.05

4.2 Design for Outcome O1: Gene level statistics

The authors consider the impact of the selected approach at for module 1 (see summary above)
Three approaches were considered: t, moderated t and correlation
These approaches were evaluated for five different transformations (see O2)

Multiple other approaches
The authors already provide the important hint that the dependency on the gene level test statistic might be more relevant for smaller sample size (e.g. 3 vs 3)

4.3 Design for Outcome O2: Transformation of the gene level statistics

The outcome was generated for five different transformations (and three gene level statistics)

4.4 Design for Outcome O3: Gene set statistics

Three gene set statistics were investigated:
- mean
- maxmean
- median
- ES
- conditional FDR
- Wilcoxon
This analyses were performed for the moderated t statistic (gene level) and by using the quadratic transformation. For significance assessment, resampling was applied.

4.5 Design for Outcome O4: Significance assessment

Four different approaches for assessing significance at the gene set level were evaluated:
- parametric
- resampling
- permutation
- restandardization
This analysis was performed by using the moderated t as the gene level statistic in combination with a quadratic transformation and the mean as the gene set statistic

4.6 Design for Outcome O5: Global approaches

globaltest andHotelling's T2-test with a shrinkage covariance matrix was considered

5 Further comments and aspects

Simulation is NOT based on characteristics or gene sets derived from real data
The paper provides very comprehensive outcomes in terms of combinations of approaches
After the paper was published another type of gene set statistics appeared that is based on Kolmogorov-Smirnov test. This approach is applied e.g. for GSEA.

Anonymous

Search

Navigation

Navigation

Show

Wiki tools

Wiki tools

A general modular framework for gene set enrichment analysis

Namespaces

Page actions

Contents

1 Citation

2 Summary

3 Study outcomes

3.1 Outcome O1: Gene level statistics

3.2 Outcome O2: Transformation of the gene level statistics

3.3 Outcome O3: Gene set statistics

3.4 Outcome O4: Significance assessment

3.5 Outcome O5: Global approaches

3.6 Further outcomes

4 Study design and evidence level

4.1 General aspects

4.2 Design for Outcome O1: Gene level statistics

4.3 Design for Outcome O2: Transformation of the gene level statistics

4.4 Design for Outcome O3: Gene set statistics

4.5 Design for Outcome O4: Significance assessment

4.6 Design for Outcome O5: Global approaches

5 Further comments and aspects

6 References

Anonymous

Search

Navigation

Wiki tools

Page tools

A general modular framework for gene set enrichment analysis

Contents

1 Citation

2 Summary

3 Study outcomes

3.1 Outcome O1: Gene level statistics

3.2 Outcome O2: Transformation of the gene level statistics

3.3 Outcome O3: Gene set statistics

3.4 Outcome O4: Significance assessment

3.5 Outcome O5: Global approaches

3.6 Further outcomes

4 Study design and evidence level

4.1 General aspects

4.2 Design for Outcome O1: Gene level statistics

4.3 Design for Outcome O2: Transformation of the gene level statistics

4.4 Design for Outcome O3: Gene set statistics

4.5 Design for Outcome O4: Significance assessment

4.6 Design for Outcome O5: Global approaches

5 Further comments and aspects

6 References