A general modular framework for gene set enrichment analysis

Revision as of 14:43, 25 February 2020 by Ckreutz (talk | contribs)

1 Citation

M Ackermann and K Strimmer, A general modular framework for gene set enrichment analysis, 2009, BMC Bioinformatics, 10:47, pages etc in any possible citation style.

Permanent link to the paper

2 Summary

Gene set analyses have a modular structure, i.e. they consist of

  1. gene level statistics
  2. gene level significance assessment
  3. gene set statistics
  4. gene set significance assessment
  5. statistical conclusion

Alternatively, steps 1.-3. might be replaced by a single global test.

In this paper, 261 different variants of gene set enrichment procedures were evaluated based on simulated and experimental data.

3 Study outcomes

List the paper results concerning method comparison and benchmarking:

3.1 Outcome O1: Gene level statistics

  • The choice of the gene-level statistics (t, moderated t, or correlation) does NOT have a great impact
  • t statistic, moderated t, and correlation fail to find gene sets that contain up- and downregulated genes

Outcomes O1 and O2 are presented as Table 2 in the original publication.

3.2 Outcome O2: Transformation of the gene level statistics

  • The transformation has a substantial impact
  • Transformations help to find gene sets that contain up- and downregulated genes
  • Combination of square transformation and rank transformation shows the best overall performance

Outcomes O1 and O2 are presented as Table 2 in the original publication.

3.3 Outcome On

...

Outcome On is presented as Figure X in the original publication.

3.4 Further outcomes

If intended, you can add further outcomes here.


4 Study design and evidence level

4.1 General aspects

  • 100 data sets were simulated
  • The simulated data sets have 600 features (genes) and 20 samples (10 vs. 10)
  • The data was simulated with normally distributed noise with variance equals to one
  • 520 genes were consided as uninformative (delta=0, rho=0)
  • Altogether, nine different simulation data sets were generated that consist of the following combinations:
    • Gene sets with different levels of differential expression (delta \in {0, 0.75, 1, -1}) were simulated
    • Gene sets with varying levels of intra-group correlation (rho \in {0, 0.6, -0.6}) were simulated
    • Gene sets that contain regulated and unregulated genes (half/half) were generated as well as gene set that contain up- and downregulated genes.
  • "The gene set statistic ES was not combined with a binary transformation since the latter does not allow a sensible ranking of the genes."
  • In total
    • 3 gene level statistics ×
    • 5 transformations ×
    • 6 gene set statistics ×
    • 3 significance assessments
    • minus 9 insensible combinations
    • = 261 (in total) variants of gene set analyses were considered


4.2 Design for Outcome O1: Gene level statistics

  • The authors consider the impact of the selected approach at for module 1 (see summary above)
  • Three approaches were considered: t, moderated t and correlation
  • These approaches were evaluated for five different transformations (see O2)
  • Multiple other approaches
  • The authors already provide the important hint that the dependency on the gene level test statistic might be more relevant for smaller sample size (e.g. 3 vs 3)

4.3 Design for Outcome O2: Transformation of the gene level statistics

  • The outcome was generated for five different transformations (and three gene level statistics)
  • Configuration parameters were chosen ...
  • ...

...

(resampling, permutation, restandardization)

4.4 Design for Outcome O

  • The outcome was generated for ...
  • Configuration parameters were chosen ...
  • ...

5 Further comments and aspects

  • Simulation is NOT based on characteristics or gene sets derived from real data
  • The paper provides very comprehensive outcomes in terms of combinations of approaches


6 References

The list of cited or related literature is placed here.