Gene set analysis methods: a systematic comparison
Contents
1 Citation
Mathur, R., Rotroff, D., Ma, J., Shojaie, A., & Motsinger-Reif, A. , Gene set analysis methods: a systematic comparison, 2018, BioData mining, 11(1), 8.
2 Summary
Approaches for gene set analyses were assessed by using simulated data that were generated based on a real experimental data set.
- The authors compared four different methods:
- Gene Set Enrichment Analysis (GSEA)
- Significance Analysis of Function and Expression (SAFE)
- sigPathway
- Correlation Adjusted Mean Rank (CAMERA)
3 Study outcomes
3.1 Outcome O1: False positives under null distribution
The frequency of false-positives was assessed by using an alpha=0.05. Consequently all approaches (except FET-1k) showed around 5% false-positive or less. FET-1k ("FET global statistic in SAFE") had around than 20%.
Outcome O1 is presented as Figure 2 in the original publication for the prostate data template and in the "Additional File 1" for the other templates.
Baseline of this outcome is that all approaches excep FET-1k perform similarly well in terms of false-positives.
3.2 Outcome O2
- sigPathway showed superior performance
- SAFE-Wilcoxon could NOT detect the differentially regulated pathway(s).
- In general, the performance increases with increasing fraction of regulated genes (parameter pi in the paper), except for "Comp GSEA Q" that shows counterintuitive performance.
Outcome O2 is presented as Figure 3 in the original publication, the numbers are provided in the supplement.
3.3 Outcome O3
- SAFE again performs weak for most configurations
- Only "aveDiff-boot" seems to have a good power that improves with increasing magnitudes tau of regulation
- FET-1k, FET-10k could identify the regulated pathway but shows counterintuitive performance (i.e. decreasing performances for increasing magnitudes of regulation)
Outcome O3 is presented as Figure 4 in the original publication.
3.4 Outcome O4
- COMP-GSEA-FDR and Self-GSEA-FDR showed superior performance
- Comp-GSEA-Q and SELF-GSEA-Q showed counterintuitive performance, i.e. the performance deceases with increasing effect size tau
4 Study design and evidence level
4.1 General aspects
- The authors consider different sizes of the gene sets
- The authors consider different proportions of regulated genes in the gene sets
- The authors consider different magnitudes of the underlying effect size (i.e. log-fold-changes)
- The authors consider three null simulations (without regulation) as reference for outcome O1
- In this publication, the authors published a novel simulation approach termed (FANGS)
- The simulation approach is available in this R package (FANGS) offers the opportunity to reproduce the simulations and repeat the analysis for other gene set methods.
- The authors provide a comprehensive list of the used configuration parameters
- The authors evaluated the following alternative configurations
- For GSEA one alternative
- For SAFE five alternative setups
- For sigPathway and CAMERA no other configurations were considered
- Three experimental data sets were used as foundations for simulating data
- prostate cancer (264 cases, 160 controls)
- ischemic stroke (20 cases, 20 controls)
- normal brain tissue (21 cases, 20 controls)
4.2 Design for Outcome O1
- The authors consider three null simulations (without regulation) as reference:
- permutation of class labels
- independently sampled expression of all features (=genes)
- centering the simulated data, i.e. set effect size to zero
- Default configuration parameters and the alternative parameters described above were evaluated
- Only the prostat cancer data set was considered as template for simulations
4.3 Design for Outcome O2
- The outcome was generated by simulating differential expression of one pathway
- The analysis was repeated for all three data sets as template
- For each of the three data sets the analysis was repeated by selecting two different pathways as differentially regulated.
- In total, six analyses were performed (3 data sets x 2 regulated pathways)
- Default configuration parameters were chosen
4.4 Design for Outcome O3
- The weak performance of SAFE for the default configuration in O2 seems to be the motivation for investigation of other configurations for SAFE
- The outcome O3 was only generated for one data set (prostate cancer) and two regulated pathways