The effects of nonignorable missing data on label-free mass spectrometry proteomics experiments
Contents
Citation
O'Brien JJ, Gunawardena HP, Paulo JA, Chen X, Ibrahim JG, Gygi SP, Qaqish BF (2018). The effects of nonignorable missing data on label-free mass spectrometry proteomics experiments. Ann Appl Stat. 12(4):2075-95. doi: 10.1214/18-AOAS1144
Summary
In this paper parameter contrasts due to missing data are analyzed and a Bayesian selection model to overcome these contrasts and recover interblock information is introduced. The proposed model is compared to other imputation strategies as well as complete-case analyses.
Study outcomes
The introduced selection model for proteomics (SMP) tries to capture the missing data mechanisms of the specific dataset.
Outcome O1
The SMP model improves accuracy, depth of discovery and internal coverage (Figures 1,2,3)
Outcome O2
The mixed model and two-way ANOVA, which rely on intrablock estimation, outperform the one-way ANOVA and other imputation methods (Min,Mean,Svd,Knn), which rely on interblock information, on all datasets (Figures 1,2,3)
Further outcomes
Missing data leads to contrast bias between conditions.
Study design and evidence level
General aspects
Separate analysis of imputation performance if protein contrasts are estimable or inestimable.
9 imputation algorithms are compared: SMP, ANOVA (1+2-way), mean, column minimum, peptide minimum, svd, knn, mixture model, although most of them are quite simple models.
Accuracy as well as interval coverage are assessed.
Further comments and aspects
Data simulation favors SMP model.
References
Model similar to:
Luo R, Colangelo CM, Sessa WC, Zhao H. Bayesian Analysis of iTRAQ Data with Nonrandom Missingness: Identification of Differentially Expressed Proteins. Stat Biosci. 1(2):228-45.