Patrik Edén

Senior lecturer

A community effort to identify and correct mislabeled samples in proteogenomic studies

Author

Seungyeul Yoo
Zhiao Shi
Bo Wen
SoonJye Kho
Renke Pan
Hanying Feng
Hong Chen
Anders Carlsson
Patrik Edén
Weiping Ma
Michael Raymer
Ezekiel J. Maier
Zivana Tezak
Elaine Johansson
Denise Hinton
Henry Rodriguez
Jun Zhu
Emily Boja
Pei Wang
Bing Zhang

Show all

Summary, in English

Sample mislabeling or misannotation has been a long-standing problem in scientific research, particularly prevalent in large-scale, multi-omic studies due to the complexity of multi-omic workflows. There exists an urgent need for implementing quality controls to automatically screen for and correct sample mislabels or misannotations in multi-omic studies. Here, we describe a crowdsourced precisionFDA NCI-CPTAC Multi-omics Enabled Sample Mislabeling Correction Challenge, which provides a framework for systematic benchmarking and evaluation of mislabel identification and correction methods for integrative proteogenomic studies. The challenge received a large number of submissions from domestic and international data scientists, with highly variable performance observed across the submitted methods. Post-challenge collaboration between the top-performing teams and the challenge organizers has created an open-source software, COSMO, with demonstrated high accuracy and robustness in mislabeling identification and correction in simulated and real multi-omic datasets.

Department/s

Computational Biology and Biological Physics - Has been reorganised
Computational Science for Health and Environment
Centre for Environmental and Climate Science (CEC)

Publishing year

2021-05-14

Language

English

Publication/Series

Patterns

Volume

Issue

Links

Document type

Journal article

Publisher

Cell Press

Topic

Bioinformatics and Systems Biology

Keywords

proteomics, genomics, mislabeling

Status

Published

Research group

Computational Science for Health and Environment

ISBN/ISSN/Other

ISSN: 2666-3899

Patrik Edén

A community effort to identify and correct mislabeled samples in proteogenomic studies

Summary, in English

Centre for Environmental and Climate Science (CEC)

Shortcuts:

Find us on social media