The browser you are using is not supported by this website. All versions of Internet Explorer are no longer supported, either by us or Microsoft (read more here: https://www.microsoft.com/en-us/microsoft-365/windows/end-of-ie-support).

Please use a modern browser to fully experience our website, such as the newest versions of Edge, Chrome, Firefox or Safari etc.

Photo of Patrik Edén

Patrik Edén

Senior lecturer

Photo of Patrik Edén

A community effort to identify and correct mislabeled samples in proteogenomic studies

Author

  • Seungyeul Yoo
  • Zhiao Shi
  • Bo Wen
  • SoonJye Kho
  • Renke Pan
  • Hanying Feng
  • Hong Chen
  • Anders Carlsson
  • Patrik Edén
  • Weiping Ma
  • Michael Raymer
  • Ezekiel J. Maier
  • Zivana Tezak
  • Elaine Johansson
  • Denise Hinton
  • Henry Rodriguez
  • Jun Zhu
  • Emily Boja
  • Pei Wang
  • Bing Zhang

Summary, in English

Sample mislabeling or misannotation has been a long-standing problem in scientific research, particularly prevalent in large-scale, multi-omic studies due to the complexity of multi-omic workflows. There exists an urgent need for implementing quality controls to automatically screen for and correct sample mislabels or misannotations in multi-omic studies. Here, we describe a crowdsourced precisionFDA NCI-CPTAC Multi-omics Enabled Sample Mislabeling Correction Challenge, which provides a framework for systematic benchmarking and evaluation of mislabel identification and correction methods for integrative proteogenomic studies. The challenge received a large number of submissions from domestic and international data scientists, with highly variable performance observed across the submitted methods. Post-challenge collaboration between the top-performing teams and the challenge organizers has created an open-source software, COSMO, with demonstrated high accuracy and robustness in mislabeling identification and correction in simulated and real multi-omic datasets.

Department/s

  • Computational Biology and Biological Physics - Has been reorganised
  • Computational Science for Health and Environment
  • Centre for Environmental and Climate Science (CEC)

Publishing year

2021-05-14

Language

English

Publication/Series

Patterns

Volume

2

Issue

5

Document type

Journal article

Publisher

Cell Press

Topic

  • Bioinformatics and Systems Biology

Keywords

  • proteomics, genomics, mislabeling

Status

Published

Research group

  • Computational Science for Health and Environment

ISBN/ISSN/Other

  • ISSN: 2666-3899