Feature Set Optimization in Biomarker Discovery From Genome Scale Data

Fortino, V; Scala, G; Greco, D

dc.contributor.author	Fortino, V
dc.contributor.author	Scala, G
dc.contributor.author	Greco, D
dc.date.accessioned	2020-06-15T12:10:27Z
dc.date.available	2020-06-15T12:10:27Z
dc.date.issued	2020
dc.identifier.uri	https://erepo.uef.fi/handle/123456789/8188
dc.description.abstract	Motivation Omics technologies have the potential to facilitate the discovery of new biomarkers. However, only few omics-derived biomarkers have been successfully translated into clinical applications to date. Feature selection is a crucial step in this process that identifies small sets of features with high predictive power. Models consisting of a limited number of features are not only more robust in analytical terms, but also ensure cost effectiveness and clinical translatability of new biomarker panels. Here we introduce GARBO, a novel multi-island adaptive genetic algorithm to simultaneously optimize accuracy and set size in omics-driven biomarker discovery problems. Results Compared to existing methods, GARBO enables the identification of biomarker sets that best optimize the trade-off between classification accuracy and number of biomarkers. We tested GARBO and six alternative selection methods with two high relevant topics in precision medicine: cancer patient stratification and drug sensitivity prediction. We found multivariate biomarker models from different omics data types such as mRNA, miRNA, copy number variation, mutation and DNA methylation. The top performing models were evaluated by using two different strategies: the Pareto-based selection, and the weighted sum between accuracy and set size (w = 0.5). Pareto-based preferences show the ability of the proposed algorithm to search minimal subsets of relevant features that can be used to model accurate random forest-based classification systems. Moreover, GARBO systematically identified, on larger omics data types, such as gene expression and DNA methylation, biomarker panels exhibiting higher classification accuracy or employing a number of features much lower than those discovered with other methods. These results were confirmed on independent datasets.
dc.language.iso	englanti
dc.publisher	Oxford University Press (OUP)
dc.relation.ispartofseries	Bioinformatics
dc.relation.uri	http://dx.doi.org/10.1093/bioinformatics/btaa144
dc.rights	In copyright 1.0
dc.title	Feature Set Optimization in Biomarker Discovery From Genome Scale Data
dc.description.version	final draft
dc.contributor.department	School of Medicine / Biomedicine
uef.solecris.id	68978748	en
dc.type.publication	Tieteelliset aikakauslehtiartikkelit
dc.relation.doi	10.1093/bioinformatics/btaa144
dc.description.reviewstatus	peerReviewed
dc.format.pagerange	3393-3400
dc.relation.issn	1367-4803
dc.relation.issue	11
dc.relation.volume	36
dc.rights.accesslevel	openAccess
dc.type.okm	A1
uef.solecris.openaccess	Ei
dc.rights.copyright	© The Author(s) 2020
dc.type.displayType	article	en
dc.type.displayType	artikkeli	fi
dc.rights.url	https://rightsstatements.org/page/InC/1.0/

Files in this item

Name:: 1592223187796876689.pdf
Size:: 450.3Kb
Format:: PDF
Description:: Article

Files

This item appears in the following Collection(s)

Terveystieteiden tiedekunta [1793]
Terveystieteiden tiedekunta / Faculty of Health Sciences

Show simple item record