| SciPort RLP

Large scale protein sequence clustering - not solved but solvable

Current Bioinformatics. Bd. 1. H. 2. 2006 S. 247 - 254

Erscheinungsjahr: 2006

ISBN/ISSN: 1574-8936

Publikationstyp: Zeitschriftenaufsatz

Sprache: Englisch

Doi/URN: 10.2174/157489306777011987

Volltext über DOI/URN

Geprüft:

Bibliothek

Inhaltszusammenfassung

Protein sequence clustering is one of the oldest problems addressed in the field of computational biology. Back in the 60s, when the first protein sequence database was published as printed version, Margaret Dayhoff defined the basic principles of this discipline with only a small number of sequences at hand. With up to a million sequences available in public databases nowadays and several well known methods for automatic grouping of proteins into somehow biologically meaningful families, sub...Protein sequence clustering is one of the oldest problems addressed in the field of computational biology. Back in the 60s, when the first protein sequence database was published as printed version, Margaret Dayhoff defined the basic principles of this discipline with only a small number of sequences at hand. With up to a million sequences available in public databases nowadays and several well known methods for automatic grouping of proteins into somehow biologically meaningful families, subfamilies and superfamilies, the problem seems to be satisfactorily solved. Nevertheless, apart from the problem of handling such a huge amount of data, several pitfalls have emerged since Dayhoffs times: databases fill up as fast as genomes are sequenced and a great many of these sequences are fragmental or disappear again when identified as being transcripts of wrongly predicted genes or hypothetical products of pseudogenes. This article first reviews the different approaches developed during the last decades. These insights will then be used to point out possible challenges waiting in the future. » weiterlesen » einklappen

Protein sequence, protein domain, protein family, protein sequence clustering, protein sequence database

Autoren

Krause, Antje (Autor)

Klassifikation

DFG Fachgebiet:
4.43 - Informatik

DDC Sachgruppe:
Naturwissenschaften

Verknüpfte Personen

Antje Krause
Professorin
(FB 2 - Technik, Informatik und Wirtschaft)

Starten Sie Ihre Suche...

Large scale protein sequence clustering - not solved but solvable

Inhaltszusammenfassung

Autoren

Klassifikation

Verknüpfte Personen

Beteiligte Einrichtungen