Comparative Genomics of Metabolic Pathways in Microbial Genomes

from Luo et al (2011) in Microbial Population Genetics

Understanding the regulatory mechanisms should allow the examination of engineering pathways with pre-determined expression patterns (i.e. expression is activated by a given compound or in a specific environmental or physiological condition). Metabolic pathways have evolved to execute their function efficiently, while tolerating perturbations, such as changes in environmental parameters or in the physiological status of the cell. Below we describe some of the databases and programs for integrated analyses of metabolic pathways.

KEGG
KEGG (Kyoto Encyclopedia of Genes and Genomes) is a database of biological systems that integrates genomic, chemical and systemic functional information. KEGG provides a reference knowledge base for linking genomes to life through the process of PATHWAY mapping. The PATHWAY database contains information about conserved sub-pathways (or pathway motifs), which are often encoded by positionally coupled genes on the chromosome and which are especially useful in predicting gene functions. The genomic information is stored in the GENES database, which is a collection of gene catalogs for all the completely sequenced genomes and some partial genomes with up-to-date annotation of gene functions. A third database in KEGG is LIGAND that includes information about chemical compounds, enzyme molecules and enzymatic reactions. In addition, KEGG provides a reference knowledge base for linking genomes to the environment, such as for the analysis of drug-target relationships, through the process of BRITE mapping. KEGG BRITE is an ontology database representing functional hierarchies of various biological objects, including molecules, cells, organisms, diseases and drugs, as well as relationships among them. Additionally, the KEGG resource is being expanded to suit the needs for practical applications. KEGG DRUG contains all approved drugs in the US and Japan, and KEGG DISEASE is a new database linking disease genes, pathways, drugs and diagnostic markers.

KEGG provides Java graphics tools for browsing genome maps, comparing two genome maps, manipulating expression maps, as well as including computational tools for sequence comparison, graph comparison and path computation.

BioCyc
The BioCyc database collection is a set of 160 pathway/genome databases (PGDBs) for most eukaryotic and prokaryotic species whose genomes have been completely sequenced to date. Each PGDB offers a wealth of genomic and metabolic information on certain microorganisms, including P. aeruginosa and S. cerevisiae. Each database provides information on a microorganism's annotated genome, on the biochemical reaction(s) that each gene product catalyses and on the organism's metabolic pathways, predicted from its annotated genome by a program called PathoLogic. The information from each database is comprehensive and complex. In addition, each bacterial PGDB includes predicted operons for the corresponding species. The BioCyc collection provides a unique resource for computational systems biology, namely global and comparative analyses of genomes and metabolic networks, and a supplement to the BioCyc resource of curated PGDBs. The Omics viewer available through the BioCyc website allows scientists to visualize combinations of gene expression, proteomics and metabolomics data on the metabolic maps of these organisms.

MetaCyc
MetaCyc is a universal database of metabolic pathways and enzymes from all domains of life. The pathways in MetaCyc are curated from the primary scientific literature, and the small-molecule metabolic pathways are experimentally determined. Each reaction in a MetaCyc pathway is annotated with one or more well-characterized enzymes. Because MetaCyc contains only experimentally elucidated knowledge, it provides a uniquely high-quality resource for metabolic pathways and enzymes. MetaCyc stores pathways involved in both primary metabolism and secondary metabolism. MetaCyc also stores compounds, proteins, protein complexes and genes associated with these pathways. It is extensively linked to other biological databases containing protein and nucleic-acid sequence data, bibliographic data and protein structures. MetaCyc also contains objects for the genes that encode many enzymes within the DB. While it does not contains primary sequence data, MetaCyc does contain links to external sequence databases.

EcoCyc
EcoCyc is a bioinformatics database that describes the genome and the biochemical machinery of E. coli K-12 MG1655. The long-term goal of this project is to describe the molecular catalog of the E. coli cell, as well as the functions of each of its molecular parts, to facilitate a system-level understanding of E. coli. EcoCyc is an electronic reference source for E. coli biologists, and for biologists who work with related microorganisms. EcoCyc contains the complete genome sequence of E. coli, and describes the nucleotide position and function of every E. coli gene. The annotation of the Escherichia coli K-12 genome in the EcoCyc database is one of the most accurate, complete and multidimensional genome annotations. EcoCyc information was derived from 15 000 publications. The database contains extensive descriptions of E. coli cellular networks, describing its metabolic, transport and transcriptional regulatory processes. Database queries to EcoCyc survey the global properties of E. coli cellular networks and illuminate the extent of information gaps for E. coli, such as dead-end metabolites. EcoCyc provides a genome browser with novel properties, and a novel interactive display of transcriptional regulatory networks.

Suggested reading:
1. Microbial Population Genetics
2. Genomics books