Chapter 1: Web Basics
Abstract
The exponential growth of the Internet and nucleotide sequence data has changed the face of molecular biology. Wet labs in many research facilities are welcoming computational biologists and exchanging the sterile hood space for computer server racks. With this new technology comes new terminology and more room for confusion. This chapter is a quick review of the fundamental Internet technologies that the book will build upon when describing the tools available for Internet research. One aspect of the technology, the client server relationship, has defined the way that personal computers interact with remote computers around the world. Other technologies such as the Hyper Text Transfer Protocol (HTTP) and File Transfer Protocol (FTP) define the standards for file sharing between foreign computers. Combined, these technologies make the Internet and computational molecular biology possible.
Chapter 2: Primary Nucleotide Sequence Databases
Abstract
Primary sequence databases are the cornerstone of bioinformatics research. Databases such as GenBank and EMBL accept genome data from sequencing projects around the world and make it available for researchers via the World Wide Web. The underlying organization of these databases has shaped the way computer-based molecular biology research is conducted both at these facilities and in related secondary databases. Understanding primary nucleotide sequence databases is key in understanding molecular biology on the web.
Chapter 3: Primary Protein Sequence Databases
Abstract
Primary protein sequence databases are to protein sequences what GenBank, EMBL, and DDBJ are to nucleotide sequences. They are the central location of protein sequence data submissions. PIR's Protein Sequence Database (PSD) and SWISS-PROT are the two main databases. They provide a variety of ways to access data and analysis tools once you have retrieved the sequence you were looking for. A detailed review of accessing data through PIR's Selection List is provided. Other databases that are mentioned are OWL, Entrez' Protein database, and Peptide/Protein Sequence Database (RPF/SEQDB).
Chapter 4: Secondary Nucleotide Databases
Abstract
Secondary nucleotide databases pull specific types of data from the primary nucleotide databases, GenBank, EMBL, and DDBJ, in generating a subject specific set of data. They offer extensive resources to the subject they cover including background information, pertinent literature, and more thoroughly annotated sequences. These databases cover all realms of nucleotide sequence: uRNA, tmRNA, ESTs (Expressed Sequence Tags), STSs (Sequence Tagged Sites), plasmids, vectors, subviral RNAs, etc. The sites themselves are not very difficult to navigate because of their small size.
Chapter 5: Protein Classification Databases
Abstract
Classifying proteins organizes them with respect to similarity. Proteins are placed into groups of similar proteins, usually called a protein family. There are online databases dedicated to protein classification. There are several of these databases because there are different methods of classification. The methods vary depending on how proteins are compared: sequence versus structure, global alignment versus local alignment, and manual efforts versus automated efforts. Some databases integrate multiple databases into their results and others classify proteins based on their own method. Whatever the case, they are all slightly different, but all quite informative. Determining which family a protein belongs to can go a long ways towards defining its function.
Chapter 6: Molecular Structure Databases
Abstract
Sequence data holds only so much information, especially when it comes to proteins. Protein structures, 3-D structures, are representative of the molecule as it functions in the cell. Knowing what a molecule looks like at its biologically active state is a powerful piece of information. Molecular structure databases are portals into the three dimensional configurations of molecules. Structure databases, primarily concerned with proteins, are used for functional and evolutionary studies of molecules. Databases developed for structural studies of DNA and RNA also exist, though the amount of structural they contain is minimal.
Chapter 7: Gene Function Databases: Enzymes, Interactions, Expression, and Pathways
Abstract
Identifying genes through sequencing efforts does not solve functional secrets. Knowing the function of a gene involves knowing where it fits in the complex system of interactions inside cells: what molecules does it interact with, what are the products of the interactions, etc. Getting these answers involves many steps, including determining the function of individual genes by looking at the other molecules they interact with and piecing together individual interactions into a web of relations. The databases in this chapter fill voids at these steps. to create a tangible model of how a cell operates. Molecular pathway databases (KEGG, WIT, etc.) present tangible models incorporating multiple interactions; databases of interactions (DIP, EMP, BRENDA, ENZYME, etc.) provide the pieces; and gene expression studies (Stanford Microarray Database, etc.) lead to clues in satisfying both quests. This chapter explains the databases involved in this effort.
Chapter 8: Genomics Centers
Abstract
Multiple genome analysis requires computing power and manpower beyond a typical lab's capabilities. Genomic centers have the resources to construct secondary genome databases for more than one genome. These genome libraries showcase the progress of the sequencing efforts and are another entry point into the primary sequence data. Genomic centers make use of genome map viewers and text-based browsers to view genome data comparatively.
Chapter 9: Genomes
Abstract
Genome sequencing projects are steadily finishing off the genome after genome. Currently there are over 800 genomes either in the process of being sequenced or already have been sequenced. There are enough complete genomes at present where many websites have been created in dedication to a specific organism. The dedication results in something far greater than a collection of nucleotide sequences from one organism. This chapter analyzes nine different genomic groups from human to plant to virus, pointing out the features that distinguish genome databases from collections of sequence.
Chapter 10: Genomics Tools
Abstract
Genomics tools are computational methods that augment experimental genomic analysis. Finding and aligning nucleotide sequences, creating phylogenetic trees, identifying genes from genomic sequence, designing primers, and analyzing microarray data are areas of genomic analysis that are supported by software found on the web. There are multiple tools for each task and variation in their capabilities. This chapter breaks down the differences between tools, explains some of the differences, and lets you know where to find them. The analysis includes Entrez, SRS, PHYLIP, Primer3, CLUSTALW, and more.
Chapter 11: Proteomics Tools
Abstract
With the discovery of more genes, faster methods to characterize them are needed. Proteomics tools, computer programs that assist molecular biological research, expedite the process by generating data with minimal experimentation. The tools described in this chapter include more than 100 related to protein sequences and various 3-D structure applications. Applications of the tools vary from translating a determining the amino acid composition of a protein sequence to generating a 3-D structure from a sequence.
Chapter 12: Genome and Database Resources
Abstract
Web resources for the molecular biologist are diverse. There are databases of links to web resources; there are databases of literature references; and there are databases of lab supplies. There are also publications dedicated to genome research. The community resources in this chapter are great portals into genome research online.
Current Books: