current microbiology books

The Internet for Cell and Molecular Biologists (2nd Edition) Chapter Abstracts

How to buy this book

Chapter 1.
Internet: All You Wanted to Know and Didn't Dare to Ask
Lorenzo M. Catucci and Manuela Helmer-Citterich

In the last 10 - 15 years the computer became an essential companion for cell and molecular biologists. At first personal computers were mainly used as word processors or to produce nice pictures for papers or talks. In many research institutes mainframes were set up as mail servers and to host and run the first packages of accessible bioinformatic tools: the Staden , Intelligenetics (Intelligenetics Suite, Intelligenetics, Inc. Mountain View. CA) and GCG packages. Sequence databases were slowly starting to grow. No, or very little, organized information about the new tools was available in the academy, but a lot of know-how was passing hand to hand in the research labs.

Since then, things have changed a lot. Each personal computer is now much more powerful and flexible than those old mainframes. Many new and sophisticated tools were developed to help biologists in their work and, most importantly maybe, many tools for advanced communication became commonplace. This had a very strong impact on the experimental biologist's life.

Every computer, if equipped with an ethernet card (or a modem) and an internet connection, represents a node in an immense network. It becomes therefore a window to the outside world and the outside world offers a lot of interesting information: from Medline access to web pages dedicated to specific hot topics of biological interest. There is no hope of giving an exhaustive list of all the useful and interesting places that can be visited during an internet trip. It is always worth trying to look around, to bookmark new sites and explore different tools.

Almost every biologist has experience in the use of electronic mail and internet browsers, but sometimes feels not completely at ease with the matter. Very few biologists received a well-organized instruction in the use of informatics instruments for biology, so what to do when we need something more, such as choosing the best parameters in a sequence search and understanding all the implications of a complicated and sometimes almost unreadable output file? We look around and find nice web pages, full of information and we would want to be able to design our own web site: do we really need to ask someone else for help? We want to visualize a protein structure on the screen or try to understand the possible consequences of a residue mutation: can we afford to play with molecular graphics?

This manual was designed as a sort of 'cook book' to try to fill some of the gaps that may affect the life of a biologist who missed an organized preparation in basic informatics, but still wants to be able to take advantage of skilful use of computers and of the rich internet tool-set.


Chapter 2.
Select the Right Computer
Michele Quondam

This is a simple guide to the computer market: if you need a computer, you can now discover how to get the best and cheapest solution for your specific needs. This chapter also provides some information about computer components and their impact on the overall computer performance.


Chapter 3.
Feeling Safe?Think again: Internet Security
Michele Quondam

Some simple rules and information to avoid the most common problems about viruses, hackers, email attacks, and some general security issues.


Chapter 4.
Design and Build Your Own Lab/Departmental Home Page
Andrea Cabibbo

It is increasingly likely that people wishing to contact you or to have information on your research activities will look for your departmental or personal web page. If have not already, the moment has come to build one. You will see that this is much easier than you might think.

This chapter is about building web sites. It will be assumed that the reader is not familiar with concepts such as html, web server and FTP; everything will be explained from scratch. After a global overview of the process, enough details will be given on how to plan and build the site to allow the reader to perform all the required steps by himself.

The world wide web was originally based on the Hyper Text Markup Language or HTML, which allows the display of both text and images on a page and provides tools to format the appearance of these elements. At this time, the web was basically a collection of static pages, often containing hyperlinks to other pages, so as to form a real "web" network.

Since these early days, the panorama has been enriched by the appearance of a number of more sophisticated programming tools, such as javascript, java, perl, php, XML and others, that allow a much tighter control of the appearance, function and behavior of web sites, often turning them into sophisticated online applications that allow, for example, searching of complex databases directly over the web and formatting the results according to your needs. This is the case for instance with web sites such as Pubmed, that allow access to Medline and sequence/structure databases.

The following chapter will focus exclusively on building sites the simple, old way, that is by using HTML. The basic concepts can be easily learnt with minimal initial effort. Once the basics are acquired, the reader will be ready to move to more sophisticated implementations.

It should be noted that HTML, despite being simple and old, is extremely powerful and will allow you to publish on the web nearly everything you could think of: text, data, images, downloadable files (documents, multimedia, powerpoint files etc.).


Chapter 5.
Using Search Engines Effectively to Find People and Information
Andrea Cabibbo

It is estimated that at present more than one billion web pages exist, and thousands of new pages are created every day. In this scenario, finding specific information seems very difficult. However, thousands of 'indexes', 'directories' and 'search engines' exist that attempt to categorize the contents of the internet by various means. The directories range from argument-specific ones, such as for instance biological directories or architecture directories, to global directories that attempt to review all possible contents. A typical example of this latter type is the Yahoo directory. In Yahoo, all content is arranged into 14 parent categories (e.g. Art and humanities, Business and economy), each of which is subdivided into subcategories, in turn subdivided into sub-sub-categories, down to very specific subjects. For instance, information about PCR in Yahoo has the following path: Home>Science>Biology>Molecular_Biology>PCR. In search engines, the contents are not pre-distributed in categories but rather are searched by keywords. In this chapter we will provide essential information on directories and search engines, together with tips on how to use these resources efficiently, in order to find the right needle in the internet haystack. We will also briefly review the PubMed Boolean search syntax that allows very precise searches for specific research articles.


Chapter 6.
Online Tools for Basic Sequence Manipulation, Restriction Analysis, PCR Primers Generation and Evaluation
Andrea Cabibbo

The analysis of biological sequences often requires some preliminary basic manipulations. For instance it is often necessary to obtain the complementary sequence to a DNA sequence, to reverse a sequence, to get a list of the restriction enzymes cutting sites in a sequence, to translate a DNA sequence to a protein sequence, and so on. Many tools are available online to perform all these operations easily. Often more than one possibility is available to the user. We list here a number of tools freely available online. These and other links are also reported in the "sequence analysis tools" section of the Bio-Web, at Cellbiol.com


Chapter 7.
Theoretical Aspects of Sequence Alignments
Barbara Brannetti and Allegra Via

This chapter is dedicated to the theoretical aspects of the analysis of nucleic and amino acid sequences. It consists of two main sections: a 'pair-wise alignments' part and a 'multiple alignments' part' where the reader can find an outline of the concepts underlying pair-wise and multiple (DNA and protein) sequence alignments together with a theoretical discussion of the principles regulating the most important algorithms for sequence analysis. This is not essential for the comprehension and full usage of chapter 8 and chapter 9, but may help the reader who wishes to get a deeper view of the subject.

Therefore those who are interested in the practical use of sequence databases and programs for sequence analysis can skip this chapter and go directly to chapter 8 or chapter 9.


Chapter 8.
Analyze a DNA Sequence with Your Browser
Barbara Brannetti

The enormous amount of data coming from the various genome projects is stored within biological databases. Different tools have been developed both to search within the databases and to analyse and annotate the contained data. The aim of this chapter is to describe the more useful and used nucleic acid databases and to introduce the tools developed to analyse nucleic acid sequences. It is organized into three main sections. The first (8.1) deals with a description of the Genbank database, with details of the structure of the files containing sequence data together with some annotation. The second section (8.2) provides a user-friendly description of tools (FASTA and BLAST) for the comparison of a query sequence with a nucleic acid database. A detailed description of the more useful tools available for gene structure prediction is reported in section 8.3. The prediction of functional sites in a raw genomic sequence is still a hot research topic (cf. Fortna and Gardiner, 2001) and no easy solution and completely reliable tool can be presented so far. We suggest therefore trying different tools in order to compare the different predictions and identify the method that seems to be more reliable for the reader's specific problem.


Chapter 9.
Practical Aspects of Protein Sequence Analysis
Allegra Via

This chapter is dedicated to the analysis of amino acid sequences. It is organized in five subsections. In the first and second the reader can find a user-friendly description of sequence databases and instructions to use some of the main tools for pair-wise alignments and database searches. Section 9.3 is dedicated to multiple alignments while section 9.4 is a very short introduction to Hidden Markov Models. Finally, section 9.5 is an overview of the most important pattern and domain databases and describes tools to use them for protein sequence analysis.
Given one or a set of sequences you can essentially perform:

1. Database searches looking for identical or similar sequences (for the detection of homology in the context of phylogenetic analysis and/or inference of function).

For these purposes sections 9.1 and 9.2 provides a description of the most widely used protein sequence databases and tools (programs and servers) for searches in such databases.

For this analysis we suggest the following steps:

  • identify the most suitable database for your needs;
  • select the most appropriate searching program
  • perform your search.

    The results of your search may be more or less biologically relevant. You can influence relevance and reliability by modifying the parameters of the searching program. If you do not feel self-confident in handling program parameters, we suggest using the default ones provided by the program itself.

    2. A multiple alignment.
    (a) one can align a single sequence to a multiple alignment of sequences provided by databases of protein families.
    (b) one can build a multiple alignment starting from a new set of sequences.

    You can find the tools for both these in section 9.3.

    3. Pattern matching.
    You may be interested in the identification of functional sites in a protein sequence (phosphorylation sites, glycosylation sites, etc.).

    Section 9.5 provides a description of databases and tools for the identification of biologically relevant signatures in protein sequences.

    Many of the programs described in this section can be used directly through the WWW. Others can be downloaded from the suitable web site and installed on a local computer.


    Chapter 10.
    From Sequence to Structure: an Easy Approach to Protein Structure Prediction
    Fabrizio Ferré

    The analysis of the three-dimensional structure of a protein can be very helpful in the design of experimental procedures aimed at the understanding of protein function. Experimental techniques as X-ray diffraction and Nuclear Magnetic Resonance are used to determine protein structures that are then stored in freely accessible databases. Molecular graphics software are also freely or commercially available to examine these structures. The protein structure generally depends only on the primary structure and on environmental conditions. Extrinsic factors, such as chaperones or the creation of disulfide bridges, may assist the folding process but are often not essential to it. Consequently, the protein three-dimensional structure may in principle be inferred by the sequence itself. While the experimental procedures to determine the protein three-dimensional structure are becoming faster and more reliable, the number of known sequences exceeds by far the number of known structures. Several methods have been developed to predict the protein structure from the sequence, and a number of them are freely available on the internet and easy to use. Modeling by homology is the more reliable method to predict protein structure: it is based on the assumption that, if two proteins share a high (or reasonably high) sequence identity, their 3D structure will also be similar (or reasonably similar) with good reliability.


    Chapter 11.
    Genomics and Bioinformatics
    Giorgio Valle, Alessandro Vezzi and Nicola Cannata

    One of the most exciting achievements in recent years is the sequencing of whole genomes. The first complete genomic sequence of a free living organism was released in 1995 by Craig Venter and collaborators; it was the 1.8 Mbp genome of the bacterium Haemophilus influenzae. The following year an international consortium completed the yeast Saccharomyces cerevisiae genome (the first from an eukaryotic organism) which is about 13 Mbp long. In 1998, the 100 Mbp genome of the worm Caenorhabditis elegans was released, representing the first animal genome; then, in the year 2000 was the turn of the first plant genome, Arabidopsis thaliana (115 Mbp) as well as the fruit fly Drosophila melanogaster (130 Mbp). More recently the human and mouse genomes (respectively 3000 and 2700 Mbp) were completed amongst several other eukaryotic genomes, while the number of bacterial and archeal sequenced genomes is well over a hundred. Table 11.1 lists some of the most important genomes that have been sequenced to-date. A more comprehensive list is available at NCBI (www.ncbi.nlm.nih.gov) and TIGR (www.tigr.org).

    The excitement associated with genomic projects is well justified. First, the huge amount and variety of data that are obtained is unprecedented in biology. Second, the problems opened by genomic research are totally new and require the development of new ideas, strategies and implementations. The most challenging tasks are probably in the field of bioinformatics, which plays a central role in genomic and post genomic research both for managing and analysing data.

    This chapter is divided into three main parts: genomic sequencing, genome browsers and comparative genomics.


    Chapter 12.
    Gene Expression Analysis by Microarray
    Emanuele De Rinaldis

    With the advent of genomics our knowledge of the genes encoded by many organisms has increased tremendously. Different approaches and techniques have been developed to best make use of these data and to annotate them with quantitative information about the expression of genes in different contexts. This chapter aims to give a general description of the most frequently used methods to measure gene expression at a genomic, high-throughput scale. After a theoretical introduction, the potentialities and limitations of the single techniques are illustrated, with special attention on the practical and computational issues. The final part of the chapter introduces some of the most relevant web sites where public gene expression data can be accessed and software tools are made available for data analysis.


    Chapter 13.
    Let Others Solve your Problems: the Newsgroups
    Richard P. Grant

    Newsgroups permit individuals to take part in a worldwide discussion on a specific topic of interest. A message is "posted" to a newsgroup usually by email or web form. Any other member of that discussion group can read and reply to the message. The BIOSCI bionet newsgroup network allows easy communication between life scientists world wide. This chapter provides a complete listing and a brief description of the bionet newsgroups and describes in detail the use of these newsgroups via a web browser and through dedicated news reader software.


    Chapter 14.
    The Roaming Scientist: Get Online, Manage Your E-mail and Exchange Files from Everywhere
    Andrea Cabibbo

    Science is an international business. Scientists often travel to other countries for variable periods of time and need to keep in touch and exchange material and information with their home lab and with collaborators worldwide. One of the most effective and simple ways to communicate and exchange documents, images, data and more general information is indeed e-mail. In most cases you will be able to use your e-mail account from all over the world, provided that the correct settings are entered in your e-mail application. E-mail has however some limitations as to the size of files that can be exchanged. Depending on the e-mail account, a variable limit on the size of attachments that can be sent and received exists. A limit also exists as to the total amount of megabites that can be stored in a personal mailbox on a mail server. This means that for the exchange of very large documents or very large amount of data, e-mail might be not well suited, and other systems have to be utilized, such as ftp, web sharing, the setting up of temporary simple web sites (see also chapter 4) or using an online storage facility.

    In this chapter we will summarize the essential information required to read and send e-mail from everywhere (well, almost) and will provide some tips for the efficient exchange files of any (reasonable) size.


    Chapter 15.
    Bio-Bookmarks
    Andrea Cabibbo and Manuela Helmer-Citterich

    Beyond the topics covered in the different chapters of this book, there are several other internet resources that can be of interest to biologists. In this chapter we shall try to give an overview of such resources, in order to complete the picture of the 'Bio-Web'. These and further links are available at Cellbiol.com. This list is by no means complete or exhaustive.

    Current Books: