current microbiology books

The Internet for Cell and Molecular Biologists Chapter Abstracts

How to buy this book


Chapter 1

The Internet: All You Wanted to Know and Didn't Dare to Ask

Lorenzo M. Catucci and Manuela Helmer-Citterich

Contents

1.1 A network of nodes

1.2 Network services

1.3 Short glossary

1.4 References

Abstract

In the last 10 - 15 years the computer became an essential companion for cell and molecular biologists. At first personal computers were mainly used as word processors or to produce nice pictures for papers or talks. In many research institutes mainframes were set up as mail servers and to host and run the first packages of accessible bioinformatic tools: the Staden (http://www.mrc-lmb.cam.ac.uk/pubseq/staden_home.html ; Staden et al., 2000), Intelligenetics (Intelligenetics Suite, Intelligenetics, Inc. Mountain View. CA) and GCG (now at http://www.accelrys.com/about/gcg.html) packages. Sequence databases were slowly starting to grow. No, or very little, organized information about the new tools was available in the academy, but a lot of know-how was passing hand to hand in the research labs.

Since then, things have changed a lot. Each personal computer is now much more powerful and flexible than those old mainframes. Many new and sophisticated tools were developed to help biologists in their work and, most importantly maybe, many tools for advanced communication became commonplace. This had a very strong impact on the experimental biologist's life.

Every computer, if equipped with an ethernet card (or a modem) and an internet connection, represents a node in an immense network. It becomes therefore a window to the outside world and the outside world offers a lot of interesting information: from Medline access to web pages dedicated to specific hot topics of biological interest. There is no hope of giving an exhaustive list of all the useful and interesting places that can be visited during an internet trip. It is always worth trying to look around, to bookmark new sites and explore different tools.

Almost every biologist has experience in the use of electronic mail and internet browsers, but sometimes feels not completely at ease with the matter. Very few biologists received a well-organized instruction in the use of informatics instruments for biology, so what to do when we need something more, such as choosing the best parameters in a sequence search and understanding all the implications of a complicated and sometimes almost unreadable output file? We look around and find nice web pages, full of information and we would want to be able to design our own web site: do we really need to ask someone else for help? We want to visualize a protein structure on the screen or try to understand the possible consequences of a residue mutation: can we afford to play with molecular graphics?

This manual was designed as a sort of 'cook book' to try to fill some of the gaps that may affect the life of a biologist who missed an organized preparation in basic informatics, but still wants to be able to take advantage of skilful use of computers and of the rich internet tool-set.

Let us start with a short description of the basic elements in computer networking.


Chapter 2

Select the Right Computer

Michele Quondam

Contents

2.1 CPU

2.2 Memory

2.3 Hard Disk

2.4 Video Card

2.5 Monitors

2.6 Other parts

2.7 The computer power and costs

2.8 A computer to do what?

2.9 Choosing the operating system

Abstract

This is a simple guide to the computer market: if you need a computer, you can now discover how to get the best and cheapest solution for your specific needs. This chapter also provides some information about computer components and their impact on the overall computer performance.


Chapter 3

Personal Internet Security

Michele Quondam

Contents

3.1 What is a virus

3.2 What is a hacker

3.3 Protection Software

3.3.1 Firewalls

3.3.2 Antivirus Software

3.3.3 Hardware Router with firewall features

3.4 Special e-mail attacks

3.4.1 Bombing

3.4.2 Spamming

3.5 Simple general security rules

Abstract

Some simple rules and information to avoid the most common problems about viruses, hackers, email attacks, and some general security issues.


Chapter 4 Design and Build Your Own Lab or Departmental Home Page

Andrea Cabibbo

Contents

4.1 A global view

4.2 Designing and building the web site

4.2.1 Planning the site with pencil and paper

4.2.2 Building the site

4.2.2.1 Visual html editors

4.2.2.2 Bells and whistles (forms, counters, boards)

4.2.2.3 Short course of HTML: the basics

Abstract

It is increasingly likely that people wishing to contact you or to have information on your research activities will look for your departmental or personal web page. If have not already, the moment has come to build one. You will see that this is much easier than you might think.

This chapter is about building web sites. It will be assumed that the reader is not familiar with concepts such as html, web server and FTP; everything will be explained from scratch. After a global overview of the process, enough details will be given on how to plan and build the site to allow the reader to perform all the required steps by himself.

The world wide web was originally based on the Hyper Text Markup Language or HTML, which allows the display of both text and images on a page and provides tools to format the appearance of these elements. At this time, the web was basically a collection of static pages, often containing hyperlinks to other pages, so as to form a real "web" network.

Since these early days, the panorama has been enriched by the appearance of a number of more sophisticated programming tools, such as javascript, java, perl, php, XML and others, that allow a much tighter control of the appearance, function and behavior of web sites, often turning them into sophisticated online applications that allow, for example, searching of complex databases directly over the web and formatting the results according to your needs. This is the case for instance with web sites such as Pubmed, that allow access to Medline and sequence/structure databases.

The following chapter will focus exclusively on building sites the simple, old way, that is by using HTML. The basic concepts can be easily learnt with minimal initial effort. Once the basics are acquired, the reader will be ready to move to more sophisticated implementations.

It should be noted that HTML, despite being simple and old, is extremely powerful and will allow you to publish on the web nearly everything you could think of: text, data, images, downloadable files (documents, multimedia, powerpoint files etc.).


Chapter 5

Using Search Engines and PubMed Effectively

Andrea Cabibbo

Contents

5.1 Directories and search engines

5.1.1 Directories

5.1.2 Search engines

5.2 Search syntax: the mathematics of search engines

5.3 Searching for scientific literature: the NCBI PubMed site

Abstract

It is estimated that at present more than one billion web pages exist, and thousands of new pages are created every day. In this scenario, finding specific information seems very difficult. However, thousands of 'indexes', 'directories' and 'search engines' exist that attempt to categorize the contents of the internet by various means. The directories range from argument-specific ones, such as for instance biological directories or architecture directories, to global directories that attempt to review all possible contents. A typical example of this latter type is the Yahoo directory (http://www.yahoo.com/). In Yahoo, all content is arranged into 14 parent categories (e.g. Art and humanities, Business and economy), each of which is subdivided into subcategories, in turn subdivided into sub-sub-categories, down to very specific subjects. For instance, information about PCR in Yahoo has the following path: Home>Science>Biology>Molecular_Biology>PCR. In search engines, the contents are not pre-distributed in categories but rather are searched by keywords. In this chapter we will provide essential information on directories and search engines, together with tips on how to use these resources efficiently, in order to find the right needle in the internet haystack. We will also briefly review the PubMed Boolean search syntax that allows very precise searches for specific research articles.


Chapter 6

Online Tools for Basic Sequence Manipulation, Restriction Analysis,

PCR Primer Generation and Evaluation

Andrea Cabibbo

Contents

6.1 Restriction analysis

6.2 Basic sequence manipulation

6.3 PCR Primers generation and analysis

6.4 Sequence analysis servers and links

Abstract

The analysis of biological sequences often requires some preliminary basic manipulations. For instance it is often necessary to obtain the complementary sequence to a DNA sequence, to reverse a sequence, to get a list of the restriction enzymes cutting sites in a sequence, to translate a DNA sequence to a protein sequence, and so on. Many tools are available online to perform all these operations easily. Often more than one possibility is available to the user. We list here a number of tools freely available online. These and other links are also reported in the "sequence analysis tools" section of the Bio-Web, at http://cellbiol.com.


Chapter 7 Theoretical Aspects of Sequence Alignments

Barbara Brannetti and Allegra Via

Contents

7.1 Pairwise alignments

7.1.1 Alignments

7.1.2 Global and local alignment

7.1.3 Substitutions

7.1.4 Insertions and deletions

7.1.5 Statistical significance of alignments

7.2 Multiple alignments

7.2.1 Intoduction

7.2.2 Multiple alignments: why do we need them?

7.2.3 Global and local alignments

7.2.4 Substitutions, deletions and insertions

7.2.5 How do we obtain a multiple alignment?

7.2.6 Gene prediction and pattern matching

7.3 References

Abstract

This chapter is dedicated to the theoretical aspects of the analysis of nucleic and amino acid sequences. It consists of two main sections: a 'pair-wise alignments' part (section 7.1) and a 'multiple alignments' part (section 7.2) where the reader can find an outline of the concepts underlying pair-wise and multiple (DNA and protein) sequence alignments together with a theoretical discussion of the principles regulating the most important algorithms for sequence analysis. This is not essential for the comprehension and full usage of chapter 8 and chapter 9, but may help the reader who wishes to get a deeper view of the subject.

Therefore those who are interested in the practical use of sequence databases and programs for sequence analysis can skip this chapter and go directly to chapter 8 or chapter 9.


Chapter 8 Analyse DNA Sequences With Your Browser

Barbara Brannetti

Contents

8.1 Genbank database

8.1.1 Description of Genbank database records

8.2 Database search

8.2.1 FASTA

8.2.2 How FASTA works, a step by step description

8.2.3 BLAST

8.3 Gene structure prediction

8.3.1 Filters

8.3.1.1 CENSOR

8.3.1.2 RepeatMasker

8.3.2 Looking for functional sites in DNA sequences

8.3.2.1 Promoter Scan

8.3.2.2 GrailEXP

8.3.2.3 GenScan

8.3.2.4 FGENE

8.3.2.5 GeneMark

8.3.2.6 WebGene

8.3.2.7 GeneId

8.3.2.8 PROCRUSTES

8.4 References

Abstract

The enormous amount of data coming from the various genome projects is stored within biological databases. Different tools have been developed both to search within the databases and to analyse and annotate the contained data. The aim of this chapter is to describe the more useful and used nucleic acid databases and to introduce the tools developed to analyse nucleic acid sequences. It is organized into three main sections. The first (8.1) deals with a description of the Genbank database, with details of the structure of the files containing sequence data together with some annotation. The second section (8.2) provides a user-friendly description of tools (FASTA and BLAST) for the comparison of a query sequence with a nucleic acid database. A detailed description of the more useful tools available for gene structure prediction is reported in section 8.3. The prediction of functional sites in a raw genomic sequence is still a hot research topic (cf. Fortna and Gardiner, 2001) and no easy solution and completely reliable tool can be presented so far. We suggest therefore trying different tools in order to compare the different predictions and identify the method that seems to be more reliable for the reader's specific problem.


Chapter 9

Practical Aspects of Protein Sequence Analysis

Allegra Via

Contents

9.1 Protein sequence databases

9.1.1 Swissprot-TrEMBL

9.1.2 PIR

9.2 Pair-wise alignments and database searches

9.2.1 FASTA

9.2.2 Fasta3 output

9.2.3 BLAST

9.2.4 BLAST output

9.2.5 Alignment of two sequences

9.2.6 PSI-BLAST

9.2.7 PSI-BLAST output

9.3 Multiple alignments

9.3.1 CLUSTALW

9.3.2 MultAlign

9.3.3 Editing a multiple alignment

9.3.3.1 ALSCRIPT

9.3.3.2 CINEMA and JALVIEW

9.3.3.3 BOXSHADE

9.4 Hidden Markov Models (HMMs)

9.5 Motifs and patterns

9.5.1 Pattern and domain databases

9.5.1.1 PROSITE

9.5.1.2 BLOCKS

9.5.1.3 PFam

9.5.1.4 PRINTS

9.5.2 Servers for patterns and domains databases scanning

9.5.2.1 ProfileScan

9.5.2.2 BLOCKS server

9.5.2.3 SMART server

9.6 References

Abstract

This chapter is dedicated to the analysis of amino acid sequences. It is organized in five subsections. In the first and second the reader can find a user-friendly description of sequence databases and instructions to use some of the main tools for pair-wise alignments and database searches. Section 9.3 is dedicated to multiple alignments while section 9.4 is a very short introduction to Hidden Markov Models. Finally, section 9.5 is an overview of the most important pattern and domain databases and describes tools to use them for protein sequence analysis.

Given one or a set of sequences you can essentially perform:

1. Database searches looking for identical or similar sequences (for the detection of homology in the context of phylogenetic analysis and/or inference of function).

For these purposes sections 9.1 and 9.2 provides a description of the most widely used protein sequence databases and tools (programs and servers) for searches in such databases.

For this analysis we suggest the following steps:

· identify the most suitable database for your needs;

· select the most appropriate searching program

· perform your search.

The results of your search may be more or less biologically relevant. You can influence relevance and reliability by modifying the parameters of the searching program. If you do not feel self-confident in handling program parameters, we suggest using the default ones provided by the program itself.

2. A multiple alignment.

(a) one can align a single sequence to a multiple alignment of sequences provided by databases of protein families.

(b) one can build a multiple alignment starting from a new set of sequences.

You can find the tools for both these in section 9.3.

3. Pattern matching.

You may be interested in the identification of functional sites in a protein sequence (phosphorylation sites, glycosylation sites, etc.).

Section 9.5 provides a description of databases and tools for the identification of biologically relevant signatures in protein sequences.

Many of the programs described in this section can be used directly through the WWW. Others can be downloaded from the suitable web site and installed on a local computer.


Chapter 10

From Sequence to Structure: an Easy Approach to Protein Structure Prediction

Fabrizio Ferré

Contents

10.1 Principles of protein structure

10.1.1 Introduction

10.1.1.1 Protein structure

10.1.1.2 Techniques for the experimental determination of protein structure

10.1.2 Structures databases

10.1.2.1 The Protein Data Bank and PDBSum

10.1.2.2 SCOP

10.1.2.3 CATH

10.1.2.4 DSSP

10.1.2.5 DALI, FSSP and HSSP

10.1.3 Visualization of molecular structures: molecular graphics tools

10.1.3.1 RasMol

10.1.3.2 SwissPDBViewer

10.1.4 Protein structure comparison

10.2 Protein Structure Prediction

10.2.1 Secondary structure prediction

10.2.1.1 Introduction

10.2.1.2 On the web

10.2.2 Homology Modelling

10.2.2.1 Introduction

10.2.2.2 On the web

10.2.3 Fold Recognition

10.2.3.1 Introduction

10.2.3.2 On the web

10.2.4 Ab initio Prediction

10.2.4.1 Introduction

10.2.4.2 On the web

10.2.5 Evaluation of structure prediction methods

10.3 Transmembrane topology prediction

10.3.1 Introduction

10.3.2 On the web

10.4 Links

10.5 References

Abstract

The analysis of the three-dimensional structure of a protein can be very helpful in the design of experimental procedures aimed at the understanding of protein function. Experimental techniques as X-ray diffraction and Nuclear Magnetic Resonance are used to determine protein structures that are then stored in freely accessible databases. Molecular graphics software are also freely or commercially available to examine these structures. The protein structure generally depends only on the primary structure and on environmental conditions. Extrinsic factors, such as chaperones or the creation of disulfide bridges, may assist the folding process but are often not essential to it. Consequently, the protein three-dimensional structure may in principle be inferred by the sequence itself. While the experimental procedures to determine the protein three-dimensional structure are becoming faster and more reliable, the number of known sequences exceeds by far the number of known structures. Several methods have been developed to predict the protein structure from the sequence, and a number of them are freely available on the internet and easy to use. Modeling by homology is the more reliable method to predict protein structure: it is based on the assumption that, if two proteins share a high (or reasonably high) sequence identity, their 3D structure will also be similar (or reasonably similar) with good reliability.


Chapter 11

Let Others Solve your Problems: the Newsgroups

Richard P. Grant

Contents

11.1 Usenet for beginners

11.2 Bionet

11.3 Access and (n)etiquette

11.4 How to use a news reader

11.5 Whither Bionet?

11.6 Useful links and further reading

Abstract

Newsgroups permit individuals to take part in a worldwide discussion on a specific topic of interest. A message is "posted" to a newsgroup usually by email or web form. Any other member of that discussion group can read and reply to the message. The BIOSCI bionet newsgroup network allows easy communication between life scientists world wide. This chapter provides a complete listing and a brief description of the bionet newsgroups and describes in detail the use of these newsgroups via a web browser and through dedicated news reader software.


Chapter 12

The Roaming Scientist: Get Online, Manage Your E-mail and Exchange Files from Everywhere

Andrea Cabibbo

Contents

12.1 Getting online

12.1.1 Host institution

12.1.2 Connect from home (Dial-up)

12.1.3 Internet Cafes

12.2 E-mail

12.2.1 How to use your work e-mail account from home or from abroad

12.2.2 Using a web-based e-mail account: read and send e-mail from any computer connected to the internet

12.3 Some tips on file exchange

12.3.1 FTP

12.3.2 Web Site

12.3.3 Web Sharing

Abstract

Science is an international business. Scientists often travel to other countries for variable periods of time and need to keep in touch and exchange material and information with their home lab and with collaborators worldwide. One of the most effective and simple ways to communicate and exchange documents, images, data and more general information is indeed e-mail. In most cases you will be able to use your e-mail account from all over the world, provided that the correct settings are entered in your e-mail application. E-mail has however some limitations as to the size of files that can be exchanged. Depending on the e-mail account, a variable limit on the size of attachments that can be sent and received exists. A limit also exists as to the total amount of megabites that can be stored in a personal mailbox on a mail server. This means that for the exchange of very large documents or very large amount of data, e-mail might be not well suited, and other systems have to be utilized, such as ftp, web sharing, the setting up of temporary simple web sites (see also chapter 4) or using an online storage facility.

In this chapter we will summarize the essential information required to read and send e-mail from everywhere (well, almost) and will provide some tips for the efficient exchange files of any (reasonable) size.


Chapter 13

Bio-Bookmarks

Andrea Cabibbo and Manuela Helmer-Citterich

Contents

13.1 Companies

13.2 Meetings

13.3 Laboratory protocols

13.4 Biological directories and sites

13.5 Microarray resources and databases

13.6 Protein interaction resources

13.7 Useful sites for lessons and presentations

13.8 Biology servers

13.9 Miscellanea

Abstract

Beyond the topics covered in the different chapters of this book, there are several other internet resources that can be of interest to biologists. In this chapter we shall try to give an overview of such resources, in order to complete the picture of the 'Bio-Web'. These and further links are available at http://cellbiol.com. This list is by no means complete or exhaustive.

Current Books: