The Internet: All You Wanted to Know and Didn't Dare to Ask
Lorenzo M. Catucci and Manuela Helmer-Citterich
Contents
1.1 A network of nodes
1.2 Network services
1.3 Short glossary
1.4 References
Abstract
In the last 10 - 15 years the computer became an essential companion for cell and molecular biologists. At first personal computers were mainly used as word processors or to produce nice pictures for papers or talks. In many research institutes mainframes were set up as mail servers and to host and run the first packages of accessible bioinformatic tools: the Staden (http://www.mrc-lmb.cam.ac.uk/pubseq/staden_home.html ; Staden et al., 2000), Intelligenetics (Intelligenetics Suite, Intelligenetics, Inc. Mountain View. CA) and GCG (now at http://www.accelrys.com/about/gcg.html) packages. Sequence databases were slowly starting to grow. No, or very little, organized information about the new tools was available in the academy, but a lot of know-how was passing hand to hand in the research labs.
Since then, things have changed a lot. Each personal computer is now much more powerful and flexible than those old mainframes. Many new and sophisticated tools were developed to help biologists in their work and, most importantly maybe, many tools for advanced communication became commonplace. This had a very strong impact on the experimental biologist's life.
Every computer, if equipped with an ethernet card (or a modem) and an internet connection, represents a node in an immense network. It becomes therefore a window to the outside world and the outside world offers a lot of interesting information: from Medline access to web pages dedicated to specific hot topics of biological interest. There is no hope of giving an exhaustive list of all the useful and interesting places that can be visited during an internet trip. It is always worth trying to look around, to bookmark new sites and explore different tools.
Almost every biologist has experience in the use of electronic mail and internet browsers, but sometimes feels not completely at ease with the matter. Very few biologists received a well-organized instruction in the use of informatics instruments for biology, so what to do when we need something more, such as choosing the best parameters in a sequence search and understanding all the implications of a complicated and sometimes almost unreadable output file? We look around and find nice web pages, full of information and we would want to be able to design our own web site: do we really need to ask someone else for help? We want to visualize a protein structure on the screen or try to understand the possible consequences of a residue mutation: can we afford to play with molecular graphics?
This manual was designed as a sort of 'cook book' to try to fill some of the gaps that may affect the life of a biologist who missed an organized preparation in basic informatics, but still wants to be able to take advantage of skilful use of computers and of the rich internet tool-set.
Let us start with a short description of the basic elements in computer networking.
Select the Right Computer
Michele Quondam
Contents
2.1 CPU
2.2 Memory
2.3 Hard Disk
2.4 Video Card
2.5 Monitors
2.6 Other parts
2.7 The computer power and costs
2.8 A computer to do what?
2.9 Choosing the operating system
Abstract
This is a simple guide to the computer market: if you need a computer, you can now discover how to get the best and cheapest solution for your specific needs. This chapter also provides some information about computer components and their impact on the overall computer performance.
Personal Internet Security
Michele Quondam
Contents
3.1 What is a virus
3.2 What is a hacker
3.3 Protection Software
3.3.1 Firewalls
3.3.2 Antivirus Software
3.3.3 Hardware Router with firewall features
3.4 Special e-mail attacks
3.4.1 Bombing
3.4.2 Spamming
3.5 Simple general security rules
Abstract
Some simple rules and information to avoid the most common problems about viruses, hackers, email attacks, and some general security issues.
Andrea Cabibbo
Contents
4.1 A global view
4.2 Designing and building the web site
4.2.1 Planning the site with pencil and paper
4.2.2 Building the site
4.2.2.1 Visual html editors
4.2.2.2 Bells and whistles (forms, counters, boards)
4.2.2.3 Short course of HTML: the basics
Abstract
It is increasingly likely that people wishing to contact you or to have information on your research activities will look for your departmental or personal web page. If have not already, the moment has come to build one. You will see that this is much easier than you might think.
This chapter is about building web sites. It will be assumed that the reader is not familiar with concepts such as html, web server and FTP; everything will be explained from scratch. After a global overview of the process, enough details will be given on how to plan and build the site to allow the reader to perform all the required steps by himself.
The world wide web was originally based on the Hyper Text Markup Language or HTML, which allows the display of both text and images on a page and provides tools to format the appearance of these elements. At this time, the web was basically a collection of static pages, often containing hyperlinks to other pages, so as to form a real "web" network.
Since these early days, the panorama has been enriched by the appearance of a number of more sophisticated programming tools, such as javascript, java, perl, php, XML and others, that allow a much tighter control of the appearance, function and behavior of web sites, often turning them into sophisticated online applications that allow, for example, searching of complex databases directly over the web and formatting the results according to your needs. This is the case for instance with web sites such as Pubmed, that allow access to Medline and sequence/structure databases.
The following chapter will focus exclusively on building sites the simple, old way, that is by using HTML. The basic concepts can be easily learnt with minimal initial effort. Once the basics are acquired, the reader will be ready to move to more sophisticated implementations.
It should be noted that HTML, despite being simple and old, is extremely powerful and will allow you to publish on the web nearly everything you could think of: text, data, images, downloadable files (documents, multimedia, powerpoint files etc.).
Using Search Engines and PubMed Effectively
Andrea Cabibbo
Contents
5.1 Directories and search engines
5.1.1 Directories
5.1.2 Search engines
5.2 Search syntax: the mathematics of search engines
5.3 Searching for scientific literature: the NCBI PubMed site
Abstract
It is estimated that at present more than one billion web pages exist, and thousands of new pages are created every day. In this scenario, finding specific information seems very difficult. However, thousands of 'indexes', 'directories' and 'search engines' exist that attempt to categorize the contents of the internet by various means. The directories range from argument-specific ones, such as for instance biological directories or architecture directories, to global directories that attempt to review all possible contents. A typical example of this latter type is the Yahoo directory (http://www.yahoo.com/). In Yahoo, all content is arranged into 14 parent categories (e.g. Art and humanities, Business and economy), each of which is subdivided into subcategories, in turn subdivided into sub-sub-categories, down to very specific subjects. For instance, information about PCR in Yahoo has the following path: Home>Science>Biology>Molecular_Biology>PCR. In search engines, the contents are not pre-distributed in categories but rather are searched by keywords. In this chapter we will provide essential information on directories and search engines, together with tips on how to use these resources efficiently, in order to find the right needle in the internet haystack. We will also briefly review the PubMed Boolean search syntax that allows very precise searches for specific research articles.
Online Tools for Basic Sequence Manipulation, Restriction Analysis,
PCR Primer Generation and Evaluation
Andrea Cabibbo
Contents
6.1 Restriction analysis
6.2 Basic sequence manipulation
6.3 PCR Primers generation and analysis
6.4 Sequence analysis servers and links
Abstract
The analysis of biological sequences often requires some preliminary basic manipulations. For instance it is often necessary to obtain the complementary sequence to a DNA sequence, to reverse a sequence, to get a list of the restriction enzymes cutting sites in a sequence, to translate a DNA sequence to a protein sequence, and so on. Many tools are available online to perform all these operations easily. Often more than one possibility is available to the user. We list here a number of tools freely available online. These and other links are also reported in the "sequence analysis tools" section of the Bio-Web, at http://cellbiol.com.
Barbara Brannetti and Allegra Via
Contents
7.1 Pairwise alignments
7.1.1 Alignments
7.1.2 Global and local alignment
7.1.3 Substitutions
7.1.4 Insertions and deletions
7.1.5 Statistical significance of alignments
7.2 Multiple alignments
7.2.1 Intoduction
7.2.2 Multiple alignments: why do we need them?
7.2.3 Global and local alignments
7.2.4 Substitutions, deletions and insertions
7.2.5 How do we obtain a multiple alignment?
7.2.6 Gene prediction and pattern matching
7.3 References
Abstract
This chapter is dedicated to the theoretical aspects of the analysis of nucleic and amino acid sequences. It consists of two main sections: a 'pair-wise alignments' part (section 7.1) and a 'multiple alignments' part (section 7.2) where the reader can find an outline of the concepts underlying pair-wise and multiple (DNA and protein) sequence alignments together with a theoretical discussion of the principles regulating the most important algorithms for sequence analysis. This is not essential for the comprehension and full usage of chapter 8 and chapter 9, but may help the reader who wishes to get a deeper view of the subject.
Therefore those who are interested in the practical use of sequence databases and programs for sequence analysis can skip this chapter and go directly to chapter 8 or chapter 9.
Barbara Brannetti
Contents
8.1 Genbank database
8.1.1 Description of Genbank database records
8.2 Database search
8.2.1 FASTA
8.2.2 How FASTA works, a step by step description
8.2.3 BLAST
8.3 Gene structure prediction
8.3.1 Filters
8.3.1.1 CENSOR
8.3.1.2 RepeatMasker
8.3.2 Looking for functional sites in DNA sequences
8.3.2.1 Promoter Scan
8.3.2.2 GrailEXP
8.3.2.3 GenScan
8.3.2.4 FGENE
8.3.2.5 GeneMark
8.3.2.6 WebGene
8.3.2.7 GeneId
8.3.2.8 PROCRUSTES
8.4 References
Abstract
The enormous amount of data coming from the various genome projects is stored within biological databases. Different tools have been developed both to search within the databases and to analyse and annotate the contained data. The aim of this chapter is to describe the more useful and used nucleic acid databases and to introduce the tools developed to analyse nucleic acid sequences. It is organized into three main sections. The first (8.1) deals with a description of the Genbank database, with details of the structure of the files containing sequence data together with some annotation. The second section (8.2) provides a user-friendly description of tools (FASTA and BLAST) for the comparison of a query sequence with a nucleic acid database. A detailed description of the more useful tools available for gene structure prediction is reported in section 8.3. The prediction of functional sites in a raw genomic sequence is still a hot research topic (cf. Fortna and Gardiner, 2001) and no easy solution and completely reliable tool can be presented so far. We suggest therefore trying different tools in order to compare the different predictions and identify the method that seems to be more reliable for the reader's specific problem.
Practical Aspects of Protein Sequence Analysis
Allegra Via
Contents
9.1 Protein sequence databases
9.1.1 Swissprot-TrEMBL
9.1.2 PIR
9.2 Pair-wise alignments and database searches
9.2.1 FASTA
9.2.2 Fasta3 output
9.2.3 BLAST
9.2.4 BLAST output
9.2.5 Alignment of two sequences
9.2.6 PSI-BLAST
9.2.7 PSI-BLAST output
9.3 Multiple alignments
9.3.1 CLUSTALW
9.3.2 MultAlign
9.3.3 Editing a multiple alignment
9.3.3.1 ALSCRIPT
9.3.3.2 CINEMA and JALVIEW
9.3.3.3 BOXSHADE
9.4 Hidden Markov Models (HMMs)
9.5 Motifs and patterns
9.5.1 Pattern and domain databases
9.5.1.1 PROSITE
9.5.1.2 BLOCKS
9.5.1.3 PFam
9.5.1.4 PRINTS
9.5.2 Servers for patterns and domains databases scanning
9.5.2.1 ProfileScan
9.5.2.2 BLOCKS server
9.5.2.3 SMART server
9.6 References
Abstract
This chapter is dedicated to the analysis of amino acid sequences. It is organized in five subsections. In the first and second the reader can find a user-friendly description of sequence databases and instructions to use some of the main tools for pair-wise alignments and database searches. Section 9.3 is dedicated to multiple alignments while section 9.4 is a very short introduction to Hidden Markov Models. Finally, section 9.5 is an overview of the most important pattern and domain databases and describes tools to use them for protein sequence analysis.
Given one or a set of sequences you can essentially perform:
1. Database searches looking for identical or similar sequences (for the detection of homology in the context of phylogenetic analysis and/or inference of function).
For these purposes sections 9.1 and 9.2 provides a description of the most widely used protein sequence databases and tools (programs and servers) for searches in such databases.
For this analysis we suggest the following steps:
· identify the most suitable database for your needs;
· select the most appropriate searching program
· perform your search.
The results of your search may be more or less biologically relevant. You can influence relevance and reliability by modifying the parameters of the searching program. If you do not feel self-confident in handling program parameters, we suggest using the default ones provided by the program itself.
2. A multiple alignment.
(a) one can align a single sequence to a multiple alignment of sequences provided by databases of protein families.
(b) one can build a multiple alignment starting from a new set of sequences.
You can find the tools for both these in section 9.3.
3. Pattern matching.
You may be interested in the identification of functional sites in a protein sequence (phosphorylation sites, glycosylation sites, etc.).
Section 9.5 provides a description of databases and tools for the identification of biologically relevant signatures in protein sequences.
Many of the programs described in this section can be used directly through the WWW. Others can be downloaded from the suitable web site and installed on a local computer.
From Sequence to Structure: an Easy Approach to Protein Structure Prediction
Fabrizio Ferré
Contents
10.1 Principles of protein structure
10.1.1 Introduction
10.1.1.1 Protein structure
10.1.1.2 Techniques for the experimental determination of protein structure
10.1.2 Structures databases
10.1.2.1 The Protein Data Bank and PDBSum
10.1.2.2 SCOP
10.1.2.3 CATH
10.1.2.4 DSSP
10.1.2.5 DALI, FSSP and HSSP
10.1.3 Visualization of molecular structures: molecular graphics tools
10.1.3.1 RasMol
10.1.3.2 SwissPDBViewer
10.1.4 Protein structure comparison
10.2 Protein Structure Prediction
10.2.1 Secondary structure prediction
10.2.1.1 Introduction
10.2.1.2 On the web
10.2.2 Homology Modelling
10.2.2.1 Introduction
10.2.2.2 On the web
10.2.3 Fold Recognition
10.2.3.1 Introduction
10.2.3.2 On the web
10.2.4 Ab initio Prediction
10.2.4.1 Introduction
10.2.4.2 On the web
10.2.5 Evaluation of structure prediction methods
10.3 Transmembrane topology prediction
10.3.1 Introduction
10.3.2 On the web
10.4 Links
10.5 References
Abstract
The analysis of the three-dimensional structure of a protein can be very helpful in the design of experimental procedures aimed at the understanding of protein function. Experimental techniques as X-ray diffraction and Nuclear Magnetic Resonance are used to determine protein structures that are then stored in freely accessible databases. Molecular graphics software are also freely or commercially available to examine these structures. The protein structure generally depends only on the primary structure and on environmental conditions. Extrinsic factors, such as chaperones or the creation of disulfide bridges, may assist the folding process but are often not essential to it. Consequently, the protein three-dimensional structure may in principle be inferred by the sequence itself. While the experimental procedures to determine the protein three-dimensional structure are becoming faster and more reliable, the number of known sequences exceeds by far the number of known structures. Several methods have been developed to predict the protein structure from the sequence, and a number of them are freely available on the internet and easy to use. Modeling by homology is the more reliable method to predict protein structure: it is based on the assumption that, if two proteins share a high (or reasonably high) sequence identity, their 3D structure will also be similar (or reasonably similar) with good reliability.
Let Others Solve your Problems: the Newsgroups
Richard P. Grant
Contents
11.1 Usenet for beginners
11.2 Bionet
11.3 Access and (n)etiquette
11.4 How to use a news reader
11.5 Whither Bionet?
11.6 Useful links and further reading
Abstract
Newsgroups permit individuals to take part in a worldwide discussion on a specific topic of interest. A message is "posted" to a newsgroup usually by email or web form. Any other member of that discussion group can read and reply to the message. The BIOSCI bionet newsgroup network allows easy communication between life scientists world wide. This chapter provides a complete listing and a brief description of the bionet newsgroups and describes in detail the use of these newsgroups via a web browser and through dedicated news reader software.
The Roaming Scientist: Get Online, Manage Your E-mail and Exchange Files from Everywhere
Andrea Cabibbo
Contents
12.1 Getting online
12.1.1 Host institution
12.1.2 Connect from home (Dial-up)
12.1.3 Internet Cafes
12.2 E-mail
12.2.1 How to use your work e-mail account from home or from abroad
12.2.2 Using a web-based e-mail account: read and send e-mail from any computer connected to the internet
12.3 Some tips on file exchange
12.3.1 FTP
12.3.2 Web Site
12.3.3 Web Sharing
Abstract
Science is an international business. Scientists often travel to other countries for variable periods of time and need to keep in touch and exchange material and information with their home lab and with collaborators worldwide. One of the most effective and simple ways to communicate and exchange documents, images, data and more general information is indeed e-mail. In most cases you will be able to use your e-mail account from all over the world, provided that the correct settings are entered in your e-mail application. E-mail has however some limitations as to the size of files that can be exchanged. Depending on the e-mail account, a variable limit on the size of attachments that can be sent and received exists. A limit also exists as to the total amount of megabites that can be stored in a personal mailbox on a mail server. This means that for the exchange of very large documents or very large amount of data, e-mail might be not well suited, and other systems have to be utilized, such as ftp, web sharing, the setting up of temporary simple web sites (see also chapter 4) or using an online storage facility.
In this chapter we will summarize the essential information required to read and send e-mail from everywhere (well, almost) and will provide some tips for the efficient exchange files of any (reasonable) size.
Bio-Bookmarks
Andrea Cabibbo and Manuela Helmer-Citterich
Contents
13.1 Companies
13.2 Meetings
13.3 Laboratory protocols
13.4 Biological directories and sites
13.5 Microarray resources and databases
13.6 Protein interaction resources
13.7 Useful sites for lessons and presentations
13.8 Biology servers
13.9 Miscellanea
Abstract
Beyond the topics covered in the different chapters of this book, there are several other internet resources that can be of interest to biologists. In this chapter we shall try to give an overview of such resources, in order to complete the picture of the 'Bio-Web'. These and further links are available at http://cellbiol.com. This list is by no means complete or exhaustive.
Current Books: