Gordon Findlay.
The purpose of this chapter is to introduce the Internet, briefly describe its history and governance and show some of the ways in which it can be used for serious scientific purposes.
Chapter 2. MACHINES, USERS, AND ADDRESSES.
Gordon Findlay.
The purpose of this chapter is to introduce the principle of machine addressing. The chapter describes how computers are addressed using a numbering system, how that numbering system is made easier for us to understand using a hierarchical naming system and how several users on one machine can be addressed.
Chapter 3. INTERNET SERVICES.
Gordon Findlay.
This chapter sets out to describe the sorts of ways that information is distributed across the Internet. There are many different forms of communication and information sharing available, each with their own strengths and weaknesses.
Traditionally, each type of service has been handled by a separate piece of software (for example, gopher, ftp...). Increasingly the more sophisticated packages, particularly WWW browsers such as Mosaic and Netscape, provide a more integrated environment, with many The traditional text-based interfaces are being replaced with graphical interfaces, in either the Macintosh or MS-Windows environments, or running over X Windows. However, the text-based services still have some advantages, and will be with us for some time yet, for three reasons. Some Internet users have only text-based terminals available. Text-based systems have a lower bandwidth requirement, and when users must pay by the volume of data moved text interfaces will be much cheaper. Over slow links, text interfaces, with the much lower amount of data to be transmitted, are much more efficient. Of course, text based systems require more effort to master, and cannot give access to information in graphical form.
The software used to implement these services differs markedly from machine to machine. For example, a Macintosh program will, if it is any good, make full use of the special features of the Macintosh user interface. Obviously the same program would be useless for a dumb terminal connected to a Unix computer.
Chapter 4. CONNECTING.
Gordon Findlay.
There are many ways of bridging the gap between your chair and the Internet. Getting access is generally not a problem. However, given the wide variety of ways in which it is possible to connect to the Internet, it pays to have some understanding of the options, and the strengths and weaknesses of each.
There are two ingredients in the connectivity recipe: the Òservice providerÓ and the channel used to connect with the provider. While both are important; it is the combination of these which determines your capabilities, and the range of services available to you.
Chapter 5. SOFTWARE POINTERS
Gordon Findlay.
It is not possible to give a run down here of even a small proportion of the software packages used by molecular biologists on the Internet.
Rather, we point out some of the things which you should expect packages to do, include some advice about choosing between competing products and add a collection of folkloric observations.
Chapter 6. CONTRIBUTING TO THE INTERNET.
Gordon Findlay.
Publishing your own data, software and ideas using the Internet can be a very satisfying experience, or a very frustrating one. Satisfying if you succeed in distributing worthwhile material to people who can make use of it; frustrating if you spend a lot of your time fiddling with computer and communication problems.
If you have resources you wish to share, there are a few questions which you must answer before you determine how to go about it:
How shall the resources be distributed? By anonymous FTP? Via Gopher, or via Email to requesters? Using WWW pages?
Where should you host the resources? Your own institutional system? Or would it be preferable to have your material distributed through one of the established biomedical Internet sites?
How will the resources be made known?
How often will the the material be updated? How can you track users of your resources, to advise them of updates?
Where will you obtain the communication and computer expertise required?
How much time have you to devote to maintaining these resources, answering questions about them and solving the problems which are sure to arise once other scientists start to use them?
What will happen if (when?) you get that fabulous new job in another place, or change your research interests?
Chapter 7. SEQUENCE RETRIEVAL AND ANALYSIS USING ELECTRONIC MAIL SERVERS
T. S. Pillay
This chapter provides an introduction to the various Email servers available to the molecular biologist in as far as they pertain to sequence analysis. The reader is particularly referred to the summary table of Email servers which may be copied and pinned up next to the computer terminal. From this readers can select the server most suited to their needs. Although the most popular servers are discussed in some length (eg. Retrieve and Blast), a discussion of the services provided by every server is beyond the scope of this chapter and detailed documentation is readily and rapidly available by sending a ÒhelpÓ message to the electronic mail address listed in the summary table. If, for any reason, communication with the server is problematic then an address to report problems is also provided in the table. The more advanced reader is referred to the list of references provided at the end of the chapter.
Sequence analysis by Email server is useful for scientists who do not have access to onsite mainframe computers or expensive software for doing sequence analysis. These involve obtaining an account with the mainframe and the user is charged a fee according to the amount of CPU time used. Most individuals at universities have free access to electronic mail and sequence analysis by electronic mail removes this impediment. Furthermore, the user has access to a number of servers and databases which may offer slightly different services. Email servers also use more powerful hardware and consequently more sophisticated sequence analysis is possible than that available on a personal computer. The only disadvantage and a minor one, is that the request gets queued and at busy times the user may have to wait several hours for the results of an Email search or Email analysis.
There are probably two main tasks which users may wish to perform:
a) Obtaining a sequence
b) analysing it in detail to search for motifs and similarities with other proteins.
The sequence may initially be in the users possession as a result of a cloning project. In this instance, the first task will be to determine if the sequence exists in the various database with a view to answering the question of whether the gene for this protein has been cloned and sequenced before. This will be dealt with later in the chapter. There are more than 40 Email servers that are able to provide a wide variety of sequence analysis functions ranging from simple database search and retrieval to powerful sequence analysis and comparison. This essentially means a user sitting at home or in the lab equipped with a personal computer and a modem can have access to a worldwide assortment of extremely sophisticated sequence analysis functions controlled by simple Email text messages. The Electronic mail servers are computers programmed to respond in a specific and automated way to text messages received in a rigidly defined format. The message sent to the server is in essence a set of commands in a specific format defined by each server. Any deviation from this causes the server to generate a programmed response usually consisting of mailing the ÒhelpÓ file for that server.
The first step in sequence analysis usually involves actually extracting the sequence in a computer-readable format for a variety or purposes including restriction analysis, primer and oligonucleotide designs. Published sequences are usually submitted to a paticular database and the sequence is given a unique identifier, the accession number, which makes it possible to extract the sequence by this identifier alone. Not having an accession number makes the task a little more difficult as one is then compelled to search by using a keyword, authors name, citation etc.
Chapter 8. Computational Gene Identification
James W. Fickett and Roderic Guig—
In this chapter, we present a guide to internet resources for identifying genes in nucleic acid sequences. By ÒidentifyingÓ we mean both locating the genes and, when possible, assigning them a tentative function. There has been a great deal of progress in gene identification methods in the last few years. At least in the case of sequence data from mammals, C. elegans, and E. coli, the older coding region identification methods have mostly given way to methods that can suggest the overall structure of genes. For all organisms, computational methods are sufficiently accurate that they give practical help in many projects of biological and medical import.
However the situation with respect to services is not simple. The choice of a program or programs to use depends, for example, on what organism is being studied, whether one is analyzing single sequence runs or large assembled sequences, and how much effort the user is willing to expend. It would be very convenient if one program, in one place, could do everything needed in the way of gene identification, but we are unlikely to enjoy this ideal soon.
To help the user choose the most appropriate services for each situation, we will briefly describe the kinds of information that each program or database is capable of supplying. The overall flow of the chapter is intended to provide a protocol for gene identification using a number of services (not all of which need to be included in each case)
Only a small fraction of the relevant background, literature, and resources available can be mentioned here. For more details and other points of view there are a number of related reviews, of which the following are perhaps closest to the goals of the current work: [Gribskov & Devereux 1991; Adams, Fields & Venter 1994; Fickett 1995; Gelfand 1995; Snyder & Stormo 1995b].
Most programs require a particular sequence format, have limits on the length and number of sequences that may be submitted for analysis and, except in the case of database searches, are designed for only one or a few specific organisms. These details are constantly changing and will not be given here. Instead we will describe how to obtain the current documentation for the service. Only one recent reference is given for each service; usually it will contain references to related work.
Emphasis is given to services, that is, to programs that are available for remote execution over the internet. All these services respect the privacy of their users, and guarantee not to keep any record of sequences analyzed. However some investigators may, for maximum security or for higher throughput, prefer to install the analysis programs locally. Thus we also mention, where applicable, availability of source code. Similarly, we will also mention programs which are not strictly internet servers, but whose source code or executables can be obtained through the internet.
Network access information is given for each service. Unless otherwise mentioned, an address labeled Òe-mailÓ is for an e-mail server program; one labeled ÒinquiriesÓ is answered personally by one of the developers of a program; ÒftpÓ is for an anonymous ftp site where source code or data may be retrieved, and ÒWWWÓ is for a World Wide Web page with information about the program and, sometimes, interactive use of the program.
Chapter 9. THE BIOSCI/BIONET ELECTRONIC NEWSGROUP NETWORK FOR BIOLOGISTS
David Kristofferson
Beginning in late 1993, the Internet started making headlines in the mass media, primarily because of the advent of the Mosaic World Wide Web (WWW) browser produced at the University of IllinoisÕ National Center for Supercomputing Applications.
Despite the headlines caused by WWW, when this author polled biologists on several recent seminar tours, it was plain to see that electronic mail was still the dominant Internet application used by research biologists.
The reason for this is simple. The Internet is a communications tool. E-mail is the most widely available communications application, included as a standard software feature on most computer systems beyond the desktop level. Gopher, USENET news software, Mosaic, etc., usually have to be installed after a computer is up and running, and this sometimes presents a barrier for scientists if they don't have good local systems support. However, virtually everyone with an account on a networked computer can access e-mail, the only barrier being the self-imposed one of resistance to learning how to use it. Knowledge of how to send, read, and reply to e-mail is therefore our assumed foundation for the discussion that follows.
Chapter 10. Real-time Collaboration On the Internet:
BioMOO, the Biologists' Virtual Meeting Place
Gustavo Glusman, Eric Mercer and Irit Rubin
The Internet, the worldwide computer network, is the embodiment of the virtual universe that is usually called ÒcyberspaceÓ. In this virtual space, biologists can find a wide range of tools not generally available in local university or research environments, such as the massive genome-related databases. Even if many of these tools are available locally, one will generally find on the network larger databases and more powerful computers with state-of-the-art software.
One of the most valuable resources on the Internet, though, is the diversity of the human beings using it. A large database can be copied, a faster machine can be bought, but there is typically a strong limit to the number of people doing closely related work at a single institution. If there are five people in the world working on a specific system, it is unlikely that they'll get together often to share and discuss their work.
In contrast, the Internet provides the connectivity needed to enable these scattered researchers to join efforts. Researchers from many fields now maintain closer connections to their colleagues all over the world via electronic mail and conferences. They don't need to travel physically around the globe to meet each other. Instead, they can 'meet' virtually in cyberspace. The computer network provides a common ground that doesn't have a true physical location, and can be accessed by anyone, from their own networked computer.
For these reasons, considerable work has been put on developing communication protocols and software. There now exists a plethora of computer programs for many different platforms (e.g. UNIX systems, Macintoshes, IBM PC's), each helping users communicate in different ways. These 'Internet services' model well-known, everyday communication systems. For example, sending electronic mail is analogous to sending mail; the UNIX 'talk' protocol is analogous to holding a phone conversation; the World Wide Web (in its simplest form) is analogous to reading magazines, and with its multimedia extensions, it can provide real-time video and audio, like television.
Of all the Internet services, the most exciting are those that provide for real-time communication with other people. Such real-time communication in virtual spaces can contribute to the spontaneity and creativity of a scientific discussion: reading announcements from a campus bulletin board can be useful, but sitting in the cafeteria or sprawling on a campus lawn, and brainstorming with your colleagues, can spell the difference between a simple exchange of data and the spirited discussion that inspires new scientific insights.
The following section describes a visit to a virtual meeting place for scientists, called BioMOO.
Chapter 11. INTERNET RESOURCES FOR HUMAN AND MOUSE MOLECULAR GENETICS
M.A.Kennedy
It is now over five years since Walter Gilbert, in a prescient commentary in ÒScienceÓ, urged molecular biologists to develop computer literacy and to Òhook our individual computers into the worldwide network that gives us access to daily changes in the databases and also makes immediate our communications with each otherÓ (Gilbert, 1991). If you have not yet heeded this advice then you have much to gain by learning now because, as Gilbert anticipated, the Internet is where we find many of the key tools with which we perform our craft.
This chapter introduces some of the Internet resources that are of use to researchers working on human and mouse genetics. Given the breadth of this field, and the rate at which Internet resources grow and change, this cannot be a comprehensive review; the aim is simply to give sufficient information to get new users familiar with key resources. The main resources described here offer on-line guides and help documentation that are updated with each improvement to the service offered, and it is neither possible nor wise to commit anything other than an introductory overview to paper! Once a basic familiarity is achieved, it is straightforward to explore the resources in depth and extend your skills without recourse to a written manual.
Most of the services described in this chapter can be accessed in several ways, and where possible details for e-mail servers, gopher clients, or WWW browsers are given. Some of the databases also offer ftp access to provide transfer of large amounts of information to your own computer. Although in most cases similar information can be obtained by several routes, WWW sites for browsers such as Mosaic or Netscape are quickly becoming the standard for most Internet resources.
Chapter 12. INTERNET RESOURCES FOR FUNGI
Kathie T. Hodge
Resources of particular interest to those studying fungi are listed below. The two most comprehensive starting points are the branches of the WWW Virtual Library covering mycology and yeasts.
Chapter 13. INTERNET RESOURCES FOR INVERTEBRATES.
Steven J.M. Jones and David Hodgson
Resources of particular interest to those studying invertebrates.
Chapter 14. INTERNET RESOURCES FOR PLANT MOLECULAR BIOLOGY
Stephen M. Beckstrom-Sternberg, Gail Juvik, Doug Bigwood, John Barnett, Jon Krainak, Marty Sikes, Jill Martin, Michael Shives, Sam Cartinhour, Stephen Heller, and Jerome P. Miksche
Chapter 15. MICROBIOLOGICAL RESOURCES ON THE INTERNET
Martin Latterich
Current Books: