Building and Utilizing Biological Databases

RIT Bioinformatics Workshop for Educators

July 2003

 

Hands-On Exercises

 

 

1.  NCBI exploration, basic navigation, educational resources

 

Go to the NCBI web site:  http://www.ncbi.nih.gov

 

The web pages at this site contain a navigation bar along the left-hand side of the page with important links and information.  Web pages generally also have a row of links across the top of the page to the most popular software available through Entrez (in a narrow dark blue strip).  Right under these links in a light blue strip is the word Search.  This is a direct link to many of the database collections at the NCBI site.

 

1.  Explore the main NCBI page – there are many links as well as recent news and information.  If there is an active link to the SARS web page click on it, if not, here is the direct URL:    http://www.ncbi.nlm.nih.gov/genomes/SARS/SARS.html

Click on the “See 3CL-PRO Related Structures” link on the right side of the page.  On the next web page click the first link on the left-hand side of the page under “Structure” (1LVO).  In the next page click the down arrow  next to “with” and select Rasmol.  Click the “View 3D Structure” button, when the download window comes up save the file on the desktop and drag it onto the Rasmol icon (also on desktop) to open it and view.  Close the structure viewing window.  Click the PDB (Protein Data Bank) link (1LVO) – it’s at the end of the line that begins “Reference” (above the “Display” text box).  This takes you to the PDB record for the protein.  From here click the Download/Display file link (on the left) – you can download/display the structure in a variety of formats, 3-d (we already looked at the structure) or (you can try it, it’s interesting but might be slow) a text file or html.  Return to the main NCBI page (the back button is your friend).   Note the links down the right-hand side of the page, including a link to ORF-Finder which was mentioned in the Gene Identification session on Thursday.

 

2.  Along the top of the page is another set of links.  Click on the PubMed link (under the NCBI logo in the upper left corner) and look around the main PubMed page.  Use the PubMed search by typing leptin in the text box by Search and clicking “Go”.  (Computer scientists, this is the so-called “obesity gene” (the obese mouse) mentioned in the Development and Pathogenesis session on Thursday.)   How many articles on leptin has PubMed indexed?  Click on any article in the resulting list to see its abstract.  There are links from the abstract page to full-text articles (some may not be freely available) as well as other related items (see “Links” on right side of page). 

 

3.  Find the long, dark bar that extends across the web page near the top and click on the Nucleotide link.  Look around the main Nucleotide information page.  In the box containing the text “Human Genome” (in orange) click the link to Map Viewer and click one of the chromosomes to view it. (Click the number to bring up the map.) 

 

Back up to the main Nucleotide page (2 pages back) and click on the “Search for Genes” link (to LocusLink) in the left-hand blue bar.  Under NCBI Genome Guides (in blue bar on left) click on Human.  This page gives a lot of good information you might use in a course (including health information) – it is culled from the vast repository and is a nice collection of different types of information from NCBI.  NCBI has recently put together a complete guide to its resources (The NCBI Handbook) – there is a link from this page to the handbook.

 

Back up to the main Nucleotide page and type leptin in the search box for Nucleotide (press enter or click Go to trigger search).  Notice that there are many patented partial gene sequences in this list.

 

4.  Click on one of the hits from this nucleotide search to view the record (the default view is in “GenBank” format).  Save this record to a file by clicking the “Send To” button – make sure that “file” is selected from the pull-down menu next to “Send To” (it should be the default).  Save this file to the folder for this lab – c:\bio_temp\rit_workshop\db\perl  

IMPORTANT: For “Save As Type” you should make sure it says All Files (the default is probably Document).  Give the file the name my.gb – note that you must type the file extension – gb indicates a file in GenBank format.  With this extension Windows will not know how to open the file but BioPerl will recognize the extension and know that the file is in GenBank format.  We’ll use this file in the next  set of exercises.

 

5. Click on the other links available in the main “Entrez” interface (across dark bar near top of page) – Protein, Genome, Structure, Taxonomy, OMIM, and Books.  The books are a terrific resource, particularly for illustrations and for learning more about biology.

 

6.  Go back to the main NCBI page and scroll down.  There are two great links for educators on the left-hand side of the page: Education and FTP Site

 

7.  Click the Education link – you will see there are tutorials, slides, links to information about the NCBI field courses, etc.  This is a great resource.

 

8.  Go back and click on the FTP site link – it takes you to a page where you can access software and data files for downloading. 

 

9.  Go back to the main NCBI web page and minimize the browser, we’ll use it in a later exercise.

 

Downloads and other cool links:

 

10.  http://www.nature.com/nrn/journal/v2/n1/animation/nrn0101_043a_swf_MEDIA1.html

 

11.  http://www.garlandscience.com/ECB/about.html (book used in Gary’s Central Dogma talk)

 

12.  http://www.tigr.org/software/glimmer/  (Glimmer gene id software)

 

13.  http://www.bernstein-plus-sons.com/software/RasMol_2.7.2.1/README.html (Rasmol -- download version 2.7.1.1 (Windows) by clicking the 8 under binaries (the windows executable) then raw under help (the help file)

 

14.  http://www.dnai.org/geneboy/index.html “Gene Boy”, easy and very fun tool for playing with sequences (Thanks to Jeffrey Kushner)

 

 

2.  Parsing with BioPerl

 

For this exercise we will use the Windows “Command Prompt” window.  An easy way to start this window is to click on Start, then Run, and in the window that pops up type “cmd” and click “OK” to start it up.  (Note that many schools have disabled the Run command in public labs and some have disabled the command prompt window entirely.) You’ll need to change into the correct directory (folder), to do so type:

cd c:\bio_temp\rit_workshop\db\perl

 

command        meaning                                  example

cd                    change directory                       cd c:\”documents and settings”\

dir                    list files in current directory        dir

perl                  run Perl interpreter                    perl seqstuff.pl

notepad            open a notepad window            notepad myprogram.pl

del                    delete a file                               del my.gb

ren                   rename a file                             ren my.gb. your.gb

 

1.  Open the program seqstuff.pl in notepad (notepad seqstuff.pl)

 

2.  Look through the program to see if you can get the gist of it.  Put simply, the program reads a GenBank record from a text file into a “sequence object” (the variable $seqobj in the program).  A sequence object contains different components corresponding to fields in GenBank format records.  You can access these components by calling special functions on the object:  seq() is a function that returns the sequence from the record.  To call a function on an object use the “arrow notation”.  For example, $seqobj->seq()gets the sequence from $seqobj and returns it as a string.  If this is confusing, ask a computer scientist!  Sequence objects along with their associated functions are defined as part of BioPerl – a large collection of Perl modules you can incorporate into Perl programs you write. 

 

3.  Run this program by typing perl seqstuff.pl at the command prompt.  It will ask you for a file name, so type the name of the file where you saved your GenBank record from the last exercise (my.gb). 

 

4.  Open the GenBank record you downloaded in the last exercise in Notepad

(notepad my.gb) and compare what is there to the program output.  The components printed by the program are only a few of those available.  If you want to see more check the bioperl web site – http://doc.bioperl.org/releases/bioperl-1.2/   This is the documentation page for all of their modules.  The module of interest is called Seq.

 

5.  Run the program again on at least one other file (b.gb, rb.gb, fly.gb are all in the directory).  To bring back the previous command in the command windows just type the up-arrow (continuing to hit the arrow scrolls back through previously-typed commands).  In other words, there’s no need to retype perl seqstuff.pl, just tap the up arrow until you see it appear at the prompt. Try running the program on a non-GenBank formatted file.  What happens?

 

6.  Close all the notepad windows and minimize, but don’t close, the command prompt window – we’ll use it again soon.

 

 

3.  A Plethora of formats – NCBI and BioPerl conversion programs

 

Maximize your web browser (NCBI web site:  http://www.ncbi.nih.gov )

 

1.  Do a Nucleotide search (you may need to change the search type since you’ve done some searching already – down arrow in the light blue bar across the top of the page next to Search) for Neurofibromatosis 1. 

 

2.  Scroll down the page until you find the link for NM000267 (may be 5th in list).  Click on this link to display the information in GenBank format. 

 

3.  Look at the GenBank record – it’s very large but make sure you find the sequences (protein and nucleotide). 

 

3.  At the top of the page click the down arrow for the box next to the Display button – it should initially say “Default”.  Select, in turn, each of the different formats available for displaying the record.  To update the display click the Display button each time you select a new format.  Pay particular attention to the size of the XML-formatted record – if it’s taking too long to load click the stop button and go on to the next format type. FASTA is a particularly popular way of representing sequence information.

 

4.  When you’re finished viewing the formats, display the record in GenBank format and save it using the send to button.  (If you need to see previous instructions for saving a .gb file.)  Put it in the same folder as the other file and call it nf.gb

 

5.  Maximize the command window we used for the second lab exercise.  Type the dir command, you should see nf.gb displayed in the list of files. 

 

6.  You will now run two conversion programs that use BioPerl modules.  The first takes a file in GenBank format and outputs it in Fasta format.  The second takes a file in Fasta format and outputs it in EMBL format. 

 

7.  Look at the GenBank to Fasta conversion program in notepad by typing

            notepad gbTOfasta.pl

The program simply inputs from one file and outputs to another, in each case specifying the file format desired.  BioPerl takes care of the conversions.

 

8.  Run the GenBank to Fasta program by typing:

            perl gbTOfasta.pl

When you are prompted for the input file give the name of the file you saved, nf.gb

 

9.  If you type dir again you should see a file called nf.fa that the program created, open it using notepad (notepad nf.fa)

 

10.  Look at and run the Fasta to EMBL conversion program in the same way, its name is fastaTOembl.pl – for an input file use nf.fa (the file you just created from the GenBank format nf.gb file).  Of course, you can convert directly from GenBank to EMBL format, the purpose of showing you both programs is to give you a pattern for conversion programs that use BioPerl.

 

11.  Once you’ve run that program you should see a file called nf.embl when you type dir, look at this file with notepad (notepad nf.embl)

 

12.  BioPerl includes numerous file formats that can be used in simple conversion programs like these.  To see a list you can go to http://doc.bioperl.org/releases/bioperl-1.2/ and look at the documentation for the SeqIO module – the synopsis describes these simple conversions.  A list of supported file formats is near the bottom of the entire module description in the first part of the section on Constructors.

 

13.  Close all of the notepad windows you may have opened as well as the command prompt window.

 

14.  To use BioPerl under windows (software is all free):

(1) download and install Active State Perl

            http://www.activestate.org/

(2) download and install BioPerl – requires installing a few extra Perl modules, make sure to read the most recent linked document at the BioPerl site about installing under Windows!

            http://www.bioperl.org/

 

 

4.  Relational Databases (MS Access)

 

This exercise is designed to show you a very simple relational database in MS Access.

 

1.  Open the MS Access database called Plant in c:\bio_temp\rit_workshop\db\ 

This database has three tables in it called experiments, personnel, and points.  The main database window that displays the components (tables, queries, etc.) should be open.  Make sure that “Tables” is clicked on the left-hand side of that window.  Double click on each table in turn to open it.  You can have all three open at once – use the minimize button, moving, and resizing to adjust the layout in Access.  Look at the tables and the data they contain.  If you double click on the “picture” field in the experiments table you will display a picture – MS Access allows for the storage of a variety of media.

 

2.  Close the tables and click back in the main database window – you should see the “Relationships” icon appear in the toolbar at the top:

 

3. When you click on the Relationships icon you see a page that shows how fields from different tables are related to one another.  The infinity symbol stands for “many”.  The experiment ID appears only once for each experiment in the experiments table, but may appear many times in the points table (many data points may be associated with the same experiment).  Similarly, the same scientist ID may appear many times in the experiments table (one scientist carries out many experiments), but only once in the personnel table, thus the relationship is many to one from experiments to the personnel table.  NB: You may need to adjust the size of the display boxes in order to display all the fields for each table.  Close the relationships view after you’ve looked it over.

 

6.  Close MS Access.  Caveat: Access does not fully support SQL, some query types don’t work.

 

 

5.  NCBI: Entrez revisited (suggested exercise to do with students)

 

We will look at some more of the databases available through Entrez, the NCBI interface to all of their data.  Maximize a browser and make sure you are at the main NCBI web page (http://www.ncbi.nih.gov)

 

1.  From the main NCBI web page click on the top link in the left-hand blue bar (SITE MAP).  The site map shows you a comprehensive (linked) listing of NCBI resources.  This is particularly helpful for exploring their databases.

 

2.  Go back to the main NCBI web page.  (Note that the rest of this exercise is based on the Entrez Tutorial – http://www.ncbi.nlm.nih.gov/Entrez/tutor.html.)

 

3.  Do a nucleotide search on “colon cancer”.  The results should comprise nearly 11,000 hits.  Click on “Limits” to pare down the results (the “Limits” link is just under the text box containing the original search string).  Click the down arrow next to “All Fields” and select Title.  This means that only records with “colon cancer” in the title will be retrieved.  Note the many other possible choices.  Click the down arrow next to “Only From” and select RefSeq (a curated database).  Now click “Go” to search using these new restrictions.

 

4.  You should have around 40 results.  Find Accession Number NM 000249 in the list of hits and click on it to bring up its RefSeq record (remember, the search was restricted to RefSeq).  This is the record for the MLN1 gene in homo sapiens. 

 

5.  Click “Links” on the right hand side of the page to bring up a menu of related information.  This feature demonstrates the real power of Entrez and its database integration.

 

6.  Select PubMed from the Links menu – the articles displayed are exactly those that are part of the RefSeq record for the MLH1 gene  – you can check this when you come back to the RefSeq record, which you should do once you’ve looked over the list of references.

 

7.  Select Protein from the Links menu and click on the NP 000240 record.   From this protein data record click on Blink (same location as “Links” on nucleotide record) – this shows the best alignments already discovered using BLAST searches.  It is now not always necessary to do a BLAST search, you should check blink to see if it’s already been done.

 

8.  Go back to the protein record and click on Domains (next to Blink).  Click all three different Show buttons on the page that comes up and ask a biologist to explain protein domains (you will learn more about them later today).

 

9.  Get back to the nucleotide record we left in step 7 (click Back as many times as necessary).  In the Links menu click on SNP.  Single Nucleotide Polymorphisms (pronounced “snips”) are locations where individuals’ nucleotides differ – they are places where variation in the genome is found.  Some variations are benign while others are deleterious.  The SNP data is submitted by individual labs to dbSNP and is aligned to the corresponding mRNA using BLAST.  You can examine SNP records by clicking on their respective ID numbers.

 

10.  Return to the nucleotide record and from the Links menu select OMIM (Online Mendelian Inheritance in Man).  OMIM records describe known allelic variants that have been reported in the literature.  From OMIM click on the link to the 120436 record in the MIM database.  The text will be displayed by default.  To see a list of allelic variations click the link in the blue bar on the left-hand side of the page.  Click some of the other links if you are interested. 

 

11.  Minimize your browser, we’ll be going back to the main NCBI web page in exercise 7.

 

 

 

 

6.  Spreadsheets to flat files and databases

 

1)      Download the file microa.xls.  This file contains a small fraction of (processed, not raw) data generated by a microarray experiment. 

 

2)      Open the file (MS Excel).  Select the “Save As” option from the File menu

 

3)      Click the down arrow next to the “Save As Type” text box.  If you are unfamiliar with the many possible formats in which you can save the spreadsheet scroll through the options.  Make sure that you save the text file in the folder for this lab (browse to c:\bio_temp\rit_workshop\db)  Choose to save this as “Text (Tab delimited)”.  You will see a warning about losing formatting, it’s ok.

 

4)      Close the Excel file – note that once you’ve saved the file as text you are working with a text file as far as Excel is concerned – you will get a window asking if you want to save what appears to be your Excel file as text before you close it, just click “No”.  Note that if you’re going to open an Excel file specifically to create a text version it’s easiest if you don’t make any changes to the Excel file itself at that time, just save as text then close Excel.

 

5)      Open the text file and look at it.  It would be relatively easy to parse this file using Perl, correct?  You could use the split command and split on the tab character (“tab delimited”) to get the values that were originally stored in the spreadsheet columns.

 

6)      Now start MS Access (the database program we used earlier) and click “File”, then “New”.  Create a new blank database.  Go back to “File” and select “Get external data”, then “Import”.  In the dialog box change the file type designator from “Microsoft Access” to “All Files” and browse to your MS Excel file (microa).  Access can automatically create a database from an Excel file.  If the Excel file has column headings they will by default become the database field names.  (Access can also create a database from a delimited text file.)  Once you finish double click the icon for the table that’s been created to view it.  Access can also link to a spreadsheet rather than importing its data.  This means that updates to the spreadsheet would appear in Access but you could not delete the spreadsheet or the database table would also disappear.   

 

7)      You can also open a delimited text file in MS Excel.  Start Excel and open the text file you created – turn it back into an Excel file and give it a new name and save it.  This gives you some idea of how you can munge through a text file and put it into Excel.  It’s common to save Excel data at tab delimited text, process it with Perl, then put what you’ve processed back into Excel to take advantage of its many built-in functions (e.g. for statistical analysis).

 

8)      Close Excel and all of the various notepad windows you have opened.

 

 

7.  BLAST searching

 

From the main NCBI web site click on the BLAST link (along that top bar).  There are many BLAST variants.  Nucleotide BLAST is used when the query sequence and databases to search are nucleic acid sequences.  Protein BLAST is used when query sequence and databases are protein sequences, and Translated BLAST allows you to use a nucleotide sequence query against protein databases (or vice versa).  

There are many parameters you can set for BLAST searches.  Interpreting the results is tricky without guidance.  There is an excellent BLAST tutorial on the NCBI site that is strongly recommended:

            http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/information3.html

To get a feel for using BLAST try the following exercise, which was created by NCBI (problem sets on-line) then adapted by  http://nh-brin.unh.edu/Bioinformatics/Tutorials/DinoDNA/  and considerably altered and adapted for this lab.

Jurassic Park" Dino-DNA Analysis

In 1990, Michael Crichton published the book "Jurassic Park" about the resurrection of dinosaurs using the blood from the stomachs of insects which had been encased in tree sap, later turned into the mineral, amber. At one point in the book, Dr. Henry Wu is asked to explain some of DNA techniques used in reconstructing the extinct dinosaur genomes. Dr. Wu describes the use of restriction enzymes and how the fragmented pieces of dino DNA can be spliced together with these enzymes. He also alludes to the fact that they don't have the entire genome but that they "fill in the gaps" with modern day frog DNA. At one point during his discussion he points to a computer screen and remarks "Here you see the actual structure of a small fragment of dinosaur DNA."

>JurassicPark DinoDNA p 103
gcgttgctgg cgtttttcca taggctccgc ccccctgacg agcatcacaa aaatcgacgc
ggtggcgaaa cccgacagga ctataaagat accaggcgtt tccccctgga agctccctcg
tgttccgacc ctgccgctta ccggatacct gtccgccttt ctcccttcgg gaagcgtggc
tgctcacgct gtaggtatct cagttcggtg taggtcgttc gctccaagct gggctgtgtg
ccgttcagcc cgaccgctgc gccttatccg gtaactatcg tcttgagtcc aacccggtaa
agtaggacag gtgccggcag cgctctgggt cattttcggc gaggaccgct ttcgctggag
atcggcctgt cgcttgcggt attcggaatc ttgcacgccc tcgctcaagc cttcgtcact
ccaaacgttt cggcgagaag caggccatta tcgccggcat ggcggccgac gcgctgggct
ggcgttcgcg acgcgaggct ggatggcctt ccccattatg attcttctcg cttccggcgg
cccgcgttgc aggccatgct gtccaggcag gtagatgacg accatcaggg acagcttcaa
cggctcttac cagcctaact tcgatcactg gaccgctgat cgtcacggcg atttatgccg
caagtcagag gtggcgaaac ccgacaagga ctataaagat accaggcgtt tcccctggaa
gcgctctcct gttccgaccc tgccgcttac cggatacctg tccgcctttc tcccttcggg
ctttctcatt gctcacgctg taggtatctc agttcggtgt aggtcgttcg ctccaagctg
acgaaccccc cgttcagccc gaccgctgcg ccttatccgg taactatcgt cttgagtcca
acacgactta acgggttggc atggattgta ggcgccgccc tataccttgt ctgcctcccc
gcggtgcatg gagccgggcc acctcgacct gaatggaagc cggcggcacc tcgctaacgg
ccaagaattg gagccaatca attcttgcgg agaactgtga atgcgcaaac caacccttgg
ccatcgcgtc cgccatctcc agcagccgca cgcggcgcat ctcgggcagc gttgggtcct
gcgcatgatc gtgctagcct gtcgttgagg acccggctag gctggcgggg ttgccttact
atgaatcacc gatacgcgag cgaacgtgaa gcgactgctg ctgcaaaacg tctgcgacct
atgaatggtc ttcggtttcc gtgtttcgta aagtctggaa acgcggaagt cagcgccctg

In 1992 Dr. Mark Boguski at NCBI entered this sequence into a text editor and searched all of the known DNA sequences at the time. Mark wrote up his findings and submitted a manuscript to the journal BioTechniques, as a tongue-in-cheek joke. His manuscript was accepted and published. (Boguski, M.S. A Molecular Biologist Visits Jurassic Park. (1992) BioTechniques 12(5):668-669).  You will reproduce this experiment using BLAST.

EX 1: From the main BLAST page click on Standard Nucleotide-Nucleotide BLAST (blastn).  This brings up a web page where you can specify your query along with various parameters.  Cut and paste the above sequence into the window labeled “Search”, then click the BLAST! button to start the search.  Click the FORMAT! button on the web page that appears.  A new window will appear where your results will eventually be displayed, hopefully in minutes and not on the geological time scale.  If you don’t get results within a few minutes check out dino1.htm.  The file is missing the graphics but you can see most of the information.    Just what kind of DNA is this, anyway?  Read through the list of hits and … ask a biologist.

Mark Boguski’s published article was brought to Crichton's attention. In his second book, "The Lost World", Mr. Crichton used Mark as a consultant. Mark chose a DNA sequence from a living organism and mixed in some frog (Xenopus) DNA just as Dr. Wu described in the book.  Mark also embedded a message in the protein translation of the DNA sequence which he submitted for use in the book. Here is the sequence Mark gave Crichton for the book "The Lost World":

>LostWorld DinoDNA p 135

gaattccgga agcgagcaag agataagtcc tggcatcaga tacagttgga gataaggacg
gacgtgtggc agctcccgca gaggattcac tggaagtgca ttacctatcc catgggagcc
atggagttcg tggcgctggg ggggccggat gcgggctccc ccactccgtt ccctgatgaa
gccggagcct tcctggggct gggggggggc gagaggacgg aggcgggggg gctgctggcc
tcctaccccc cctcaggccg cgtgtccctg gtgccgtggg cagacacggg tactttgggg
accccccagt gggtgccgcc cgccacccaa atggagcccc cccactacct ggagctgctg
caaccccccc ggggcagccc cccccatccc tcctccgggc ccctactgcc actcagcagc
gggcccccac cctgcgaggc ccgtgagtgc gtcatggcca ggaagaactg cggagcgacg
gcaacgccgc tgtggcgccg ggacggcacc gggcattacc tgtgcaactg ggcctcagcc
tgcgggctct accaccgcct caacggccag aaccgcccgc tcatccgccc caaaaagcgc
ctgcgggtga gtaagcgcgc aggcacagtg tgcagccacg agcgtgaaaa ctgccagaca
tccaccacca ctctgtggcg tcgcagcccc atgggggacc ccgtctgcaa caacattcac
gcctgcggcc tctactacaa actgcaccaa gtgaaccgcc ccctcacgat gcgcaaagac
ggaatccaaa cccgaaaccg caaagtttcc tccaagggta aaaagcggcg ccccccgggg
gggggaaacc cctccgccac cgcgggaggg ggcgctccta tggggggagg gggggacccc
tctatgcccc ccccgccgcc ccccccggcc gccgcccccc ctcaaagcga cgctctgtac
gctctcggcc ccgtggtcct ttcgggccat tttctgccct ttggaaactc cggagggttt
tttggggggg gggcgggggg ttacacggcc cccccggggc tgagcccgca gatttaaata
ataactctga cgtgggcaag tgggccttgc tgagaagaca gtgtaacata ataatttgca
cctcggcaat tgcagagggt cgatctccac tttggacaca acagggctac tcggtaggac
cagataagca ctttgctccc tggactgaaa aagaaaggat ttatctgttt gcttcttgct
gacaaatccc tgtgaaaggt aaaagtcgga cacagcaatc gattatttct cgcctgtgtg
aaattactgt gaatattgta aatatatata tatatatata tatatctgta tagaacagcc
tcggaggcgg catggaccca gcgtagatca tgctggattt gtactgccgg aattc

EX 2: From the main BLAST page click on Standard Nucleotide-Nucleotide BLAST, as above, copy and paste this new “Lost World” sequence into the Search window and submit to BLAST.  If you can’t get results open dino2.htm.

Click the link to the GenBank entry for the highest-scoring match in the list of sequences.  The ist begins below the graphical sequence display and the link is the left-most part of the line beginning with “gi”.  Which organism is this DNA sequence from?  To see more about the organism click the ORGANISM link in the GenBank record.  Do the same thing for the second-highest-scoring match.  Are either of these organisms related to dinosaurs?

EX 3: From the main BLAST page click on Nucleotide query - Protein db (blastx) under Translated BLAST Searches.  Copy and paste this same “Lost World” sequence into the Search window and submit it to BLAST.  If you can’t get results open dino3.htm. 

On the results page look at the best alignment by clicking on the top score value in the right hand column (or scroll down past the hit list to the first alignment).  Mark’s message is contained in the query sequence where the subject sequence has gaps (represented by dashes ---).  What is his message?