Darwin Help

Back to Index

GenomeSummary

Class GenomeSummary - summary information of a database file

Calling Sequence  GenomeSummary(DB)
Parameters
NameTypeDescription

DB databasedatabase structure to create a summary
Return Type  GenomeSummary
Selectors
NameTypeDescription

FileName stringname of external file containing the database
string stringthe entire header of the database as a string
TotAA posintnumber of amino acids or bases in the database
TotChars posintnumber of characters in the database
TotEntries posintnumber of entries in the database
type stringdna, rna, mixed or peptide
EntryLengths list(posint)length of each entry
Id string5-letter code (SwissProt) for species/genome
Kingdom stringeither Bacteria, Archaea or Eukaryota
Lineage list(string)Lineage as a list (from OS tags)
Genus stringFirst part of the scientific name
Epithet stringSecond part of the scientific name
sgml_tag stringThe contents of the tag in the database header
Methods print, Rand, select, string, type
Synopsis GenomeSummary provides an alternative to loading a database when the sequences themselves are not needed. Typically, the database is loaded, then GenomeSummary is run and its results are stored in a file for later reading. In this way, all of the data except for the sequences themselves, is available and many genomes can be loaded into a darwin session.

GenomeSummary has all the selectors which are available for a database (except for Entry and Pat which are can only be used if the sequences are present). Additionally it provides a few additional selectors. The EntryLengths contains the length of the sequence of each entry. The string selector, does not select the entire text of the database, just the text that is before the first entry. This is normally called the header of the database. In the header there are several useful tags which describe the entire database, for example, 5-letter code, kingdom, lineage, etc. This information is available directly through selectors. Any other tagged information in the header can be selected with the name of the tag as a selector.

Examples
> ReadDb('/home/darwin/DB/genomes/ECOLI/ECOLI.db'):
> gs := GenomeSummary(DB):
> gs[TotAA];
1358990
> gs[Lineage];
[Bacteria, Proteobacteria, Gammaproteobacteria, Enterobacteriales, 
Enterobacteriaceae, Escherichia, Escherichia coli]
> print(gs);
  FileName: /home/darwin/DB/genomes/ECOLI/ECOLI.db
    string: <DBNAME>Escherichia coli K-12 MG1655 complete genome.</DBNAME><D...
     TotAA: 1358990
  TotChars: 6806443
TotEntries: 4289
      type: Peptide
        Id: ECOLI
   Kingdom: Bacteria
   Lineage: [Bacteria, Proteobacteria, Gammaproteobacteria, Enterobacteriales, E
nterobacteriaceae, Escherichia, Escherichia coli]
See also ConsistentGenome,   database,   DB,   Entry,   ReadDb,   Sequence