GenPept - Format Enhancement
With the next full release of GenPept (141) to coincide with the next full release of GenBank(141) on ~April 15, 2004, a number of new record types will be added to enhance the data content of GenPept.New Types:
Version A compound identifier consisting of the GenPept Locus and a numeric version number associated with the current version of the sequence data in the record. This is followed by an integer key (a "GI") assigned to the peptide sequence. Mandatory keyword/exactly one record.Keywords Short phrases describing gene products and other information, taken directly from the corresponding GenBank entry. Mandatory keyword in all annotated entries/one or more records.
Source Common name of the organism or the name most frequently used in the literature. Mandatory keyword in all annotated entries/one or more records/includes one subkeyword.
PI Isoelectric point. Mandatory keyword/exactly one record.
Comment/NucGI GI of corresponding nucleotide entry
The LOCUS line will contain new additional information: Number of amino acids, GB division, date.
Detailed format for the LOCUS line:
Positions Contents
--------- --------
01-05 'LOCUS'
06-12 spaces
13-25 GenPept Locus name
26-26 space
27-35 GenBank Locus name
36-40 Length of peptide sequence
41-41 space
42-43 'aa'
44-47 spaces
48-50 'PEP'
51-55 spaces
56-61 'linear'
62-64 spaces
65-67 GenBank division code
68-68 space
69-79 Date, in format dd-mmm-yyyy
Below is an example of the old format followed by the new format of the
reference section of an entry:
OLD:
1-------10--------20--------30--------40--------50--------60--------70------78
LOCUS X76706_1 [A15H9FIB]
DEFINITION Adenovirus type 15H9 (Morrison) fibre gene, nonenveloped DNA.
DATE 29-JAN-1996
ACCESSION X76706
ORGANISM Human adenovirus type 15
Viruses; dsDNA viruses, no RNA stage; Adenoviridae; Mastadenovirus.
COMMENT CDS 50..1138
/gene="fiber gene"
/product="fiber protein"
/protein_id="CAA54127.1"
/db_xref="GI:436055"
/db_xref="GOA:P36846"
/db_xref="Swiss-Prot:P36846"
WEIGHT 39420
LENGTH 362
ORIGIN Translated using phase 1
1-------10--------20--------30--------40--------50--------60--------70------78
NEW:
1-------10--------20--------30--------40--------50--------60--------70------78
LOCUS X76706_1 A15H9FIB 362 aa PEP linear VRL 29-JAN-1996
DEFINITION Adenovirus type 15H9 (Morrison) fibre gene, nonenveloped DNA.
DATE 29-JAN-1996
ACCESSION X76706
VERSION X76706_1.1 GI:436055
KEYWORDS fiber gene; fiber protein.
SOURCE Human adenovirus type 15
ORGANISM Human adenovirus type 15
Viruses; dsDNA viruses, no RNA stage; Adenoviridae; Mastadenovirus.
COMMENT CDS 50..1138
/gene="fiber gene"
/product="fiber protein"
/protein_id="CAA54127.1"
/db_xref="GI:436055"
/db_xref="GOA:P36846"
/db_xref="Swiss-Prot:P36846"
/NucGI="436054"
WEIGHT 39419.48
PI 6.03
LENGTH 362
ORIGIN Translated using phase 1
1-------10--------20--------30--------40--------50--------60--------70------78
ABCC GenPept is available from ftp://ftp.ncifcrf.gov/pub/genpept.
If you have questions or comments please contact: Gary Smythers.
GenPept(R) and GenBank(R) are registered trademarks of the U.S. Department of Health and Human Services for the GenBank Gene Products and the GenBank GeneticSequence Data Banks.




