This page was produced as an assignment for Genetics 677, an undergraduate course at UW-Madison.

Sequence Motifs

DNA motifs

None of the available sites for finding DNA motifs in a particular sequence, including MOTIF, the MEME Suite, Align ACE, and YMF could be used, due to the large size of the human GRM7 DNA sequence.

mRNA motifs

The human GRM7 isoform a mRNA sequence was analyzed using the MOTIF and MEME websites in an effort to find motifs occuring in the GRM7 gene (this nucleotide sequence was much easier to analyze than the genomic sequence of GRM7 due to its small size, at 4021 bp, relative to the size of the GRM7 genomic sequence, at 880,291 bp) (Bailey, et al., 1994; Heinemeyer, et al., 1999).

The MOTIF database (at a cut-off value of 97) returned eleven motifs
(Heinemeyer, et al., 1999). The three most significant were a Drosophila heat shock factor motif, a yeast heat shock factor motif, and an alcohol dehydrogenase gene regulator motif (Heinemeyer, et al., 1999). Due to the lack of availability of a detailed description for these and the other motifs on the MOTIF website, it is difficult to assess the relationship of the motifs found for the GRM7 mRNA to the function of the GRM7 protein.

The analysis conducted using MEME returned several motifs, ranging in size, from 8 bp to 41 bp in length, and in number of repeats, from 2 to 5
(Bailey, et al., 1994). Due to the comparison method MEME uses to find motifs, there is no information on what function the motifs identified serve (Bailey, et al., 1994). It is interesting that MEME did not identify any of the motifs found by MOTIF (Bailey, et al., 1994; Heinemeyer, et al., 1999). This may be due to the differing width specifications of the two programs, as well as to the differing methodologies used by the two websites to find motifs (Bailey, et al., 1994; Heinemeyer, et al., 1999).


MOTIF:
The MOTIF website allows users to search a query sequence against the TRANSFAC motif library using a specific cut-off value, gap penalty, and classification of the motifs against which the query sequence will be searched (including all, vertebrates, insects, plants, nematodes, etc.) (Heinemeyer, et al., 1999). The human GRM7 isoform a mRNA was searched again the TRANSFAC motif library using no gap penalty, the "all" classification, and four different cut-off values: 85 (the default), 90, 95, and 97. Table 1 (below) shows the number of motifs identified at each of the these cut-off values (Heinemeyer, et al., 1999). Table 2 (below) gives the name, consensus sequence, and classification of each of the 11 motifs found at a cut-off value of 97, as well as the number of times that particular motif appears in the GRM7 mRNA sequence (Heinemeyer, et al., 1999).

Table 1

Table 1. Number of motifs found at differing cut-off values. The human GRM7 isoform a mRNA was searched against the TRANSFAC motif library on the MOTIF website using the “all” classification and no gap penalty (Heinemeyer, et al., 1999). The cut-off values used for this search and the number of different motifs identified in the mRNA sequence at each particular cut-off value are given in the table (Heinemeyer, et al., 1999).

 

Cutoff Value

Number of Motifs

85

136

90

72

95

24

97

11

 

Table 2. Motifs found at a cut-off value of 97. 11 different motifs were found when the human GRM7 isoform a mRNA was searched against the TRANSFAC motif library on the MOTIF website using the “all” classification, no gap penalty, and a cut-off value of 97 (Heinemeyer, et al., 1999). The name, consensus sequence, and classification of each of the different motifs, as well as the number of times that particular motif appears in the human GRM7 isoform a mRNA, are given in the table (Heinemeyer, et al., 1999).

 

Motif

Consensus

Number

Classification

HSF

AGAAN

12

heat shock factor (Drosophila)

HSF

AGAAN

5

heat shock factor (yeast)

ADR1

NGGRGK

7

alcohol dehydrogenase gene regulator 1

NIT2

TATCTM

2

activator of nitrogen-regulated genes

SRY

AAACWAM

2

sex-determining region Y gene product

Nkx-2.5

TYAAGTG

2

homeo domain factor Nkx-2.5/Csx, tinman homolog

AML-1a

TGTGGT

2

runt-factor AML-1

CdxA

WWTWMTR

2

CdxA

CREB

NNGNTGACGYNN

1

cAMP-responsive element binding protein

AP-4

CWCAGCTGGN

1

activator protein 4

GATA-X

NGATAAGNMNN

1

GATA binding site

 

MEME:
The MEME: Multiple Em for Motif Elicitation website allows users to find motifs shared by one or more related query sequences (Bailey, et al., 1994). The user specifies the optimum minimum and maximum motif widths, the maximum number of motifs to find, and what the distribution of the motifs across the sequences is believed (by the user) to be
(Bailey, et al., 1994). The human GRM7 isoform a mRNAs was queried in this database, using a minimum motif width of 6, a maximum motif width of 50, a maximum number of 10 motifs to be found, and an unspecified distribution of the number of repetitions across the sequences (Bailey, et al., 1994). Figures 1, 2, and 3 (below) show the three motifs with the lowest Expectation-values, and therefore, the highest significance, of the ten motifs identified using MEME (Bailey, et al., 1994).










Figure 1. Motif 1 found by
MEME. The human GRM7 isoform a mRNA was input into the MEME: Multiple Em for Motif Elicitation website, using a minimum motif width of 6, a maximum motif width of 50, a maximum number of 10 motifs to be found, and a unspecified distribution of the number of repetitions across the sequences (Bailey, et al., 1994). The 38 bp sequence logo shown above represents the motif with the second lowest Expection-value (1.6 X 10^3) of all the motifs returned (Bailey, et al., 1994).
In the sequence logo, the total height of each stack of letters represents the "information content" in bits for that postion in the motif, while the height of each of the individual letters within each stack represents the probability of the letter occuring at that position multiplied by the total information content of the stack (Bailey, et al., 1994). This motif is found four times in the GRM7 isoform a mRNA (Bailey, et al., 1994).











Figure 2. Motif 2 found by
MEME. The human GRM7 isoform a mRNA was input into the MEME: Multiple Em for Motif Elicitation website, using a minimum motif width of 6, a maximum motif width of 50, a maximum number of 10 motifs to be found, and a unspecified distribution of the number of repetitions across the sequences (Bailey, et al., 1994). The 23 bp sequence logo shown above represents the motif with the lowest Expection-value (2.5 X 10^2) of all the motifs returned (Bailey, et al., 1994).
In the sequence logo, the total height of each stack of letters represents the "information content" in bits for that postion in the motif, while the height of each of the individual letters within each stack represents the probability of the letter occuring at that position multiplied by the total information content of the stack (Bailey, et al., 1994). This motif is found five times in the GRM7 isoform a mRNA (Bailey, et al., 1994).











Figure 3. Motif 3 found by MEME. The human GRM7 isoform a mRNA was input into the MEME: Multiple Em for Motif Elicitation website, using a minimum motif width of 6, a maximum motif width of 50, a maximum number of 10 motifs to be found, and a unspecified distribution of the number of repetitions across the sequences (Bailey, et al., 1994). The 29 bp sequence logo shown above represents the motif with the third lowest Expection-value (1.3 X 10^5) of all the motifs returned (Bailey, et al., 1994). In the sequence logo, the total height of each stack of letters represents the "information content" in bits for that postion in the motif, while the height of each of the individual letters within each stack represents the probability of the letter occuring at that position multiplied by the total information content of the stack (Bailey, et al., 1994). This motif is found four times in the GRM7 isoform a mRNA (Bailey, et al., 1994).


References

References

Bailey, T. L., and Elkan, C. (1994). Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology. 28-36.

Heinemeyer, T., Chen, X., Karas, H., Kel, A. E., Kel, O. V., Liebich, I., Meinhardt, T., Reuter, I., Schacherer, F., Wingender, E. (1999). Expanding the TRANSFAC database towards an expert system of regulatory molecular mechanisms. Nucleic Acids Res. 27:318-322.

Jennifer Wagner
wagner4@wisc.edu
Updated February 28, 2009
http://www.gen677.weebly.com