We studied the frequency distribution of oligonucleotides 10 bp long in a sample of 1.6 Mb of mammalian genes, containing 579 sequences from GenBank(R) 55.0, with the aim of detecting transcription control signals. 2216 decamers had a frequency higher than 10 times the mean and were subjected to further statistical analysis. For each of the 2216 decamers (parents), we counted the individual frequencies of the 30 decamers differing from the parent by one base mutation (progeny) and then calculated two variance/mean chi squares for the progeny, with and without the parent. We then studied the distribution of the ratio between the two chi squares. Out of 2216 decamers, 346 had a chi square ratio of 1.9 or larger. In this final set, which corresponds to less than 0.033 per cent of all possible decamers, 18 were found to contain 23 eukaryotic transcription control elements 5-10 bp of length, such as Sp1 and others. Furthermore, when compared to 210 random sets containing 346 decamers, this set contains a highly significant excess of the longer signals.
ENRICHMENT OF OLIGONUCLEOTIDE SETS WITH TRANSCRIPTION CONTROL SIGNALS .2. MAMMALIAN DNA
VOLINIA, Stefano;SCAPOLI, Chiara;GAMBARI, Roberto;BARRAI, Italo Enrico
1992
Abstract
We studied the frequency distribution of oligonucleotides 10 bp long in a sample of 1.6 Mb of mammalian genes, containing 579 sequences from GenBank(R) 55.0, with the aim of detecting transcription control signals. 2216 decamers had a frequency higher than 10 times the mean and were subjected to further statistical analysis. For each of the 2216 decamers (parents), we counted the individual frequencies of the 30 decamers differing from the parent by one base mutation (progeny) and then calculated two variance/mean chi squares for the progeny, with and without the parent. We then studied the distribution of the ratio between the two chi squares. Out of 2216 decamers, 346 had a chi square ratio of 1.9 or larger. In this final set, which corresponds to less than 0.033 per cent of all possible decamers, 18 were found to contain 23 eukaryotic transcription control elements 5-10 bp of length, such as Sp1 and others. Furthermore, when compared to 210 random sets containing 346 decamers, this set contains a highly significant excess of the longer signals.I documenti in SFERA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.