Over an organism’s lifetime, its genome changes very little. What does change, constantly, are which proteins the cell produces in response to damage, changes in the environment, or stages in the reproductive cycle. The protein production is regulated by DNA-binding proteins that have evolved the ability to turn different genes on or off. Because the environment can change quickly, rapid adaptation is key. The DNA-binding proteins must find the correct DNA code among millions of base pairs, and do so fast. When DNA-binding proteins search the genetic code for their target sequence, they slide along the DNA helix to speed up the process.When they finally find the right spot, they stay there; the interaction with the “correct” sequence prevents them from sliding along. This mechanism has been widely accepted to describe the search process.
It is an appealing hypothesis, yes, but it presents an annoying problem: the DNA code is full of “almost correct” sequences. If the time a protein resides on a particular DNA motif was determined by the sequence, the searching proteins would constantly linger on sequences that resembled their target. If the textbook explanation was correct, the DNA-binding proteins would get stuck all the time off target. Gene regulation would be very ineffective, but we know from previous studies that this is not the case. A bacterial protein called LacI, that controls enzymes of lactose metabolism, finds its target sequence among 4.6 million base pairs in a matter of minutes. In an attempt to resolve this paradox, the researchers allowed the DNA-binding protein LacI to slide back and forth on thousands of different DNA sequences mounted on a microchip.
A fluorescent molecule was attached to the LacI protein and made it possible to measure how fast LacI adhered to the different sequences and how quickly it was released. The result was striking contradicting previous assumptions, the DNA sequence had little effect on how long LacI remained bound to the DNA. However, it was much more likely that the sliding LacI was held up briefly when the sequence was similar to the target sequence. In other words, DNA-binding proteins often leave also the sequence they are intended to regulate, but at the target site, they all but always make a very short journey before finding their way back again. But it is not the only wonder about the magical antenna called DNA. A newly discovered code within DNA (coined “spatial grammar”) holds a key to understanding how gene activity is encoded in the human genome.
This breakthrough finding, identified by researchers at Washington State University and the University of California in San Diego, revealed a long-postulated hidden spatial grammar embedded in DNA. The research could reshape scientists’ understanding of gene regulation and how genetic variations may influence gene expression in development or disease. Transcription factors, the proteins that control how genes in the genome are turned on or off, play a crucial role in this code. Each cell of an organism interprets the same genome in a unique way. At the heart of this process are sequence-specific transcription factors. How these regulatory programs are encoded is still largely enigmatic. Many regulatory elements contain sequence motifs for similar sets of TFs and most TFs display widespread binding to regulatory sequences,
Of course, adding variable and sometimes minimal consequences for gene regulation. Consequently, scientists are largely unable to predict gene expression patterns from DNA sequence alone and it is unclear how the transcription of most human genes is regulated. Previous studies have shown that TF-binding-site spacing, orientation and copy number, and affinity of TF-binding sites can influence transcriptional output. However, few generalizable rules exist for how transcription factors binding sites construct gene regulatory programs, restricting the ability to rationally interpret our genome or understand how mutations in regulatory sequences impact gene regulation or manifest in disease. Long thought of as either activators or repressors of gene activity, this research shows the function of transcription factors is far more complex.
Contrary to what one may find written in biochemistry textbooks, transcription factors that act as true activators or repressors are surprisingly rare. Rather, scientists found that most activators can also function as repressors. If one remove an activator, your hypothesis is you lose activation, but that was true in only 50% to 60% of the cases, so it was obvious that something was off. Looking closer, researchers found the function of many transcription factors was highly position-dependent. They discovered that the spacing between transcription factors and their position relative to where a gene’s transcription began determined the level of gene activity. For example, transcription factors might activate gene expression when positioned upstream or ahead of where a gene’s transcription begins but inhibit its activity when located downstream, or after a gene’s transcription start site.
Long atory short, it is the spacing, or ‘ambience,’ that determines if a given transcription factor acts as an activator or repressor. It just goes to show that similar to learning a new language, to learn how gene expression patterns are encoded in our genome, we need to understand both its words and the grammar.
- Edited by Dr. Gianfrancesco Cormaci, PhD, specialist in Clinical Biochemistry.
Scientific references
Duttke SH et al. Nature 2024; 631:891-898.
Marklund E et al. Science 2022; 375(6579):442.
Zeitlinger J. Curr Opin Syst Biol. 2020; 2:22–31.
De Boer CG et al. Nat Biotech. 2020; 38(1):56.