chemREPEAT by Blazon Publishing and Media Ltd

Probing the structure of proteins with low sequence complexity The same amino acids may be repeated multiple times within a certain region of a protein, and it’s difficult to characterise these regions with traditional structural biology techniques. We spoke to Dr Pau Bernadó about his work in developing new strategies which will help researchers analyse the structure and dynamics of low complexity regions in proteins. A protein is commonly thought of as a vibrantly coloured, rigid, 3-dimensional structure, yet not all proteins share these characteristics, and some in fact are not structured. A second common preconception, namely that the 20 amino acids that constitute proteins are evenly distributed amongst sequences, is also incorrect, as Dr Pau Bernadó explains. “There are sequences in which 1, 2, 3 or 4 amino acids are repeated within the same sequence, it’s what we call a low complexity sequence,” he outlines. As a chemist and structural biologist, Dr Bernadó has spent a lot of time analysing disordered proteins, those which don’t have a 3-dimensional structure. “This means that you cannot apply the classical, traditional methods to investigate these systems,” he continues. “You have to adapt your methods to reflect the fact that the protein is not rigid and that the amino acid sequence is highly repetitive.” 16

Low-complexity proteins This is a topic central to Dr Bernadó’s work as the Principal Investigator of the chemREPEAT project, an initiative based at France’s National Institute of Health and Medical Research (INSERM) in which researchers are investigating the structure and dynamics of low complexity regions (LCRs) in proteins. These LCRs are normally parts of proteins that have no 3-dimensional structure, with an amino acid composition that differs from a globular, rigid protein. “If you look at the distribution of amino acids in a globular protein, you will see that it’s more or less standard, it reflects the proteome in general,” says Dr Bernadó. A LCR by contrast has a very different amino acid composition. “You may have a glutamine or a glycine which is repeated 20 times consecutively,” explains Dr Bernadó. “This is, essentially, the lowest level of complexity that we can find.” These proteins lacking a rigid structure are very difficult to characterise, as many conformations

co-exist at the same time, so researchers typically measure average properties of all those conformations. Methods have been developed to characterise disordered proteins, which are not LCRs. “If you have a disordered protein that has a normal (unbiased) sequence, it can be disordered and not be low-complexity,” says Dr Bernadó. A technique called Nuclear Magnetic Resonance (NMR) spectroscopy can then be applied here, together with other methods and tools, to investigate the shape and peculiarities of specific proteins. “For example, glycine-67 and glycine-85 have different isolated peaks in the NMR spectrum,” explains Dr Bernadó. “The fact that you can identify these peaks as glycine-67 and glycine-85 is because their chemical environments are different.” This could mean for instance that there is an alanine before the glycine-67, while there might be a proline before the glycine-85. The presence of these different amino acids induces changes that enable researchers to distinguish between glycine-67 and glycine-85. “They are

EU Research

different because their chemical environment is different, because the sequence is different,” says Dr Bernadó. A lot of Dr Bernadó’s attention is devoted to analysing a specific type of LCR called homo-repeats, in particular in Huntingtin (Htt), a protein which is associated with Huntington’s disease. “There are homo-repeats of all kinds of amino acids. One of the most common homo-repeats in eukaryotes is polyglutamine,” he outlines. “In cases where you have 30 consecutive glutamines, it’s extremely difficult to distinguish between glutamine-15 and glutamine-25 for example.” The project aims to help overcome these limitations by essentially incorporating labelled amino acids within these homorepeats, from which more can then be learnt about their structure and dynamics. Normally, when a protein is produced for NMR analysis, isotopically labelled nitrogen-15 (15N) and carbon-13 (13C) are given to E. coli bacteria. “The bacteria eats this carbon and nitrogen and produces proteins that are fully labelled in 15N and 13C, which are NMR sensitive,” explains Dr Bernadó. This approach is not effective with Htt however, as all the glutamines would be labelled in the same way and so difficult to distinguish, so Dr Bernadó is developing a method called site-specific isotopic labelling (SSIL). “With SSIL, we essentially trick the system. We want to put the carbon and nitrogen isotopes in the places that we want,” he says.

tRNA suppression A technique called tRNA suppression plays an important role in this respect. Three consecutive bases of mRNA, referred to as a codon, code for an amino acid. The ribosome, while reading the mRNA, appends these coded amino acids to synthesize the protein. “There are specific codons for each amino acid, so the ribosome knows that if it encounters CAG in the mRNA, then it corresponds to a glutamine for instance. A different combination of three bases corresponds to a serine,” outlines Dr Bernadó. “The link between the mRNA and the amino acids is made by a small molecule, called tRNA. The tRNA comes with an amino acid attached on one side, and a sequence that can recognise the codon on the other. In a way, the ribosome simultaneously binds the mRNA and the appropriate tRNA to keep building the protein with the right sequence. Certain sequences of three bases do not code for any amino acid however; these sequences are called stop codons. When the ribosome reaches and recognises a stop codon, it simply stops building and the protein is delivered.”

www.euresearcher.com

NMR investigation of Htt with 16 consecutive glutamines. (top) overlay of the 13C and 15N-NMR spectra of all SSIL samples produced. Different colors indicate samples with individual labeled glutamines. This strategy enables the unambiguous assignment of the spectra. (middle and bottom panels) The analysis of the peak positions demonstrates the presence of several helical conformations of different length co-existing in solution.

The trick here involves externally synthesising, in vitro, a tRNA that can recognise these stop codons and with an amino acid attached. “We make use of a stop codon to synthesise the protein. It’s a tool to introduce whatever we want, wherever we want,” explains Dr Bernadó. This tRNA suppression method effectively represents a means of hacking the system, says Dr Bernadó. “This methodology allows

us to bring new chemistry into proteins. Some people introduce amino acids that are not natural, which are called non-canonical amino acids. The novel aspect of our research is that we introduce natural amino acids that are isotopically labelled, which doesn’t count as a chemical modification,” he explains. This approach allows researchers to highlight specific locations within a

Amino acid context of polyQ regions. a) Leucine and b) proline abundance around human polyQ regions. Dashed red lines refer to the background composition of the amino acid. These results suggest that the features found in Htt are very common in human glutamine-rich proteins.