A new way of decoding proteins, the machinery of the cell
The following is based on the article 'In vivo mRNA' by P. Oikonomou, R. Salatino and E. E. Eichler, first published in the Proceedings of the National Academy of Sciences of the United States of America on October 9 2020. Links to that article and other relevant reads can be found in the 'Read more' section below this post.
Cells, the tiny little compartments that make you, trees, fungi, and even presidential candidates, rely on even tinier pieces of cellular machinery called proteins. Understanding proteins in cells, be they our cells, the cells of wheat crops, or the cells of Joe Biden, is thus extremely important to understand the functioning of a cell. Identifying which cells express which proteins (which varies not only between species, but between tissues within a species) is an ongoing area of research in genetic science. The large-scale study of proteins and their interactions, sometimes called proteomics (the collection of all of the proteins expressed in a cell at a certain time is called the proteome), relies on understanding these proteins and their makeups.
So where’s the problem? We can sequence DNA faster, and more accurately than ever, why can’t we do the same with proteins? This is perhaps best understood by thinking about shapes. DNA molecules are long double helices we can break into smaller double helices which we can unravel into straight lines, which are then ‘read’. This is, biochemistry aside, how we sequence DNA. Proteins are…a mess, comparatively. They sprawl, they bunch up, they’re oftentimes huge and unwieldy (comparatively). There’s also the matter of ingredients. A letter in the code of DNA can be one of four different bases. We have tests based on this and, all things considered, the four letter code is pretty simple.* Unfortunately, proteins are not made of a sequence of nucleic acid bases, they are made of a sequence of amino acids, and there are twenty-one amino acids. All of which have shapes and chemical characteristics much more varied than the four bases of DNA. Tricky. This is not all to say that decoding proteins cannot be done, it is simply often very expensive compared to DNA sequencing methods, and there isn’t a general standard protocol that can be successfully used to decode all proteins. At least not until, perhaps, now.
mRNA display
The new technique is referred to in a study by Oikonomou, Salatino and Tavazoie as ‘In vivo mRNA display’, but it’s worthwhile breaking down those terms a bit. First, in vivo simply means in a living organism (in a living cell). This is because performing this technique outside of a cell, in a lab setting (in vitro) was already established. This is generally much easier to do, but also much less useful. You get to understand the constituent parts of the protein but you don’t get to understand where the protein fits in the cell, or what its function is. It’s like an alien landing on earth and finding an ironing board on the side of the road. If they tinker around with it they know all its parts and how it’s made, but it’s less obvious how it’s used inside the home. The harder question is, what is mRNA display?
mRNA display is a technique based on our understanding of the central dogma of molecular biology. This dogma states that DNA transcribes RNA, and RNA gets translated into protein. Not all RNA gets translated into protein, but all proteins come from translated RNA. The type of RNA that gets translated into protein is called mRNA (short for messenger RNA). Importantly for us, RNA isn’t transformed into protein, it’s simply analysed by a cellular machine called a ribosome (made of protein and another type of RNA), which then recruits amino acids from around the cell, and combines them into a protein. What mRNA display does is attach the mRNA to the translated protein, and then purifies the protein, so we are left with a lot of our target protein and its mRNA template. Using a technique called reverse transcriptase (which I’ll go over in a separate article), we are able to convert that mRNA into DNA, and then sequence that DNA, from which we can infer the sequence of the protein using the genetic code.
Until recently, doing this in vitro meant scientists making artificial bonds between proteins and mRNA in test tubes. So how was this done in the cell?
A perfect match: The MCP protein
The answer lay, as it so often does, in stealing work from others. Specifically, stealing work from bacteria, who don’t have any intellectual property and so can be stolen from indefinitely. Bacterial proteins are the basis of almost all our great triumphs in gene technology of the last 50 years: CRISPR, gene cloning, the list goes on. The protein we are appropriating this time is a protein called MS2 bacteriophage coat protein (MCP). This protein has a unique property that has been very helpful in recent research: it forms a close bond with a particular section of its encoding mRNA (the stem loop).
In vivo mRNA display relies on this bonding to work its magic. The following work was done in budding yeast, a single celled organism. First, the MCP’s mRNA stem loop was artificially inserted into the genome of the yeast, just after the gene encoding for the target protein. Then, an MCP protein was attached to the end of the target protein. This led to the mRNA, carrying with it the MCP stem loop, fusing to the MCP attachment to the target protein. The protein was then isolated, purified, and the mRNA was analysed. The protein sequence was then read from the DNA sequence (the gene).
Why?
The previously most reliable solution to get an outcome of this kind, a technique called mass spectrometry, is estimated to cost ten times as much as in vivo mRNA display. This is a powerful and cost effective tool to sequence proteins by first sequencing their corresponding genes. Powerful, but not perfect. The experiment showed that MCP did not readily bind to the N terminus of all proteins targeted in the study, so further work is needed to find a way of making this work for all (or at least, more) proteins. Even so, it’s an effective technique and may prove to be invaluable in the future of proteomics.
I hope you learnt something new!
Thanks for reading,
Jack
Read more:
https://www.pnas.org/content/early/2020/10/08/2002650117 - original article
Article cover image sourced from protein database at https://www.rcsb.org/structure/2MS2, authors K Valegard, L Liljas
*So simple, in fact, that it took forever to convince anybody it could hold enough information to encode for living organisms
Comments