7 minute read
Forensics: Identifying Bacteria and Yeast Using Ribosomal DNA Fingerprints
3.2 Chang, Hung, Lee, Li, Lucena, Mardjoko, Tsarnakova, Wang, Wang, Zheng. Microbial Forensics: Identifying Bacteria and Yeast Using Ribosomal DNA Fingerprints 17 3.2 Microbial Forensics: Identifying Bacteria and Yeast Using Ribosomal DNA Fingerprints
By Hannah Chang ’22, along with Cassandra K. Hung, Hyunkyung K. Lee, Katherine W. Li, Alejandro J. Lucena, C. Zora Mardjoko, Rositsa Tsarnakova, Jason Wang, Lisa Wang, Claire Zheng
Advertisement
Note from the Editors
This Paper was granted permission to reside in the Sigma Journal courtesy of PA Governor’s School for the Sciences 2021 Journal.
Abstract
Microorganism identification, specifically by using the 16S rRNA genomic region shared by bacteria, is applicable in various areas of our lives. Current methods that exist for identification are subject to various limitations. Our project focused on developing a computational tool that could aid in the identification of bacteria by analyzing their 16S rRNA gene sequence via PCR and restriction enzyme digests. Biopython was used to load data from the Ribosomal Database Project (RDP) for use in our comparison program. A possibility reducer algorithm identified matching bacteria sequences between the dataset and the test set based on the unique ribosomal DNA restriction fragment lengths. Our match results indicated functionality of the program, in addition to obtaining bacteria that had no fragment matches. The output data was manually validated to find false positive and false negative fragments that helped explain the results.
Introduction Background
Microorganisms can be found ubiquitously in a variety of settings, and their identification enables the analysis of their function, structure, and uses in a comprehensible manner. This plays a crucial part in a variety of fields, including food contamination, food production, environmental cleanup, ecosystem biodiversity, and disease control and prevention, to name a few.
In recent years, Salmonella has been found as a contaminant of a number of supermarket foods, including salad, ground turkey, and other frozen poultry [1]. Being able to quickly and accurately identify disease-causing agents like Salmonella is imperative to treating and monitoring the spread of such pathogens. Conversely, some microbes are critical to maintaining life. They modulate energy flow within ecosystems by acting as decomposers and are responsible for almost half the photosynthesis that occurs on Earth [6]. Microbes are also important in the production of food products such as cheeses, yogurts, and breads; for instance, modern-day yogurt production involves culturing the milk with live bacteria, including Streptococcus thermophilus and Lactobacillus bulgaricus, which produce lactic acid to thicken the yogurt [7]. However, understanding these phenomena would not have been possible without first knowing the identity of the microorganism being worked with. Whether it is studying infectious diseases, producing foods, or engineering sustainable technology, being able to identify unknown microorganisms is important for a number of scientific applications.
Current methods for identification include denaturing gradient gel electrophoresis (DGGE), fluorescence in situ hybridization (FISH), clonal libraries, full genome sequencing, amplified ribosomal DNA restriction analysis (ARDRA), and terminal restriction fragment length polymorphism (T-RFLP). These techniques employ a variety of procedures and often have different efficacies depending on the microorganism that is being analyzed. For example, denaturing gradient gel electrophoresis (DGGE) relies on melting points of DNA fragments to track separation while clone libraries require additional phylogenetic comparison between the foreign organism and available data. Full genome sequencing is another alternative to identification, but it can take a while to produce. One of the more widely available procedures, amplified ribosomal DNA restriction analysis (ARDRA), allows for the identification of organisms through creating a DNA fingerprint of their rRNA [5]. Although ARDRA has the benefits of being relatively quick, inexpensive, simple to use, and available in most labs, there are drawbacks in that it is time-intensive since it relies on the manual matching of DNA fingerprints to a database. Additionally, organism identification is often a tedious task that may not always require the simplest of techniques; therefore, it is crucial that steps be taken in order to introduce an unchallenging and straightforward approach to discerning unknown microorganisms.
16S rRNA Gene
A 70S prokaryotic ribosome is composed of a large 50S subunit and a small 30S subunit, and the 30S subunit can be further divided into 21 proteins and a 16S rRNA molecule [7]. The associated 16S rRNA gene is of particular interest in studying bacteria taxonomy because it is highly conserved both structurally and functionally across prokaryotic organisms; however, there is still variation within 16S rRNA gene sequences that can be used as markers to differentiate between species [3]. Furthermore, the 16S rRNA gene is approximately 1,500 base pairs, which is sufficiently large enough to be used for research in computational informatics [2]. The entire prokaryotic
18 Chapter 3. Life Science
ribosome in addition to a diagram of the 16S rRNA gene illustrating conserved and variable regions is shown in Figure 1.
Figure 1: Prokaryotic Ribosome and the 16S rRNA Gene
Analysis of 16S Gene DNA using Restriction fragment length polymorphism (RFLP)
Restriction fragment length polymorphism (RFLP) is a technique that allows for genetic fingerprinting. In RFLP analysis, a DNA sequence is first digested into fragments using one or more restriction enzymes. Next, these fragments are separated through agarose gel electrophoresis, which separates mixtures of DNA fragments by length. However, due to the substantial time it takes to complete an analysis (up to one month) as well as the large amount of DNA needed in the original sample, RFLP is less widely used now [18]. However, PCR has been used in conjunction with RFLP in cleaved amplified polymorphic sequence (CAPS) assays. This technique is more efficient than traditional RFLP since PCR can amplify a small amount of DNA to levels sufficient for RFLP in two to three hours, meaning that samples can be analyzed in less time. Regardless, RFLP analysis acted as a stepping stone for the development of current techniques like ARDRA and T-RFLP.
Current Methods of Bacteria Identification Using Ribosomal DNA: Terminal Restriction Fragment Length Polymorphism (T-RFLP)
Terminal restriction fragment length polymorphism utilizes both PCR and RFLP analysis. The primers used in T-RFLP are labeled with fluorescent molecules and occasionally fluorescent dyes are needed for tagging, the most commonly used one being 6-carboxyfluorescein (6-FAM). After the PCR amplification, the amplicons are then cut by restriction enzymes. A capillary electrophoresis machine is then needed to separate the resulting fragments and their sizes are determined by a fluorescence detector. The main advantage of T-RFLP is that the first fragment in a sequence is able to be identified due to the primers being bound to the 5’ end of the fluorescent molecule. T-RFLP provides more potential for statistical analysis as compared to other techniques due to the grouping of fragments into operational taxonomic units (OTUs). Although T-RFLP is relatively fast, the capillary electrophoresis machine required for the process is expensive, making it a costly option when compared to other methods. On a broader scale, ARDRA and T-RFLP are essentially the same process; however, ARDRA will not use labeled primers while T-RFLP will implement such primers into the process. T-RFLP also uses a fluorescence detector to detect the labeled primers, contributing to the increased efficiency and expense.
Purpose
Current methods for identifying bacteria using ribosomal DNA fingerprints require substantial time and resources, creating barriers to scientific research since costly equipment is required that may not be accessible to everyone. Additionally, in a lab setting, this process would be time-consuming and require several steps. First, the full genome would have to undergo PCR to isolate and amplify the target DNA region. In bacteria, the Uni331F and 1492R primers would be used in this step to target the 16S rRNA gene. Then, the PCR products would be run through a series of three restriction enzyme digests. Once the restriction fragments are obtained, they can be run through a gel electrophoresis to separate the fragments by size. Because the length of the restriction fragments produced will differ from species to species, the gel would be analyzed and compared to a reference or a database to determine the identity of the organism. Therefore, to overcome this obstacle and make identification more accessible, we aim to design a computational tool that will identify an unknown microorganism by analyzing its 16S rRNA gene sequence using the more inexpensive ARDRA method while still producing results comparable to that of T-RFLP.
Methodology Test Data Generation
In order to establish a baseline for the behavior of our virtual digest program, test data was collected to compare the results of our program with the results of a pre-existing web-based restriction enzyme digest. The overall process for test data generation is shown in Figure 3.
Figure 3: Overall Steps for Test Data Generation A full
bacterium genome was downloaded from the GenBank database, then run through a virtual PCR simulation from the Sequence Manipulation Suite using the Uni331F and 1492R primers (Figure 4). This allowed us to target and amplify the desired 16S rRNA gene sequences in the DNA so that only this region would be used in the virtual digest. After the fragment was obtained through the virtual PCR, a virtual restriction enzyme test was performed.