Meeting of the Minds, 2018

Page 1



M

eeting of the Minds is an annual symposium at Carnegie Mellon University that gives students

an opportunity to present their research and project work to an audience of faculty, fellow students, family members, industry representatives and the larger community. Students use posters, videos and other visual aids to present their work in a manner that can be easily understood by both experts and non experts. Through this experience, students learn how to bridge the gap between conducting research and presenting it to a wider audience. A review committee consisting of external experts and faculty members will review the presentations and choose the best projects and posters. Awards and certificates are presented to the winners.


From the Dean The Meeting of the Minds student research symposium is a celebration of ingenuity, hard work, scientific exploration and intellectual curiosity. It is a highlight of the academic year, and we are exceptionally proud of the fine body of work produced by our students. Research is an essential element of the undergraduate experience. For some students, this is the beginning of a career in scientific exploration, experimentation and analysis. For others, the intellectual rigor of research is invaluable experience in problem solving, developing critical skills they will use throughout their professional careers. At its heart, scientific research brings together creativity and reason. The projects at Meeting of the Minds 2018 are a showcase of this process. I encourage you to explore the projects, ask questions and learn about the unique perspectives that our students bring to scientific questions. Michael Trick Dean Carnegie Mellon University in Qatar

Undergraduate research at CMU-Q A research institute like no other, Carnegie Mellon is home to the world’s leading experts in a range of fields. In this tradition, Carnegie Mellon University in Qatar nurtures and develops opportunities for faculty members and students to build regionally relevant research programs in their areas of expertise. Faculty members contribute to the CMU-Q body of work through studies funded by Qatar National Research Fund (QNRF) and internal seed funds. These projects often provide a framework for undergraduates to learn about the research process and contribute to a larger project. Students also undertake senior thesis projects, pursue independent studies guided by faculty mentors, initiate their own projects, and partake in summer research programs within Carnegle Mellon University and Education City. Meeting of the Minds is a showcase of these projects.

2


Acknowledgements Special Awards Carnegie Mellon University in Qatar acknowledges and thanks the Ministry of Development Planning and Statistics and Qatar National Research Fund for recognizing students and researchers with special awards.

Judges Carnegie Mellon University in Qatar would like to express deep appreciation to the judges, who offer their time, expertise and feedback to make this research symposium a success. Thank you. •

Dr. Essam Abdelalim, Hamad Bin Khalifa University

Dr. Serkan Akgüç, Carnegie Mellon University in Qatar

Dr. Ali Alaboudy, Qatar National Research Fund

Dr. Chadi Aoun, Carnegie Mellon University in Qatar

Dr. Salim Bougarn, Sidra Medicine

Dr. Chiara Cigno, Sidra Medicine

Dr. Fatemeh Darakhshan-Rassam, Qatar National Research Fund

Dr. Julie Decock, Hamad Bin Khalifa University

Dr. Mohammed Dehbi, Hamad Bin Khalifa University

Dr. Hasan Demirkoparan, Carnegie Mellon University in Qatar

Dr. Roberto Di Pietro, Hamad Bin Khalifa University

Dr. Muhammad Elnaggar, Sidra Medicine

Dr. Denielle Emans, VCUarts Qatar

Dr. Jason Ford, Sidra Medicine

Dr. Susan Hagan, Carnegie Mellon University in Qatar

Dr. Henning Horn, Hamad Bin Khalifa University

Dr. Karl Richard Alexander Knuth, National Center for Cancer Care and Research, Hamad Medical Corporation

Dr. Donald Love, Sidra Medicine

Dr. Qutaibah Malluhi, Qatar University

Dr. Nayef Mazloum, Weill Cornell Medicine – Qatar

Dr. Hamid Menouar, Qatar Mobility Innovations Center

Dr. Massimono Miele, Sidra Medicine

Dr. Mohamed Mokbel, Qatar Computing Research Institute

Dr. Kemal Oflazer, Carnegie Mellon University in Qatar

Dr. Preslav Nakov, Qatar Computing Research Institute

Dr. Mourad Ouzzani, Qatar Computing Research Institute

Dr. Ahmed Rebai, Qatar National Research Fund

Dr. A.M. Salaz, Carnegie Mellon University in Qatar

Dr. Basem Shomar, Qatar Environment and Energy Research Institute

Dr. Munir Tag, Qatar National Research Fund

Haya Thowfeek, Education Above All Foundation

Dr. Annette Vincent, Carnegie Mellon University in Qatar

Dr. Stephan Vogel, Qatar Computing Research Institute

Dr. Ingmar Weber, Qatar Computing Research Institute

Dr. Hadi Mohammad Yassine, Qatar University

Dr. Barak Yehya, Ministry of Development Planning and Statistics

3


Undergraduate Posters Biological Sciences

Effects of different stresses on pluripotent stem cell fate

Are you sure your soy is non-GMO?

• Phenotypic and behavioral characterization of MDA-MB 231/468 breast cancer cell lines

12

• Detection of CP4-EPSPS and other GM genes in soy milk variants

14

• A novel post-transcriptional mechanism for inhibiting the expression

10

of PTEN in breast cancer

16

Mitochondrial dysfunction associated with aspartame toxicity in kidney cells

18

Is corn syrup used in processed products extracted from genetically modified corn?

20

• Classification of bacterial diversity in Qatar ballast water samples using QIIME bioinformatics pipeline

22

• Identification of post transcriptional regulatory factors of PTEN expression in breast cancer cells

24

• Investigating the presence of 35s promoter, CRY1A(b), Bat and Pat genes as markers for genetic modification in three commercial Zea Mays (Corn) food products

• MAPK14 splicing as a novel biomarker in regulating breast cancer

• Analysis of genetically modified (GM) marker genes in maize-based products

26 28

using multiplex PCR and ELISA

30

Life bacterial detection using RNA extraction from ballast water sample

32

Truncations of Drs1 arms provide insight into their possible functions

34

• Testing for the presence of genetic modifications in common corn products - tortilla chips and corn flour

36

Studying phosphorylation of Kindlin F1 loop and interactions with protein partners.

38

Investigating oxidative stress induced by aspartame in human embryonic kidney cells

40

• The varying amount of genetic modifications in non-GMO labeled products from the USA and Europe

4

8

42


Computer Science

An oracle characterization of the polynomial-size alternating hierarchy

44

Interactive evaluation and training of classifiers

46

Minimizing cost of accuracy estimation of automated classifiers

48

Behaviour analysis using multi-sensor data

50

A learning approach to vision-based coarse robotics localization in industry

52

Computational analysis of the role of MTCP1 in T-cell leukemia

54

Mixed initiative system for survivable path planning in cluttered environments

56

• Relating children’s automatically detected facial expressions to their behavior in RoboTutor

58

60

Deep learning and pattern analysis for crack detection

Information Systems

Doctor-patient communication in Qatar

62

Trust in commerce through Instagram in Qatar

64

• Communicate through your eyes: A study of natural interactions with a digital cultural artifact

66

Parents of children of autism and technology use by the children

68

RAES: Road accidents and emergency services in the United States

70

A study on the use of educational tools amongst university students

72

RISE: Real-time information system for emergency detection

74

• NEOS: Saving receipts electronically

76

78

Measuring corporate transparency in sustainability reporting: A study of the energy sector

Postgraduate Posters

Delay tolerant computing

82

The MADAR Arabic dialect corpus and lexicon

84

Guidelines and annotation framework for Arabic author profiling

86

Teams of aquatic and aerial robots for marine environmental monitoring

88

Offloading mobile storage to underutilized edge devices

90

Extending the range via ad-hoc communication for cooperative robotic watercraft

92

RAMOS: A resource-aware multi-objective system for edge computing

94

MADARi: A web interface for joint Arabic morphological annotation and spelling correction

96

Event coreference using neural network classifiers

98

Fine-grained Arabic dialect identification

100

Formalization of financial trading systems in a concurrent logical framework (CLF)

102

5


6


Undergraduate Posters


Effects of different stresses on pluripotent stem cell fate Author

Farah AbdelAziz

Advisors

Annette Vincent Mohamed Emara, Qatar Biomedical Research Institute

Category

Biological Sciences

Abstract Induced pluripotent stem cells (iPSCs) have a unique capacity to self-renew and differentiate into various types of cells. The ability of iPSCs to direct its fate to either self-renewal or differentiation is balanced by different extra and intracellular factors. One of these factors that has been found to primarily control this phenomenon is the alteration in micro-environmental conditions (stress conditions). Under stressful conditions (sodium arsenite (SA) and Sodium Chloride (SC)), pluripotent stem cells form a specific type of granules known as stress granules (SGs). These granules are known to sequester the mRNA and proteins and may act as a hub that determine stem cell fate (Palangi et al, 2017). The purpose of this research is to study the co-localization between pluripotent markers and stress granule markers within the mRNA and protein level to further understand the mechanism towards iPSC fate. mRNA containing AU rich elements is known to bind to one of the SG proteins (TIA) and get sequestered to SGs. Interestingly, Nanog mRNA contains those elements and thus it is expected to be recruited to SGs. To test this hypothesis, we did a FISH experiment to test the recruitment of Nanog mRNA as well as other pluripotent markers mRNAs to SGs. Our results showed that Nanog mRNA was the only one among pluripotent markers to be sequestered to SGs. On the protein level, we observed the localization of Oct4 but not Nanog to SGs. This data would increase our understanding of how stress response program regulates stem cell fate. Also, it will help us to better use them in studying the pathogenesis of different diseases and can also provide possible cures through drug screening and cell therapy.

8


Effects of Different Stresses on Pluripotent Stem Cell Fate: A Step Forward Towards Enhancing the Use of iPSC in Disease Modeling Farah AbdelAziz1, Dr. Annette Vincent1, and Dr. Mohamed M. Emara2 1.Biological Sciences Program, Carnegie Mellon University, Qatar 2.Neurological Disorders Research Center, Qatar Biomedical Research Institute; Hamad Bin Khalifa University Abstract Induced pluripotent stem cells (iPSCs) have a unique capacity to self-renew and differentiate into various types of cells. The ability of iPSCs to direct its fate to either self-renewal or differentiation is balanced by different extra and intracellular factors. One of these factors that has been found to primarily control this phenomenon is the alteration in micro-environmental conditions (stress conditions). Under stressful conditions (sodium arsenite (SA) and Sodium Chloride (SC)), pluripotent stem cells form a specific type of granules known as stress granules (SGs). These granules are known to sequester the mRNA and proteins and may act as a hub that determine stem cell fate (Palangi et al, 2017). The purpose of this research is to study the co-localization between pluripotent markers and stress granule markers within the mRNA and protein level to further understand the mechanism towards iPSC fate. mRNA containing AU rich elements is known to bind to one of the SG proteins (TIA) and get sequestered to SGs. Interestingly, Nanog mRNA contains those elements and thus it is expected to be recruited to SGs. To test this hypothesis, we did a FISH experiment to test the recruitment of Nanog mRNA as well as other pluripotent markers mRNAs to SGs. Our results showed that Nanog mRNA was the only one among pluripotent markers to be sequestered to SGs. On the protein level, we observed the localization of Oct4 but not Nanog to SGs. This data would increase our understanding of how stress response program regulates stem cell fate. Also, it will help us to better use them in studying the pathogenesis of different diseases and can also provide possible cures through drug screening and cell therapy.

Fig 1. Analysis of hiPSC morphology after SA as Stress Factor Phase contrast microscopy showing the morphology of non- treated hiPSC colonies and colonies treated with 125μM SA Large insets show magnified views of cells and scale bar represents 50μm

Figure 2. Testing the Co-localization of Pluripotent Marker Oct-4 with Stress Granule Marker G3BP with SA and SC as Stress Factors. The cells were incubated at 37°C for 1 hour in an incubator with 5% CO2.

Figure 3. Testing the Co-localization of Pluripotent Marker Nanog mRNA and Stress Granule Marker G3BP with SA as a Stress Factor

Conclusion SA and SC stress factors do not affect iPSC morphology. Oct-4 pluripotent marker protein co-localizes into stress granules with G3BP as opposed to Nanog protein. On the other hand, Nanog mRNA was found to co-localize into SG with G3BP. Thus, depending on mRNA and protein level, different pluripotent markers are sequestered into SG which could determine iPSC fate accordingly.


Are you sure your soy is non-GMO? Authors

Albandari Al Khater Aisha Fakhroo

Advisor

Annette Vincent

Category

Biological Sciences

Abstract Genetically Modified Organisms (GMO) have been widely used since the early 1990s. Nowadays, food products are labeled “organic” in order to indicate that they are not genetically modified. One of the crops that is most commonly genetically modified is soybean. There has been wide interest in detecting transgenic components, mainly proteins, in different soy-containing food products. This project assesses the credibility of such soy-products that claim to be organic or natural. The tested products include ready-mixed soy powder and cocoa crunch bar. In order to study soybean, the DNA will be extracted using NucleoSpin isolation technology. Further analysis on the DNA of soybean using polymerasechain reaction (PCR) will be applied to confirm the presence or absence of genetic modification through the screening for NOS terminator gene, which is present in 80% of GMOs. Then multiplex PCR will be performed to assess the inserted gene, specifically CP4-EPSPS and Cry1Ab genes. Sandwiched ELISA will be further used to test for presence of proteins expressed from CP4-EPSPS gene. Results have shown that soy milk powder is positive for Cry1Ab gene, soy cocoa bar is negative for both CP4-EPSPS and Cry1Ab genes and both the negative and positive controls are positive for both CP4-EPSPS and Cry1Ab genes. In conclusion, all products tested were found to be genetically modified except for cocoa bar. The negative control (organic soy flour) was found to be genetically modified opposing what the label claims of being “organic.”

10



Phenotypic and behavioral characterization of MDA-MB 231/468 breast cancer cell lines Author

Khalid Al-Naemi

Advisor

Mohamed Bouaouina

Category

Biological Sciences

Abstract The aim of the research is to further expand on characterizing MDA-MB 231 cell line in terms of integrin expression. In addition to that, we will characterize the phenotype and behavior of MDA-MB 468 cell line in order to correlate integrin expression to the behavioral/phenotypic differences between the cell lines. The phenotypic differences include adhesion to several extracellular matrix proteins, cell mobility and adhesion-mediated signaling. These phenotypic differences allow for the accurate assessment of the cell lines pathological behavior. The techniques that will be used are flow cytometry, which will determine the presence of specific integrin using anti-integrin antibodies. Previously, we were able to determine the presence of integrin-β1 in MDA-MB231, we will further expand for determination of other integrins and in MDA-MB 468. Also, to assess the cell mobility for each cell line, we will conduct a wound healing assay. The experiment assess the cell migration characteristic for each cell line. Moreover, we will expand and troubleshoot Western blot for integrin-associated proteins in order to quantify and correlate the protein expression to the characteristics for each cell line. Cancer is one of the common diseases that many people nowadays face. In which 1 in 3 people will develop cancer at least once throughout the course of their lifetime. According to World Health Organization, in 2015, cancer have caused the death of 8.8 million individuals. This is nearly 1 in 6 of global deaths is due to cancer. These issues demand immediate resolution, to be able to save millions of lives. Thus it is crucial to identify and characterize all the types of cancer that are observed. It allows for more comprehensive understanding of cancer cells behavior.

12


Methods

The cell lines were provided by HBKU/Weill Cornell University. The cells ǁĞƌĞ ŐƌŽǁŶ ĂŶĚ ĐƵůƚƵƌĞĚ /Ŷ ϲŵů ƵůďĞĐĐŽ͛Ɛ DŽĚŝĨŝĞĚ ĂŐůĞ͛Ɛ DĞĚŝƵŵ (DMEM), provided Fetal Bovine Serum (FBS), Penstrip, Sodium Pyruvate and Non-essential amino acids. Incubated at 37C0 Dry incubator with 5% CO2 flow. The cells were monitored on a daily basis to sustain cell culture propagation

Cell Culture:

The aim of the research is to further expand on characterizing MDA-MB 231 cell line on terms of integrin expression. In addition to that, we will characterize the phenotype and behavior of MDA-MB 468 cell line in order to correlate integrin expression to the behavioral/phenotypic differences between the cell lines. The phenotypic differences include adhesion to several extracellular matrix proteins, cell mobility and adhesion-mediated signaling. These phenotypic differences allow for the accurate assessment of the cell lines pathological behavior. The techniques that will be used are flow cytometry; which will determine the presence of specific integrin using anti-integrin antibodies. Previously, we were able to determine the presence of integrin-ɴϭ ŝŶ D -MB231, we will further expand for determination of other integrins and in MDA-MB 468. Also, to assess the cell mobility for each cell line, we will conduct a wound healing assay. The experiment assess the cell migration characteristic for each cell line. Moreover, we will expand and troubleshoot Western blot for integrin-associated proteins in order to quantify and correlate the protein expression to the characteristics for each cell line. Cancer is one of the common diseases that many people nowadays face. In which 1 in 3 people will develop cancer at least once throughout the course of their lifetime. According to World Health Organization, in 2015, cancer have caused the death of 8.8 million individuals[1]. This is nearly 1 in 6 of global deaths is due to cancer[1]. These issues demand immediate resolution, to be able to save millions of lives. Thus it is crucial to identify and characterize all the types of cancer that are observed. It allows for more comprehensive understanding of cancer cells behavior.

Abstract & Introduction

Flow Cytometry

Results

The cells are pelleted and lysed using lysis buffer containing multiple protease inhibitors and lysozyme. The protein concentration from the lysate is determined through Bradford Standard curve. The cell lysate volume are added according to increasing amount of protein amount in SDS gel wells. The gel is run for 1.5 hours for 150 volts. The proteins are then transferred into a nitrocellulose membrane overnight at 4C0. The membrane is blocked by BSA solution (2mg/ml) for 1 hour. The membrane is incubated with primary antibodies for protein of interest, which include: Kindlin2, Paxillin, Kindlin1 and Actin as loading control. Followed by secondary incubation and imaging the membrane and visualizing the bands using Li-Cor membrane imager.

Western Blot:

The cells were harvested and splinted in wells depending on the variable being tested. The variable being tested were: The varying seeding of the cells in wells and varying the scratch width. The cells are then monitored throughout specific time points and the induced scratched are imaged using Cell imager using normal light with 10X objective. Determining the amount of open wound is assessed by Tscratch.

Scratch Assay:

Flow Cytometry: The cells are counted to be 5x105 and are transferred into FACs tubes, and subjugated into 3 conditions: Unstained cells with no antibodies added, Primary and secondary antibodies are added and Secondary staining with only secondary antibody is added. The incubation time for each antibody staining is 30min in 4C0, while only the secondary staining is kept in ambient light condition alongside previous set conditions.

v

Analysis

Varying Scratch width

[1]: WHO Cancer Control Programme. (n.d.). Retrieved March 22, 2018, from http://www.who.int/cancer/en/

References

Characterizing of MDA-MB 231 is still undergoing through testing for other integrins, and determine the effect of other extracellular matrices. MDA-MB 468 cells did survive earlier, and did not receive new batches. The future aim is to further expand and compare data of cell lines with another.

Conclusion & Future Research

Based on Flow Cytometry data, we can have found with statistically significant evidence for expression of 4 of 5 integrins. The main finding s expression of integrin beta 2, which is not usually expressed in epithelial cell origin. While Scratch assay data are not conclusive, as it requires various replicates to ensure the data is not due to random. The data on Western blot do not show various other integrin-associated proteins, as we were not able to detect clear signal for other proteins.

\\

Western Blot

Varying Cell seed

Wound Healing

Phenotypic and Behavioral Characterization of MDA-MB 231/468 breast cancer cell lines Student Khalid Al-Naemi Supervisor Mohamed Bouaouina Biological Sciences Program CMU Qatar


Detection of CP4-EPSPS and other GM genes in soy milk variants Authors

Kawthar Al-Sadat Najlaa Al-Thani

Advisor

Annette Vincent

Category

Biological Sciences

Abstract Soy milk is an essential dietary supplement which many consumers utilize on a daily basis. The primary component of soy milk, soy beans, is known to a major genetically modified plant within consumer-based agriculture. Although soy milk products claim to be GMO free, the many soy milk variants such as Soy Whipped Cream and Soy Cooking Cream questionably deviate from the naturally occurring physical form of normal soy milk. This arises suspicion that soy beans used in creating these variants could have been genetically altered. In this study, we will be using PCR, Multiplex PCR and Sandwich ELISA to detect the presence of genetically modified soy beans by detecting known genes related to genetic modifications such as CP4-EPSPS, Pat and CaMV35S promoter along with Soy Lectin as a soybean internal control within soy milk variants while using organic, Non-GMO soy milk as our negative control. GM genes (CamV35S, Pat and CP4-EPSPS) were detected in soy cooking cream but little to no EPSPS protein product was detected through ELISA.

14


30 seconds at 62 C 0 45 seconds at 72 C 7 minutes at 720C 0 At 4 C

0

25 seconds at 95 C

0

60 seconds at 59 C 0 45 seconds at 72 C 10 minutes at 720C 0 At 4 C

0

0

30 seconds at 95 C

60 seconds at 560C 0 45 seconds at 72 C 10 minutes at 720C 0 At 4 C

0

30 seconds at 95 C

Type of Polymerase Chain Reaction Bead PCR Multiplex PCR Uniplex PCR CaMV35S: CV35 Pat: NOS terminator: CP4EPSP: Pat-F HA-nos118 CP4 CP4EPSP: Soybean Lectin Soybean Lectin CP4 (Internal Control): (Internal Control): Soybean Lectin GMO3/GMO4 GMO3/GMO4 (Internal Control): GMO3/GMO4 3 minutes at 950C 5 minutes at 950C 5 minutes at 950C 50 Cycles 30 Cycles 30 Cycles

ELISA: Proteins from food products were extracted using extraction buffer and adding equal volumes of each food product to the same volume of extraction buffer. Sandwich Enzyme-linked

Denaturation: Number of Cycles Initial Denaturation: Annealing: Extension: Final Extension: Final Extension:

Primers Used for each gene (both forward and backward primers)

Conditions

Table 2: Primers and the conditions used in each type of PCR.

Methods DNA Extraction: Food products to be analyzed were homogenized, and DNAs were extracted using NucleoSpin Technology kit. To measure the purity of the isolated, scans were performed at a wavelength of 200 nm to 400 nm using a UV-Vis 60S thermofisher spectrophotometer. Absorbance at specific wavelengths, such as 230 nm, 260 nm, and 280 nm was measured. The purity ratio was assessed using the absorbance at 260 nm and 280 nm. Polymerase Chain Reaction (bead, multiples and uniplex): DNA extracted from the food products were analyzed for the presence of genetically modified gene by testing for the presence of NOS terminator (which is required in any GMO event) and using the internal control as a control using a beads PCR. The primers used and the PCR conditions are shown in table 2. A multiplex PCR was also conducted to test for the presence of multiple genes such as the promoter (CaMV35S), Pat gene, CP4EPSPS and the internal control in the same reaction tube. The concentration or the volumes of primers added are based on their relative abundance. The primers used and the PCR conditions are shown in table 2. A uniplex PCR was also conducted to test for the presence of one specific GMO gene only, which is CP4EPSPS in this case, in addition to the internal control. The primers used and the PCR conditions are shown in table 2. PCR products were analyzed by running a 1.2% agarose gel to confirm the presence of the genes in the PCR amplicon by compared to 1 kb ladder (NEB N3232L) and 100 basepair ladder (NEB N3231S)

Introduction Soymilk has been in high demand with the rise of health conscience consumerism, encouraging the production of more alternative soymilk products. As of 2016, 94% of countries worldwide adopted genetic modification (GM) technology to grow soybeans which is a major crop used in several food products, such as soymilk [1]. Common GM genes found in soybeans include CP4-EPSPS (expresses EPSPS enzyme) and Pat (eliminates herbicidal activity ). This genes are usually inserted into soybeans using a gene cassette containing CaMV35S promotor and NOS terminator. In this study, we will be detecting these genes with positive control GMO soybeans as well as 2 test products, Soy whipped cream (Soyatoo!) and Soy cooking cream (Sojade), both being soy milk variants and finally a negative control organic soymilk product (Organic EdenSoy).

Abstract Soy milk is an essential dietary supplement which many consumers utilize on a daily basis. The primary component of soy milk, soy beans, is known to a major genetically modified plant within consumer-based agriculture. Although soy milk products claim to be GMO free, the many soy milk variants such as Soy Whipped Cream and Soy Cooking Cream questionably deviate from the naturally occurring physical form of normal soy milk. This arises suspicion that soy beans used in creating these variants could have been genetically altered. In this study, we will be using PCR, Multiplex PCR and Sandwich ELISA to detect the presence of genetically modified soy beans by detecting known genes related to genetic modifications such as CP4-EPSPS, Pat and CaMV35S promoter along with Soy Lectin as a soybean internal control within soy milk variants while using organic, Non-GMO soy milk as our negative control. GM genes (CamV35S, Pat and CP4-EPSPS) were detected in Soy cooking cream but little to to EPSPS protein product was detected through ELISA.

Results

References [1] International Service for the Acquisition of Agri-biotech Applications; ISAAA; 2016. [2] Clarke, J. D., Alexander, D. C., Ward, D. P., Ryals, J. A., Mitchell, M. W., Wulff, J. E., & Guo, L. (2013). Assessment of genetically modified soybean in relation to natural variation in the soybean seed metabolome. Scientific reports, 3, 3082. [3] James D. et al. 2003. J. Agric. Food Chem. [4] Querci M Qualitative Detection of MON810 Maize, Bt-176 Maize and Roundup Ready® Soybean by PCR; WHO [5] geat, H.R. et al. 2002. J. Agric. Food Chem. [6] Querci, M., Jermini, M., & Van den Eede, G. (2006). The analysis of food samples for the presence of genetically modified organisms. TRAINING COURSE ON, 33. [7] History of Roundup Ready Soybeans. (n.d.). Retrieved from sourcewatch: https://www.sourcewatch.org/index.php/History_of_Roundup_Ready_Soybeans •

Acknowledgements We would like to show our gratitude to Professor Annette Vincent for her guidance throughout this research project. We would also like to thank Maya Kemaldean and Bernadette Bernales for their assistance during lab work. Finally, we are grateful for Carnegie Mellon University Qatar for providing us with the faculties required to carry out our research.

The originally labeled ‘organic’ soy cooking cream (CC) was found to contain known GM genes CP4-EPSPS, Pat, CamV35S. All samples contained internal control Soy lectin but NOS terminator was not detected even in the positive control. EPSPS was expressed at low levels in all samples but this does not ideally conclude that samples are GM since EPSPS is naturally expressed at low levels in soybeans. In the future, we would like to retest these genes after re-isolating the genomic DNA from these soybean products. We would also like to test more common GM genes used in soybean cultivation and test for both promotor and terminator. Sequencing PCR products will also verify the identity of the genes amplified.

Conclusion

ELISA Based on the absorbance data and ELISA standard curve, all samples showed a CP4-EPSPS protein concentration n between 0-0.10%.This could be due to trace amount of the EPSPS enzyme not necessarily due to GM intervention.

Based on Multiples/Uniplex PCR data, CP4-EPSPS genes was detected in both PCR reactions of the Soy cooking cream (CC) at 604 bp, a much higher band than expected. Pat, CamV35S and Soy lectin genes were also detected in Soy cooking cream (CC) at 1950 bp, 1008bp and 129bp, also much higher bands than expected. The positive control still does not show any bands for any of the GM genes, suggesting improper genomic DNA extraction since the GM soybean sample did prove to be more difficult in the extraction process than other soybean samples. This suggested that primers may have amplified other genes.

PCR Based on single PCR data, no bands can be seen to correlate with the NOS terminator (125bp) gene, initially indicating no presence of the GM cassette, however, the positive control does not seem to show this band either. Many primer dimers are seen at the bottom of the gel suggesting that primers may have been unable to bind to the NOS terminator region in any of the samples. Internal controls bands are clearly seen at ~ 118bp, verifying the soybean identity in all samples.

DNA Isolation The GM soybean (positive control) contained the highest concentration of genomic DNA (127 ug/ml) with the highest purity ratio (1.46). All other samples showed concentrations and purity ratio below these values. Since none of the purity ratio were between 1.7-1.9, all samples are of low purity. This could be due to expected low yield from liquid sample DNA extraction.

Discussion

Kawthar Al-Sadat; Najlaa Al-Thani Biological Sciences Department; Carnegie Mellon University Qata

Detection of CP4-EPSPS and other GM genes in Soy Milk Variants


A novel post-transcriptional mechanism for inhibiting the expression of PTEN in breast cancer Authors

Boshra Al-Sulaiti Reem Elasad Ettaib El Marabti

Advisor

Ihab Younis

Category

Biological Sciences

Abstract Splicing of introns in the pre-mRNA is an important post-transcriptional step, as the final protein product depends on the sequence of the mature mRNA. Thus, normally this splicing function is tightly regulated. A subset of genes contains a specific type of intron, called minor introns, present in highly conserved genes, including tumor suppressors and oncogenes. We have previously found that a subset of these introns is dysregulated in breast cancer. The regulation of minor introns in breast cancer is not fully studied, so to further understand this, we analyzed the expression of minor intron-containing genes in 1200 breast cancer samples. Analysis of RNA-seq data identified several minor intron-containing genes whose expression seems to be differentially regulated in breast cancer. Next, we used antisense morpholino oligonucleotides (AMOs) to inhibit the splicing of specific introns in a breast cancer cell line (MDA- MB-231) to check whether the inhibition of minor intron splicing affects the behavior of these cells. For this, we utilized RT-PCR and western blot to check transcript and protein levels, as well as proliferation and motility assays to check for cancer cell behavior. Finally, using computational analysis, a database of RNA-binding proteins (RBPs) that interact and potentially affect the aberrant splicing of specific minor introns in breast cancer was curated. Our results show that the pre-mRNA of the tumor suppressor gene PTEN, phosphatase and tensin homolog, contains a minor intron and is dysregulated. The AMO transfection that inhibited minor intron splicing of PTEN showed increased unspliced mRNA transcripts and consequently indications of increased proliferation and migration. A list of RBPs that bind to the PTEN minor intron was compiled. In conclusion, PTEN, a gene commonly dysregulated in cancers, contains a minor intron, and when its splicing is inhibited, cancer cells behave more aggressively, suggesting a novel mechanism for downregulating tumor suppressors, such as PTEN, in breast cancer by altering the splicing of their minor intron. Thus, understanding how the cells alter minor intron splicing of PTEN could be a therapeutic target in breast cancer.

16


2

1 Abstract

Figure 2: Dysregulation of splicing leads to the formation of different or unusual spliceoforms causing the dysregulation of multiple cellular processes, eventually leading to the onset of diseases, such as neurodegenerative diseases and cancer. Adapted from Srebrow & Kornblihtt [6]

Figure 1: Minor introns are embedded molecular switches and their splicing is regulated by the levels of U6atac present in the cell. U6ata snRNP has a short half life, but p38 MAPK stabilizes it. High levels of U6atac, along with the presence of the other subunits in the minor spliceosome, splice out the minor intron, otherwise, the mRNA is either degraded or alternately spliced to make another isoform. Thus, the splicing, or lack of, of minor introns determines the fate of the mRNA in the cell. Adapted from Younis et al. [2]

Figure 4: Akt/PKB signaling pathway. PTEN, a tumor supressor protein, inhibits AKT protein, and in turn inhibiting uncontrolled growth and proliferation, stopping the formation of a tumor. Adapted from Phin et al. [8]

Figure 3: Percentage of breast cancers that utilize different mechanisms to inhibit the function of PTEN

• PTEN is a tumor suppressor protein that inhibits the Act pathway • PTEN is commonly silenced or suppressed in breast cancer [7] • Currently known mechanisms of PTEN silencing: - Point mutation in exon 2 - Promoter hypermethylation

Aim of Project: To explore the potential of minor intron splicing as a novel post-transcriptional mechanism for inhibiting the expression of PTEN in breast cancer.

• Splicing is: - Tightly regulated by trans factors (RNA-binding proteins), cis regulatory elements in the gene promoter (Exonic and intronic enhancers and silencers) - Dysregulation can cause diseases [3] • Breast cancer: - Highest incidence rate and mortality compared to other cancers in women in Qatar [4] - Utilizes aberrant splicing of certain genes [5] • Previous bioinformatics research that compared a breast cancer cell line (MCF7) and a breast epithelial cell line (MCF10A) identified PTEN as a dysregulated minor intron-containing gene

• Pre-mRNAs are made of exons (expressed sequences) and introns (intervening sequences) • Introns are removed out in mature mRNA • The mechanism by which these introns get removed is called splicing • There are two types of introns; major and minor • Minor introns are specialized introns: - Only 600-700 present in genes that are conserved and are usually responsible for DNA replication and repair, RNA transcription and processing, regulating the cell cycle, etc [1] - Spliced by the minor spliceosome, which is made up of U11, U12, U5, U4atac, and U6atac (catalytic subunit)

Introduction

Splicing of introns in the pre-mRNA is an important post-transcriptional step, as the final protein product depends on the sequence of the mature mRNA. Thus, normally this splicing function is tightly regulated. A subset of genes contain a specific type of intron, called minor introns, present in highly conserved genes, including tumor suppressors and oncogenes. We have previously found that a subset of these introns is dysregulated in breast cancer. The regulation of minor introns in breast cancer is not fully studied, so to further understand this, we analyzed the expression of minor intron- containing genes in 1200 breast cancer samples. Analysis of RNA-seq data identified several minor intron-containing genes whose expression seems to be differentially regulated in breast cancer. Next, we used antisense morpholino oligonucleotides (AMOs) to inhibit the splicing of specific introns in a breast cancer cell line (MDA- MB-231) to check whether the inhibition of minor intron splicing affects the behavior of these cells. For this, we utilized RT-PCR and western blot to check transcript and protein levels, as well as proliferation and motility assays to check for cancer cell behavior. Finally, using computational analysis, a database of RNA-binding proteins (RBPs) that interact and potentially affect the aberrant splicing of specific minor introns in breast cancer was curated. Our results show that the pre-mRNA of the tumor suppressor gene PTEN, phosphatase and tensin homolog, contains a minor intron and is dysregulated. The AMO transfection that inhibited minor intron splicing of PTEN showed increased unspliced mRNA transcripts and consequently indications of increased proliferation and migration. A list of RBPs that bind to the PTEN minor intron was compiled. In conclusion, PTEN, a gene commonly dysregulated in cancers, contains a minor intron, and when its splicing is inhibited, cancer cells behave more aggressively, suggesting a novel mechanism for downregulating tumor suppressors, such as PTEN, in breast cancer by altering the splicing of their minor intron. Thus, understanding how the cells alter minor intron splicing of PTEN could be a therapeutic target in breast cancer.

4a

3

Figure 2: PTEN mRNA splicing patterns in MDA-MB-231 cells before and after the suppression of PTEN minor intron splicing. MDA-MB-231 cells were transfected with 0.1mM control and 0.05mM PTEN AMO. (a) RT-PCR was carried out on all RNA samples extracted 24 hours post-transfection to amplify exon 1 to intron 1 (unspliced transcript), exon 1 to exon 2 (spliced transcript), exon 1 to exon 3 (spliced transcript), intron 1b to exon 3 (cryptic exon), intron 1 to intron 2 (unspliced transcript), and GAPDH (control). Samples were run on a 1.5% agarose gel electrophoresis. (b) Visual representation on where the PCR primers bind on the PTEN mRNA transcript and which regions they amplify.

Figure 1: Optimization of AMO dosage and incubation time used for the transfection of MDA-MB-231. MDA-MB-231 cells were transfected with 0.1mM control AMO and 0.1mM or 0.05mM PTEN AMO. RNA was extracted after 24 hours and 48 hours of transfection. (a,b) RT-PCR was carried out to amplify exon 1 to exon 2 (spliced transcript), exon 1 to intron 1 (unspliced transcript), and GAPDH (control). Products were run on a 1.5% agarose gel electrophoresis. (b,d) Amounts of unspliced and spliced PTEN transcripts were quantified and normalized to GAPDH, and the ratio of unspliced to spliced mRNA was calculated. (e) From a separate but similar AMO trsnafection experiment, proteins were extracted after 5 days of transfection and were run in a western blot using mouse antiPTEN, rabbit anti-MAPK14 (another minor intron-containing gene), and rabbit anti-HPRT as control. (f) Quantification of the amount of PTEN protein in different conditions normalized to HPRT.

Results: Chemical Assays

*Antisense Morpholino Oligonucleotides

Experimental Setup

Boshra Al-Sulaiti, Reem Elasad, Ettaib El Marabti, Ihab Younis Biological Sciences Program, Carnegie Mellon University in Qatar, Education City, Doha, Qatar

6

5

4c

4b

CSTF2T

HNRNPA1 ESWRI HNRNPC

HNRNPU

CSTF2

FUS

HNRNPA1

2

[1] Burge, C. B., Padgett, R. A., & Sharp, P. A. (1998). Evolutionary fates and origins of U12-type introns. Molecular cell, 2(6), 773-785. [2] Younis, I., Dittmar, K., Wang, W., Foley, S. W., Berg, M. G., Hu, K. Y., ... & Dreyfuss, G. (2013). Minor introns are embedded molecular switches regulated by highly unstable U6atac snRNA. Elife, 2, e00780. [3] Shin, C., & Manley, J. L. (2004). Cell signalling and the control of pre-mRNA splicing. Nature reviews Molecular cell biology, 5(9), 727-738. [4] Donnelly, T. T., Al-Khater, A. H., Al-Kuwari, M., Al-Meer, N., Al-Bader, S. B., Malik, M., ... & Jong, F. C. D. (2011). Study exploring breast cancer screening practices amongst Arabic women living in the State of Qatar. Avicenna, (2011), 1. [5] Lee, M. P., & Feinberg, A. P. (1997). Aberrant splicing but not mutations of TSG101 in human breast cancer. Cancer Research, 57(15), 3131-3134. [6] Srebrow, A., & Kornblihtt, A. R. (2006). The connection between splicing and cancer. Journal of cell science, 119(13), 2635-2641. [7] ZHANG H-Y, LIANG F, JIA Z-L, SONG S-T, JIANG Z-F. PTEN mutation, methylation and expression in breast cancer patients. Oncology Letters. 2013;6(1):161-168. doi:10.3892/ol.2013.1331. [8] Phin, S., Moore, M., & Cotter, P. D. (2013). Genomic rearrangements of PTEN in prostate cancer. Frontiers in oncology, 3, 240.

Bernadette Bernales, Maya Kemaldean, Dr. Mohamed Bouaouina, Nada Abdul Khalique. This research was funded by Qatar Foundation and Carnegie Melon University in Qatar Seed Grant (PI: Ihab Younis). Special thanks to Hind Al Saad for helping with the design of this poster.

Acknowledgments & References

To optimize the AMO transfections, two different doses and times were tested, the optimal dose was found to be 0.05mM AMO and 24 hour incubation post-transfection. Splicing patterns of MDA-MB-231 cells show that there is a high level of unspliced PTEN mRNA in the cells. Hence, MDA-MB-231 cells utilize a novel mechanism in which they dysregulate the minor intron splicing of PTEN to stop its tumor suppressor function. It is also interesting that there is a cryptic exon in intron 1, which gets activated upon suppression of PTEN minor intron splicing. Inhibition of PTEN minor intron splicing also shows increased proliferation. The migration data is inconclusive. Finally, a list of possible RNA-binding proteins that potentially regulate the splicing of the PTEN minor intron was compiled. Thus, for future directions, the effect of these RBPs on minor intron splicing will be further studied. And to further build on this project, the experiments will be repeated with other breast cancer cell lines such as MCF7 and T47D.

Conclusion & Future Directions

Figure 5: RNA-binding protein binding sites within the minor intron region of PTEN. A list of RBPs that are differentially expressed relative to the expression of PTEN in MCF7 breast cancer cells were curated. The RBPs that bind next to or within the region of intron 1, the minor intron, is shown.

1

Figure 4: Migration of MDA-MB-231 cells transfected with AMOs over time. 2 million MDA-MB-231 cells were transfected with 0.05mM control AMOs, 0.005mM U6atac AMOs, and 0.05mM PTEN AMOs on a 6-well plate. After 24 hours, a scratch was made and migration was recorded every 24 hours for 2 days

Results: Computational Analysis

Figure 3: Proliferation of MDA-MB- 231 cells transfected with AMOs over time. Cells were transfected with control AMO, U6atac AMO, and PTEN AMO, then plated onto a 96-well plate and incubated for 24 hours. MTT assay was then carried out every 24 hours for 4 days, and the absorbance was measured at 550 nm. * p > 0.05 (significance asterisk for each data set has its corresponding color) Representative of one biological replicate.

Results: Functional Assays

A Novel Post-Transcriptional Mechanism for Inhibiting the Expression of PTEN in Breast Cancer


Mitochondrial dysfunction associated with aspartame toxicity in kidney cells Author

Maria Ali

Advisor

Annette Vincent

Category

Biological Sciences

Abstract Artificial sweeteners are food additives that are popularly consumed as a sugar substitute worldwide. Our previous work on the effect of aspartame on Madin-Darby Canine kidney cells (MDCK) shows that short term exposure to the artificial sweetener causes increased production of reactive oxidative species (ROS) in mitochondria in kidney cells. We hypothesize that the production of ROS in mitochondria for aspartame treated MDCK cells is because of aspartame feeding into the malate aspartate shuttle. This will cause an increase in the activity of the shuttle and ultimately increase production of ROS through the electron transport chain.

18


Mitochondrial Dysfunction Associated with Aspartame Toxicity in Kidney Cells Maria Ali, Annette Vincent

Biological Sciences Program, Carnegie Mellon University in Qatar Abstract Artificial sweeteners are food additives that are popularly consumed as a sugar substitute worldwide. Our previous work on the effect of aspartame on Madin-Darby Canine kidney cells (MDCK) shows that short term exposure to the artificial sweetener causes increased production of reactive oxidative species (ROS) in mitochondria in kidney cells. We hypothesize that the production of ROS in mitochondria for aspartame treated MDCK cells is because of aspartame feeding into the malate aspartate shuttle. This will cause an increase in the activity of the shuttle and ultimately increase production of ROS through the electron transport chain. Malate Aspartate Shuttle[1]

Proposed link between Malate Aspartate Shuttle and NADPH Oxidase[2]

Methods Determine cytosolic and mitochondrial NADH/NADPH concentrations

Determine changes in mitochondrial membrane potential

Results NADH concentrations measured through auto fluorescence

• NADH concentration increased slightly after aspartame treatment, suggesting elevated activity of the malate aspartate shuttle • The jump in NADH concentration with aminooxyacetate (AOA, known inhibitor of malate aspartate [3] ) and aspartame treatment suggests that the malate aspartate shuttle regulates NADH/NAD+ concentrations. Mitochondrial membrane potential measured through JC-1 Assay

• No significant change in mitochondrial membrane potential post treatment. Superoxide production measured through MitoSox Red Assay

• Treatment with 250 μg/mL of aspartame causes elevated superoxide production in the mitochondria Conclusion Treatment of 250μg/mL of aspartame for 30 minutes with MDCK cells causes elevated mitochondrial superoxide production. There is no change in the mitochondrial membrane potential. Future research should look at how aspartame effects NADPH oxidase activity. References

Measure superoxide production in mitochondria Measure malate aspartate shuttle activity

1. Lodish, H., et al. (2007). Molecular Biology (6thed.). New York, NY: Freeman and Company. 2. Dikalov, S. (2011). Cross talk between mitochondria and NADPH oxidases. Free Radical Biology and Medicine, 51(7), 1289-1301. 3. Støttrup, N. B., Løfgren, B., Birkler, R. D., Nielsen, J. M., Wang, L., Caldarone, C. A., ... & Nielsen, T. T. (2010). Inhibition of the malate–aspartate shuttle by preischaemic aminooxyacetate loading of the heart induces cardioprotection. Cardiovascular research, 88(2), 257-266.


Is corn syrup used in processed products extracted from genetically modified corn? Authors

Sayeda Sakina Amir Saad Rasool

Advisor

Annette Vincent

Category

Biological Sciences

Abstract Corn syrup serves to soften the texture of food, prevent sugar crystallization, add flavoring, and increase volume of food. It is largely utilized in several food dressings including ketchups, tomato pastes and mayonnaise. Some companies use genetically modified (GM) crops to make corn syrup for use in food products. For our project, we plan to use several such condiments, some of which claim to be organic, to test the presence of GM corn in these food products. In order to do this, we intend to use a set of biochemical methods, including PCR and ELISA assay, to determine the presence of certain genes and proteins which are specific for either modified or native corn.

20


“For our project, we hypothesize that corn syrup used in these food products contains genetically modified corn.”

Table 1: This table shows genes of our interest for the experiment

To do that, we will screen for the CRY3A, CRY1AB, Bar, and CP4EPSPSF genes which are specific to the corn containing food products using multiplex PCR. These genes are different in size from each other which will make it easy to observe and analyze them on the gel.

The purpose of the investigation is to use several different processed tomato products, like tomato ketchup and tomato sauces, and test for the presence of genes and proteins specific for GM corn. The positive control for the experiment was corn, while the organic tomato pastes were the negative controls, as we expected them to only contain unmodified ingredients.

Corn syrup is a food sweetener made from corn starch. Its serves to soften the texture of food, prevent sugar crystallization, add flavoring, and increase volume of food. Some companies use genetically modified (GM) corn strains to create the corn that is present in the corn syrup in food products.

Genetically Modified (GM) crops are plants that are used in cultivation and express agriculturally desirable traits like pests and herbicide resistance. These GM crops have been the topic of the decade and controversies regarding their ill effects on the environment and public health have been spreading ever since. The main GM crops that are primarily grown around the world today include corn, soy, canola, cotton, and rice1.

Corn syrup serves to soften the texture of food, prevent sugar crystallization, add flavoring, and increase volume of food. It is largely utilized in several food dressings including ketchups, tomato pastes and mayonnaise. Some companies use genetically modified (GM) crops to make corn syrup for use in food products. For our project, we plan to use several such condiments, some of which claim to be organic, to test the presence of GM corn in these food products. In order to do this, we intend to use a set of biochemical methods, including PCR and ELISA assay, to determine the presence of certain genes and proteins which are specific for either modified or native corn.

ELISA

Figure 1: The figure on the left shows the cycle that was run on the PCR machine. Note that the annealing temperature depends on the set of primers used. The lowest annealing used was 51ºC

Multiplex PCR

Figure 2: Spectrophotometric data of five food products absorbed at 230, 260, and 280nm to determine the concentration and the purity ratio. (A) Heinz Ketchup (B) Al Ahli Tomato Paste (C) KDD Tomato Paste (D) Ami Tomato Paste (E) M ayonnaise

Spectrophotometric analysis

The final step was to use ELISA for identification of CP4 EPSPS enzyme in food samples. For this procedure, we used EnviroLogix QualiPlate kit (Catalog # AP 010). Before starting the procedure, samples were incubated with the Extraction Buffer overnight.

Table 3: This table illustrates the sized of amplified fragments for each primer pair used in the experiment.

Uniplex PCR

The initial part of our project involved isolating the genomic DNA from the positive/negative controls, as well as the unknown samples. For this purpose, we used NucleoSpin Food protocol. After isolation, we performed spectrophotometric scans by measuring Absorbance values at 230, 260 and 280nm. The absorbance at 260nm corresponds to the DNA in sample and the ratio of Absorbance at 260 to Absorbance 280 is used to determine the purity of isolated DNA. An ideal ratio is between 1.7 and 1.9.

Genomic DNA Isolation & Spectrophotometric analysis

b

b

d

e

d

c

e

a. Different products of samples that were assayed b. This is the OD that was measured at 450nm. Duplicates were run. c. The corrected OD values were calculates by subtracting from the blank’s absorbance. For example, for the Heinz ketchup, the corrected absorbance is 0.018-0.003=0.005 d. This is the average of the two corrected absorbance for each sample. For example, for the Heinz Ketchup, the average is 0.003+0.007/2 = 0.005 e. The positive control ratio is calculated by dividing the mean of the corrected absorbance of each sample by that of the positive control. For example, for Heinz Ketchup, the positive control ratio = 0.005/0.8855 = 0.0056

Table 8: This table illustrates the positive control ratio of the samples that were run on ELISA at 450nm c b a d

Table 7: This table illustrates order of the samples loaded on the gel in Figure 3

ELISA

Figure 4: The five food products were loaded on a 1.2% mini agarose gel with ethidium bromide and run for two hours minutes at 100 volts in 1xTris-borate EDTA. The gel was imaged under Ultraviolet light. A 1kb ladder and 1bp ladder (NEB) was loaded on the gel to estimate the size of the products. No bands seen in any lane

Figure 4: Agarose gel analysis of the five food samples with five primer pairs

c

Table 6: This table illustrates order of samples loaded on the gel in Figure 2

b

Multiplex & Uniplex PCR

Figure 3: The five food products were loaded on a 1.2% mini agarose gel with ethidium bromide and run for two hours minutes at 100 volts in 1xTris-borate EDTA. The gel was imaged under Ultraviolet light. A 1kb ladder and 1bp ladder (NEB) was loaded on the gel to estimate the size of the products.

Figure 3: Agarose gel analysis of the five food samples with the promoter CAMV35S & Zein gene (internal control) primers

Uniplex PCR

a. Different products of samples that were assayed b. This is the concentrations calculated from the nano-drop c. The purity ratio values were obtained from the nano-drop

a

Table 5: This table illustrates the purity ratios and concentrations of the five food products calculated from the Nanodrop

b

a. Different products of samples that were assayed b. These are the scan readings measured by the spectrophotometer c. The purity ratio is calculate by dividing the absorbance at 260nm by the absorbance at 280nm.For example, for Heinz Ketchup, the purity ratio is 0.089/0.074 = 1.20. d. The purity ratios and concentrations of four of the five samples could not be determined because the absorbance readings were negative e. The concentrations were calculated by multiplying the Abs at 260nm with 50ng/ul. For example, for Heinz ketchup, the concentration is 0.089x50 = 4.45 ng/ul

a

Table 4: This table illustrates the purity ratios and concentrations of the five food products calculated from the scans

Sayeda Sakina Amir, Saad Rasool, Annette Vincent Biological Sciences Program, Carnegie Mellon University Qatar

We would like to thank Ms. Maria Bernales and Ms. Maya Kemaldean for their guidance and support throughout the course of this whole project. Additionally, we are also very thankful to our peers for their help during the laboratory sessions and result analysis.

3 Food produced from insect protected corn line MON 810. (n.d.). Retrieved March 26, 2018, from http://www.foodstandards.gov.au/code/applications/pages/applicationa346/Default.aspx

Doonan, C., & Vincent, A. (n.d.). Experimental Biochemistry, A manual for 03-344.

Key, S., Ma, J. K., & Drake, P. M. (2008, June 01). Genetically modified plants and human health. Retrieved March 13, 2018, from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2408621/ 2

1

The ELISA assay performed to detect the presence of CP4 EPSPS protein. The experiment was performed in duplicates to account for deviation in our results. Absorbance values for blank were below 0.03 and %CV was calculated to be 14.1%, which is below 15%. Therefore, we proceeded with calculating the positive control ratio. The values of Positive Control Ratio for all the samples were 0.25. Therefore, ELISA suggests that our sample contains below 0.1% of event 603 corn. However, based on PCR results, we are not sure about the presence of corn gene in our samples, thus we can not make any final conclusions about corn in our samples. Thus, we are unable make any conclusions about our hypothesis.

ELISA

In the case of uniplex PCR, we observe several bands in Lane 1, which had Heinz ketchup with CAMV35S primers. However, none of these bands correspond to the size of band that we expect for the amplified fragment (158bp). Therefore, we will consider them to be a result of non specific binding of the primer. In the case of Lane 15, which is positive control with Zein primers, we expected to see a band of 227 bp and the band we observe is very close to the expected band size. However, since it has the positive control DNA, we can only conclude that our internal control primers were successful. In the case of Multiplex PCR, we were only able to observe a high amount of primer dimers in Lane2-7 (all target primers), while lower amounts in Lane 10-18 (bar primers), which is as expected due to lower amounts of total primers used. However, we do not observe any bands, for any of the primers, which indicates that PCR conditions might have to be re-adjusted until we are able to observe a band for the internal control in the positive control DNA.

Uniplex & Multiplex PCR

The results obtained from the the spectrophotometer indicate that no DNA was isolated for all the samples except for the Heinz Ketchup. Since a smaller volume of genomic DNA available, thus, we decided to proceed with Nanodrop as it requires a very low volume. Considering the purity ratios obtained through the Nanodrop, the DNA from KDD tomato paste shows a ratio which is close to the ideal purity ratio, while all the other samples indicate protein or RNA contamination.

Spectrophotometric analysis

Is Corn Syrup used in Processed Products extracted from Genetically Modified Corn?


Classification of bacterial diversity in Qatar ballast water samples using QIIME bioinformatics pipeline Author

Mohammad Osaama Bin Shehzad

Advisors

Annette Vincent Basem Shomar, Qatar Environment and Energy Research Institute

Category

Biological Sciences

Abstract According to World Resources Institute, Qatar ranks among top five countries to be affected by water crises in all sectors (domestic, commercial) by 2040. Currently, Qatar relies on desalinating seawater to meet its demands which is why its purity must be strictly regulated. However, seawater is becoming polluted with toxic chemicals and foreign bacteria due to the disposal of ballast water from cargo ships into the sea. Cargo ships contain ballast water for their stabilization over long voyages but they are chemically treated with biocides to prevent the formation of biofilms by the bacterial community which is naturally present in the ballast water. Therefore, this project serves to look at the microbiome diversity in ballast water with the end goal to design harmless viruses called bacteriophages which will specifically regulate bacterial species in ballast water. Samples were collected from ballast water and variable regions of 16S rRNA gene in bacterial species were amplified and sequenced using Ion 16S Metagenomics Kit. Once DNA reads were obtained from FASTQ files, they were computationally analyzed using QIIME Microbiome bioinformatics pipeline. Using trained classifiers in QIIME and Greengenes databases, different bacterial species and their frequency were recorded in a taxonomy table; and a phylogenetic tree was constructed to identify the ancestral relationships between them. In addition, beta diversity analysis was also performed to measure the diversity in bacterial populations between different samples. Now that different species and their frequency are known in ballast water, bacteriophages can be designed to specifically target and regulate bacterial species in water instead of opting for toxic chemical treatments.

22


Ballast water - stabilizes cargo ships - contains foreign bacterial species - contains toxic chemicals

Ballast water

Ballast Water Samples

Solution

Persian Gulf (Qatar)

Classify bacterial species present in ballast water so that they can be treated with bacteriophages (viruses which only kill bacteria).

Sequence 16S rRNA gene in ballast water to identify different bacterial species using QIIME pipeline

Cargo ship arrives in Qatar - releases toxic ballast water into the sea

Cargo Ship

Effects on seawater in Qatar - polluted by toxins and foreign bacteria from ballast water - purity of water and its ecosystem is disrupted - challenges in purifying water for domestic/commercial purposes

Foreign seas

Ballast water from foreign seas loaded into cargo ships

OTU 3

PCoA axis 1

OTU 2

OTU 1

Phylogenetic Tree

OTU # 1

Taxonomy Table Species Species Sample Type Count Count

trained classifiers to identify types of bacteria in all OTUs using Naive Baye’s Theorem. Summarize in a Taxonomy Table

3. Apply computationally

5. Analyze bacterial diversity (beta diversity) between each sample using Bray-Curtis dissimilarity algorithm to construct a Principal Coordinate Axis (PCoA) graph

4. Construct a “phylogenetic tree” using FastTree algorithm to identify ancestral relationships between bacterial community in ballast water

OTU # 2

---AAT---GGT-----AAG---GAT-----AAC---GTT---

DNA reads

---AAT---GGT-----AAG---GAT-----TTA---GGA-----TTC---GGC---

Each FASTQ file contains thousands of DNA reads from 16S rRNA gene of bacteria in ballast water

sample1 sample2 .fastq .fastq

---TTA---GGA-----TTC---GGC-----TAT---CGC---

similar DNA reads together into “OTUs” for each sample

2. Group very

Ballast water samples ready for sequencing using Ion 16S Metagenomics Kit

Sample Sample Sample 1 2 3

1. Sequence 16S rRNA gene in ballast water samples and store DNA reads in human readable FASTQ formatted files

Methods Overview

G3

G1

G5

G2

G3

G5

G2

G4

G1

Now that we know different bacterial species in ballast water, we can design viruses called “bacteriophages” which will specifically regulate bacterial population without the use of toxic chemicals. This will preserve both purity and ecology of Qatar seawater.

Future Directions

Fig 2.0 maps the bacterial diversity among ballast water samples on a PCoA graph. Further the samples are apart, move diverse bacterial population they have.

Legend

Figure 2.0 - Beta diversity in ballast water samples

Fig 1.0 shows the ancestral relationships between different bacterial species present in ballast water samples. Higher the phylogenetic distance, further the species are apart at genus level.

G4

Figure 1.0 - Phylogenetic tree of bacterial species in ballast water samples

16S rRNA Sample Frequency variations Barcode # Count Taxonomy level k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__ G1 12 6 Oceanospirillales;f__Alcanivoracaceae;g__Alcanivorax;s__ G2 14 1 Unassigned;__;__;__;__;__;__ k__Bacteria;p__Proteobacteria;c__Epsilonproteobacteria;o__C G3 15 6 ampylobacterales;f__Helicobacteraceae;g__Sulfurimonas;s__ k__Bacteria;p__Proteobacteria;c__Epsilonproteobacteria;o__C G4 24 2 ampylobacterales;f__Helicobacteraceae;g__Sulfurimonas;s__ G5 0 53 k__Bacteria;__;__;__;__;__;__

Table 1.0 - Species count and species taxon level in ballast water samples

Results

Results & Future Directions

Carnegie Mellon University in Qatar, 2Qatar Environment and Energy Research Institute

Rank of Qatar as a Water Stressed Country by World Resources Institute (WRI). Qatar relies heavily on purifying seawater to meet its commercial and domestic water supply.

Problem

#1

Issue

Introduction

1

Mohammad Osaama Bin Shehzad1 Dr. Annette Vincent1, Dr. Basem Shomar2

Classification of bacterial diversity in Qatar ballast water samples using QIIME bioinformatics pipeline

PCoA axis 2


Identification of post transcriptional regulatory factors of PTEN expression in breast cancer cells Author

Reem Elasad

Advisor

Ihab Younis

Category

Biological Sciences

Abstract One of the most commonly dysregulated pathways in cancer is phosphatidylinositol-3-kinase (PI3K)/AKT oncogenic signaling pathway, PI3K kinase generates PIP3 at the plasma membrane, which subsequently activates AKT signaling pathway. AKT directly regulates various biological processes such as cell proliferation, survival, and growth. The tumor suppressor protein Phosphatase and tensin homolog (PTEN) directly antagonize PI3K function by dephosphorylating PIP3 to PIP2, thus PTEN abrogates relaying of PIP3 signal. The Lipid phosphatase function of PTEN contributes to its potent function as a tumor suppressor, and PTEN is yet the only known lipid phosphates that counteracts PI3K pathway. PTEN has been directly associated with initiation and progression of breast cancer through several mechanisms, including somatic and germline mutations of silencing due to methylation of PTEN promoter. However, despite the fact that the first intron of PTEN gene is a minor intron, and minor introns are known to be “molecular switches” due to the high regulation of their splicing, the minor intron regulation of PTEN in cancer has not yet been investigated. Herein, we hypothesize that in breast cancer MDA-MB231 cells, PTEN is post transcriptionally regulated via its minor intron splicing. We have analyzed expression of more than 600 RNA Binding Proteins (RBPs) relative to PTEN levels in patient samples from The Cancer Genome Atlas (TCGA). Based on RBP function, expression (Pearson value), and binding site, the list of RBPs was narrowed down to: [1] SETX a minor intron splicing enhancer, [2] EXOSC4 3’ UTR binding at AU rich motifs, [3] CSTF2 and [4] CSTF2T 3’ UTR binding at GU rich motifs, [5] HNRNPA1 alternative splicing regulatory RBP.

24



Investigating the presence of 35s promoter, CRY1A(b), Bat and Pat genes as markers for genetic modification in three commercial Zea Mays (Corn) food products Authors

Reem Elasad Dina Altarawneh

Advisor

Annette Vincent

Category

Biological Sciences

Abstract As the public interest for non-genetically modified organisms (non-GMO’s) has grown substantially, it became more demanding to meet the questioning customers with scientific proof about the integrity of food labels. This project aims to investigate the presence of genetically modified DNA fragments in commercial corn-containing foods. In order to detect GM fragments -if present-, known templates containing genetically engineered DNA will be used as markers to detect genetically engineered species. These markers are: P35S promoter, Cry1A(b), Pat and Bar genes, which have been previously inserted and expressed in crops to enhance insect resistance (Cry and Bar genes) and increase tolerance to the herbicide glufosinate ammonium (Pat and Bar genes). Enzyme-linked immunosorbent assay (ELISA) will be used as another approach to test for GMO content in food samples. The hypothesis is that genes corresponding to genetically modified corn will be detected in food products that are labeled ‘organic.’ Due to the ANZFA’s classification of Cry1b, Bar and Pat genes as safe for human consumption, there are no requirements or regulations for labelling products with these genes. In addition, cry1Ab, PAT, and Bar genes all encode essential characteristics for Maize products.

26


Investigating the presence of 35s promoter, CRY1A(b), Bat and Pat genes as markers for genetic modification in three commercial Zea Mays (Corn) food products Reem Elasad and Dina Altarawneh, Advisor: Dr. Annette Vincent

Data

Abstract As the public interest for non-genetically modified organism (non-GMO’s) has grown substantially, it became more demanding to meet the questioning customers with scientific proof about the integrity of food labels. This project aims to investigate the presence of genetically modified DNA fragments in commercial corn-containing foods. In order to detect GM fragments -if present-, known templates containing genetically engineered DNA will be used as markers to detect genetically engineered species. These markers are: P35S promoter, Cry1A(b), Pat and Bar genes, which have been previously inserted and expressed in crops to enhance insect resistance (Cry and Bar genes) and increase tolerance to the herbicide glufosinate ammonium (Pat and Bar genes). Enzyme-linked immunosorbent assay (ELISA) will be used as another approach to test for GMO content in food samples. The hypothesis is that genes corresponding to genetically modified corn will be detected in food products that are labeled ‘organic’. Due to the ANZFA’s classification of Cry1b, Bar and Pat genes as safe for human consumption, there are no requirements or regulations for labelling products with these genes. In addition, cry1Ab, PAT, and Bar genes all encode essential characteristics for Maize products.

Introduction This project use the food products mentioned above to screen for three genes widely found in genetically modified corn, cry1Ab, Bar and Pat, In addition we will screen for the promoter P35S . The cry1Ab encodes Cry1Ab, protein found in insect protected Bt-176 corn, which binds to specific receptors on the intestinal lining of lepidopteran and ruptures the cells. Insects will stop feeding within two hours of a first bite and, and if enough toxin was eaten, the insect will die within two to three days a,b.

Methods DNA isolation: In order to analyse DNA samples to assess integrity of food labels and detect GMO markers, DNA must first be extracted from food samples. DNA isolation was carried out using NucleoSpin Food DNA extraction kit (from manufacturer Macherey Nagel).200 mg of food samples was first homogenized, cell lysis was performed using 550 µL CF at 65˚C, 10µL proteinase K was added (65˚C for 30 min). This was followed by centrifugation at >10,000xg for 10 min and the clear supernatant (1 vol) was added to 1 vol C4 and 1 vol Ethanol. The DNA was binded in a spin column and washed 3 times. It was then eluted in 100µL CE at 70˚C for 5 min and centrifuged at 11,000 xg for 2 min. Spectrophotometric readings: carried scans from 200-400 nm region, in order to assess the purity and concentration of DNA extract in sample.specifically, 230nm where organic contaminants absorb, 260 nucleic acid absorption and 280nm where proteins absorb at optimum. Bead-based, Uniplex and multiplex PCR: The following PCR reactions were performed and run on 1.2% agarose gel in 1xTBE: 1) Bead-based PCR: to amplify a promoter region (35s) as a marked for genetic modification as well as an internal control Zein which is hypothesized to be included in all corn samples. 2) Multiplex PCR: was used to detect multiple genes (P35S promoter, Cry1A(b), Pat and Bar genes) The thermocycler conditions were set to the following for the multiplex PCR

ELISA: Protein was extracted by solubilizing 5 g of each food sample (Organic Cornmeal, Non-organic Cornmeal and Crispy Corn Curls) in 20 mL Extraction buffer. The protein being detected is Pat (corn) The kit used to perform ELISA was QualiPlate kit for LibertyLink PAT (Catalog# AP 014). Absorbance readings were measured at 450nm.

Data Figure 1: DNA spectrophotometer readings plotted as absorbance Vs. Wavelength (nm) for positive and negative controls, Quaker cornmeal and crispy corn curl samples. Figure 1a: Positive control Absorbance Vs. wavelength plot (200-400 nm range) at 1.0 nm interval

Figure 1c: Quaker cornmeal Absorbance Vs. wavelength plot (200-400 nm range) at 1.0 nm interval

Figure 1b: Negative control Absorbance Vs. wavelength plot (200-400 nm range) at 1.0 nm interval

Figure 1d: Crispy corn curls Absorbance Vs. wavelength plot (200-400 nm range) at 1.0 nm interval

Table 1: Absorbance readings at 230, 260, 280 nm and DNA concentration for four samples: Positive Control (GMO), Negative Control (Organic Cornmeal), Quaker Cornmeal and Crispy Corn Curls [1]

Sample Calculation:

1 Abs at 260 nm= 50µg/mL DNA concentration = 1.01 x 50 = 50.5 µg/mL

Table 2: summary of genes tested for each PCR reaction tube with their expected and observed sizes

Figure 2 illustrates a 1.2% agarose gel of PCR reactions setup with 50 ng corn DNA, PCR buffer (10 mM Tris-HCl pH 9.0, 50 KCl, 1.5 mM MgCl2), 0.2 mM dNTP, 2.5 U/bead Taq, 2.5 µL Zein3 and 2.5 µL Zein4 primers, up to 25µL total volume. 10µL + 3 µL of 6X loading dye were loaded on the gel and run in 1X TBE buffer for 80 min at 100V. Bands of sizes less than 100 bp are assumed to be primer dimers. Figure 3: Multiplex with primers to amplify 35s promoter, CRY 1A(b), Bar and Pat genes, Uniplex PCR to amplify Pat gene. Samples analysed are positive control (GMO), negative control (organic cornmeal), Quaker cornmeal, Crispy corn curls, PCR control.

Figure 3 illustrates a 1.2% agarose gel of PCR reactions setup with 50 ng corn DNA, 1X PCR buffer (100 mM Tris-HCl, 15 mM MgCl2, 500 mM KCl, pH 8.3), 0.2 mM dNTP, 2.5 U Taq, 0.2 µM 35s promoter primer pair, 0.3 µM CRY 1(A)b primer pair, 0.4 µM Bar primers, 0.5µM Pat primer pairs up to 25µL total volume. 10µL + 3 µL of 6X loading dye were loaded on the gel and runin 1X TBE buffer for 80 min at 100V. Bands with sizes of 100 bp and below are assumed to be primer dimers due their small size.

Figure 1: showing the process of ingestion of cry1b toxin by Bacillus thuringiensis.[f]

As for the Bar gene found in DBT418f corn line –an insect-protected and herbicide tolerant corn line, it encodes for two proteins Cry1Ac and PAT. The former, Cry1Ac protein produced by kurstaki, a subspecies of Bt bacterium which produces proteins have insecticidal activity against lepidopteran insects. Meanwhile PAT protein expresses enzyme phosphinothricin acetyltransferase enzyme which inactivates phosphinothricin (PPT), the active component of glufosinate ammonium c. Moreover, Pat gene is found in Corn line T25, a genetically modified corn tolerant to glufosinate ammonium herbicide, which also encodes PAT protein d. For genetically modified foods we expect to find at the promoter P35S in all samples, because it is the most widely used promoter, meanwhile we expect to least one gene of the three mentioned above in the food samples being tested. However, it is a possible to observe the promoter only and none of the genes three genes (cry1Ab, PAT and Bar) because there are more than 299 e GM corn events up to date with different gene inserts..

Primers used and their annealing temperatures (˚C)

Figure 2: bead-based PCR for the 35s Promoter/and Zein genes on 1.2% agarose gel

Figure 4: ELISA readings at 450 nm of four food samples and a blank. ELISA was performed in duplicates to detect Pat gene using QualiPlate kit for LibertyLink PAT (Catalog# AP 014) and a microplate reader.

Analysis Figure 2 and table 2: Bands in lane 1, 4, 5, 6 and 8 all have the band expected, but there are also additional bands such as that of size 345 bp in lanes 1 and 4 which can be due to non-specific 35s promoter binding to DNA. It is accepted that there are bands <100 bp which may correspond to primer dimerization. Figure 3 and table 2: show that multiplex PCR was successful in the sense that the PCR reaction did occur, but the uniplex PCR did not show any bands (Lanes 6 through 10). This could be due to having an annealing temperature that was higher (52˚C) than the optimum (49-50˚C) for the Pat primers to anneal to template DNA as a result of having to use a machine collectively with a group that required that high annealing temperature. Lane 1 in figure 3 did not show results which could be a result of not adding primers or adding too little DNA sample. The expected size for Pat was shown in lanes 2,3 and 4, but the other genes in the multiplex did not show any positive results. A possible reason is that the annealing temperature chosen was not optimal for all the genes, therefore PCR wasn’t successful. A follow up would be to create an annealing temperature gradient to find the optimal temperature for all 4 genes. ELISA, Figure 4: The ELISA test showed negative results for all samples teste, as shown by an average-blank value <0.2 Thus, from ELISA only, it can be concluded that the samples are all non-GMO as PAT protein was not detected based on the low absorbances measured. Certainty of ELISA is not reliable and the procedure should be repeated to troubleshoot. The labelling of the food products can be summarized as follows (based on PCR data only, excluding ELISA): 1) GM event NK603 corn (positive control): Herein, the multiplex PCR in figure 3 is inconclusive due to absence of bands. Meanwhile bead-based PCR in figure 2 clearly shows a band of size 123bp, which corresponds to the expected size of 35s promoter. Presence of promoter, is an indicator of genetic modification, hence confirming that the positive control is indeed a GMO. In figure 2, another band of an unexpected size of 345 bp is observed, which is due to non-specific amplification of another DNA stretch by the 35s promoter primers. 2) Negative control: claimed to be organic cornmeal: The bead-based PCR in figure 2 does not show the expected band (lane 2) however in lane 6 the a band of size very similar to the internal gene (maize, size 277 bp) is observed indicating that PCR was successful and absence of band in lane 2 due to the absence of p35s gene. However, multiplex PCR in figure 3 shows a band of size 186 bp corresponding to expected size of Pat, thus the sample is to be genetically modified, yet the insert cassette has a promoter other than p35s. To conclude, the results obtained contradict the food label that classifies the food product is organic. 3) Quaker corn meal: this product doesn’t have an ‘organic’ or ‘non-GM’ label. Results from multiplex PCR shows that Pat was present in the sample (186 bp), indicating that the product is genetically modified. Results from bead-based PCR in figure 2 is inconclusive, because reaction mixture did not enough DNA concentration to successfully amplify target genes, shown by absence of band in lane 7 for the maize internal gene. 4) Crispy corn curls: this product doesn’t have an ‘organic’ or ‘non-GM’ label. Results from the multiplex PCR show that Pat was present in the sample, where a band of 186 bp (expected size of Pat), indicating that the product is genetically modified. Results from bead-based PCR in figure 2 shows presence of additional GMO marker, the 35s promoter with a size 123bp. There’s an unexpected band size of 345 bp due to non-specific amplification.

Conclusion From multiple tests including bead-based, uniplex and multiplex PCR as well ELISA analysis, it can be concluded that the hypothesis stating that some ‘organic’-labelled food products will contain GM genes may hold true. This is seen from the negative control (cornmeal) which was labelled organic but multiplex PCR showed bands corresponding to Pat gene. In order to confirm these results, duplicates and repeats of experiments including ELISA. To optimise results, uniplex PCR with Pat primers can be performed again with the optimal annealing temperature (49-50˚C) and a multiplex PCR with annealing temperature gradient should also be performed.

References a) b) c) d) e) f)

Chowdhury, E. H., Kuribara, H., Hino, A., Sultana, P., Mikami, O., Shimada, N., . . . Nakajima, Y. (2003). Detection of corn intrinsic and recombinant DNA fragments and Cry1Ab protein in the gastrointestinal contents of pigs fed genetically modified corn Bt11 1. Journal of Animal Science, 81(10), 2546-2551. doi:10.2527/2003.81102546x Gewin V (2003) Genetically Modified Corn— Environmental Benefits and Risks. PLoS Biol 1(1): e8. https://doi.org/10.1371/journal.pbio.0000008 Application A380 - Food protected from insect-protected and glufosinate ammonium-tolerant DBT 418 corn. (n.d.). Retrieved March 26, 2018, from http://www.foodstandards.gov.au/code/applications/pages/applicationa380/index.aspx Wehrmann, A., Vliet, A. V., Opsomer, C., Botterman, J., & Schulz, A. (1996). The similarities of bar and pat gene products make them equally applicable for plant engineers. Nature Biotechnology, 14(10), 1274-1278. doi:10.1038/nbt1096-1274 Advanced Search: 229 events found. (n.d.). Retrieved March 26, 2018, from http://www.isaaa.org/gmapprovaldatabase/advsearch/default.asp?CropID=6&TraitTypeID=Any&DeveloperID=Any&CountryID=Any&ApprovalTypeID=Any Bacteria - 2015 03-25 (AGB 12022). (2015, March 27). Retrieved March 26, 2018, from https://www.slideshare.net/Suvanthinis/2015-0325-agb-12022


MAPK14 splicing as a novel biomarker in regulating breast cancer Authors

Nourhan ElKhatib Ettaib El Marabti

Advisor

Ihab Younis

Category

Biological Sciences

Abstract Splicing in eukaryotes is the removal of intervening sequences, called introns, from the pre-mRNA. The majority of introns are spliced out by small nuclear ribonuleoproteins (snRNPs) which are part of the spliceosomes. A small number of introns called ‘minor’ introns have been identified, and they comprise less than 0.4% of all introns in humans. These introns have unique sequences that are recognized by U11, U12, U4atac,U5 and U6atac snRNPs, which are much less abundant than the ‘major’ spliceosomes. Such specialized introns have been shown to provide a unique opportunity for regulation of the genes that contain them. Minor intron-containing genes play key roles in cell cycle, transformation, DNA damage repair and signal transduction. Misregulation of any of these functions is associated with diseases, including cancer. Breast cancer is one of the leading causes of cancer mortality in females worldwide. Qatar’s incidence rate of breast cancer is dramatically increasing. Using RNA seq data of 1200 breast cancer samples, we have previously shown that splicing of some minor intron-containing genes, including MAPK14, is dysregulated. MAPK14 gene encodes p38MAPK protein, which is a stress-induced mitogenactivated protein kinase that functions in relaying different extracellular stimuli and plays a key role in metastasis, a hallmark of cancer. To identify the molecular mechanisms that regulate the expression of MAPK14 post-transcriptionally, specifically the splicing of introns in breast cancer cells, we used Antisense Morpholino Oligonucleotides (AMOs) to regulate MAPK14’s minor intron splicing. We then assessed the effect on breast cancer cell behavior, using MTT and migration assays. We also used RNA-seq data and computational analysis to curate a database of RNA binding proteins that could potentially impact minor intron splicing in breast cancer. In conclusion, we identified MAPK14’s minor intron as a novel biomarker that is regulated in breast cancer cells. Our data also sheds a new light on a positive role, yet unknown mechanism in carcinogenesis.

28


Abstract

Figure 1: Visualization suggesting that one minor intron is sufficient to be detrimental to mRNA. Low levels of U6atac or RBPs may lead to degradation or production of an alternatively spliced isoform, while high levels of U6atac leads to immediate splicing and production of mRNA that can be translated into a functional protein. This may be during transcription by RNA pol II or post-transcriptionally by MAPK14.

To identify the molecular mechanisms that regulate the expression of MAPK14 post-transcriptionally, the splicing of minor introns

Introduction

1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0

Control AMO MAPK14 (5uM)

MAPK14 (10uM)

0

0.5

1

1.5

2

2.5

Control AMO

MAPK14 (5uM)

MAPK14 (10uM)

Experimental design of the project on MDAMB231 triple negative breast cancer cells. The control and MAPK14 AMOs were used to optimize the concentrations that should be used and the time at which transfection has the highest effect on splicing of MAPK14 and the production of a functional p38MAPK. Control, MAPK14 and U6atac were also used to determine the effect of inhibiting splicing of minor intron-containing genes on the cells’ metabolic activity and metastasis.

Methodology

48hr-control

48hr- MAPK14 AMO

MAPK14 AMO

U6atac AMO

Control AMO

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Day 1

Day 1

Day 2

Day 2 Day 3 Day 4 Days after Transfection

Day 5

Day 3

MAPK14

U6atac

Control

Figure 5: Migration of MDAMB-231 cells after being transfected with different AMO treatments over a period of 3 days. Two million cells were transfected by 1mM control, 0.1mM U6atac and 1mM MAPK14 AMOs. Pictures were taken using EVOS Digital Microscope at 20% light saturation and 4x objective.

Figure 4: Growth Curve of MDAMB-231 cells after transfecting them with AMOs and measuring cell death over 5 days through the MTT assay. Cells treated with control, U6atac and MAPK14 AMOs then plated on 96-well plate, adding 5000 cells into each well. Absorbance was measured using a microtiter plate reader at 550nm. Error bars of 3 replicates except for day 5 of the control.

Figure 3: RTPCR MAPK14 full panel representing splicing efficiency of its minor intron 8. RNA samples extracted from MDAMB-231 cells 24 and 48 hours after transfection after transfection with 10uL of 1mM AMOs were converted to cDNA and amplified with different primers then run through a 1.5% agarose Midi-gel.

0

24hrs- MAPK14 AMO

1

0

24hrs-control

2

0.5

1.5

1 0.5

1.5

2

2.5

Figure 2: RTPCR of MAPK14 full panel representing the splicing efficiency of its minor intron 8. RNA samples extracted from MDAMB-231 cells 24 and 48 hours after transfection after transfection with different concentrations of 1mM AMOs- 5 and 10µL were converted to cDNA and amplified with different primers then run through a 1.5% agarose Midi-gel.

Absorbance (550nm)

Splicing in eukaryotes is the removal of intervening sequences, called introns, from the pre-mRNA. The majority of introns are spliced out by small nuclear ribonuleoproteins (snRNPs) which are part of the spliceosomes. A small number of introns called ‘minor’ introns have been identified, and they comprise less than 0.4% of all introns in humans. These introns have unique sequences that are recognized by U11, U12, U4atac,U5 and U6atac snRNPs, which are much less abundant than the ‘major’ spliceosomes. Such specialized introns have been shown to provide a unique opportunity for regulation of the genes that contain them. Minor intron-containing genes play key roles in cell cycle, transformation, DNA damage repair and signal transduction. Misregulation of any of these functions is associated with diseases, including cancer. Breast cancer is one of the leading causes of cancer mortality in females worldwide. Qatar’s incidence rate of breast cancer is dramatically increasing. Using RNA seq data of 1200 breast cancer samples, we have previously shown that splicing of some minor intron-containing genes, including MAPK14, is dysregulated. MAPK14 gene encodes p38MAPK protein, which is a stress-induced mitogenactivated protein kinase that functions in relaying different extracellular stimuli and plays a key role in metastasis, a hallmark of cancer. To identify the molecular mechanisms that regulate the expression of MAPK14 post-transcriptionally, specifically the splicing of introns in breast cancer cells, we used Antisense Morpholino Oligonucleotides (AMOs) to regulate MAPK14’s minor intron splicing. We then assessed the effect on breast cancer cell behavior, using MTT and migration assays. We also used RNAseq data and computational analysis to curate a database of RNA binding proteins that could potentially impact minor intron splicing in breast cancer. In conclusion, we identified MAPK14’s minor intron as a novel biomarker that is regulated in breast cancer cells. Our data also sheds a new light on a positive role, yet unknown mechanism in carcinogenesis. Normalized to GAPDH

Nourhan ElKhatib, Ettaib El Marabti, Ihab Younis Carnegie Mellon University, Biological Sciences Program, Qatar

Conclusions

Acknowledgements

Unspliced Ex8-Int8 was high initially in MDA-MB-231 (Fig. 2) Splicing inhibition of minor intron 8 was more efficient with 10uM of MAPK14 AMO as compared to 5uM (Fig. 2) Spliced Ex8-Ex9a was not present after 48hrs. (Fig. 3) U6atac AMO leads to the lowest cell proliferation as it inhibits splicing of MAPK14 and 600 other genes (Fig. 4) Inhibiting MAPK14 decreases cell proliferation and viability at day 5 (Fig. 4) Inhibiting MAPK14 splicing by MAPK14 AMO slows down MDAMB-231 migration (Fig. 5) MAPK14 AMO was also effective at day 5 as no functional protein was made (Fig. 6)

1.Introns and Exons: mRNA Processing. (n.d.). Retrieved November 18, 2017, 2.Younis, I., Dittmar, K., Wang, W., Foley, S. W., Berg, M. G., Hu, K. Y., ... & Dreyfuss, G. (2013). Minor introns are embedded molecular switches regulated by highly unstable U6atac snRNA. Elife, 2, e00780. 3.Liang, N., Zhong, R., Hou, X., Zhao, G., Ma, S., Cheng, G., & Liu, X. (2015). Ataxia-telangiectasia mutated (ATM) participates in the regulation of ionizing radiation-induced cell death via MAPK14 in lung cancer H1299 cells. Cell Proliferation,48(5), 561-572. doi:10.1111/cpr.12203 4.Study exploring breast cancer screening practices amongst Arabic women living in the state of Qatar. (n.d.). Retrieved August 20, 2017 5.(n.d.). Retrieved March 19, 2018, from http://lulab.life.tsinghua.edu.cn/clipdb/targetsearch.php

References

Dr. Mohamed Bouaouina , Maria Brenadette Bernales, Maya Kemaldean, Nada Abdul Khalique. Funded by Qatar Foundation and Carnegie Mellon University Qatar Seed Qrant

v v v

v v

v v

v ELAV1 binds to the 3’-UTR region of mRNAs to increase their stability. It is activated by MAPK14 gene. v CPSF6 and CSTF2 are required for 3’ RNA cleavage and polyadenylation processing. v AGO2 is required for RNA-mediated gene silencing (RNAi) by the RNA-induced silencing complex (RISC)

Figure 6: Western Blot on proteins extracted from MDAMB231 cells transfected by control, U6atac and MAPK14 proteins after AMOs after day 5 with Laemmli buffer. Proteins were heated at 95ºC, spun down and loaded on SDS-PAGE gel with 12.5% resolving and 4% stacking gels then run at 120V for 1.5hrs. Gel was transferred to a nitrocellulose membrane and stained overnight with antiMAPK14 antibody, anti-p-MAPK14 and rabbit anti-HPRT. Membrane was then stained with antirabbit and anti-mouse and imaged.

MAPK14 splicing as a Novel Biomarker in Breast cancer

Normalized to GAPDH


Analysis of genetically modified (GM) marker genes in maize-based products using multiplex PCR and ELISA Authors

AlReem Johar Farah AbdelAziz Mohammad Osaama Shehzad

Advisor

Annette Vincent

Category

Biological Sciences

Abstract The genetic modification of crops allows them to survive harsh environments, growing in larger masses with higher nutrition value. While genetically modified (GM) food crops like maize, soy and rice have selective advantage, they have also sparked debate about whether they are safe for human consumption. Therefore, this project aimed to test two maize-based products, Ortega Yellow Corn Taco Shells and Libby’s Whole Kernel Sweet Corn, for the presence of GM markers at both protein and gene level. It is hypothesized that both the canned corn and taco shells would be GM as both food products originate from non-European Union countries which do not hold as strict controls over GM food as European Union countries. Samples were homogenized using NucleoSpinŽ Food kit and their DNA was purified followed by PCR amplification of CamV35S promoter which is the most commonly found promoter in GM maize food. Multiplex PCR amplification of genes cry9B, zein, and pat, which are found in GM events in corn was also performed. At protein level, ELISA was performed to test for the presence of PAT protein expressed by pat. It was found that both Taco Shells and Canned Corn were GM based on PCR as bands were observed at 186 bp corresponding to pat gene, yet no results were obtained from ELISA which could have been due to low concentration of PAT or its degradation and denaturation. Thus, the combination of both gene and protein based experiments allowed for the detection of GM markers present in both food products as matched by the hypothesis.

30



Life bacterial detection using RNA extraction from ballast water sample Author

AlReem Johar Advisor Annette Vincent

Category

Biological Sciences

Abstract Previous studies in research have demonstrated the presence of bacteria diversity in ballast water samples after being treated for safe reuse. Yet, it remains difficult to judge, whether the remaining bacteria are viable or not. Therefore, this project is aimed at increasing our knowledge about the viability of the bacterial diversity presence in ballast water through detecting their RNA using RT-PCR. The detected RNA will then be correlated with the number of viable cells recorded previously to make a clear relationship between the presence of bacterial microorganisms and their variability in ballast water after treatment strategies.

32


Viable Bacteria Detection Using RNA Analysis From Ballast Water Samples Alreem Ahmed Johara, Basem Shomarb and Annette Vincenta aBiological

Sciences Program, Carnegie Mellon University in Qatar; b Qatar Energy and Environment Research Institute (QEERI)

Data

Abstract

Previous studies in research have demonstrated the presence of bacteria diversity in ballast water samples after being treated for a safe reuse. Yet, it remains difficult to judge, whether the remaining bacteria are viable or not. Therefore, this project is aimed at increasing our knowledge about the viability of the bacterial diversity presence in ballast water through detecting their RNA using RT-PCR. The detected RNA will then be correlated with the number of viable cells recorded previously to make a clear relationship between the presence of bacterial microorganisms and their variability in ballast water after treatment strategies.

Introduction

To reduce the global spread of invasive aquatic species, international regulations require reductions of the number of organism in ballast water discharged by ships. Thus, different treatment systems were developed and approved by an international procedure. However, these treatment systems are not able to remove bacterial microorganisms present in the ballast water samples. Thus, this research aims to examine the viability of bacterial microorganism present in water samples of ballast treatment. To achieve this goal in distinguishing between the viable and non-viable microorganisms, the RNA of these microorganisms will be used as a viability indicator in a RT-PCR technique that detects the expression of RNA.

1

2

3

4

5

Results

Extraction of DNA and RNA from ballast water samples was an arduous task. Upon collection, ballast water samples were immediately filtered through 0.22µm membrane and membranes were stored at -80˚C until DNA and RNA extraction.

6

Figure 1: Assessing DNA quality using PCR. The 16S rRNA gene was amplified using specific primers (IDT, US) from 5ng of DNA. The DNA was isolated using the DNeasy mini kit (Qiagen, US) from ballast water samples filtered through 0.22µm membrane. Presence of 1.5kb product correlates with 16S. Lane 1: 1kb ladder; Lane 2: no DNA control; Lane 3-6: 16S PCR products from ballast water DNA. 1

2

3

4

Initial assessment of DNA quality was important to allow for better sequencing data. From the results, the successful amplification of the 16S rRNA region allowed for further analysis using 16S targeted sequencing via next generation sequencing. The data will provide an insight into the microbial communities existing the in ballast water samples. However, DNA analysis alone may not give an accurate assessment of viable bacteria within the community. Hence, analyzing the RNA and quantitating it via real-time PCR will give us a quantitative estimate of the relative viability of bacteria within the ballast water. Successful amplification of the cDNA suing 16S rRNA specific primers shows that the RNA quality extracted was of high enough grade to be analysed by real time PCR. The cDNA will be amplified using 16S primers to be sequenced.

5

Conclusion

Method

Using the RNeasy and DNeasy mini kits, good quality DNA and RNA was extracted from ballast water samples. This will be used for further analysis using next-generation sequencing technology and real time PCR to provide insight into the viable microbial community that inhabit the ballast water samples

Test bacteria/Ballast water

With this information, treatment strategies to combat the contamination of the waters off Qatar could be employed to limit devastation brought about by foreign vessels.

Real-time PCR

Figure 2: Assessing RNA quality using PCR. The total RNA was isolated using the RNeasy mini kit (Qiagen, US) from ballast water samples filtered through 0.22µm membrane. RNA extracted was converted to cDNA and the quality was assessed via PCR using primers specific to 16S rRNA and the rrnH operon of Escherichia coli K12. Presence of 1.5kb product correlates with 16S. Lane 1: 1kb ladder; Lane 2: 100bp laddder; Lane 3: rrnH operon of Escherichia coli K12. Lane 4: no DNA control; Lane 5: PCR products from ballast water DNA.

References Jahn C.E et al. Evaluation of isolation methods and RNA integrity for bacterial RNA quantitation. Journal of Microbiological Methods 75 (2008) 318–324 Revetta RP et al. 16S rRNA Gene Sequence Analysis of Drinking Water Using RNA and DNA Extracts as Targets for Clone Library Development. Curr Microbiol (2011) 63:50–59 Werner JJ, et al. Impact of training sets on classification of high-throughput bacterial 16S rRNA gene surveys. ISME J (2012). 6:94-103.

Acknowledgement

Behind this successful undertaking is the blessing and guidance. This formal piece of acknowledgment may not be sufficient to express my gratitude and deep respect experienced during the working process. I would like to thank would like to thank Mrs. Bernadette and Mrs. Maya Kemaldean for their help and her supervision during lab work, and their great support! I would also like to thank Drishya George for her help.


Truncations of Drs1 arms provide insight into their possible functions Authors

Muhammad Nahin Khan Dona Ferdinando Samanda Valente Jelena Micic

Advisors

John Woolford, Carnegie Mellon University

Category

Biological Sciences

Abstract Ribosomes are sophisticated devices that build all the essential machinery of our cells: the proteins. Ribosomes are built by other proteins called assembly factors. Drs1 is one such assembly factor. This poster documents attempts at understanding potential functions of drs1 arms by mutating them at various points. Drs1 was found to be involved in different stages of ribsome biogenesis pathway and its different termini arms were found to have different specifc functions.

34


Cytoplasm

C-terminus arm

• Who is recruiting Drs1 and how?

• What kind of role does Drs1 play in assembly?

Research Questions: • When does Drs1 come into the assembly line?

 Diseases that are caused by incorrect assembly, known as ribosomopathies.  By studying ribosome assembly, ribosomopathies can be better understood and potential cures explored. • Model organism used in this research: yeast

• The motivation for studying ribosome assembly:

• Therefore, a protein may be involved in its recruitment.

• Despite not having amino acid sequences capable of binding to RNA specifically, Drs1 must bind to a specific RNA region of the preribosome.

• An “arm” refers to a stretch of amino acid sequences that are disordered. • A disordered chain of amino acids can do anything in a cell: potentially interesting function for the Drs1 arms.

Figure 2: Structure of Drs1 assembly factor. Note the disordered “arms” at the Nterminus and the C-terminus, with a globular structure in the middle.

N-terminus arm

• Therefore, Drs1 is an ATPase that has two “arms”:

Figure 2: Result from DISOPRED analysis of Drs1 amino acid sequence. Note the disordered sections shown at the N-terminus and C-terminus of the protein. (Source: DISPOPRED by UCL).

• Structural analysis of the drs1 gene using bioinformatics tools reveals the predicted disordered arms of the Drs1 protein (see Figure 2).

• Our protein of interest: Drs1, an assembly factor thought to be involved in the removal of the “foot” of the ribosome. Drs1 is an ATPase.

Figure 1: Overview of 60S ribosome subunit assembly. Over 75 known assembly factors come in and out of the preribosome before it reaches full maturity in the cytoplasm. (Source: Woolford Laboratory webpage)

Nucleoplasm

Figure 5: Spotting plates highlighting the effects of the mutations on growth of cells at different temperatures and media

Spotting was conducted to visually determine the significance of each truncation on the rate of growth of the yeast cells at different temperatures.

Figure 3: The resultant mutations from sitedirected mutagenesis

C-Leu+Gal C-Leu C-Leu+Gal C-Leu C-Leu+Gal C-Leu

Mutation N1 Mutation N1 Mutation N2 Mutation N2 Mutation C1 Mutation C1

6 hours 6 hours 5.5 hours 4 hours 7.5 hours 7 hours

Doubling Time

Tang L, Sahasranaman A, Jakovljevic J, Schleifman E, Woolford Jr JL. (2008) Interactions among Ytm1, Erb1, and Nop7 Required for Assembly of the Nop7-Subcomplex in Yeast Preribosomes. Ripmaster T, Vaughn G, Woolford JL Jr. (1993) DRS1 to DRS7, Novel Genes Required for Ribosome Assembly and Function in Saccharomyces cerevisiae. Miles,TD, Jakovljevic, J, Horsey, E, Harnpicharnchai, P, Tang, L, and Woolford, JL, Jr. (2005) Ytm1,Nop7,and Erb1 Form a Complex Necessary for Maturation of 66S Preribosomes.

References

Figure 8: Western blot highlighting the levels of the subcomplex Ytm1, Erb1 and Nop7 along with assembly factors Tif6 and Bud20 for each mutation and the depletion

Ytm1

Erb1 Nop7

Rea1

Figure 7: Gel showing concentrations of Ytm1, Erb1, Nop7 for yeast with a) Drs1 depletion and b) mutation C1

Ytm1

Erb1 Nop7

Rea1

The antibodies used bound specifically to the subcomplex of Ytm1, Erb1 and Nop7 and also showed levels of the middle stage assembly factor Tif6 and late stage assembly factor Bud20 for each mutation. This gave an indication of the function of each Drs1 arm in blockage of the assembly pathway at different stages.

5- Western blot reveals changes in assembly factors levels.

The doubling times were measured and it was found that the C1 mutation had a significantly slower doubling time.

Media

Strain

Figure 6: Doubling times for 3 mutations in galactose and glucose media

Mutation N1 has a cold sensitive phenotype. Mutation N2 has no obvious phenotype. Mutation C1 has a cold sensitive and temperature sensitive phenotype. 4- Protein purification with Nop7-TAP tag 3- Grow cultures in liquid media to find doubling times a) b)

The drs1 gene is expressed from a plasmid under control of the GAL promoter. Therefore in glucose media it is only the mutant drs1 that is expressed.

Figure 4: Illustration of the effects of the Gal promoter in galactose and glucose media

2- Plate to study mutants growth

Complete N terminus (N1) Partial N terminus (N2) Complete C terminus (C1)

-Drs1 gal

Nucleolus

Three plasmids with different deletions were designed using site-directed mutagenesis. The resultant plasmids were delivered to the cells.

1- Site directed mutagenesis

Methods and Results

-Drs1 glu

• Ribosomes are sophisticated devices that build proteins. • Ribosomes are built by proteins called assembly factors, of which there are more than 75 that can be classified into 19 families of proteins (Konikkat, S. and Woolford,J.L.Jr, 2016).

-Drs1 gal

Introduction

Tif 6

Bud20

Late blockage

Partial blockage

Complete blockage

Repeat protein purifications and western blot with yeast cultures shifted to 13˚C. Perform mass spectrometry to observe precise changes in the amounts of most of the ~75 ribosomal assembly factors. Perform affinity chromatography to search for proteins that bind to the arms of Drs1. Perform UV-Crosslinking to see if RNA binds directly to a Drs1 arm. Repeat all experiments to show consistency in results.

Future Directions

The conclusions formed in this project highlight some of the potential roles played by the arms of Drs1. In the broader picture, it helps form a step closer to fully understanding ribosomal assembly and potentially finding cures to diseases that are caused by disruption in ribosomal assembly.

Figure 10: Stages of involvement of Drs1 arms are different. (a): Lack of Drs1 in the cell causes failure of recruitment of Tif6 and Bud20. (b): Lack of Drs1 N-terminus arms results in partial blockage of Tif6 and Bud20 recruitment. This represents a fraction of the ribosome biogenesis failing. (c): Mutation in the C-terminus arm allows recruitment of Tif6 and Bud20. Ribososme biogenesis terminates at a late stage.

Mutation C1

Mutation N1 Mutation N2

Drs1 Depletion

This is summarized in Figure 10:

• Mutation C1 appears to block the assembly pathway later than the entry of Tif6 and Bud20. This was suggested by the intensities of the two bands being nearly equal for mutation 3 in galactose and glucose media.

• Mutation N1 and mutation N2 strains appear to block the pathway at the same location, but with partial blockage. We observe this with the bands of the two assembly factors appearing with less intensity in glucose as compared to galactose.

• Without Drs1, the ribosome biogenesis pathway gets blocked before Tif6 and Bud20 can enter. No bands were seen for Tif6 and Bud20 in wildtype yeast strain without Drs1 (depletion) grown in glucose media in the western blot gel.

Secondly, it is possible that Drs1 may be blocking ribosomal assembly pathway at different stages when mutated. This conclusion was derived from the western blot results using the following observations:

Figure 9: Possible functions of the arms of Drs1. The C-terminus arm binds to a putative protein X, which binds to a specific portion of the ribosomal RNA and in turn helps Drs1 in being recruited to the correct part of the ribosomal RNA. The N-terminus arm may then be involved in RNA manipulation.

This is summarized by Figure 9:

• The C-terminus Drs1 arm that was cut off is significant with regards to its function. • The second-half portion of the N-terminus arm is involved in RNA manipulation during ribosomal assembly. • Overall, the different effects seen when the C-terminus and N-terminus arms are cut off suggest that the two arms may be involved in binding to different things.

Spotting assays (Figure X) and the silver stain gels (Figure X) show that:

Theories formed with regards to the functioning of Drs1 assembly factor in ribosome biogenesis:

Discussion

M. Nahin Khan Dona Ferdinando Samanda Valente Jelena Micic Dr John Woolford

Truncations of Drs1 Arms Provide Insight into Their Possible Functions

-Drs1 glu

HUHNNNNNNHGGGHHHHH?!


Testing for the presence of genetic modifications in common corn products — tortilla chips and corn flour Authors

Aya Nour Fatema Abdul Salik

Advisor

Annette Vincent

Category

Biological Sciences

Abstract Food crops are increasingly genetically modified due to the need for increased productivity. But how often do people realize that the food items they consume contain Genetically Modified Organisms (GMOs)? Our research aims to determine the presence of genetic modifications in corn products. According to the Food and Agriculture Organization of the United Nations, the total production of corn in the world in the year 2016 was 1.06 billion tonnes. Moreover, according to the United States Department of Agriculture Economic Research Service, up to 92% of the corn produced in the year 2017 in the US was genetically modified. Thus, we hypothesized that the corn products that we analyzed are genetically modified. We analyzed four products: a positive GM corn control, a negative organic canned sweet corn control, corn flour, and tortilla chips to ascertain genetic modification of the corn crop present in these products. We performed a DNA analysis on the products to determine the presence of genetic modification through carrying out uniplex PCR reactions to test for the presence of the P35S promoter, the maize invertase internal control, and the pat gene insert. We also carried out multiplex PCR reactions to test for the presence of the 35S promoter, the maize invertase gene, and the pat and Cry1A(b) gene inserts simultaneously in one reaction tube. We performed a protein analysis through carrying out an ELISA assay to determine the presence of the PAT protein. Our results show that tortilla chips are genetically modified with the pat gene; however, no conclusions could be made on our corn flour and negative organic control samples.

36



Studying phosphorylation of Kindlin F1 loop and interactions with protein partners Author

Saad Rasool

Advisor

Mohamed Bouaouina

Category

Biological Sciences

Abstract The Kindlin family of proteins is comprised of 3 homologues, Kindlin 1,2 and 3. All these homologues are composed of a FERM domain, which consists 4 sub domains, F0, F1, F2 and F3. Mutations in these proteins have been linked to conditions such as Kindler syndrome, leukocyte adhesion deficiency, cancer and other acquired diseases. Based on our existing knowledge, we know that different Kindlin subdomains are required for kindlin-mediated integrin activation. However, a detailed mechanism of kindlin function is still missing. For this project, our focus remains on studying specific phosphorylated residues on the F1 loop of Kindlin to understand their importance in Kindlin function. In addition, we will also use biochemical techniques to determine potential protein binding partners to Kindlin F1 loop. These results will allow us to develop a better understanding of kindlin protein-protein interactions required for its integrin activator function.

38



Investigating oxidative stress induced by aspartame in human embryonic kidney cells Authors

Fatema Abdul Salik Reema Subeh

Advisor

Annette Vincent

Category

Biological Sciences

Abstract Previous research showed that gestational diabetes affected three to five percent of all pregnancies, but new, more rigorous diagnostic criteria puts the number closer to 18%. As such, managing blood glucose levels becomes more crucial to minimize the baby’s chances of developing complications such as electrolyte abnormalities and jaundice. Besides gestational diabetes, studying the effect of aspartame is essential due to the high rate of diabetes in Qatar’s population (293,100 diabetic patients reported in 2015). Many people are dependent on aspartame as an alternate for table sugar as a means of regulating their sugar intake. For the above reasons, we investigated the effect of aspartame on human embryonic kidney cells (HEK293) in our research.

40


(a) MTT Assay

(b) Flow Cytometry

Several techniques were used to assess the increase in reactive oxidative species (ROS) with and without the presence of aspartame. Also, the activity of the malate aspartate shuttle was manipulated by introducing aminooxyacetic acid (AOA) as a control. This is because the malate aspartate shuttle produces ROS as one of its functions and is inhibited by AOA. These objectives were tested using microplating, fluorescence microscopy and flow cytometry upon exposure to two oxidative fluorescent dyes: Mitosox red for superoxides and Dichlorofluorescein-diacetate (DCFHDA) for total ROS.

Methodology

For the above reasons, we investigated the effect of aspartame on human embryonic kidney cells (HEK293) in our research.

Previous research showed that gestational diabetes affected three to five percent of all pregnancies, but new, more rigorous diagnostic criteria puts the number closer to 18 %. As such, managing blood glucose levels becomes more crucial to minimize the baby's chances of developing complications such as electrolyte abnormalities and jaundice. Besides gestational diabetes, studying the effect of aspartame is essential due to the high rate of diabetes in Qatar’s population (293,100 diabetic patients reported in 2015). Many people are dependent on aspartame as an alternate for table sugar as a means of regulating their sugar intake.

Introduction These results show how cell viability changes when incubated with various aspartame concentrations. Interestingly, cell viability decreases when HEK cells are incubated with 100ug/ml aspartame and increases with 250ug/ml aspartame. However, cell viability results with 250ug/ml aspartame are unreliable due to the large standard deviation. Moreover, the Vincent lab has collected data that shows that cell viability of HEK cells when incubated with 250ug/ml aspartame for 30 minutes decreases drastically and at this aspartame concentrations most HEk cells undergo cell apoptosis.

MitoSOX Red

DCFHDA

Fluorescent Dye Used to Measure Oxidative Stress

Flow Cytometry

Insignificant

Insignificant

Fluorescence Microscopy

Insignificant

Flow Cytometry

Insignificant

Insignificant

Fluorescence Microscopy

Microplate Reader

Insignificant

Significance of Increase in Oxidative Stress

Microplate Reader

Techniques Used to Assess Oxidative Stress

Table 1: Analysis for Significance of Oxidative Stress for Addition of Aspartame to HEK293 Cells Using Different Techniques and Fluorescent Dyes

(c) MTT Assay Results for HEK293 Cells Incubated with Varying Aspartame Concentrations

Results & Discussion

> Stanley, L. (2013). External scientific report for EFSA. Review of data on the food additive aspartame. Retrieved from https://efsa.onlinelibrary.wiley.com/doi/10.2903/sp.efsa.2013.EN-399

> Wang, C., Chen, H., Zhang, M., Zhang, J., Wei, X., & Ying, W. (2016). Malate-aspartate shuttle inhibitor aminooxyacetic acid leads to decreased intracellular ATP levels and altered cell cycle of C6 glioma cells by inhibiting glycolysis. Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/27157912

> Qatar. (n.d.). Retrieved from http://www.idf.org/membership/mena/qatar

> [2018] Flow cytometry introduction | Abcam. [online] Available at: http://www.abcam.com/protocols/introduction-to-flow-cytometry [Accessed 18 Mar. 2018].

We would like to thank the committee of QSIURP (Qatar Student Initiated Undergraduate Research Program) at CMUQ for selecting our project for the research grant.

References & Acknowledgements

From the given results, it can be concluded that addition of aspartame does not elevate oxidative stress in HEK293 cells, even when the activity of the malate aspartate shuttle – which produces ROS – was inhibited using aminooxyacetic acid. Further research need to carried out to understand why cells become inviable at high Aspartame concentrations.

There was no significant increase in either ROS or superoxide in the mitochondria when analyzed using the two oxidative stress dyes as shown in Table 1. The t-test results for low cytometry and fluorescence microplate reader were always higher than 0.05, meaning that the total ROS and superoxide was not elevated. The fluorescence microscopy results also showed no elevated oxidative stress. Though we have hypothesized that total ROS and superoxide would increase, our results did not support our hypothesis.

Conclusion

Fatema Abdul Salik, Reema Subeh, and Dr. Annette Vincent Biological Sciences Program, Carnegie Mellon University in Qatar

Investigating Oxidative Stress Induced by Aspartame in Human Embryonic Kidney Cells


The varying amount of genetic modifications in non-GMO labeled products from the USA and Europe Authors

Moussa Zekak Khalid Al-Naemi

Advisor

Annette Vincent

Category

Biological Sciences

Abstract The European Union has stricter regulations on Genetically Modified Organisms (GMO) and their labeling. If the genome of an organism has more than 1% genetic modification it must be labelled as a GMO product, whereas in the United States of America, the regulations are less strict, in which up to 5% of the organism can be genetically modified without the need to label the product as a GMO. In our methods, we test for genetically modified events (such as barCAN, oxyCAN and CamV35S [promoter]) in products that have canola oil, an oil derived from canola seeds that are often genetically modified to survive herbicides and insecticides. We will do this at the genomic level using multiplex and uniplex PCR as well as the protein level detection using ELISA by detecting the protein product of the pat genetic modification. These different products either come from Europe or the Unites States and are manufactured there. Thus we hypothesize that there will be more GMO events in the USA based products compared to the EU manufactured products. This is significant as it will play a role in informed decisions on the consumer and state level on what to buy and import. In our results We found that none of our products contained genetic modifications – this was confirmed by the absence CamV35S promoter and the absence of all the genes that were tested in uniplex and multiplex PCR – in ELISA we did not find any protein product. In conclusion, these products seem to not be genetically modified, and thus confirms their lack of GMO labelling. We are thus unable to compare regulation cut offs between these two countries.

42


Thus the purpose of our experiment is to find whether there is a difference in amount of genetic modifications in these different products from countries with different regulations. Consumers in Qatar who are provided with products from the USA and the EU may want to consider the country of origin when deciding whether the food is GMO-free or not due to different regulations in order to make more informed choices. On the state level, Qatar may want to be more informed about which products they want to import and how strict they want to be on genetically modified foods. Thus our goal is testing whether a difference really exists between these two countries with different regulations to allow for more informed choices

We base our confidence of the presence of these genes as they have been found in Canola and multiplex PCR has been used to detect these in the literature [6] – in addition to this they are specific to canola in the PCR primer list[8]. All of these genes confer herbicide resistance to the canola plant [11][12].

Figure 5: The DNA Cassette complex used for insertion of a GMO event.

Regulations of GMOs are stricter in the EU than in the USA. Therefore we have chosen the following Canola Oil products: [Bird Seeds, European Butter, American Butter, Organic (non-GMO) Canola Oil – negative control]. We will test for GMO events - which are barCAN, oxyCAN, patCAN, CP4canESPS [6] using multiplex PCR followed by testing protein products using ELISA (enzyme-linked immunosorbent assay) – a procedure that can specifically detect a substance based on specific antibodies and the amount of the substance based on color change [2]. We will be screening for Cruciferin which will be an internal control that ensures we have canola oil in these products – as Cruciferin is a specific gene in Canola [1]. We will also be screening for the CamV35S promoter as this will give us a yes or no answer to whether our product has been genetically modified. Since this is the most common promoter used in GMO event insertion. It is found in the DNA cassette that is inserted into the organism’s genome – which consists of [Promoter, [18] Gene of Interest and Terminator] [2].

Introduction

The European Union has stricter regulations on Genetically Modified Organisms (GMO) and their labeling. If the genome of an organism has more than 1% genetic modification it must be labelled as a GMO product-whereas in the United States of America, the regulations are less strict[3], in which up to 5% of the organism can be genetically modified without the need to label the product as a GMO. In our methods, we test for genetically modified events (such as barCAN, oxyCAN and CamV35S [promoter]) in products that have canola oil, an oil derived from canola seeds that are often genetically modified to survive herbicides and insecticides. We will do this at the genomic level using multiplex and uniplex PCR as well as the protein level detection using ELISA by detecting the protein product of the pat genetic modification. These different products either come from Europe or the Unites States and are manufactured there. Thus we hypothesize that there will be more GMO events in the USA based products compared to the EU manufactured products. This is significant as it will play a role in informed decisions on the consumer and state level on what to buy and import. In our results We found that none of our products contained genetic modifications – this was confirmed by the absence CamV35S promoter and the absence of all the genes that were tested in uniplex and multiplex PCR – in ELISA we did not find any protein product. In conclusion – these products seem to not be genetically modified – and thus confirms their lack of GMO labelling. We are thus unable to compare regulation cut offs between these two countries.

Abstract DNA isolation using NucleoSpin® genomic DNA prep kit was done. In which the genomic DNA is extracted from only 0.2g of product. The sample is homogenized and then lysed using proteinase K and 65 oC. After centrifugation C4 buffer and ethanol are added and then the sample is loaded into NucleoSpin food column to allow binding of DNA to the column. After several washes to remove contaminants the DNA is eluted from the column using 70 oC CE buffer. The isolated genomic DNA integrity is assessed and determined using spectrophotometer that detects absorbance at wavelengths 230,260 and 280nm. Bird Seed

Canola Oil Positive Control

288

207

238

190

barCAN

oxyCAN

CP4canESPS

patCAN

123

190

Cruciferin

CamV35S

Expected size [8]

Sample Name

Table 2: Gene expected and observed fragment length

Not Observed

Not Observed

Not Observed

Not Observed

Not Observed

193.5

Observed size

0.016 0.035

USA Butter EU Butter

0.016

0.041

0.010

0.122

0.85

1.60

1.65

1.44

1.34

Purity Ratio (Abs260/Abs2 80)c

17.5

8

100.5

11.5

183.5

DNA Concentration (ng/µl)a

Blank

Samples

0

0.01

0.02

0.03

0.04

0.05

0.06

Negative Bird Seed Liquid USA Fat USA Control Butter Butter

Liquid EU Butter

Fat EU Butter

Acknowledgments: We would like to thank Maria Bernales, Maya Kemaldean.

[1]: Campbell, L., Rempel, B. C., Wanasundara, J. P. D. 2016. Canola/Rapeseed Protein: Future Opportunities and Directions – Workshop Proceedings of IRC 2015. MDPI. Retriev ed from: f ile://qatar.win.cmu.edu/files/User/mzekak/MyDocuments/Downloads/plants-05-00017.pdf [2]: Vincent, A., Doonan, C. 2018. Experimental Biochemistry . [3]: Vincent, A. 2018. Lecture: Biotechnology /Lab 4 Ov erv iew. Experimental Biochemistry. [4]: New England BioLabs. 2018. 100 bp DNA Ladder. NEB. [5]: NA. 2013. GM labelling. Food Standards Agency . Retriev ed f rom: https://www.f ood.gov.uk/science/novel/gm/gm -labelling [6]: Demeke, T., Ratnayaka, I. 2008. Multiplex qualitativ e PCR assay for identification of genetically modified canola ev ents and real-time ev ent-specific PCR assay for quantification of the GT73 canola ev ent. Food Control. Retrieved f rom: https://www.sciencedirect.com/science/article/pii/S0956713507002010 [7]: Schmidt, A. M., Sahota, R., Pope, S. D., Lawrence, S. T., Belton, M. P., Rott, M. E. 2008. Detection of Genetically Modif ied Canola Using Multiplex PCR Coupled with Oligonucleotide Microarray Hy bridization. Journal of Agricultural and Food Chemistry . Retriev ed f rom: https://pubs.acs.org/doi/pdf/10.1021/jf800137q [8]: Vincent, A. 2018. GMO primer list. Experimental Biochemistry . [9]: Vincent, A. 2018. PCR PROTOCOL – CAMV35S promoter/NOS terminator. Experimental Biochemistry . [10]: ThermoFisher. 2018. Multiple Primer Analy zer. ThermoFisher Scientif ic. Retrieved f rom: https://www.thermof isher.com/in/en/home/brands/thermo-scientific/molecular-biology /molecular-biology -learning-center/molecularbiology -resource-library /thermo-scientific-web-tools/multiple-primer-analy zer.html [11]:Ev ent Name: OXY -235. (n.d.). Retrieved March 20, 2018, f rom http://www.isaaa.org/gmapprov aldatabase/event/default.asp?EventID=7 [12]:Chapter 2 Introduction: The bar gene. (n.d.). Retriev ed March 20, 2018, f rom http://www.bios.net/daisy/Phosph/g2/710.html [13]: Kim, J.H., Park, B. S., Hong, Y ., Kim, H. Y . 2015. Detection of eight genetically modified canola events using two event-specific pentaplex PCR systems. Food Control. Retriev ed f rom: https://www.sciencedirect.com/science/article/pii/S0956713514006616 [14]:Poly merase Chain Reaction Agarose Gel Electrophoresis: What do these bands mean? . (n.d.). Retriev ed March 17, 2018, f rom https://prettyincrediblegirls.weebly .com/home/polymerase-chain-reaction-agarose-gel-electrophoresis-what-do-these-bandsmean [15]:Poly merase Chain Reaction (PCR) : Principle, Procedure, Components, Ty pes and Applications. (2015, July 28). Retriev ed March 21, 2018, f rom https://laboratoryinfo.com/polymerase-chain-reaction-pcr/ [16]: New tool f or successful end-point multiplex PCR. (2016, July 13). Retrieved March 21, 2018, f rom http://biomarkerinsights.qiagen.com/2016/07/13/new-tool-for-successful-end-point-multiplex-pcr/ [17]:Cox, K. L. (2014, December 24). Figure 1: [Diagram of a sandwich ELISA...]. - Assay Guidance Manual - NCBI Bookshelf . Retriev ed March 21, 2018, f rom https://www.ncbi.nlm.nih.gov/books/NBK92434/figure/immunometh.F1/ [18]:Bertheau, Y . (2013). Genetically modif ied and non-genetically modif ied f ood supply chains co-existence and traceability . Chichester, West Sussex, UK: Wiley-Blackwell.

References

Positive Control

Blank Corrected Reading

Intial Reading

Figure 4: The ELISA reading for 4 Canola products and Positive control extracts at the wavelength 450nm. 50ul of sample extract in each well followed by addition of 50ul Enzyme Conjugate followed by wash. 100ul of substrate was used and inbucated for 30minutes at ambient temperature and 100ul stop solution (1.0M HCl) was added. Plates were read after 30 minutes of adding stop solution.

Figure 3: Multiplex PCR Gel with internal control Cruciferin to test for genetic modification in 3 food products and one positive control and one negative control. 15ul of PCR samples were mixed with 5ul of 6x Loading Dye. 1.2% Agarose gel was poured and run with 1X TBE buffer. The gel was run for 60 minutes at 100 volts. 100bp ladder was loaded NEB [N3231S] and 1kb ladder[N3232S}[4 ]

a- DNA concentration  Total volume =100µl 1 Abs= 50µg  Sample calculation for Bird Seeds: (0.201*50)/Total volume*1000ng = 100.5 b- The absorbance is taken from tabular view of spectrophotometer c- Purity Ratio calculated by dividing Abs260/Abs280. Sample calculation for positive control: 0.367/0.274 = 1.34. A good purity ratio lies between 1.6-1.9 [2].

0.023

0.201

Bird Seeds

0.274

0.367

Positive Control Negative Control

Absorbance at 280nm

Absorbance at 260nmb

Product Name

Table1: DNA concentration for each genomic prep of 4 Canola products and positive control

Figure 2: Uniplex PCR Gel for Internal Control of Cruciferin and CamV35S promoter to test for genetic modification in 4 food products and one positive control. 15ul of PCR samples were mixed with 5ul of 6x Loading Dye. 1.2% Agarose gel was poured and run with 1X TBE buffer. The gel was run for 60 minutes at 100 volts. 100bp ladder was loaded NEB [N3231S][4]

USA Butter

EU Butter

Figure 1: The spectrophotometry readings of 4 Canola products and positive control from NucleoSpin genomic DNA prep kit over the wavelength range 200.0 – 400.0. 10ul of each sample DNA was added to a cuvette with 90ul CE (elution) buffer. A blank of just 100ul CE buffer was run.

Results

PCR Reactions: We are not able to obtain the right band size for all reactions, as we can see in Figure 2, many reactions had primer dimers – as indicated by bands below the ladder. Also, we may have observed non-specific amplification with size 342bp, which could suggest that there are other binding sites for Cruciferin primers in the genome of EU butter. The positive control does not show any presence of promotor CamV35S – this makes it difficult to make any conclusions regarding whether we have GMO events in our other products. A possible explanation for the lack of the promoter is a non-optimal annealing temperature (as it was much higher than the melting temperature of the primers). The positive control and bird seeds show presence of Cruciferin, at around ~196bp; the fragment length of Cruciferin is 190bp[8 as shown in Table 2. Further Tests: Our food products tested negative for all the genes tested as shown in Figure 3 for the uniplex PCR. There are no bands with exception of cruciferin in the same products as in the Uniplex (Figure 2). Before testing other genes, we would first optimize the uniplex in order to have the positive control with the CamV35S band so that we can determine whether our products are genetically modified in the first place or not. However, other genes that can be tested for are MS1, GT73, RF3 and T45 . These genes are inserted in other canola products according to the literature[13] ELISA Data: The ELISA used does not generate a standard curve, instead it creates a threshold OD reading that defines if there is PAT protein found within the sample or not. The threshold is OD-blank subtracted >0.2[2]. ELISA result indicate there is no GMO protein present in our samples as indicated in Figure 4 as the red bars do not reach the 0.2 threshold. Although it not certain, as the positive control also falls under the threshold, which implies that the positive control does contain PAT protein – when in fact we expect that it does. Thus, it cannot be fully conclusive regarding whether there is not GMO proteins. A possible explanation for this is the enzyme on the secondary antibody has lost its activity thus affecting ELISA results across the board. Discussion: The labeling based purely on our data is accurate, we were not able to identify GMO events in our samples and thus the lack of GMO labelling seems warranted. Also, our positive control was not detected as genetically modified on the genomic level (even though it ought to be), there is no detection of GMO insertions or pat proteins that could be found in our products. The negative control used appears to function its purpose, as we did not detect any GMO insertion and GM proteins on both genomic and protein levels – however at the same time we didn’t detect our internal control cruciferin – a pattern found in all our products (except bird seeds and positive control which had cruciferin bands). Thus we cannot make any clear conclusions, as mentioned earlier, since our positive control did not serve its purpose. Conclusion: Our project did not achieve its main goal, to determine if there is GMO content in products label as non-GMO coming from two different countries with varying regulation. The project requires multiple optimization in addition to investigating for other GMO insertion genes. However, we can be confident that our bird seeds and positive control do not have the five genes tested – since in Figure 4 we do see our cruciferin band but no bands for our genes. In order to be more confident in that conclusion we would run a positive for each of the genes (which we know has at least one of the five genes tested) in order to compare. Our hypothesis however is neither proven or disproven, as one multiplex PCR for 4 GMO insertion genes does not fully conclude that there is no other possible insertion events that were not tested for. The future aim is to include more genes with primers with improved complementarity. As our current primer did not have high complementarity [10]

Discussion and Conclusions

interest. Using a 96-Well plate whose bottom is coated with anti-pat antibody, we add the sample followed by adding our extracted protein sample. The protein is extracted from the sample by artificial homogenization of a 2g of food product. Followed by incubation with extraction buffer for 96 hours. The extraction buffer consists of Buffer salts that is dissolved in sterile water, and the composition of the buffer is not disclosed by QualiPlate. After adding sample, primary antibody is added to bind any pat antigen followed by secondary antibody conjugated with alkaline phosphatase enzyme that binds or primary antibody – hence sandwich. The ELISA kit that was used is QualiPlate Kit for LibertyLink PAT/pat. NPP is the product used and is hydrolyzed by the enzyme into p-nitrophenol which absorbs in the spectrophotometer at 450nm wavelength. The absorbance is directly proportional to the concentration of pat protein in our samples. A blank of just buffer is used to control for any background noise.

5 ELISA is a technique that allows for the detection of specific proteins of

annealing temperature used for CamV35S is 62C 0, and for Cruciferin is 52C0. The PCR reaction was Initial denaturation at 95C 0 for 3min, following that denaturation for 25seconds, annealing at the annealing temperatures for each set for 30 seconds, extension at 72C 0 for 45 seconds –the number of cycles used was 50 [9], the final extension at 72C0 for 7 min[7], holding the reaction at 4C 0. 3 Multiplex PCR was set up for the following genes: barCAN, oxyCAN, patCAN and CP4canESPS. The total reaction had an annealing temperature of 54C0, the reaction had the condition of initial denaturation at 94C0 for 5min, it is followed by 35 cycles that consists of denaturation at 94C0 for 30 seconds, annealing for 30 seconds, extension at 72C0 for 40 seconds. The final extension is at 72C 0 for 7min[7], followed by holding the reaction at 4C 0. 4 Both PCR reactions are loaded in 1.2% Agarose gels alongside a ladder of known size bands to determine the sizes of any bands and thus characterize whether we have amplified the desired region.

2 Uniplex PCR was used to detect CamV35S and Cruciferin. The

1

Methods

Biological Sciences

Moussa Zekak, Khalid Al-Naemi and Annette Vincent

The Varying Amount of Genetic Modification in non-GMO Labeled Products from the USA and Europe

Absorbance


An oracle characterization of the polynomial-size alternating hierarchy Authors

Malek Anabtawi Sabit Hassan Mohammad Zakzok

Advisor

Christos Kapoutsis

Category

Computer Science

Abstract In computational complexity theory, the Polynomial-Time Hierarchy (PH) is a tool for describing the difficulty of computational problems. The higher a problem lies in this hierarchy (of classes of problems), the more alternations we believe are necessary in a polynomial-time alternating Turing machine (ATM) that solves it. Nobody knows, however, whether PH is strict, and thus our beliefs derived from it are correct; or collapses at some level, and thus some presumably hard problems are actually easy. To approach this question, we study analogs of PH for the much simpler model of one-way finite automata (1FAs). In previous work, the One-Way Polynomial-Size Hierarchy (1H) was so derived from PH, by replacing polynomial-time ATMs by polynomial-size alternating 1FAs, and proven to be strict. However, PH is also defined equivalently in terms of polynomial-time oracle Turing machines (OTMs). What is its 1FA-analog under this alternative definition? Is it again 1H or is it some other natural hierarchy? In this work, we introduce the concept of an oracle 1FA (1OFA). We prove that its most restricted variant gives rise to a hierarchy identical to 1H, but its most general variant gives rise to distinct, new hierarchy which is strictly stronger.

44



Interactive evaluation and training of classifiers Author

Sabit Hassan

Advisors

Bhiksha Raj, Carnegie Mellon University Saquib Razak

Category

Computer Science

Abstract In this research, we propose strategies to estimate the accuracy of classifiers and to train classifiers on a dataset when resource limitations restrict the number of instances for which true labels can be obtained. For estimating classifier accuracy, our target scenarios include situations where the classifier output labels, but no scores, e.g. when the classifier is an inexpert human labeller. Our objective is to optimally select a subset of the data to obtain true labels for, such that they provide the best estimate of classifier accuracy. We use techniques based on stratified sampling to address this problem. However, stratified sampling poses two challenges: i) how best to stratify the data, and ii) how to allocate samples among the strata. We propose a method of stratifying data and then present two novel interactive algorithms to approximate optimal allocation of samples to the strata. Our proposed methods for stratification and allocation are seen to outperform other popular approaches to the problem. Then, we extend our algortithms for evaluation of multiple classifier with same labeling resources. For active training of classifiers, we consider the scenario when we cannot obtain true labels for the whole dataset but can iteratively request for true labels. We introduce the concept of stratified sampling to choose which instances we request true labels. Use of stratified sampling reduces the training time of classifiers without compromising accuracy.

46


Interactive Evaluation and Training of Classifiers Sabit Hassan Advisors: Bhiksha Raj and Saquib Razak Carnegie Mellon University Qatar

Introduction Motivation

Interactive Evaluation of Classifiers

● Classifiers can perform tasks much faster and at much larger scale than humanly possible ● Classifiers need to learn patterns to perform tasks ● Classifiers need to be evaluated before they are deployed ● “True labels” are required for both learning and evaluation

Approach: Stratified Sampling

Method:

● Stratify data into K different regions with weight Wk and variance S2k ● Choose nk samples from kth region to estimate Ark ● Evaluate classifier on these samples to estimate As

Challenges:

Research Goals

● Obtaining “true labels” is expensive ● We want to minimize amount of “true labels” needed (thus, reducing cost) while obtaining classifier of good accuracy during learning phase and good estimate of accuracy during evaluation phase. ● Exploit distribution of data: Use Stratified Sampling

● How to stratify data ● How to allocate samples in each region ● Optimal Stratification or Optimal Allocation cannot be achieved because accuracy of classifier is not known beforehand [1] ● We want to stratify the data and allocate samples in a way such that it results in the variance (equation above) to be as small as possible

Stratification Method Existing Methods ● Use classifier scores or feature vectors for stratification [2] ● Feature vectors do not always result in good stratification ● Classifier score is not always available

Experiment on stratification methods:

Our solution

● Train a logistic regression to learn the relationship between feature vectors and class labels of instances. ● Stratify data on the logistic regression score learned

Experiment: Variance in estimation of accuracy for different stratification methods

Results:

● Stratification on our logistic regression score performs better than stratification on feature vectors and similar to stratification on classifier score ● Logistic regression score can be used when classifier score is not available

Allocation Method Experiment on approximation of Results: ● Existing algorithms optimal allocation:

Existing Methods

● Allocate samples equally to all strata or in proportion to size of strata, which is not optimal ● Optimal allocation allocates samples in proportion to the variance of accuracy within the strata, which is unknown.

Our solution

● Approximate optimal allocation ● Use logistic regression score learned earlier to obtain estimate of true variance

Algorithm: I-OPT

Evaluation of Multiple Classifiers

Algorithm: L-OPT

Experiment: Mean Average Error (MAE) in estimation of accuracy

such as Equal Allocation can perform poorly ● Among the two proposed algorithms, L-OPT outperforms I-OPT. ● Logistic score is a good indicator of accuracy

Interactive Training of Classifiers

Existing method:

Method:

● Iterate over data to choose sample that provides most information gain

● We can modify our algorithm to evaluate multiple classifiers simultaneously ● We take weighted average of estimated variance of multiple classifiers in earlier algorithms

Proposed method: ● Stratify data based on logistic score ● Iterate over each stratum to choose sample with most information gain

Experiment result:

Experiment result:

● Performs better than randomly choosing samples for evaluation

Experiment: MAE in evaluation of Multiple Classifiers

● Reduction in running time without compromising accuracy

Experiment: Accuracy for different training algorithms

Conclusion and Future Work

● Logistic score is a better stratification metric ● Algorithms can be extended to evaluate multiple classifiers ● In future, stratification of data can be looked at more closely for the classifier training ● Our allocation algorithms are most consistent ● Our proposed algorithm reduces running time of training References [1]A. Kumar and B. Raj. Classifier Risk Estimation under Limited Labeling Resources, arXiv preprint arXiv:1607.02665, 2016 [2]P. N. Bennett and V. R. Carvalho. Online stratified sampling: evaluating classifiers at web-scale. In Proceedings of the 19th ACM international conference on Information and knowledge management, pages 1581–1584. ACM, 2010.


Minimizing cost of accuracy estimation of automated classifiers Authors

Sabit Hassan Shaden Shaar

Advisors

Bhiksha Raj, Carnegie Mellon University Saquib Razak

Category

Computer Science

Abstract In this research, we explore strategies to estimate accuracy of classifiers. Instead of obtaining true labels for whole dataset to estimate accuracy, we provide strategies to pick only a few instances that we obtain true labels for and still have a good estimate of the accuracy. We stratify the dataset into different regions and pick samples from each region. We derive mathematical formulation of iterative optimal stratification for estimating accuracy. Lastly, we propose improvement upon existing strategies to approximate optimal allocation method that focus on lowering the bias.

48



Behaviour analysis using multi-sensor data Author

Daanish Ali Khan

Advisors

Bhiksha Raj, Carnegie Mellon University Saquib Razak Rita Singh, Carnegie Mellon University

Category

Computer Science

Abstract With the rapid growth of the Internet of Things (IoT), extremely large volumes of data are being generated. One of the main applications of IoT devices is surveillance, a field where currently humans are required to analyse the data manually to determine whether or not action needs to be taken. As a result, there is a growing need for Behaviour Analysis and Event Detection systems that can automatically analyse the surveillance data and identify what is happening in the area of interest, track suspicious individuals, etc. Current work on Automatic Behaviour Analysis and Event Detection relies on mainly video data. We investigate utilizing data generated by many different types of IoT sensors to perform automatic behaviour analysis and event detection. We propose a machine learning model to analyse the multi sensor data in a distributed fashion to address the problem.

50


Behaviour Analysis Using Multi-Sensor Data Daanish Ali Khan, Bhiksha Raj, Saquib Razak, Rita Singh Problem Statement and Significance Device-free human behaviour analysis is the task of automatically identifying physical behaviour or activity without attaching any sensory devices to the subject. The problem has several applications in security, remote healthcare and smart homes. While the use of cameras and video data for behaviour analysis has provided good results in the past[1], due to privacy issues, visual data may not always be available. Instead, due to the recent ubiquity of WiFi-enabled devices, significant work has been done on WiFi based behaviour analysis techniques[2].

Access Point

Human Activity WiFi Data Collection

WiFi Device

Methodology We collected WiFi Channel State Information (CSI) data for multiple activities. CSI data describes the propagation of the signal from transmitter antennae to receiver antennae.

Accuracy Over Epochs 100 90

We de-noised our CSI signals using Principal Component Analysis (PCA) on data with no activity, and projected our remaining data onto the tangential hyperplane of the first principal component.

80

Validation Accuracy

70 60 50 40 30 20 10 0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

Epoch Number

Model Architecture

Model Accuracy

Results and Evaluation To test our approach we trained and evaluated our model on a benchmarking CSI dataset used to report the accuracy of the current state of the art approach as 75%[2]. With 10-fold cross validation, we achieved an accuracy of 95%. Citations

[1] Subetha, TS. (2016, February). A Survey on human activity recognition from videos (ICICES), 2016 International Conference on (pp. 1-7). IEEE. [2] Yousefi, S., Narui, H., Dayal, S., Ermon, S., & Valaee, S. (2017). A Survey on Behavior Recognition Using WiFi Channel State Information. IEEE Communications Magazine, 55(10), 98-104.

The current state of the art in CSI behaviour analysis uses a Long Short Term Memory (LSTM) Neural Network for classification[2]. We used a stacked GRU with a Convolutional Neural Network feature extractor.

Conclusions and Future Work • Our results were a significant improvement compared to the state of the art approach, indicating deep learning models are suitable for CSI analysis • A model trained on data collected in one room does not perform well on another dataset, there is a need for a generalizable high accuracy model • Using auto-encoders to learn a vector representation of the data could enable oneshot learning which is more applicable to realworld behaviour analysis scenarios


A learning approach to vision-based coarse robotics localization in industry Author

Aisha Mohamed

Advisors

Gianni Di Caro

Category

Computer Science

Abstract: One of the main challenges of autonomous assembly of work-pieces in industry is the localization of the robot with respect to the workpieces. The robot’s task is to localize the different pieces that are not assembled yet and assemble them. The main challenge in the industrial setting is the lack of datasets of images of each piece, the absence of cameras spread around the place and the symmetric nature of most of the work pieces which make it challenging for recognition algorithms to differentiate different poses. While creating a dataset of images for each workpiece is a huge overhead, most man-manufactured objects were designed by CAD models that specify the geometric and volumetric features of the object. We propose a way to coarsely localize a robot with respect to a workpiece based on its CAD data. This system finds the workpiece in the environment then finds the pose of the robot with respect to it. It learns the features of the object from the images rendered using the CAD model, reducing the overhead of collecting a dataset for each work-piece. The first step in localization is to find the object in the environment. Given only the CAD model of an object, we use a Convolutional Neural Network (CNN) trained on rendered images of the object to segment the object from the background. Our system then uses the features extracted by the CNN to build a progressive viewpoint map of the object. The variation in appearance of the workpiece encoded in the CAD model is used to localize the robot in that map. The location in the map provides an initial estimate to a fine localization module for further pose precision. This approach learns the features of the object from the virtual rendered images and uses this knowledge to localize the robot with respect to the workpiece in real world. It also exploits the progressive variation of poses encoded in the CAD model to incorporate time awareness in the system. The system knows the features distinguishing each pose and the patterns of the neighboring poses in both directions. This awareness of the neighbors’ patterns can help resolve the conflicts arising from the symmetry in most of the features in the work pieces. Our system can thus take a picture of the object in real world and localize it in the map using clustering algorithms.

52


Example of a symmetric workpiece before assembly

Our contributions mainly are: - Learning the appearance of the workpiece from the rendered images using the CAD model. - Using a CNN to automate extraction of complex and effective features. - Using patterns of neighbors encoded in the progressive viewpoint map to localize the robot.

We propose a learning approach to coarsely localize a robot with respect to a workpiece based on its CAD data. Our system addresses the feature extraction challenge and the lack of datasets by using synthesized data from the CAD models to learn the features of the workpiece then localize it.

Contribution

- The symmetric nature of most of the work pieces which make it challenging to define or hand-craft the features for pattern recognition approaches.

- The lack of datasets of images of each piece which are required for learning approaches

The challenges in localizing workpieces in this setting are:

The Robot’s task is to localize the workpiece, estimate its pose with respect to the work-piece then manipulate it.

Out system learns the features of the object from the virtual rendered images using the CAD models and then transfer this knowledge to localize the robot with respect to the workpiece in real world.

Using robots to automate assembly has been already generalized in most of the factories where robots perform a strict set of instructions. The current advances in robot’s perception and action capabilities are encouraging a more flexible interactive use of robots to assemble work-pieces that are not known to be in a specific location in the factory.

Output

We extract the features of the workpiece from the CNN as the activations of the last convolutional layer. This maps each pose to a vector in the feature space.

Input

Step 2: We assume that the object is within the field of view of the robot. We use a Convolutional Neural Network (CNN) trained on rendered images of the object to segment the object from the background.

Randomize the location of the workpiece in the background that is similar to the factory to ensure the system learns complex effective features.

Step1: Build a dataset of rendered images using the CAD model of the workpiece from different distances from the observer and from different angles.

Approach

Introduction

Aisha Mohamed, Gianni A. Di Caro Carnegie Mellon University in Qatar

Map of a cube

Map of a sphere

Test loss Vs epoch for the motor

We will use the viewpoint maps to do fine pose estimation. We will use the particle filter algorithm to estimate the pose of the robot given the initial coarse localization.

Future Work

Test loss Vs epoch for the cylinder

The CNN achieves an accuracy of 69% on a CAD model of a cylinder and an accuracy of 60% on more complex model of a motor.

Results

The cube poses can be clustered in three clusters and the map also encodes the symmetry of the clusters.

The sphere is a symmetric shape along all axes, All the poses have the same features and are clustered the same.

We build the maps using Gaussian Mixture Model to cluster the poses. Poses that have similar features are in the same cluster with the same color.

Step3: We use the features extracted from this CNN to build a viewpoint map that encodes the variation of the appearance of the object across progressive viewpoints.

A Learning Approach to vision-based coarse robotics localization in industry


Computational analysis of the role of MTCP1 in T-cell leukemia Author

Rohith Krishnan Pillai

Advisor

Valentin Ilyin

Category

Computer Science

Abstract: MTCP1 or mature T-cell proliferation 1 is a gene that is found in chromosome X at location Xq28, some of whose translocations are thought to be liked to mature T-cell proliferations. MTCP1 is believed to be an oncogene that is associated with T-cell Prolymphocytic Leukemia. The gene has not been very deeply studied in the past, and this research aims to compile together all the information available in different online databases in order to comprehensively understand the role and effects of the gene. We look at MTCP1 and its homologs in model organisms such as mice, and also look at the biological pathways that MTCP1 is part of. We also try to study the protein and the splice forms of MTCP1 to understand more about its function.

54


● Multiple sequence alignment (MSA) ● Use datasets to find related genes ● Pathway/Network analysis

● mRNA: Gene starts at 1 and ends at 7239 mRNA:join(1..579,5216..5367,5484..5654,5736..5787,5874..7239) ● CDS: Translation starts at 5263 and ends at 5783. join(5263..5367,5484..5654,5736..5783)

The region has complex gene structure with a common promoter and 5' exon spliced to two different sets of 3' exons that encode two different proteins. Hence of the 2 splice forms only one of them is associated with the MTCP1 gene while the downstream 8kDa protein is encoded by the co-located CMC4 gene. The splice form’s gene structure is given below.

Gene Structure

DATA COLLECTION : • All of the data was collected using existing online databases such as: ○ CCD ○ NCBI ○ GEO ○ PDB ○ OMIM ○ KEGG ○ AmiGo2 ○ MGI

Methodology

T-cells with Prolymphocytic Leukemia

● Study role of MTCP1 and related genes in T-cell Prolymphocytic Leukemia. ● Compile together all the information available in different online databases. ● Comprehensively understand the role and effects of the gene. Study the protein and the splice forms of MTCP1 to understand more about it's function.

Objectives:

Mature T-cell proliferation 1 (MTCP1) is a gene that is found in chromosome X at location Xq28, some of whose translocations are thought to be liked to mature T-cell proliferations. MTCP1 is believed to be an oncogene that is associated with the rare T-cell Prolymphocytic Leukemia.

Introduction

Protein Study

p13 has protein -protein interactions with AKT1 and AKT2, and has a sequence length of 108 amino acids. Part of the PI3K-Akt signaling pathway which is involved in regulates fundamental cellular functions such as transcription, translation, proliferation, growth, and survival

Structure:

● Involved in cell survival and proliferation. ● positive regulation of peptidyl-serine phosphorylation and protein serine/threonine kinase activity

Processes:

● Part of the TCL1 family of genes, with conserved domains. ● The protein p13 MTCP1 is found in the mitochondria and cytosol of the cell. ● The protein is shown to stabilize the mitochondrial transmembrane potential and enhances the phosphorylation and activation of AKT1 and AKT2, by acting as a cofactor. ● Protein kinase binding and protein serine/threonine kinase activator activity

Function:

There are a couple of close homologs of MTCP1 as the gene is conserved in chimpanzee, Rhesus monkey, dog, and cow. The Rhesus monkey is the best model organism due to extremely close similarity by coverage, identity, and evolution. The clustal omega tree using MSA on BLAST alignments is shown below.

Homologs

Carnegie Mellon University in Qatar

Rohith Krishnan Pillai and Valentin Ilyin

Fu, Z. Q., Du Bois, G. C., Song, S. P., Kulikovskaya, I., Virgilio, L., Rothstein, J. L., ... & Harrison, R. W. (1998). Crystal structure of MTCP-1: implications for role of TCL-1 and MTCP-1 in T cell malignancies. Proceedings of the National Academy of Sciences, 95(7), 3413-3418. Soulier, J., Madani, A., Cacheux, V., Rosenzwajg, M., Sigaux, F., Stern, M.-H. The MTCP-1/c6.1B gene encodes for a cytoplasmic 8 kD protein overexpressed in T cell leukemia bearing a t(X;14) translocation. Oncogene 9: 3565-3570, 1994. [PubMed: 7970717, related citations] Gritti, C., Dastot, H., Soulier, J., Janin, A., Daniel, M. T., Madani, A., ... & Stern, M. H. (1998). Transgenic mice for MTCP1 develop T-cell prolymphocytic leukemia. Blood, 92(2), 368-373.

References

Conclusion & Future work

There is enough evidence to show a link between expression levels of MTCP1 gene and T-cell Prolymphocytic Leukemia, and must be studied in more detail using controlled experiments such as population studies using microarray expression data etc, as existing datasets are not suitable. Further studies should look into BRCC3, CMC4, PAWR, STAG2, and also create a network model for the T-cell Prolymphocytic Leikemia disease pathway.

Other possible links due to: ● BRCC3: involved in dna repair and pathogenic translocations of MTCP1 ● Overexpression of human p13 MTCP1 in mice resulted in late-onset T-PLL

MCTP p13 protein is involved in the HTLV-1 disease pathway that causes T-cell leukemia. The p13 protein is 2 steps removed from the apoptosis pathway. Overactive MTCP1 can lead to overactive Ras, which can lead to a more active pro-survival phenotype.There are no cell surface receptors upstream of the gene, so it cannot be downregulated in this way. Also important to note is the link to HIV1 tat proteins which specifically associates with MTCP1 promoter to upregulate MTCP1 expression in T cells.

Links to T-cell Leukemia

To find similarly expressed genes, we used the GEO profile of MTCP1, and used the dataset GDS4455, with the experiment called RhoGTP dissociation inhibitor 2 effect on UM-UC-3 bladder cancer cells. Using a hierarchical clustering method we get the following clustering. Similar genes to look for include: PAWR ,LOC101060521, KIAA0753, STAG2, INTS6, IFNA14. Closely related neighboring genes BRCC3 and CMC4 are also possibly coregulated.

Similarly Expressed Genes

Computational analysis of the role of MTCP1 in T-cell Leukemia


Mixed initiative system for survivable path planning in cluttered environments Author

Rohith Krishnan Pillai

Advisor

Gianni Di Caro.D.

Category

Computer Science

Abstract How can we exploit the advanced cognitive abilities of humans to aid in robot navigation in cluttered environments? This research aims to create a mixed initiative path planning system that can be used in post disaster scenarios such as search and rescue situations after an earthquake. In the case of humanrobot teams being deployed for such tasks it is important to solve the problem of navigation, especially with respect to path planning to find the most survivable path between locations. Current research focuses on solely autonomous planning without additional information from an outside source during a mission, which limits them to the information from their on-board sensors. However, human teammates have a superior cognitive ability and context awareness that can be used by the robot systems in order to find paths that are both efficient and survivable - a path with low danger and high probability of existence. However, the information conveyed by the human is inherently high-level and imprecise. In this research we introduce a novel, mixed initiative method that uses the imprecise information from humans in order to navigate cluttered environments, and compare our system with existing path planning methods in terms of survivability and efficiency.

56


● We create a mixed initiative system that includes a technique to model obstacles from imprecise spatial information passed to the robot ● Use imprecise probabilities methods to path plan as opposed to using bayesian probabilities.

Proposed Contributions

How can robots use such imprecise information to adaptively build maps and plan paths that are survivable and efficient?

Survivable path planner finds the most survivable path with the current world model using imprecise probabilities.

Mixed Initiative Approach Overview

● Survivable Path Planning Planning using bayesian methods due to precise information [1]

● Human-robot interaction Multi-modal fusion of speech and gesture for precise spatial information [2]

Prior Work

Challenges: ● Information input is sporadic ● Human input is inherently imprecise / ambiguous ● Robot sensors alone might be insufficient to detect hazards

SLAM using octomaps

Sensor Map

Human Advisory Input (Imprecise information)

1.

Sensor Robot map

Start

2

4

spatial specifications

Numeric ranges

Point Clouds projected

Object: glass on ground Range: close Direction: front Message:careful!

3

Convex shape

Modeling human advisory information

Iterative Survivable Path Planner

Occupancy grid map

With human input: No general, reliable model exists for calculating probabilities of detection

Octomap occupancy grid

New robot map

obstacle model

5

Belief map

Update probabilities & merge maps

Plausibility map

Imprecise Probabilities ● Uniform distributed uncertainty ● Smaller range signifies higher confidence

Requirements for a bayesian model: ● Prior probabilities ● Way to update posterior probabilities Bayesian Probabilities

Imprecise vs Bayesian Probabilities

1

unknown, cluttered and hazardous environments in the context of human-robot teams. Idea: Humans can take the initiative: help robot agents if/when needed by providing information for navigation. Robots take initiative to ask for information if/when needed as well.

Methods

Goal: Safe robot navigation in post disaster scenarios, that are potentially

Motivation

Carnegie Mellon University in Qatar

Rohith Krishnan Pillai and Gianni Di Caro

?

Object: glass on ground Range: close Direction: front Message:careful!

Numeric ranges

spatial specifications

References

2.

1.

Beom-Seok Cho, Se-Hong Park, and Min-Cheol Lee. Visibility and survivability map based path planning and its simulation. In Ubiquitous Robots and Ambient Intelligence (URAI), 2015 12th International Conference on, pages 482–484. IEEE, 2015. Marjorie Skubic, Dennis Perzanowski, Samuel Blisard, Alan Schultz, William Adams, Magda Bugajska, and Derek Brock. Spatial language for human-robot dialogs. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 34(2):154–167, 2004.

We try to show that it is indeed possible to integrate imprecise information into existing world model to capture missing/undetected hazards using imprecise probabilities from simulations and tests.

Work in Progress

We evaluate the planning model using these experiments. The planning time, distance of path planned, number of inputs provided is recorded. The robot is placed in multiple scenarios with varying: ● density/size of obstacles in map(both modeled and requiring human input) ● start, goal, risk threshold targets ● human advisory information given

Survivable Path Planner Experiments

Preliminary results show a positive correlation between “close” distance and both the velocity of the robot and the obstacle size.

Hypothesis: The “closeness” distance affected by both velocity and the size of obstacle that robot is moving towards at the time.

● We need to find a way to model the spatial specification, distance to a numerical value (what is “close” as a numerical value?) ● We define “close” to be the least distance to obstacle needed by the robot to still be on a collision-free path given a constant velocity.

Proximity Experiments

Results

Mixed Initiative System for Survivable Path Planning in Cluttered Environments


Relating children’s automatically detected facial expressions to their behavior in RoboTutor Authors

Mayank Saxena, Delhi Technological University Rohith Krishnan Pillai

Advisor

Jack Mostow, Robotics Institute/Human Computer Interaction Institute, Carnegie Mellon University

Category

Computer Science

Abstract Can student behavior be anticipated in real-time so that an intelligent tutor system can adapt its content to keep the student engaged? Current methods detect affective states of students at end of learning session to determine their engagement levels, but this provides a one-dimensional input for intervention policies and tutor responses. However, if students’ imminent behavioral action could be anticipated from their affective states in real-time, this could lead to much more complex intervention policies by the tutor. This in turn would assist in keeping the student engaged in an activity, thereby increasing tutor efficiency as well as student engagement levels. In this paper we first explore if there exist any links between a student’s affective states and his/her imminent behaviour action in RoboTutor, an intelligent tutor system for children to learn math, reading and writing. We then exploit our findings to develop a real-time student behavior prediction module.

58


Timestamp Action Units Educationally relevant emotions

OpenFace

2

Student performance (Bubble Pop) Use of the back button

Provides: Successful completion

Timestamp

Log files from student’s session

"RTag","type":"TimeStamp","datetime":"07/04/2017 16:50:21","time":"1499176221447","data":{"RoboTutor":"SessionStart "}}...

5{"RT_log_version":"1.0.1","RT_log_data":[{"class":"VERBOSE","tag":

Provides:

Screen record video

1

Approach:

RoboTutor DB

6

W-value

p-value

Wilcoxon rank W-value sum test (95% 29136000 confidence)

p-value

Affective State Surprise is a good indicator for a student clicking the back button during activity

Correctness in Bubble Pop activity is positively correlated only with Affective State Flow

4

Join AU files to log files by Timestamp.

8 Data visualization

7

Map AUs to Affective States: Delight, Surprise, Frustration, Confusion, Boredom, Neutral

Note: ● Sample size (N) = 17 sessions ● Assumes each session is a different student ● AU to emotion mapping is done using correlations from existing literature.

1079900000 2.20E-16

Integrate existing affective state data analysis pipeline to Statistical Probe of Tutoring system (SPOT) for continued analysis of incoming data.

Thanks to Dr. Amy Ogan and Dr. Jeni Lazlo for their valuable inputs on improvements to our research. Thanks to Dr. Tadas Baltrušaitis for his clarifications and suggestions on using Open Face. Thanks to Rhea Jain for her help in the analysis, AU to emotion conversions and plotting. We also appreciate Yugandhar Pavan Devarapalli for helping us visualize the data.

Acknowledgements

Developing a background service to communicate the next possible action or behavior of the student to RoboTutor. This is currently in development.

● A correlation exists between in-app behavioral actions of the students and the affective states exhibited by them. ● Using many such correlations, we can build a prediction model for the in-app behavioral actions of the students in real-time.

Conclusions and Future Work:

0.03796

● Flow (neutral) is a good indicator of the learner being engaged while interacting with ● Although boredom is more ● Statistically significant positive the tutor system. frequent, only neutral, surprise and correlation exists between the state delight were statistically significant. ● It can be considered as one of delight and successful completions. the many predictors for ‘good’ ● Affective state delight is a possible ● Significance values shown above performance in Bubble Pop are for affective state Surprise. predictor for the successful activities. completion of activity.

Wilcoxon rank sum test (95% confidence)

Successful completions are more likely when students are in Affective State Delight

Results:

Facial action units (AU)

3

● Can we apply a novel approach of affective state estimation on children belonging to different demographics using existing open source tools? ● Given the additional information of the students’ affective state, what can we infer about the student, the different activities and the subject categories? ● Is it possible to predict the students’ next action based on their affective states? For example, can we make a real-time system to detect when the student is about to tap the back button in-between the activity?

Research questions included:

Goal: To study if there are any interesting links between student behavior and their affective states in an intelligent tutor system? Can these affective states be used to predict student’s behavior in real-time?

Mayank Saxena, Rohith Krishnan Pillai, Jack Mostow

Relating Children’s Automatically Detected Facial Expressions to their Behavior in RoboTutor


Deep learning and pattern analysis for crack detection Author

Fatma Tlili

Advisor

Gianni Di Caro

Category

Computer Science

Abstract To maintain infrastructures such as buildings, tunnels, and bridges over time, efficient and accurate crack detection becomes a very important inspection step. Traditional methods for such investigation require experienced workers to manually examine these structures. These methods are very challenging and time-consuming due to the large surface area of the structures. A more recent trend in Computer Vision has led to the emergence of multiple approaches to defect detection. These approaches range from image processing techniques (IPTs) such as Canny edges and Sobel filters to Deep Learning (DL) techniques such as Multilayer Perceptrons (MLPs) and Convolutional Neural Networks (CNN). Nevertheless, the varying real-world situations can be very challenging for the IPTs, and the DL techniques require large amounts of data for decent performance. To overcome these limitations, we propose a crack detection system that combines both image processing as well as CNNs to produce accurate results that are generalizable to real-life situations with limited data. The main steps of our system are image preprocessing, decomposition into patches, producing a probability map of the likelihood of a crack in the patch and stitching the patches to reconstruct a crack path.

60


•  •  •  •

1856 crack 2000 non-crack Dimensions: 60 x 60 Non overlapping

Developed a semi-automatic annotation system That divides the image to patches of crack and non cracks:

Annotation:

A web crawler was used to gather 200 images of cracks from the internet

Data Collection

Dataset

Maintaining infrastructures such as buildings, tunnels, and bridges over time requires regular inspection for cracks and defects to prevent their damage. However, while different approaches have tackled this problem, the challenges remain in generalizing the solutions to real life conditions while using the limited available data. Our system aims to overcome these limitations by creating a hybrid model that utilizes Statistical Pattern Recognition as as well as Deep Learning techniques in order to automate the process of investigating concrete surfaces for cracks and defects.

Introduction

Fatma Tlili, Gianni Di Caro|Carnegie Mellon University

Convolution 1 Max Pooling Convolution 2 Max Pooling Fully Connected Kernel: 5x5 Kernel: 2x2 Kernel: 3x3 Kernel: 2x2 Layer Maps: 20 Maps: 30

Original Image

Threshold Approach

Max Path Approach

Experimental classification Study To decide on the neural network architecture we performed an experimental study. We trained 9 Convolutional Neural Network (CNN) models on the raw data as well as on the Histogram of Oriented Gradients features (HoG) extracted from the data, that is the magnitude and the orientation of change of intensity at the each pixel Architectures and results •  Hog + CNN : Accuracy: 56% •  CNN on the raw patches : Accuracy 82.26% Crack Reconstruction Using Spatial Correlation

Input Image

The preprocessed patches are then passed through a multi-layer neural network which classifies the patch as crack or non-crack building a probability map

Approach

t=2

t=3

Our approach combines the main two state of the art approaches: •  Statistical Pattern Recognition •  Convolutional Neural Networks. This solution is expected to: •  Generalize to real life situations •  Minimize the requirement for gathering a large dataset. •  Combine spatial and temporal information.

Research contribution

Exploits the temporal space as the frames get closer the surface to be examined) This will be a validation state that asses the image over multiple frames which will improve the accuracy of our detector.

t=1

Implementing and demonstrating path reconstruction using temporal correlation in video.

Future work

Our Approach Choose the highest probability patches while also taking into account the direction of the patch and its proximity from the high clear crack patches.

Deep Learning and Pattern Analysis for Crack Detection


Doctor-patient communication in Qatar Author

Ali Abbas

Advisor

Selma Limam Mansar

Category

Information Systems

Abstract How can technology help bridge the communication gap between health care providers and health care receivers in Qatar? The biggest problem is that the majority of the doctors in Qatar and generally the Middle East only speak Arabic or English, and the majority of the patients only speak Hindi, Urdu or Nepali (considering the population of Qatar with a massive amount of people being migrant workers). The primary objective is to find a technological solution to help patients be able to communicate and convey their symptoms to doctors. This research looks at past solutions to communication problems in several contexts, especially in health care, and to what extent they may be successful in implementing here in Qatar. It further builds on a previous project on solving this problem but with the solution this time being a web app that will only use visuals, like graphic symbols and cartoon illustrations, as the medium of communication for the patients. This solution is then tested out in hospital scenario settings with migrant workers and doctors/medical students as volunteers, the solution is then assessed on its ease of use and effectiveness in such scenarios.

62


DOCTOR-PATIENT COMMUNICATION IN QATAR Ali Abbas, Selma Limam Mansar

Problem Statement MIGRANT WORKERS IN QATAR STRUGGLE TO COMMUNICATE THEIR SYMPOTOMS TO DOCTORS WITHOUT ANY ASSISTANCE. Qatar is an extremely high density multicultural setting.

Top 3 languages spoken by patients in Qatar: 1. Hindi 2. Urdu 3. Nepali

Patients: • Can’t communicate medical symptoms • Afraid of medical procedures they don’t understand

Doctors: • Only speak English or Arabic • Can’t treat a patient they can’t understand • Worried about misdiagnosis

Current Solutions

Description

Used by

Flaw

Incidental Interpreters

Friends, family or staff available that can speak the same language as the patient

Hamad Hospital

Patient privacy at risk, errors in interpretation, etc.

Medical Interpreters

Volunteer citizens that can speak the language and willing to help. If not physically available, contact by video calling

Hamad Hospital

Patient privacy at risk, not always available

Medical Visual Language Translator

Paper leaflet containing graphic symbols and cartoon illustrations of medical symptoms

Medical students in Weill-Cornell Medical College

Inconvenient, time consuming, made for doctors use only

Cartoon illustrations

Graphic symbols & cartoons being used to display safety messages to workers

Construction companies

Not applicable for symptoms

Resources for the deaf community

Building a voluntary interpreter network and publishing advisory pamphlets

Japanese government

Inconvenient or not always available

Medical text translation applications

Predetermined list of commonly used phrases in medical contexts available to use from patients’ phones. One being developed by QCRI as research project

Users all around the Western world

None available for Hindi, Urdu Nepali and Arabic

Research Question Solution:

HOW CAN TECHNOLOGY HELP BRIDGE THE COMMUNICATION GAP BETWEEN DOCTORS & PATIENTS?

Develop a web application that only uses graphic symbols and cartoon illustrations as medium of communication for patients. Nurses can provide tablets to patients in waiting rooms to familiarize themselves and start creating visual report of symptoms to show to doctor.

Research Design & Methodology: 1) Patient-centered design:

3) Technical design:

Adopting the principles of user-centered design (UCD) and worked closely with faculty and doctors from Weil-Cornell Medical College Qatar (WCMC-Q) to understand the medical interview process to design the experience for patients to easily be able to provide the required information for a successful diagnosis.

Developed as a web application accessible through hospital and clinic owned tablet devices. Tablets allow patients to view all possible options in one screen and virtual environment allows us to control and simplify their experience with universal features understood by anyone technology proficient. For example, using a calendar applet to allow patients to select how long they have been experiencing their symptoms.

2) Interaction design: Conducted tests with test users to make app as intuitive as possible by design it to fit patients’ mental model. Adopting dashboard design to give simple access to options and work as hints for the patient. Simpler than existing solutions such as MediBabble & Canopy Speak with confusing application structures.

This is the initial wireframe of the landing page of the proposed application.

Content of app is populated with cartoon illustrations from the Medical Visual Language Translator (shown right) which has graphics tried and tested with the Qatari patient population and used by WCMC-Q doctors and medical students.

Research Experiment: The solution needs to be tested and validated by in field test of the application. This test requires two participant groups, doctors (which will be acted by medical students from Weill-Cornell), and patients (which will be acted by cleaning workers from Al Mukhtar contractors). The test will basically be a simulation of doctor’s visit, re-enacted by members of the two participant groups, but in different variations of the resource the patient is given to use and the scenario that the patient is required to reenact.

Scenario formation: There are four different scenarios of a patient’s interaction with a doctor, complete with all the details a patient can provide his condition and all the possible diagnoses that the doctor can conclude from them. The scenarios were extracted from the GeekyMedics, a health care providers community run website with several sample scenarios for practice purposes, and then slightly modified to be used in the Qatari context.

Assessment criteria: Three sets of criteria will be measured to judge effectiveness of the solution.

Hypothesis testing: There are three variations to the test: a) No resource (control experiment) b) Visual verbal medical translator booklet c) Technological solution – web app on iPad

1- Successful diagnosis (yes/no) 2- Time taken 3- Symptoms communicated (%)

Two hypotheses: 1- Tech solution is better than visual booklet 2- Tech solution is better than no resource

Distribution: The participants have been distributed to provide the fairest and most accurate results. Each doctor will be seeing 4 patients, at least one patient with each variation of the test (no resource, booklet, iPad) and a fourth patient with a selection of one of the variations again. Each patient will present a different scenario to the doctor. Each patient will only see one doctor and will only be assigned one scenario and only one form of communication Each scenario would have been tested with the 3 different methods of communication ( app, pamphlet and no resource) 3 times

Expected Results:

Future Work:

Due to the limitations on time, resources, and availability and scheduling conflicts of both participant groups, the research test could not be done as extensively as planned with 9 doctors and 36 patients as the initial test of the application.

• Continue testing of web app solution working with Hamad Medical Corporation (HMC) Develop as a mobile app and run pilot study in labor city clinic • Release mobile on Apple App Store and Google Play Store • Have mobile app pre-installed on government owned tablets in clinics and hospitals for a large scale implementation and use of the solution to work towards Qatar 2030 vision for health care accessibility • Develop integration of speech to text translation systems for non medical translations to provide a complete translated and seamless experience working with Qatar Computing Research Institute

However, is is expected that the web app solution is the best solution compared to the other variations for the following reasons: 1. The visual booklet is slow to use, not designed for patients and rarely used by doctors 2. Having no resource/ depending on gestures leads to inaccurate diagnoses & slower 3. Web app solution is fast, easy to navigate and more accurate than other options

REFERENCES: Elnashar, M., Abdelrahim, H., & Fetters, M. D. (2012). Cultural Competence Springs up in the Desert. Academic Medicine, 87(6), 759-766. doi:10.1097/acm.0b013e318253d6c6 Bust, P. D., Gibb, A. G., & Pink, S. (2008). Managing construction health and safety: Migrant workers and communicating safety messages. Safety Science, 46(4), 585-602 Moriyama, M., Harnisch, D. L., & Matsubara, S. (1994). The Development of Graphic Symbols for Medical Symptoms to Facilitate Communication between Health Care Providers and Receivers. The Tohoku Journal of Experimental Medicine, 174(4), 387-398 Alpay, L., Toussaint, P., & Zwetsloot-Schonk, B. (2004). Supporting healthcare communication enabled by Information and Communication Technology: Can HCI and related cognitive aspects help? Proceedings of the conference on Dutch directions in HCI


Trust in commerce through Instagram in Qatar Author

Maryam Al-Naemi

Advisor

Daniel Phelps

Category

Information Systems

Abstract Instagram is an internet-based application that allows people to share their pictures or videos which can be done either publically or privately. The application as brought a great impact on various fields depending on how it has been used. One the biggest areas that have been positively affected is the business. Every successful business has a platform on where it shares it views and uses it to reach to their customers. This helps to maintain the customer and also the market as they are always updated on the new products that are coming up. Countries such Qatar also have invested in social media as they use the platforms for the bigger part of the business in the country. The organization uses the platform to advertise its products as well as use online payment transactions. Online shopping and payment transactions come with its own challenges. One of them is trust especially where electronic payments are used to make transactions between the sellers and the buyers. Before customers can eventually adopt a new technology, they must be assured of its usefulness as explained in the technology acceptable model. International online shopping stores do not promote the diversity that the Qatari people have in terms of culture. As such people are still skeptical on whether to embrace it fully although they use social media to interact with others.

64


Conclusion

60%

43%

Data analysis-The data Procedure- survey collected and stored in questionnaires were basically sent via the email Carnegie Mellon Qatar (CMQ) to Carnegie Mellon Qatar database was then statistically analyzed so as to establish (CMQ) students possible correlations between the two variables.

Sampling- a sample of Qatar Instagram users was used to quantitatively represent the larger population while collecting data for analysis.

Ethics- ethical standards were taken into consideration given that the study involved the quantitative collection and analyzing data responses from various Qatar Instagram users.

Instrument- a survey was conducted online with the aid of questionnaires.

Thus, in order to promote the use of Instagram, online vendors should consider building trust with potential and existing customers by providing the rightful information about their products.

A significant number of customers considered stressful learning and developing buying skills on Instagram. Therefore, in order to improve its use, online vendors should consider building trust with potential and existing customers by providing the rightful information about their products.

Some variables such as familiarity showed direct effect on other variables, they did not prove significant influence on its use. Specifically, the intended use and perceived ease of use are most important in determining the use of Instagram on business processes in Qatar.

94%

Design-A broader outlook of Instagram i.e., its application as well as associated purchase intentions.

Methodology Data was first collected, recorded, and then later on statistically analyzed before a conclusion was finally made. The essential subsections covered under the methodology included:

Introduction

Analysis

Structural assurance

Familiarity with e-vendor

Situational analysis

Perceived usefulness (PU) Trust

Perceived usefulness

Perceived ease of use

Intended use

Trust

The bootstrapping analysis showed more interrelationships between the behaviors. It showed the four relationships that had a significant effect on one another:

The SmartPLS algorithm was used to show the several relationships between the variables. For instance, having a co efficient of 0.348, it is was clear that the intended use of Instagram had a direct effect on the perceived ease of use of the media.

Perceived ease of use (PEOU)

Intended use

The structural behaviors were used to determine their impact with the use of Instagram in Qatar. The structural behaviors included:

Advisor: Daniel Phelps

Social medial is gaining a noticeable traction force as far as business is concerned. Based on available statistical data, 94% of successful businesses with a well-established marketing department once used social media as a platform for marketing, 60% of global marketers are found to dedicate their good amount of time developing their digital social media marketing platform. With regard to age, 20-29 years (43% of the population) usually does spend at least 10 hours on social media in a week. The success of social media has been attributed to its capability to allow people from diversified cultures to interact and share ideas on business.

Principal investigator: Maryam Al-Naemi

How much does a user’s trust of Instagram as an electronic vendor effect their intention to buy for English speaking respondents in Qatar?

Trust in commerce through Instagram in Qatar


Communicate through your eyes: A study of natural interactions with a digital cultural artifact Author

Latifa Khalid Al-Thani

Advisor

Divakaran Liginlal

Category

Information Systems

Abstract In the context of digital heritage, the purpose of this study is to investigate how we can recreate, in virtual space, the natural interaction experience of visitors with artifacts in a museum. Specifically, the research explores eye-gaze interactions and effective solutions to the related Midas Touch problem. In order to do this, we conducted observation studies of visitors interacting with paintings and photos in museums in Qatar. Inspired by the results, we created a digital cultural artifact, Al-Lulwa, that contains a collection of photos, songs, and videos about the history of pearl diving in Qatar. Experiments with Al-Lulwa confirms that eye-gaze feels natural to users than mouse interaction. The data gathered from questionnaires and interviews provides insights into the design of Al-Lulwa and ways of controlling dwelling time and fixation to address the Midas touch problem.

66


COMMUNICATE THROUGH YOUR EYES A Study of Natural Interactions with a Digital Cultural Artifact Author: Latifa Khalid Al-Thani

Advisor: Divakaran Liginlal

Natural Interactions in Virtual Space In the context of digital heritage, the purpose of this study is to investigate how we can recreate, in virtual space, the natural interaction experience of visitors with artifacts in a museum. Specifically, the research explores eye-gaze interactions and effective solutions to the related Midas Touch problem. In order to do this, we conducted observation studies of visitors interacting with paintings and photos in museums in Qatar.Inspired by the results, we created a digital cultural artifact, Al-Lulwa, that contains

The

Al-Lulwa | Experience

a collection of photos, songs, and videos about the history of pearl diving in Qatar. Experiments with AlLulwa confirms that eye-gaze feels natural to users than mouse interaction. The data gathered from questionnaires and interviews provides insights into the design of Al-Lulwa and ways of controlling dwelling time and fixation to address the Midas touch problem. Al-Lulwa was designed to recreate the natural interaction experience of visitors in a museum through the use of eye gaze interaction. Users are able to experience a virtual gallery consisting of photos, songs and audio files, which unfolds the journey of pearl diving in Qatar. Al-Lulwa executes commands based on the user’s eye-gaze interactions. For example, if a user wants to move through the gallery, they would look at the left or right edge of the gallery walls. Moreover, if a user wants to listen to the audio captions to know more about the photo, they would focus their attention on the headphone icon, and an audio file would play. The artifact also contains an underwater scene where users explore hidden pearls on the sea bed, triggering videos of the story behind each pearl.

Research Methodology Phase Phase 1: Users explore virtual gallery in Al-Lulwa Phase2: Users explore underwater scene in Al-Lulwa

Group 1 interact using eye gaze only and group 2 interact using mouse click only. Interaction is reversed in second phase.

Data on gaze interaction patterns and click stream patterns is collected during both phases

PrEmo questionnaire to measure the participant’s emotional response to the experience is distributed once a phase is completed

participants are administered a questionnaire that yield insights into user preference between the interaction modes (mouse click vs eye gaze) and are interviewed about the overall experience of the user with the digital cultural artifact.

Results Overall, how easy was it to navigate Al-Lulwa?

Which of the interactions you liked the most?

How natural was your interactive experience with Al-Lulwa?

Conclusion The results, further validated by the analysis of the emotional response data, confirm that users prefer eye-gaze over mouse interaction as eye-gaze feels more natural. Participants found the experience engaging and felt that eye-gaze was easy to use. We hope to expand our research and explore other methods of recreating the natural interaction experience of visitors with artifacts in museums.


Parents of children of autism and technology use by the children Author

Layan Yousef Azem

Advisor

Daniel Phelps

Category

Information Systems

Abstract This study aims to measure if different technologies and the level of use by a child with autism affects the perceived stress level of the caregivers of the child. A survey was designed and developed, then sent to participants through autism or child development schools and centers. Of the 32 responses we received, 97% responded with general applications (e.g. YouTube or Talking Tom). It is speculated that such applications were listed due to the relatively less severe placement of the children on the autism spectrum. This was the result of the distribution of the instrument by the schools and centers only to those parents they predicted would complete the survey instrument which was correlated with the severity of the child’s location on the autism spectrum. As such, the technology studied in this research is general vice assistive. The conclusion from this sample was that the technology has not been an indicator for stress; however, the limitations associated to the survey instrument responses are most likely the cause of that as many parents were not willing to participate, mostly mothers participated, and parents of children with less severe cases responded only. Technology, according to past researches, is significant for parents, especially when severe autism cases are involved as communication is important, and technology can facilitate it.

68


Autism

Help Them, Help Us ‘I want people to know that my son doesn’t cry in the mall because I did not raise him well, but because he is autistic.’

A developmental disorder of variable severity (spectrum) that is characterized by difficulty in social interaction & speech, nonverbal communication, & by restricted or repetitive patterns of thought and behavior. There are different types of autism, caused by different combinations of genetic and environmental influences. Common symptoms include a learning disability, delayed speech development, & not responding to their name or avoiding eye contact.

Parents of Children with Autism & Technology use by the children Does a higher level of technology use by a child with autism decrease the perceived stress level of the parent or caregiver?

Method & Survey Results A survey instrument was developed & designed to examine the types of technology used by children with autism, how much technology the child uses, and what the perceived stress level is using the Perceived Stress Scale to measure whether technology affects the family (stress level mainly). 97% responded with general applications (e.g. YouTube or Talking Tom). It is speculated that such applications were listed due to the relatively less severe placement of the children on the Autism spectrum. This was the result of the distribution of the instrument by the schools and centers only to those parents they predicted would complete the survey instrument which was correlated with the severity of the child’s location on the autism spectrum. As such, the technology studied in this research is general vice assistive.

If we are not willing to participate in research, where will we get data about this region from, & eventually help ourselves?

Less than 5% of the parents of children of autism responded to the survey

Spread Awareness AGE GROUP OF SAMPLE (CHILDREN) 16+ 3%

4 to 7 28% 8 to 11 41%

In this study: Low Technology Use = 0 or 1 Technologies High Technology Use = 2 or more Technologies

Further Study Results

Stress: ‘a state of mental or emotional strain or tension resulting from adverse or demanding circumstances’ (oxford dictionary, 2018)

Bayesian analysis was applied against two groups with the prior defined by the means and standard deviations for groups of individuals who are parents of children with autism of high and low technology use.

Long debate on whether technology enhances communication OR creates a “digital bubble” for autistic children ? The general conclusion has been that technology has a positive effect on the child and his or her family. It does not replace face-to-face communication but rather facilitates it (Parsons, S., Yuill, N., Brosnan, M., & Good, J. 2015). Mothers of children with autism experience higher levels of stress than mothers of normal children or other disabilities. (Baker et al., 2002; Dumas et al., 1991...)”

Layan Yousef Azem, Information Systems ‘18, Advised by Professor Daniel Phelps

0 to 3 16% 12 to 15 12%

Ultimately, there was no significant difference found between the two groups (mean = -2.1, 95% HDI: 78.5% < 0 < 21.5%). In this study, the technology has not been an indicator for stress. This may be the result of the limitations and resulting small effect size.


RAES: Road accidents and emergency services in the United States Authors

Muhammad Ali Bashir Umair Qazi

Advisor

Chadi Aoun

Category

Information Systems

Abstract Imagine a scenario when somebody close to you gets involved in a car crash while driving on the highway. What will you do if you have no access to emergency medical services (EMS) nearby or if these services take too long to take you to the nearest hospital? Globally, 1.3 million people die each year from road accidents and an additional 20-50 million become injured or disabled as a result (Annual Global Road Crash Statistics, 2017). According to the US National Highway Traffic Safety Administration 37,461 people were killed in 34,463 road accidents, averaging 102 deaths per day (NHTSA, 2016). The study has enabled us to conclusions that will enable governments both at a city and state level to make informed decisions about the location of EMS services in the United States of America.

70


Accidents Location

Roads

10

1

7

4

8

2

Results

11

5

Muhammad Ali Bashir and Umair Qazi - Chadi Aoun

Sources: National Highway Traffic Safety Administration (2014), United States Census Bureau (2015), Homeland Infrastructure Foundation – Level Data (2010), United States Geographic Names Information Systems Hospitals (2015)

Relevance & Impact: Policy development & Emergency Response: The study has enabled us to make some conclusions that will enable governments both at a city and state level to make informed decisions about the location of EMS services in the United States of America.

Scope of Analysis: State & City based comparative study: For emergency care in the case of road accidents, location is of utmost importance in the short term. We analyze Accident Locations from the FARS Database in the US. The data is analyzed at a state and city level.

EMS

Units of Analysis: Location & EMS: We will analyze data about road accidents in the US along with the accident location. This will allow us to make a comparison between the states which have the highest and lowest number of accidents as well as EMS services.

Data sets/source: Emergency management services play an important role in saving lives by providing early intervention at the scene of an accident. Consequently the location of the EMS in proximity to accident sites is of paramount importance.

Methodology

This is primarily due to the fact there are few or no emergency services available on highways or people in general are not aware of such services. Also, people in suburban areas are not aware of the closest EMS services in case of the occurrence of a car accident. This is an alarming issue which should be addressed by the Government and other City Planning Authorities to provide better infrastructure for road networks and healthcare to all those who require it.

Imagine a scenario when somebody close to you gets involved in a car crash while driving on the highway. What will you do if you have no access to emergency medical services (EMS) nearby or if these services take too long to take you to the nearest hospital? Globally 1.3 million people die each year from road accidents and an additional 20-50 million become injured or disabled as a result (Annual Global Road Crash Statistics, 2017). According to the US National Highway Traffic Safety Administration 37,461 people were killed in 34,463 road accidents, averaging 102 deaths per day (NHTSA, 2016).

Introduction

RAES

6

Conclusion

3

1. EMS should be placed within 5km of historical accidents locations to make cities more safer. 2. Safest Cities are very well covered by EMS. All accidents in safe cities are within 5km of EMS. 3. Dangerous cities are not well covered by EMS, and many accidents go beyond 10km of EMS. 4. Hospitals do not play an important role in avoiding accidents.

9

Road Accidents & Emergency Services in the United States

Dangerous Safest


A study on the use of educational tools amongst university students Author

Manisha Dareddy

Advisor

Daniel Phelps

Category

Information Systems

Abstract This research is a study for the senior honors thesis and the aim of this study is to analyze the role of cultural backgrounds on the acceptance of technology used by university students. This study will provide a better understanding on how university students use the various tools in the classroom and find ways to make that experience better. This study uses the existing Unified Technology Acceptance and Usability Testing model (UTAUT) developed by Venkatesh. et al (2003) as a tool to understand its relevancy to a different cultural and organizational context. In the current model, culture is not used as a factor that determines technology acceptance. The aim of the study is twofold: 1) To test if the UTAUT model holds true/is applicable in the Qatari context, specifically in the context of higher education universities. 2) Evaluating how the model reacts to adding culture as a moderating factor to determine how user behavior changes.

72


Culture

Limited research has been done that combines all the three factors above

University students

UTAUT model

GAP

(Venkatesh, el. al. 2003). Introduced the unified UTAUT model (Göğüş & Nistor, 2012) & (Al-Gahtani, et.al, 2007) Applied UTAUT in different culture (Park, 2009) Used TAM in testing university students behavior

LITERATURE REVIEW

Does a university student’s cultural backgrounds and values play a role in the way they use educational technology tools in their classrooms?

RESEARCH QUESTION

Students are constantly exposed to offthe-shelf type educational tools in university. They are forced to use them in classes, but cultural nuances are rarely addressed.

MOTIVATION

Age

Original model

Gender

Student recruitment from all majors and years

VoU

UB

Culture

SmartPLS used for designing the model and running the values

What this research tests

Experience

BI

Anonymous survey sent to ask opinion about Canvas LMS

METHODOLOGY

FC

SI

EE

PE

The UTAUT model, developed by Venkatesh et.al (2003) is a unified and comprehensive model that combines 8 different technology adaptation models. This research tests the model by: 1 Examining it’s validity in a new cultural and organizational context 2 Evaluating how the model reacts by adding culture as a new moderating effect.

RESEARCH MODEL

2

1

1

1

Senior

Junior

71%

Collectivist

Individualist

29%

Culture does not directly effect Behavioral Intention. Limitations of the study included sample size, cultural diversity and time span. Future studies should test culture along with existing moderators.

CONCLUSION, FUTURE RESEARCH

The presence of PE, EE and SI latent variables is validated. The three latent variables did not have significant effect on Behavioral Intention Order of significance effect: Social Influence > Performance Expectancy > Effort Expectancy Culture as a moderator was not significant on its own

Sophomore

15%

47%

Freshmen

18%

20%

RESULTS

78 responses, 35 usable

AUTHOR: MANISHA DAREDDY ADVISOR: DR. DANIEL PHELPS

A STUDY ON THE USE OF EDUCATIONAL TOOLS AMONGST UNIVERSITY STUDENTS


RISE: Real-time information system for emergency detection Author

Umair Qazi

Advisor

Dan Phelps

Category

Information Systems

Abstract Document classification can be done on the basis on genre, sentiment, author; intelligently or manually. We consider a subset of this problem, which is novel, and has not been explored before. How do we extract phrases which express sounds? Consider the sentence; “Veli is a male”. ‘Veli’ is a proper noun, and ‘male’ is a common noun. This sentence describes a fact. But consider the sentence; “The alarm is ringing.” ‘Ringing’ is a verb, and this is an example of a sentence which expresses sound. We would like to explore the descriptors of such sentences and make a classifier which would identify such sentences, intelligently, based on Machine Learning (ML) techniques. Acoustic Analysis of Text (AAT), is a solution to the above stated research question that has applications in artificial intelligence, machine learning and textual analysis. However, our project goes beyond simple text interpretation and applies this solution in the context of emergency detection in real-time.

74


RISE

(Real Time Information System for Emergency Detection) Umair Waheed Qazi

Document classification can be done on the basis on genre, sentiment, author; intelligently or manually. We consider a subset of this problem, which is novel, and has not been explored before. How do we extract phrases which express sounds? Consider the sentence; “Veli is a male”. ‘Veli’ is a proper noun, and ‘male’ is a common noun. This sentence describes a fact. But consider the sentence; “The alarm is ringing.” ‘Ringing’ is a verb, and this is an example of a sentence which expresses sound. We would like to explore the descriptors of such sentences and make a classifier which would identify such sentences, intelligently, based on Machine Learning (ML) techniques. Acoustic Analysis of Text (AAT), is a solution to the above stated research question that has applications in Artificial intelligence, Machine learning and Textual analysis. However, our project goes beyond simple text interpretation. For artificially intelligent agents to effectively comprehend and interact with the world, they must also be able to interpret sound. Sound differs fundamentally from other forms of information: it does not exist by itself. Instead, it results from the actions or interactions of objects. Therefore, it is much more challenging to identify a phrase that denotes a sound than it is to identify more direct factual statements such as those extracted by NELL (Mitchell & Fredkin, 2014) and NEIL (Shrivastava & Gupta, 2013). AAT, is a new domain that fits in this larger goal of enabling a machine to comprehend the world around us better through interpretation of sound. In order to comprehend the world completely the machine must be able to describe the sound in ways that permit additional inferences. Hence the first step is to determine what kind of descriptors allude to sounds in text. We are trying to solve this problem.

Since our research question deals with finding a good way to build a program that can detect a sound descriptor (SD), the procedures followed were an essential part of the research.

Since we had to use a supervised machine learning technique, and given the circumstances that the type of classification that was being done was binary (positive/negative example of a SD), we decided that a SVM would be the best possible technique in such a situation. The training dataset contained 1001 positive instances, but at a positive to negative instance ratio of 0.1783:1. The training data on the other hand contained 193 positive instances with a positive to negative ratio of 0.199:1. There was no experiment conducted without using the scaling technique as it helps to increase the accuracy or does not make any difference at all.

The methods and the techniques used in the research provided some good results in terms of accuracy of classifying a SD vs a non-SD. However, the methodology also brings with it some limitations that has in some ways affected the accuracy of the overall classifier developed such as smaller datasets. Another limitation is the fact that we decided to use only three parts of speech in the conversion of the labeled data instances into the numerical fields for each instance. Overfitting because of the ratio of the positives to negative instances being really small is a problem that we had encountered during the research project. One of the phenomena observed was the accuracy would be high at around 81% even though the predictions were almost all negative. This seems to be working in favor of a higher accuracy as the ratio of positive to negative instances is so small that such a program predicting only negative results would still show high accuracy. We hypothesize that this could be the effect of the very small ratio of positive to negative examples leading to overfitting by the SVM, which in turn could be providing only negative predictions.

Since this research is one of the first on the domain of acoustic analysis of text, it could be explored in greater depths through other research projects. Future work relevant to this field could incorporate looking at other machine learning techniques like K-nearest neighbors, Decision Trees etc. to design AAT systems. Another possible direction is to look strictly at finding grammatically/syntactical patterns through parts of speech tagging to find SDs. This would be similar to Hearst Patterns being used in Ontology population. Rule learners are also a possible alternative to discovering such patterns from text. AAT is a new domain in the AI field and it requires a lot more of in depth study and research to be able to reach the prominence and commonality that text analysis has. Through our research we were able to shed light on a novel way to acoustically analyze texts from Wikipedia. We did this by using a supervised machine learning technique like SVM, to create a program or a classifier able to predict whether a given noun phrase or verb phrase contains a sound descriptor. This being only a preliminary research on the matter also sheds light on the many challenges and limitations of AAT. However, further work on this research is continuing to increase the accuracy, the precision and the recall of the current system. Further improvements would mean that we are several steps closer to being able to create a Never-Ending-Learner for sounds, much like its counterpart such as NELL and NEIL.

Chen, X., Shrivastava, A., & Gupta, A. (2013). Neil: Extracting visual knowledge from web data. In Proceedings of the IEEE International Conference on Computer Vision (pp. 1409-1416). Love, T. (2011). Analysing Sound. Retrieved from http://www2.eng.cam.ac.uk/~tpl/asp Mitchell, T., & Fredkin, E. (2014, October). Never Ending Language Learning. In Big Data (Big Data), 2014 IEEE International Conference on (pp. 1-1). IEEE.


NEOS: Saving receipts electronically Authors

Hassan Marafih Umair Qazi

Advisor

Chadi Aoun

Category

Information Systems

Abstract Paper receipts are currently used all over the world in almost all businesses, and are used in almost every transaction made, every single day. Think about all transactions you are involved in every day, and how many receipts you receive. For every cup of coffee, you receive a paper receipt. For every grocery shopping you have done, you received a long receipt with it. For every meal, you have ever had at a restaurant you received a paper receipt. Those are a lot of receipts. The problem is that most of the time, people do not really need their receipts and just throw them away as soon as they get them. Paper receipts have an impact on the world in many different ways. They have environmental, social, economic and technological effects on the world. The current technology that is used to print receipts has been the same for a long time. Some innovation has been implemented to make it more efficient and user friendly, but the core functionality has been the same. We present a solution (NEOS) to digitize this entire process.

76


Sustainability aspect

Usability of the receipts

“It’s inconvenient due to the fact that you end up sticking the receipt in your pocket, where it stays for a while. It’s not easy to store them.”

Qualitative Feedback “Physical receipts gets wrinkly and messy which makes me decides to throw it because I cannot keep it (in its messy state).”

Purpose of the receipts.

A good solution for any problem in grounded in stakeholders research based on customer/vendor needs. To help us get a good grip on our problem, we conducted surveys with buyers and vendors based on the the following criteria:

Stakeholders Research

Solution

The process of printing receipts can sometimes lead to a lot of problems due to how the technology is inefficient and also due to hardware malfunctions.

Technological

Receipt printing machines generally cost up to 400 US dollars and the roll of thermal paper are usually sold for about one dollar per roll. In the United Kingdom alone, 145 million US dollars are being spent each year on printing receipts.

Economic

According to Qatar’s Consumer Protection Law, the consumer has “The right to obtain a dated invoice for the product purchased”. Paper receipts are sometimes seen as an inconvenience. Customers don’t want them, but must receive them by law. In most cases, receipts are often immediately discarded like when buying a cup of coffee or a meal from a restaurant.

Social

About 12 million acres of forests are destroyed every year for the paper production industry. This industry consumes a huge amount of water, up to 60 cubic meters per ton. This water is also contaminated with toxic chemicals, bleaches, and heavy metals which end up polluting water resources. The production of one ton of paper consumes the equivalent of 253 gallons.

Environmental

Challenge

NEOS Prototype Validation

- Make it more fitted for small businesses

- Integrate payment method - Make product prices flexible and modifiable

- Colors should be changed

Things that should be improved

Our application pushes receipt technology to the current up to date technological norm. Every object is electronic and action can be done electronically. There is no need to print out any papers or having to save and look out for paper copies.

Technological

All the hardware utilized in our application is currently available, existing, and being used by the vendors already. All it takes to run out application is a barcode scanner, a computer to run our application, and internet access.

Economic

Our app has been designed to be fully accessible and convenient to both the vendors and customers. Customers will no longer need to inconvenience themselves with throwing out their paper receipts or trying to save them for the chance that they may need to exchange or refund an item.

Social

The main problem in the environmental dimension was initially the use of paper and its devastating impact on the environment. For our solution, we completely cut out the use of paper. Therefore, all the energy used to produce paper for the purpose of making them into receipts, is saved.

Impact

- Really good for big franchises

- The process is really simple and easy to follow - It is faster than the current system

- Layout is simple

Environmental

Usability

Functionality

User Interface

Things they like

Once the tests have completed we asked the tested people a series of questions regarding what they liked and what could be improved and recorded their responses.

For the customer testing, we chose to test on CMU students. For the tests we gave the people our working prototypes and guided them through using it, explaining each functionality and its purpose.

To test our application we used the Think Out Loud Protocol on different types of vendors and customers to validate our application and get useful feedback to improve our overall product. For the vendor testing, we chose Khulood Pharmacy and Jazz Cafe.

Stakeholders Validation

Saving Receipts Electronically

Hassan Marafih & Umair Qazi - Chadi Aoun


Measuring corporate transparency in sustainability reporting: A study of the energy sector Author

Mohammed Zakaria

Advisors

Divakaran Liginlal Chadi Aoun

Category

Information Systems

Abstract The key objective of this research is to develop an appropriate measure of corporate transparency in sustainability reporting and apply it to examine the sustainability reporting practices of the energy sector. A comprehensive set of documents rich in sustainability content was compiled from various sources including books, research papers, social media sites, and NGO websites. These documents were then analyzed using text analytics software to build a dictionary of keywords on the topic of sustainability. The dictionary was then refined with the help of an expert in sustainability to yield a trimmed version which was then evaluated with the help of three other topic experts. The aggregated prior and conditional probabilities of occurrence of the words in the dictionary were computed based on their frequency of occurrence in the source documents. The proposed measure of corporate transparency is built upon these probabilities. Annual reports of companies in the energy sector were then scored based on this scoring criterion and transparency of OPEC and non-OPEC energy companies was studied. The results demonstrate significant differences between these two entities. The measure and methodology developed in this research may be extended to study corporate transparency in other sectors such as healthcare and other areas such as privacy, and to study corporate accounting practices.

78




Postgraduate Posters


Delay tolerant computing Authors

Mohammad Aazam Khaled A. Harras Ali E. Algazar

Category

Postgraduate Poster

Abstract The current Internet of Things (IoT) devices such as smartphone, smartwatch, in use today are getting more powerful in terms of features and capabilities, but they are still incapable of executing smart, autonomous, and intelligent tasks such as those often required for smart healthcare, ambient assisted living (AAL), virtual reality, augmented reality, intelligent vehicular communication, and in many services related to smart cities, Internet of Things (IoT), Tactile Internet, Internet of Vehicles (IoV), and so on. For many of these applications, tasks (computational or data storage) cannot be entirely performed locally and we need another entity to execute tasks on behalf of the user’s device and return the results a technique often called offloading. Many of the applications that require offloading can compromise on the delay in the execution of the tasks, but not on the incompleteness of the tasks. Delay tolerant applications pave way for a new paradigm called delay tolerant computing (DTC). Fog and edge computing is often tied to time-sensitive application. However, fog/edge can be very useful for DTC, especially when it comes to saving cost and core network’s bandwidth. This poster presents a realistic scenario related to DTC; the technical details on DTC by providing its architecture; and the experimental results based on monetary cost, computational requirements, and delay.

82


Delay Tolerant Computing Mohammad Aazam, Khaled A. Harras, Ali E. Elgazar Carnegie Mellon University - Qatar

• Several tasks require intensive computation or data storage (e.g. healthcare analytics, image/video processing, augmented reality, HD video storage) • Devices such as smartphones, smartwatches, smartglasses, sensors, and small-scale IoT devices can not perform such tasks standalone • Offloading tasks to traditional cloud is more expensive and bandwidth consuming, as well as less privacy-aware • Tasks can be offloaded to other entities (femtocloud, fog/edge, cloud) • Many of the tasks can be delay tolerant, paving way for delay tolerant computing (DTC)

I. INTRODUCTION & BACKGROUND

• DTC hierarchy has femto-cloud/cloudlet/mobile device cloud at the bottom, fog and edge in the middle, and classic cloud at the top • Higher in the hierarchy means higher computational reliability, latency, cost, and security but lower privacy

From classic computing to DTC.

II. POTENTIALS OF DTC • Scenario 1: - Cop took photos/video of a mob/demonstrators/crowd, offload to fog, cop moves to another location - Fog performs analysis with the help of cloud, informs the cop as soon as cop reappears • Scenario 2: - Cop receives the suspect’s photo from its central server, cop’s Google Glass takes photos of people/crowd, offloads to the fog and keeps on moving to new locations - Fog performs trimming and sends to the cloud - Cloud informs the cop if the suspect is detected, as soon as cop is reachable again

• DTC examples: - Smartglasses based security – smartglasses offloading to edge (smartphone) and fog, to perform image and video processing on captured data -Procuring similar content from the cloud -Analytics on health data log

DTC-enabled offloading from smartglasses.

IV. DTC CASE STUDIES

III. DTC ARCHITECTURE

• DTC case studies: (i) computational offloading on fog vs cloud • (ii) Data offloading delay to fog vs cloud • Evaluated using Java-based ONE simulator

Monetary cost to extract depth maps from 2D images.

Time cost of sending data to fog versus cloud.

V. CONCLUSION & FUTURE WORK Architecture and components of DTC.

• DTC entities: offloader, offloadee, offload as a service (OaaS) provider, offload manager (OM)

Bibliography

[1] CALAGARI, K., ELGHARIB, M., DIDYK, P., KASPAR, A., MATUSIK, W., AND HEFEEDA, M. Gradient-based 2d-to-3d conversion for soccer videos. In Proceedings of the 23rd ACM international conference on Multimedia (2015), ACM, pp. 331–340. [2] FRICKER, C., GUILLEMIN, F., ROBERT, P., AND THOMPSON, G. Analysis of an offloading scheme for data centers in the framework of fog computing. ACM Transactions on Modeling and Performance Evaluation of Computing Systems (TOMPECS) 1, 4 (2016), 16.

• Conclusion: • • Investigated delay tolerant tasks • (computation and data) by offloading them to fog and to cloud • • Converted 2D video to 3D depth maps • • Offloaded HD data to fog and to cloud and measured delay • Results endorse DTC

Acknowledgement

Future Work: We plan to implement DTC and build prototype Based on various DTC use-cases We will implement difference scenarios, including femto-cloud, edge/fog nodes, and cloud

This work was made possible by NPRP grant # 4-1330-1-213 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors.


The MADAR Arabic dialect corpus and lexicon Authors

Houda Bouamor Nizar Habash, New York University Abu Dhabi Mohammad Salameh Wajdi Zaghouani, Hamad Bin Khalifa University Kemal Oflazer

Category

Postgraduate

Abstract Dialectal Arabic (DA) is emerging nowadays as the primary written language of informal communication online in the Arab World: in emails, blogs, discussion forums, chats, SMS, etc. There has been a rising interest in research on computational models of Arabic dialects in the last decade. There have been several efforts on creating different resources to allow building models for several Natural Language Processing (NLP) applications. However, these efforts have been disjointed from each other, and most of them have focused on a small number of dialects that represent vast regions of the Arab World. In this work, we present two resources we created as part of the Multi Arabic Dialect Applications and Resources (MADAR) project. The goal of MADAR is to create, for a large number of dialects, a unified framework with common annotation guidelines and decisions, and targeting applications of Dialect Identification (DID) and Machine Translation (MT). The first resource is a large parallel corpus of 25 Arabic city dialects, in addition to the pre-existing parallel set for English, French, and Modern Standard Arabic (MSA). The second resource is a 25-way lexicon of 1,045 entries in each city’s dialect along with MSA, French, and English. These resources are the first of their kind in terms of the breadth of their coverage and their fine granularity. The kind of resources we present in this paper are useful not only for building computational systems but also for studying Arabic dialects from a linguistics perspective (e.g., computational dialectology).

84


The MADAR Arabic Dialect Corpus and Lexicon Houda Bouamor1 , Nizar Habash2 , Mohammad Salameh1 , Wajdi Zaghouani3 and Kemal Oflazer1

1

Carnegie Mellon University in Qatar, 2 New York University Abu Dhabi, 3 Hamad Bin Khalifa University http://nlp.qatar.cmu.edu/madar

Overview

Arabic and its Dialects

• We present two novel resources created as part of the Multi Arabic Dialect Applications and Resources (MADAR) project:

• Arabic language today is a collection of variants: Dialects and Modern Standard Arabic (MSA) • MSA is the shared official language and is not the language of any native speaker Differences between dialects:

1. A large corpus covering 25 Arabic city dialects in addition to Modern Standard Arabic (MSA), English and French 2. A lexicon of 1,045 concepts with an average of 45 words from 25 cities per concept

• Phonology: the letter Qaf ( /q) pronounced /q/ in Tunisian and /’/ in Egyptian and /g/ in Gulf

• The goal of MADAR is to create a unified framework for Dialectal Arabic Processing

• Orthography: No standard orthography, use of roman script, etc.

• Morphology: The MSA future marker + /sa+ or /sawfa appears /bať in as + /Ha or /raH in Levantine, + /ha+ in Egyptian and Tunisian

• Different region, sub-region, and city dialects considered in building the MADAR resources: Region Maghreb Sub-region Morocco Algeria Tunisia Cities Rabat Algiers Tunis (RAB) (ALG) (TUN) Fes Sfax (FES) (SFX)

Libya Tripoli (TRI) Benghazi (BEN)

Nile Basin Egypt/Sudan Cairo (CAI) Alexandria (ALX) Aswan (ASW) Khartoum (KHA)

Levant South Levant North Levant Jerusalem Beirut (JER) (BEI) AmDamascus man (DAM) (AMM) Aleppo Salt (ALE) (SAL)

Gulf Iraq Gulf Mosul Doha (MOS) (DOH) Baghdad Muscat (BAG) (MUS) Basra Riyadh (BAS) (RIY) Jeddah (JED)

Yemen Yemen Sana’a (SAN)

• Lexicon: Lexical differences among dialects are significant. mish, muw, lA, • Syntax: Negation is different: mA,

lam

Computational processing of dialects challenging, due to the lack of resources MADAR LexiconisGuidelines Nizar Habash, Houda Bouamor, Salam Khalifa, Mohammad Salameh,

Wajdi Zaghouani, Fadhl Eryani, Alexander Erdmann, Dana Abdulrahim The MADAR Lexicon Version 0.5 - September 25, 2017

The MADAR Corpus

• Several concept keys: triplets of words from English, French, and MSA This document specifies the guidelines for populating the Lexicon of the Multi-Arabic Dialect Applications andinto Resources project.ofWe the lexicon format, followed by are translated the(MADAR) dialects 25present cities.

What?

guidelines for lexical choice.

• Each 1.word a concept is defined in terms of its CODA orthogLexicon in Format For the purpose of lexicon editors, the lexicon is presented in a Google Sheet where every concept raphy,and CAPHI phonetic representation, and cities where it is used. its associated dialectal word forms are listed as shown below.

• Translate 12,000 sentences from English or French into Dialectal Arabic. • The sentences are extracted from the Basic Travel Expression Corpus (BTEC) [Takezawa et al.,2007]

Concept_ID

Category

139 Concept

English

French

car

voiture

Standard Arabic

Dialect

Arabic CODA

En-POS

‍ ŮŽďşłďŻżďą ďşŽع؊‏NOUN

Fr-POS

Ar-POS

NOUN

NOUN

#============= Concept_ID

• The corpus is available at: http://nlp.qatar.cmu.edu/madar/ How? The translation was handled by Ramitechs (http://www. ramitechs.com/)

Category

CAPHI

139 AUTO

EGY

‍ ďť‹ﺎďş‘ﯿﺔ‏3 a r a b i y y a

139 AUTO

LEV, IRQ, YEM

‍ ﺳﯿﺎع؊‏s a y y aa r a

139 AUTO

TUN

‍ ďť›ﺎﺒﺔ‏k r h b a

Comments/Questions

139 ADD #=============================================

• The MADAR lexicon concepts (covering There are two sections for everycontains concept: The first1,045 section (marked in yellow cells above) specifies 88.0%, 86.4% the concept definition. The second section (marked in green cells above) specifies the various and 85.5% of the lemma tokens in the English, French and MSA BTEC dialect words. corpora). 1.a Concept Definition​ ​ (Yellow Section above) The concept definition consists of five columns:

• Corpus-25: 2,000 sentences and translated them to all 25 city dialects (each of these sentences has 25 corresponding parallel translations). • Corpus-5: 10,000 additional sentences translated to five selected cities: Beirut, Cairo, Doha, Tunis and Rabat.

1. Concept_ID​ is a unique identifier of the concept

Lexicon building allows an automatic script reader to 2. Category issteps: an identifier of the content of the row. This Entry for concept (very, très, ) identify the row without keeping track of the files structure. For concept definitions, the • Automatic concept extraction: CODA CAPHI City Dialect Category is alwayskey “Conceptâ€?. brĹĄA b a r sh a Tunis, Sfax 3. English ​word Arabic) tuples ex(English, French, 4. French ​word bzAf b e z z aa f Rabat, Fez, Algiers Standard Arabic​ word tracted 5.automatically from the BTEC bkl bikkil Benghazi 6. En-POS ​Part-Of-Speech tag of the English word jdA giddan Cairo, Alexandria parallel7.corpus. Fr-POS ​Part-Of-Speech tag of the French word jdA jiddan Jeddah, Khartoum, Riyadh 8. Ar-POS ​Part-Of-Speech tag of the Arabic word Tuples are then clustered based on (3-5) are the lemma triples that disambiguate the concept. xAlS kh aa l i s. Cairo, Alexandria, Aswan their semantic similarity, such that ĹĄdyd sh a d ii d Khartoum Note​: Lexicon editors should not change any Concept definition, although report any qwy they2 ashould wi Cairo, Alexandria each cluster represents a concept. qwy g a w i Aswan, Sana’a • Manual validation of concept keys: kθyr k i t ii r Alexandria, Cairo kθyr carefully checking all the extracted k t ii r Beirut, Jerusalem, Damascus, Aleppo, Amman, Fez, Rabat concepts, correcting some cases and kθyr k th ii r Amman, Salt adding some missing entries. kθyr k th ii gh Mosul kθyr

k a t ii r Jeddah, Aswan, Khartoum • Automatic lexicon population: ex kθyr k i th ii r Riyadh, Muscat tract entries from existing resources klĹĄ k u l l i sh Basra, Baghadad klĹĄ k e l l i sh Mosul, Doha (The Karmous dictionary for Tunisian Ara

• Native speakers from each of the 25 cities translate these sentences to their dialects.

bic, the Moroccan Arabic Dialect textbook, Tharwa lexicon fro Egyptian, Levantine and MSA, Iraqi dictionary from LDC) and link

Translation Guidelines:

them to concepts. • Manual lexicon population: a large annotation effort by 13 linguists. The linguists were provided with detailed guidelines.

• Read sentences carefully and translate without adding information. • Use Arabic script, avoid any code-switching and to be internally consistent in spelling words. • The translation of idioms should not be literal but reflect the meaning of the idioms instead.

mr hlb  Â? Ď‚wm hwAyA Â? wAjd Â? wAjd Â? wAjd

marra halba 3 oo m h w aa y a w aa y i d w aa j i d w aa g i d

Jeddah Tripoli Muscat Basra, Baghadad Basra, Baghadad, Doha Benghazi, Tripoli, Doha Muscat

Acknowledgments • This work was made possible by grant NPRP 7-290-1-047 from the Qatar National Research Fund (a member of Qatar Foundation).

• Foreign words borrowed from English or French should be transliterated.

• We would like to thank our dedicated linguists who contributed in building the MADAR lexicon

• Numbers written in letters should be translated into letters

1


Guidelines and annotation framework for Arabic author profiling Authors

Anis Charfi Wajdi Zaghouani, Hamad Bin Khalifa University Syed Mehdi Esraa Mohamed

Category

Postgraduate Poster

Abstract In this work, we present Arap-Tweet, which is a large-scale multi-dialectal corpus that we built in the context of the ARAP project. The corpus has a size of over 2.4 million words and it includes Tweets from 16 countries in the Arab world representing 11 Arabic dialectal varieties. To build this corpus, we collected public data from Twitter and we provided a team of experienced annotators with clear guidelines that they used to annotate the corpus for age categories, gender, and dialectal variety. During the data collection effort, we based our search on distinctive seed words and expressions that are specific to the different Arabic dialects and we used Twitter API to retrieve up to 3200 tweets per user. After retrieving the tweets for each user, we filtered out short tweets, non-Arabic tweets, quotations and retweets. In this poster, we report on the corpus data collection and annotation efforts. The provided corpus will enrich the limited set of available language resources for Arabic and will be a key enabler for developing author profiling tools and NLP tools for Arabic.

86


Guidelines and Annotation Framework for Arabic Author Profiling

Motivation

Objective

Arabic is a widely spoken language with multiple dialects, which are frequently used in social media. Research on author profiling for Arabic has always been constrained by the limited availability of fine-grained language resources annotated w.r.t. different characteristics of authors such as dialect, age, and gender.

We aim to build a dialectal Arabic corpus (ARAPTweet), which covers 11 dialectal regions and with an approximate size of 2.4 million words.

User Identification

Annotation

We collected dialect specific seed words and

For each user, the annotators manually identified

expressions for each region. Then, we retrieved

the gender, dialect, and age group. We provided

users from the 11 dialectal regions using the Twitter

clear guidelines to help the annotators with this

API by searching for users with tweets containing

task in order to make the annotation consistent. We

these seed words. For example, the seed word

/

had three age groups: under 25, 25 till 34 &, 35 and

karhba/ ‘car’ is specific to Tunisian Arabic and the

up. The annotators had to follow specific steps to

seed word

/zo:l/ ‘man’ is specific to Sudanese

determine the age including searching for the users

Arabic. Furthermore, the annotators were trained to

on the Web and in other social networks, reading

manually identify accounts for the different dialects.

through their timeline, and using a web based tool

We tried to get at least 200 users per region equally

for age identification from profile photos.

divided by gender.

Tweet Retrieval

Filtering

After retrieving the tweets for each user, we filtered

Once we had the list of users for each region with at

out short tweets, non-Arabic tweets and retweets.

least 200 users with equal gender divide and a good

If the remaining original tweets of the user were

age distribution, we retrieved the users’ timeline

less than 500 it was replaced with a new user that

using the TweePy Python library. We had a few

has the same attributes (age group, gender, dialect).

limitations since Twitter API allows only retrieving

As part of the filtering process, we also removed

the 3200 most recent tweets of any given user.

quotations such as verses from Quran.

Authors: Anis Charfi, Wajdi Zaghouani, Syed Mehdi, Esraa Mohamed Acknowledgment: This publication was made possible by NPRP grant 9-175-1-033 from QNRF

Carnegie Mellon University Qatar


Teams of aquatic and aerial robots for marine environmental monitoring Authors

Gianni Di Caro Filippo Arrichiello, Integrated Systems for Marine Environment, Genova, Italy Enrico Simetti, Integrated Systems for Marine Environment, Genova, Italy

Category

Postgraduate Poster

Abstract This is a starting NPRP10 project that aims to integrate multiple aerial and water surface autonomous robots (UAVs, USVs) for cooperative missions in marine environments. The high-level goal of the project is to enable the robot team to perform time-extend (days/weeks) autonomous monitoring missions. At this aim, a number of scientific and technological challenges will be tackled, including: distributed planning and coordination exploiting complementary sensory-motor skills; integration of network control with mission-based decision-making; resilience to failures and hostile conditions (e.g., by self-organized assembly); use of surface robots as carriers of aerial robots, to overcome energy-limitations of UAVs / multi-copters; dynamic schedule of meeting points and takeoff and landing between UAVs and USVs. A specific monitoring use-case will be selected from an established collaboration with the Department of Biological and Environmental Sciences at Qatar University.

88


v v

q q

q q

q

q

q q

q q q

q

∗ ü ü ü

Ø

ü

Ø

Ø

~ 20

ü ü

Ø

ü

Ø

Ø

ü

ü


Offloading mobile storage to underutilized edge devices Authors

Ali Elgazar Khaled A. Harras Mohammad Aazam Abderrahman Mtibaa, New Mexico State University

Category

Postgraduate Poster

Abstract With rising media technologies, increasing online presence, and newer high-quality file types, files on average have dramatically increased in size. However, to the consumer’s dissatisfaction, storage capacities on devices have not scaled well for acceptable prices. Moreover, privacy remains a concern in regards to personal data, where compromises in online storage services can be catastrophic. Manufacturers offer higher storage capacities on devices, in exchange for higher prices, and should the user opt to buy a low storage cheaper device, they become cornered into a limited number of options when their storage runs out: 1) utilizing cloud storage, which has raised a myriad of issues regarding privacy as of late. 2) manually moving their files from their device to storage disks. 3) deleting some of their files, to free up space for new content. These options are non-autonomous and require a good deal of user intervention. We propose Edge Storage Offloading Platform (ESOP), a mobile device middleware which capitalizes on the availability of unused devices within a user’s household. ESOP addresses the aforementioned privacy and cost concerns, through identifying unused files on a user’s device and offloading them to nearby user-owned, underutilized, and trusted devices.

90


Offloading Mobile Storage To Underutilized Edge Devices Ali Elgazar +, Khaled A. Harras+ , Mohammad Aazam+ , and Abderrahman Mtibaa* + Carnegie Mellon University and *New Mexico State University I. INTRODUCTION & MOTIVATION Shortage Of Storage On Our Mobile Devices Storage space in the most recent decade has become an outstanding issue for many people around the world: ● Rising social media has lead to a rise in User Generated Content (UGC) which quickly occupies available storage on our mobile devices. ● High storage mobile devices cost much more, and manufacturers have opted to remove mini-storage cards from mobile devices. ● Cloud Storage Services (CSSes) have suffered from malicious attacks, compromising users’ data and leading to some mistrust in CSSes.

Fig. 2 Cost of higher storage devices (USD)

II. PROPOSED SOLUTION Utilizing Devices At The Edge

Edge Storage Offloading Platform (ESOP)

We can utilize our older devices that we keep at home. Statistics show that on average, a household has at least 3 Internet connected devices at all times that have underutilized storage [1][2][3].

We’ve developed an easy to deploy offloading platform that enables your mobile device to offload unpopular files. ESOP consists of two main components: ● Centralized registration authority. ● ESOP middleware deployed on all devices. The registration authority is simply responsible for connecting devices and allowing the establishment of P2P connections. The ESOP middleware is responsible for identifying unused files, ranking devices by their favorability, and offloading the unused files to said devices.

III. SIMULATED EVALUATION

Fig. 3 ESOP Architecture

IV. REAL LIFE TEST CASE

Simulation Environment

Offloading From a Mobile Device To I.O.T. Devices

In order to test our solution under a variety of different possible combinations, we implemented a simulation of ESOP utilizing One Simulator tool [4].

We implemented ESOP utilizing 3 Intel Edison devices as storage caches each carrying up to 8GBs worth of available storage, and an offloading mobile phone with 32GBs worth of occupied storage.

Delay Results With 4G Enabled

The ESOP Middleware was equipped with several functionalities in this test case: ● The user interface (UI) on the mobile device allowed the user to pick an amount of storage on his device which he would want cleared. ● An intuitive file browser with the purpose of gathering the user’s access patterns to help predict file popularity. ● The user was able to see all the devices registered under his account, he could remove, add, modify any device’s details.

Our results showed that with 4G enabled, we can offload up to 90% of 24GB storage, and still have roughly 80% of file accesses incurring a delay of less than a second.

Delay With Only WiFi Our simulations showed that when we offload 90% of 24GB storage, roughly 50% of file accesses incur a delay of less than a second. Fig. 4 CDF of delays on accessing files

This experiment showcased the ability of ESOP to function across networks and NAT boxes/Firewalls, as well as the ability to handle large offloadable storage quantities.

Storage And Battery

V. CONCLUSION & FUTURE WORK

Our simulations showed that there is a direct correlation between the amount of storage requested by the user, and battery consumption. The more the user requests as free storage, the more available storage becomes volatile, leading to much more energy consumption.

References

Conclusion

● Storage shortage is becoming a major problem with rising media technology. ● We created a platform to automatically offload user files to underutilized devices.

Future Work

Fig. 5 Storage and battery over time

[1] https://www.recode.net/2014/11/18/11632960/more-than-90-percent-of-u-s-households-have-three-or-more-devices [2] http://blog.globalwebindex.net/chart-of-the-day/digital-consumers-own-3-64-connected-devices [3] Gabay, Yarom, Francisco Jose Assis Rosa, and Ran Gilboa. "System and method for identifying underutilized storage capacity." U.S. Patent No. 9,043,184. 26 May 2015. [4] Keränen, Ari, Jörg Ott, and Teemu Kärkkäinen. "The ONE simulator for DTN protocol evaluation." Proceedings of the 2nd international conference on simulation tools and techniques. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), 2009.

● We are developing further on the method we utilize to pick files to offload in order to provide higher accuracy. ● We intend to examine our platform under different social environments and, in order to provide a solution that works across different scenarios.

Acknowledgements

This work was made possible by NPRP grant # 8-1645-1-289 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors.


Extending the range via ad-hoc communication for cooperative robotic watercraft Authors

Ahmed Emam Abderrahmen Mtibaa, Texas A&M University at Qatar Khaled A. Harras Nathan Brooks, Carnegie Mellon University Paul Scerri, Carnegie Mellon University

Category

Postgraduate Poster

Abstract In this project, we outline a low-cost, easy-to-deploy, mobile, autonomous marina sensor that can support broad set of applications including water PH sensing, water-depth sensing, oil spill detection, etc. By working cooperatively, fleets of boats (sensor nodes) can cover large areas that would be impractical otherwise. We start with describing the boat’s design, hardware and communication architecture. Then, we zoom-in the communication architecture and discuss its limited coverage range, this limitation stops the boats from going farther in their exploration quest. We propose a solution for current limitation via device-2-device ad-hoc communication methodologies, multi-hop and DTN. We discuss our evaluation of single-hop device-2-device WiFi ad-hoc communication on-water with regards to horizontal and vertical distances. At the end we discuss experiments performed on-land to validate and evaluate our multi-hop and DTN ad-hoc prototypes. In the future, we plan to evaluate our prototypes on-water using the boats, also use appropriate routing algorithms to support real-world topologies.

92


Extending The Range Via Ad-hoc Communication For Cooperative Robotic Watercraft Ahmed Emam +, Abderrahmen Mtibaa*, Khaled A. Harras+ , Nathan Brooks+ and Paul Scerri+ +Carnegie Mellon University and *Texas A&M University Qatar I. INTRODUCTION & BACKGROUND A) Application

B) Marina sensor node design

C) Architecture • Hardware

• Communication

Need a low-cost, easy-to-deploy, mobile, autonomous marina sensor network

On-board Navigator & Communicator

Current design of Lutra Prop

Command Center

II. PROBLEM & PROPOSED SOLUTION Problem

• Boat 1 achieves 2-way comm. Command center  Boat Boat  Command Center • Boat 2 achieves one-way communication, no live feedback from the boats Command Center  Boat

Method 1: Multi-hop

Method 2: DTN

- A route has to exist from end-to-end - Data is sent in real-time from one end to other via intermediate nodes - Appropriate routing algorithms are needed

-

Boats opportunistically communicate Each boat stores and forwards data Data must be delay tolerant No guaranteed data delivery

• No communication coverage in green-shaded area

Solution

• Extend communication coverage via Boat-2-Boat communication

Exploration and coverage

IV. EXPERIMENTS ON-LAND

III. EXPERIMENTS ON-WATER • •

Single-hop

Building block for any Boat-2-Boat communication Investigate communication properties on water surface (Prior works are onland or under-water)

Distance d: Horizontal distance between two phones Altitude h: Distance from phone to water surface

Time elapsing

DTN Scenario

- Sender sends data to Sink Opportunistically via Proxy - Proxy moves to blue circles, adjacent to Sender’s movement to red circles

Effective Throughput

Multi-hop

h = 5 cm

Scenario

h = 20 cm

h = 50 cm

- Sender sends data to Sink via Proxy - Proxy moves to blue circles, parallel to Sender’s movement to red circles

Effective Throughput

V. CONCLUSION & FUTURE WORK Conclusion

• Investigated single-hop ad-hoc Wi-Fi communication on-water • Built prototype for multi-hop and DTN Boat-2-Boat communication

Future Work • Better understanding of Wi-Fi performance at sea-level • Fast degradation in quality of communication because of refraction and refraction of EM waves caused by sea waves

References

[1] Valada, Abhinav, et al. "Development of a low cost multi-robot autonomous marine surface platform." Field and Service Robotics. Springer Berlin Heidelberg, 2014. [2] Scerri, Paul, et al. "Real-world testing of a multi-robot team." Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems-Volume 3. International Foundation for Autonomous Agents and Multiagent Systems, 2012.

• Perform multi-hop and DTN experiment on-water using boats • Investigate trade-offs between DTN and multi-hop in terms of explored-area size, time and number of boats used • Use appropriate routing algorithm

Acknowledgements

This work was made possible by NPRP grant # 4-1330-1-213 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors.


RAMOS: A resource-aware multi-objective system for edge computing Authors

Hend Gedawy, Hamad Bin Khalifa University Karim Habak, Georgia Institute of Technology Khaled A. Harras Mounir Hamdi, Hamad Bin Khalifa University

Category

Postgraduate Poster

Abstract Mobile and IoT devices are becoming increasingly capable computing platforms that are often underutilized. In this paper, we propose RAMOS, a system that leverages the idle compute cycles in a group of heterogeneous mobile and IoT devices that can be clustered to form an edge micro-cloud. At the heart of this system, we formulate a multi-objective, resource-aware task assignment and scheduling problem. The scheduling runs in two main modes; Latency-Minimization and EnergyEfficiency. Under the Latency Minimization mode, it strives to maximize the computational throughput of the constructed micro-cloud while maintaining the energy consumption below an operator specified threshold. Under the Energy Efficiency mode, it minimizes the total energy consumed in the micro cloud while meeting defined tasks deadlines. Due to the NP-completeness of this scheduling problem, we design a set of heuristics to solve this problem. We implement a prototype of our system and use it to evaluate its performance and assess its efficiency. Our results demonstrate the system’s ability to meet the different scheduling objectives while adhering to pre-specified time and energy constraints. Compared to other schedulers, our scheduler achieves 10% to 40% improvement in terms of latency minimization, and up to 30% improvement in terms of computational throughput. It is also able to optimize for Energy while meeting the specified deadline with an average of 86%.

94


● ● ●

● ● ●

-High and Predictable Mobility -Overlooking energy constraint

● ● ●

-Relatively stable Scenarios -Energy Consumption and Capacity Constraints

● ●

● ● ● ●

● ●

● ● ●

● ●

○ ○ ● ●

• • • • •


MADARi: A web interface for joint Arabic morphological annotation and spelling correction Authors

Ossama Obeid Salam Khalifa Nizar Habash, New York University Abu Dhabi Houda Bouamor Wajdi Zaghouani, Hamad Bin Khalifa University Kemal Oflazer

Category

Postgraduate Poster

Abstract In this paper, we introduce MADARi, a joint morphological annotation and spelling correction system for texts in Standard and Dialectal ArabicThe MADARi framework provides intuitive interfaces for annotating text and managing the annotation process of a large number of sizable documents. Morphological annotation includes indicating, for a word, in context, its baseword, clitics, part-of-speech, lemma, gloss, and dialect identification. MADARi has a suite of utilities to help with annotator productivity. For example, annotators are provided with pre-computed analyses to assist them in their task and reduce the amount of work needed to complete it. MADARi also allows annotators to query a morphological analyzer for a list of possible analyses in multiple dialects or look up previously submitted analyses. The MADARi management interface enables a lead annotator to easily manage and organize the whole annotation process remotely and concurrently. We describe the motivation, design, and implementation of this interface; and we present details from a user study working with this system.

96


MADARi: A Web Interface for Joint Arabic Morphological Annotation

and Spelling Correction

Ossama Obeid, Salam Khalifa, Nizar Habash, Houda Bouamor †, Wajdi Zaghouani ‡ and Kemal Oflazer † Computational Approaches to Modeling Language Lab, NYU Abu Dhabi † ‡

Carnegie Mellon University in Qatar

College of Humanities and Social Science, HBKU

Task Description:

• The task of human manual annotation is difficult and tedious • Several annotation interface tools have been created to assist in such effort. • Goal: Build an annotation tool for joint: o morphological annotation of Arabic: standard and dialectal. o spelling correction of Arabic texts • Morphological annotation includes indicating, for a word: its baseword, clitics, part-of-speech, lemma, gloss, and dialect identification.

Desiderata:

• • • •

Minimal requirements and setup for annotators. Allow annotators to work remotely over the web. Allow lead annotator to manage annotation process remotely. Allow annotators to switch between spelling correction and morphological annotation quickly. • Provide utilities to speed up annotation process.

MADARi Annotation Interface:

Figure 1: The Task Overview screen. (a) Task information. (b) Sentence filter bar. (c) Sentences in document (filtered).

Figure 2: The Edit Sentence screen.

• Task overview screen to get a quick glance of the current document and to filter sentences by a given keyword. • Text edit screen to edit text to conform to CODA. • Morphological analysis screen for annotating to annotate word morphology. • Utilities to speed up annotation: o Undo and redo buttons. o Apply annotation to multiple contexts of a word in the current document. o Lookup previously save annotations or query analyses from MADAMIRA.

MADARi Management Interface:

The Annotation Management Interface enables the lead annotator to: 1. Manage annotator accounts. 2. Upload and manage documents. 3. Assign and monitor tasks. 4. Export annotations.

Design and Implementation:

• Client-server web app inspired by QALB, MANDIAC, and DIWAN annotation tools. • Utilized hybrid SQL/JSON storage system used by MANDIAC. • Documents are stored one sentence per row to improve performance when annotating larger files. • Use MADAMIRA to pre-annotate analyses to decrease annotation effort.

Figure 3: The Morphological Analysis screen. (a) Original sentence. (b) Edited tokens. (c) Morphological analysis panel. (d) Analysis search panel. (e) Navigation buttons. (f) POS tag language select. (g) Document status panel. (h) Undo and redo buttons.

Acknowledgment: This work was made possible by grant NPRP 7-290-1-047 from the Qatar National Research Fund (a member of Qatar Foundation)

Figure 4: Contexts panel.


Event coreference resolution using neural network classifiers Authors

Arun Pandian Lamana Mulaffer Amna AlZeyara, Qatar University Kemal Oflazer

Category

Postgraduate Poster

Abstract This paper presents a neural network classifier approach to detecting both within- and cross-document event coreference effectively using only event mention based features. Our approach does not (yet) rely on any event argument features such as semantic roles or spatiotemporal arguments. Experimental results on ECB+ dataset show that our F1 scores significantly outperform the state-of-the art methods for both within-document and cross-document event coreference resolution when we use B3 and CEAFe evaluation measures, but gets worse F1 score with the MUC measure. However, when we use the CoNLL measure, which is the average of these three scores, our approach has slightly better F1 for withindocument event coreference resolution but is significantly better for cross-document event coreference resolution.

98


Event Coreference Resolution using Neural Network Classifiers Arun Pandian, Lamana Mulaffer, Kemal Oflazer Carnegie Mellon University in Qatar

Amna Alzeyara Qatar University

Event Coreference Resolution

• Event coreference resolution is the task of finding spans of text that refer to events, and clustering them into groups, resulting in one group per unique event • Event coreference resolution is used in text summarization, topic detection, question generation and question answering among others. S1, accused

S1: Trial date set for man accused of double murder in Millom. S2: A 23 year old man has been charged with murders of his mother and sister.

S1, double murder

S2, charged

S3: John Jenkin is due in court over the murders of Alice McMeekin, 58, and Kathryn Jenkin, 20.

S3, murders

• Within-Document (WD) event coreference resolution: groups contain events from the same document. • Cross-Document (CD) event coreference resolution: groups contain events across documents of the same topic.

S2, murders

Problem

Can we use pairwise neural network classifiers to detect WD and CD coreference resolution using only event mention-based features? • Motivation: extracting event-mention based features is easier and less error-prone than extracting event arguments. • Previous work: previous models use event arguments and more complex neural networks and clustering mechanisms.

Solution 1

Detect event mentions in the ECB+ corpus using CRFbased semi-Markov Model. --------------------------

2

Calculate coreference scores for WD (CD) pairs using the WD (CD) neural network

3

---------------------------------

Construct WD and CD mentiion pair graphs. Event mentions are vertices. Coreference scores are edges.

4

Filter edges based on WD (CD) threshold; find connected components.

Evaluation Phase:

Training Phase:

Input event mention pair:

Labelled input pairs: (S1, accused), (S2, murdered) (S2, killed), (S3, dead) + (S4, murder), (S5, taken) -

S1 context window S2 context window Trial date set for man accused of double murder in Millom A 23-year old man has been charged with murders of his mother and sister Feature vector for event mention pair: S1 contextual features

Feature vector of event pairs and their contexts:

S1 & S2 relational features S2 contextual features

[ ... 123.45, 45.98, 66.78, .... ...., 67.09, 0.01, -9.65, ... ... 1236.45, 890, 0.25, ... ]

[ ... 134.86, 86.40, -0.98, ... ... 444.98, 75.90, -5.04 .... ... 789.63, 80.04, -8.99 .... --- 789.90, 530.8, -1.4 .... ]

Neural Network:

Contextual features: • Pre-trained word embeddings of mention words • Pre-trained embeddings of context words • POS tag of event mention words Relational features: • Wordnet path similarity of mention word pair • Wordnet path similarity of hypernyms of word pair • Cosine similarity of mention word embeddings • Sentence distance between mention word pair (WD only)

Training Neural Network:

Coreference Score

Results

• WD and CD systems were evaluated using the official CONLL scorer against a standard baseline as well as results from previous work. System

Measures

WD (our results) Baseline 1: Lemma Baseline 2: HDDCRP Baseline 3: Iterative

2

BCUBED F1

CEAFe F1

CONLL F1

57.73

83.50

74.73

71.99

BCUBED F1

CEAFe F1

CONLL F1

57.95

76.41

73.74

69.37

CD (our results)

60.20

Baseline 1: Lemma

51.40

66.70

46.20

54.80

73.10

53.50

49.50

58.70

73.40

61.0

56.50

63.63

66.70 1

MUC F1

MUC F1

48.80

65.10

System

Measures

53.40

75.40

71.70

66.83

Baseline 2: HDDCRP1

62.60

72.40

71.80

68.93

Baseline 3: Iterative2

• Future work includes incorporating event arguments, joint entity and event coreference resolution and testing with other datasets.

References

1 2

Bishan Yang, Claire Cardie, and Peter Frazier. 2015. A hierarchical distance-dependent Bayesian model for even coreference resolution. Transactions of the ACL, 3:517-528. Prafulla Kumar Choubey and Ruihong Huang. 2017. Event coreference resolution by iteratively unfolding inter-dependencies among events. In Proceedings of EMNLP, pages 2117- 2123.

This research was made possible by the NPRP grant 8-1337-1-243 from the Qatar National Research Fund.


Fine-grained Arabic dialect identification Authors

Mohammad Salameh Houda Bouamor Nizar Habash, New York University Abu Dhabi

Category

Postgraduate Poster

Abstract Previous work on the problem of Arabic Dialect Identification typically targeted coarse-grained five dialect classes plus Standard Arabic (6-way classification). This paper presents the first results on a fine-grained dialect classification task covering 25 specific cities from across the Arab World, in addition to Standard Arabic -- a very challenging task. We build several classification systems and explore a large space of features. Our results show that we can identify the exact city of a speaker at an accuracy of 67.9% on a blind test (a 9% error reduction over the state-of-the-art technique for Arabic dialect identification). We also report on additional insights from a data analysis of similarity and difference across Arabic dialects.

100


Fine-Grained Arabic Dialect Identification Mohammad Salameh1 , Houda Bouamor 1, Nizar Habash2 1Carnegie

Mellon University Qatar

2New

Main Contributions:

York University Abu Dhabi

MSA:

• We extend the problem of Dialect Identification(DID) to predict 25 fine-grained city-level dialects. • We leverage the relatively rich resources from a small number of city dialects to help with the fine-grained DID task for 25 city dialects. • We present a detailed analysis of dialects similarity and confusability and redraw the geographical map for Arabic DID beyond the traditional map presented in the literature

. ً ‫هذه الغرفة صغيرة جدا‬

. ‫هاألوضة كتير زغيرة‬ . ‫هالغرفة صغيرة كتير‬ . ‫هالغرفة كتير صغيرة‬ . ‫هاي الغرفة كتير صغيرة‬ . ‫هاي الغرفة كثير صغيرة‬ . ‫هاي الغرفة كتير صغيرة‬

Beirut Damascus Aleppo Jerusalem Salt Amman

. ‫هاي الغرفة كولش صغيره‬ . ‫هاي الغرفة كلش صغيرة‬ . ‫الغرفه كلش صغيغي‬ . ‫هالحجره وايد صغيرة‬ . ‫هالغرفة واجد صغيرة‬ . ‫الغرفة صغيرة جدا‬ . ‫الغرفة دي مرا صغيرة‬

Baghdad Basra Mosul Muscat Doha Riyadh Jeddah

English: This is a small room.

Lebanon

Cairo . ‫األوضة دي صغيرة أوي‬ Aswan . ‫األوضة دي صغيرة خالص‬ . ‫ األوضة دى صغيرة جدا‬Alexandria . ‫ الغرفة دي صغيرة شديد‬Khartoum

Syria Palestine Jordan

. ‫الغرفة صغيرة قوي‬ . ‫هاد الغرفة صغيرة بزاف‬ . ‫هاد الغرفة صغيرة بزاف‬ . ‫لبيت هذي صغيرة‬ . ‫البيت هذه ياسر صغيرة‬ . ‫هاذ الغرفة صغيرة بزاف‬ . ‫الدار صغيرة بكل‬ . ‫الدار هادي صغيره هلبه‬

Iraq Oman Qatar KSA

Sana'a Fes Rabat Tunis Sfax Algeria Benghazi Tripoli

Egypt Sudan Yemen Morocco Tunisia Algeria Libya

Dialect Identification, Challenges and Datasets Dialects Similarity

We build a similarity matrix representing the lexical similarity between dialects and apply hierarchical agglomerative clustering algorithm to it using single linkage clustering 0.65

Token Dissimilarity

• Dialect Identification is the task of automatically identifying the dialect class of a particular segment of speech or text of any size. • Challenges: Dialect Arabic differs from MSA and within each other on all levels of linguistic representation: morphology, syntax, phonology and lexical. • Datasets: we use two datasets for DID: • Corpus-26: is created by translating 2,000 sentences from the Basic Traveling Expression Corpus (BTEC) into 25 Arabic city dialects + MSA • Corpus-6: is created by translating 10,000 additional sentences were translated to the dialects of five selected cities: Beirut, Cairo, Doha, Tunis, Rabat

0.25

Accuracy

Features • Word n-grams • Character n-grams • 5-gram Word and Character Language Model Scores. • Language Models are trained on the corpus of each dialect.

Dialect Confusability and Identifiability • The colors in the columns refer to the probability of assigning a specific MODEL-6 label from the six dialects we consider as anchors • there is general anchor-dialect diffusion pattern • These confusability patterns correlate with geography independently of any pre-design of the data sets

Acknowledgment:

0.45

0.35

Models and Features Models • 5-gram Character Language Model (Baseline) • Multinomial Naïve Bayes • Convolutional Neural Networks • Bidirectional Long Short-Term Memory (BiLSTM)

0.55

This work was made possible by grant NPRP 7-290-1-047 from the Qatar National Research Fund (a member of Qatar Foundation)

Model

Features

Corpus-6 Corpus-26

a. Baseline Char 5-gram LM b. MNB Character 1+2+3+4+5-grams c. MNB Word unigrams d. MNB Word unigrams & Character 1+2+3-grams e. MNB (d.) + Word 5-gram LM f. MNB (d.) + Char 5-gram LM g. MNB (d.) + Char/Word 5-gram LM (g.) + Corpus-6 Classifier Probability Scores h. MNB 30-dimentional character embedding and i. CNN 300-dimentional word embedding j. BiLSTM 30-dimentional character embedding

92.7 89.3 91.1 91.1 91.9 93.2 93.6 -

64.7 59.7 63 63.6 62.8 66.4 67.5 67.9

89.4

-

88.0

-

100%

90%

80%

70%

60%

50%

40%

30%

20%

10%

0%

RAB FES ALG TUN SFX

TRI BEN ALX

CAI ASW KHA JED SAN DOH MUS RIY

RAB

TUN

CAI

DOH

BEI

BAS BAG MOS AMM SAL

MSA

JER

ALE DAM BEI MSA


Formalization of financial trading systems in a concurrent logical framework (CLF) Authors

Dragisa Zunic Sharjeel Khan Giselle Reis

Category

Postgraduate Poster

Abstract Financial exchanges are a cornerstone of the market economy worldwide. Market logic is captured by the rules defining how an automated trading system (ATS) operates, namely how it performs matching of buy and sell orders. At the same time, the trading system must comply (operate in accordance with) to the rules set by the financial regulatory bodies. However, both these sets of rules are written in English and it is therefore difficult to formally check whether a given ATS operates as intended (implementation adequacy), and moreover whether it complies at all times to the regulatory requirements (regulatory compliance). What makes this task challenging is the fact that we are dealing with a complex infinite state space system. Therefore the conventional methods used in industry, such as testing, cannot with absolute confidence prove that the system always operates in a desirable way. This is witnessed in the market as major financial institutions that have recently been fined by the US Securities and Exchange Commission (SEC) for violating federal laws when operating their trading venues (e.g. dark pools). Our approach to addressing this problem is based on the observation that their root is the lack of a formal (logical) specification of the system and its requirements in a language where reasoning can be done systematically. We propose to specify the rules of an ATS in a logical framework, while the requirements (fairness, regulation compliance, etc.) are in effect properties that must be proved for the given specification. The current specification and proofs (by hand) are done in Celf - a tool that embodies reasoning in a concurrent logical framework.

102


Alternative Trading System (ATS) form is the document with the rules for the trading system

Solution

Financial Trading System matches buy and sell orders according to the rules

Opposite actions

The list of resident orders for price Y and action

activePrices( ,L) activePrices( ,L)

A list of all prices with resident orders

exchange(A,L,P,Y) orderQueue(Q)}

Checks if the order can be exchanged for any resident order

The new facts produced into the state space

orderQueue(front((limit,A,P,ID,N,T),Q)) dual(A, ) priceQ( ,Y,cons((ID’,N,T’),L’)) {priceQ( ,Y,L’)

The queue of buy and sell orders

Exchange a newly arrived limit-price A={buy, sell} order at price P or better, by matching it against the best opposite resident order. The order’s id, timestamp and quantity are ID, T and N respectively.

An order book contains a list of buy and sell orders

Overview

Logic Formulae (26 rules)

Rules in the ATS Form

Violations based on Regulations

Implementation Inadequacy

Fines by SEC

Mathematical Logic Theorems

Financial Trading System Properties (No crossed/locked market, order price-time priority)

Companies

SEC Fines to Companies

Incorrect Transactions

Problem

Formal Proofs and Simulations in CLF

Dragisa Zunic, Sharjeel Khan, Giselle Reis

Formalization of Financial Trading Systems in a Concurrent Logical Framework (CLF)

Millions of Dollars ($)


About Carnegie Mellon University in Qatar For more than a century, Carnegie Mellon University has challenged the curious and passionate to imagine and deliver work that matters. A private, top-ranked and global university, Carnegie Mellon sets its own course with programs that inspire creativity and collaboration. In 2004, Carnegie Mellon and Qatar Foundation began a partnership to deliver select programs that will contribute to the long-term development of Qatar. Today, Carnegie Mellon Qatar offers undergraduate programs in biological sciences, business administration, computational biology, computer science, and information systems. Nearly 400 students from 35 countries call Carnegie Mellon Qatar home. Graduates from CMU-Q are highly sought-after. Most choose careers in top organizations in Qatar and around the world, and many have pursued graduate studies. With ten graduating classes, the total number of alumni is nearly 700. To learn more, visit www.qatar.cmu.edu and follow us on: Twitter:

@CarnegieMellonQ

Instagram: @carnegiemellonq Facebook: CarnegieMellonQ

104

YouTube:

CarnegieMellonQatar

LinkedIn:

Carnegie Mellon Qatar

Leadership

Contact

Michael Trick Dean

Dean’s Office: deans-office@qatar.cmu.edu

John O’Brien Associate Dean

Research Office: cmuq-research@qatar.cmu.edu

Selma Limam Mansar Associate Dean, Education

Admission Office: ug-admission@qatar.cmu.edu

Kemal Oflazer Associate Dean, Research

Media Inquiries: mpr@qatar.cmu.edu



P .O .B ox 24866 | E du c a tion City, D oh a , Qa ta r | Ph : +9 7 4 4 4 5 4 8 4 0 0 w w w .qa ta r.c mu .e du /me e tin g -o f - th e - m i n d s


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.