CLEAR Journal June 2017

Page 1

CLEAR JUNE 2017

1


CLEAR JUNE 2017

2


CLEAR Journal (Computational Linguistics in Engineering And Research) M. Tech Computational Linguistics, Dept. of Computer Science and Engineering, Govt. Engineering College, Sreekrishnapuram, Palakkad678633 www.simplegroups.in simplequest.in@gmail.com

Editorial………………………………………… 4

Chief Editor Dr. Ajeesh Ramanujan Assistant Professor Dept. of Computer Science and Engineering Govt. Engineering College, Sreekrishnapuram, Palakkad678633

Text to Image Synthesis by Using GAN........................................06

News & Updates…………………………….5 CLEAR September 2017 Invitation………………………………………22 Last word………………………………………23

Ayishathahira C H Metaphor Processing...............................09 Sreelakshmi K

Editors Ayishathahira C H Manjusha P D Rahul M Sreelakshmi K Cover page and Layout Rahul M

Malayalam Word Sense Disambiguation Using Naive Bayes Classifier................................13 Manjusha P D Zero-Shot Translation by Google’s Multilingual Neural Machine Translation System....................................18 Sandhini S, Varsha E

CLEAR JUNE 2017

3


Dear Readers! Greetings! Dear Readers, This edition of CLEAR Journal contains articles about some interesting topics like Text to Image Synthesis by Using GAN, Metaphor Processing, Malayalam Word Sense Disambiguation Using Naive Bayes Classifier and Zero Shot Translation by Google’s Multilingual Neural Machine Translation System. In our last edition we primarily focus on researches and works done related to some trending topics like Sentiment Analysis from Amazon data, Cyber Bullying Detection, Analysing Human Activity from Mobile Phone Call Detail Records and Analysing Sentiments of Visual Contents. Our readers includes a group of individuals who have demonstrated passionate interest in natural language engineering and related fields. They have ceaselessly empowered and criticized all our endeavours and it has filled in as a motive force for the entire CLEAR team. On this optimistic prospect, I proudly present this edition of CLEAR to our faithful readers and look forward to your opinions and criticism. Best Regards, Dr. Ajeesh Ramanujan (Chief Editor)

CLEAR JUNE 2017

4


Presentation on R language The three day presentation on R language was held at GEC Sreekrishnapuram from 14th to 16th June 2017. Students from M.Tech 2016-18 batch, computer science department gave session on the topics R basics, Data Interfaces, Charts and Graphs, and Statistics. Faculties from various departments attended the session.

CLEAR JUNE 2017

5


Text to Image Synthesis by Using GAN Ayishathahira C H M.Tech Computational Linguistics Government Engineering College, Sreekrishnapuram ayishathahira007@gmail.com

Automatic synthesis of realistic images from text would be interesting and useful. However, in recent years generic and powerful recurrent neural network architectures have been developed to learn discriminative text feature representations. Deep convolutional generative adversarial networks (GANs) have begun to generate images of specific categories.

Generative adversarial network is a new framework for estimating generative models via an adversarial process, in which simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates

CLEAR JUNE 2017

the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a mini-max two-player game. This game ends when data distribution of the generator is equal to data distribution of the discriminator. Before training, text encoding is done by using deep convolutional and recurrent text encoders to obtain a visually-discriminative vector representation of text descriptions. After text encoding the encoded text and a noise input z is given to the generator part. Then the encoded text is concatenated to the random noise vector z. A synthetic image is generated by passing this concatenated input through the generator network. This synthetic image is passed to the discriminator network. The text description is also encoded by using same function and it is compressed to 6


smaller dimension. Then the discriminator network checks whether it is from real training data or from generator network. Discriminator network uses spatial batch normalization in all convolutional layers.

The training procedure is doing by using a GAN-CLS training algorithm. The algorithm uses mini batch stochastic gradient descent method. The training algorithm is applied to both discriminator and generator. A (text, image) pair is used for training. First encode the text descriptions and then generate a random noise by forwarding through generator, synthesize the image. The discriminator is trained with the inputs that the real text with real image is outputted as real. Another two inputs CLEAR JUNE 2017

are real image with wrong text and synthesized image with right text. Then update the discriminator and generator. The following algorithm explain GANCLS training. CUB dataset of bird images is used for training and it has 11,788 images of birds that belonging to 200 different categories. It split into training and test sets with 150 training and validation classes and 50 test classes. Randomly take an image and one of the captions out of 4 captions during mini-batch selection for CLS training. In case of text features, a pretrained CUB text encoder is used. CUB use a hybrid of character-level ConvNet with a recurrent neural network (char-CNN-RNN). And calculate the gradient parameters for generator and discriminator to alternatively update the generator and the discriminator network as described in the CLS-training algorithm. No of dimension for text features is set to 128 and gpu is set to 1. Noise is the normal gaussian distribution and the real label set as 1 and fake label set as 0 Base learning rate is set to 0.0002, and used the ADAM solver with momentum 0.5. mini batch size is 64 and trained for 600 epochs. cuDNN (NVIDIA CUDA Deep Neural Network library) is enabled for high performance. After 7


training, save all results to checkpoint directory. This work developed a simple and effective model for generating images based on detailed visual descriptions. This model can synthesize many plausible visual interpretations of a given text caption. In future scale up the model to higher resolution images and add more types of text.

Lifespan Prediction using Artificial Intelligence

Predicting a patient's lifespan simply by looking at images of their organs with a computer is becoming a reality. This is a new research led by the University of Adelaide, first study of its kind using medical images and artificial intelligence. “Predicting the future of a patient is useful because it may enable doctors to tailor treatments to the individual," says lead author Dr Luke Oakden-Rayner, a radiologist and PhD student with the University of Adelaide's School of Public Health. “Instead of focusing on diagnosing diseases, automated systems can predict medical outcomes in a way that doctors are not trained to do, by using large volumes of data and detecting subtle patterns," Dr Oakden-Rayner says. The researchers hope to apply same techniques to predict other important medical conditions, such as the onset of heart attacks.

CLEAR JUNE 2017

8


Metaphor Processing Sreelakshmi K M.Tech Computational Linguistics Government Engineering College, Sreekrishnapuram sreelakshmiknarayanan@gmail.com

Metaphor is a figure of speech in which a word or phrase is applied to an object or action to which it is not literally applicable. It brings vividness, distinction, and clarity to our ideas and communication. In the meantime, it plays an important structural role in our cognition, helping us to organize and project knowledge and guide our reasoning and thinking. Metaphors arise from systematic associations between distinct, and seemingly unrelated, concepts. For instance, when we talk about “the turning wheels of a political regime”, “rebuilding the campaign machinery” or “mending foreign policy”, we view politics and political systems in terms of mechanisms—they can function, break, be mended, have wheels, and so forth. Manifestations of metaphor are pervasive in language and reasoning, making its computational processing an imperative task within Natural CLEAR JUNE 2017

Language Processing (NLP). Explaining up to 20% of all word meanings according to corpus studies, metaphor is currently a bottleneck, particularly in semantic tasks. An accurate and scalable metaphor processing system would become an important component of many practical NLP applications. These include, for instance, machine translation (MT): A large number of metaphorical expressions are culture specific and therefore represent a considerable challenge in translation. Because the metaphors we use are also known to be indicative of our underlying viewpoints, metaphor processing is likely to be fruitful in determining political affiliation from text or pinning down cross-cultural and cross-population differences, and thus become a useful tool in data mining. In social science, metaphor is extensively studied as a way to frame cultural and moral models, and to predict social 9


choice. Metaphor is also widely viewed as a creative tool. Its knowledge projection mechanisms help us to grasp new concepts and generate innovative ideas. Distributional clustering techniques can be used to investigate how metaphorical cross-domain mappings partition the semantic space in different languages, For example consider English, Russian, and Spanish. In a distributional semantic space, each word is represented as a vector of contexts in which it occurs in a text corpus. Because of the high frequency and systematicity with which metaphor is used in language, it is naturally and systematically reflected in the distributional space. As a result of metaphorical cross-domain mappings, the words’ context vectors tend to be non-homogeneous in structure and to contain vocabulary from different domains. For instance, the context vector for the noun idea would contain a set of literally used terms (e.g., understand [an idea]) and a set of metaphorically used terms, describing ideas as PHYSICAL OBJECTS (e.g., grasp [an idea], throw [an idea]), LIQUIDS (e.g., [ideas] flow), or FOOD (e.g., digest [an idea]), and so on. Similarly, the context vector for politics would contain CLEAR JUNE 2017

MECHANISM terms (e.g., operate or refuel [politics]), GAME terms (e.g., play or dominate [politics]), SPACE terms (e.g., enter or leave [politics]), as well as the literally used terms (e.g., explain or understand [politics]).

Figure 1: Context vectors for game and politics Figure 1 demonstrates how metaphorical usages, abundant in the data, structure the distributional space. As a result, the context vectors of different concepts contain a certain degree of cross domain overlap, thus implicitly encoding cross-domain mappings. Figure 1 shows such a term overlap in the direct object vectors for the concepts of GAME and POLITICS. We can exploit such composition of the context vectors to induce information about metaphorical mappings directly from the words’ distributional 10


behaviour in an unsupervised or a minimally supervised way. We then use this information to identify metaphorical language. This approach involves distributional learning from large collections of text, the choice of an appropriate text corpus plays an important role in the experiments and the interpretation of results. So we need to select comparably large, widecoverage corpora in our three languages to train the systems. The corpora is then parsed using a dependency parser and VERB– SUBJECT, VERB–DIRECT OBJECT, and VERB–INDIRECT OBJECT relations were extracted from the parser output. We can use these grammatical relations (GRs) as features for clustering. The features used for noun clustering consists of the verb lemmas occurring in VERB - SUBJECT, VERB - DIRECT OBJECT, and VERB – INDIRECT OBJECT relations with the nouns in our data set, indexed by relation type. The features used for verb clustering were the noun lemmas, occurring in the above GRs with the verbs in the data set, also indexed by relation type. The feature values were the relative frequencies of the features.

CLEAR JUNE 2017

We can use both Unsupervised and semi-supervised techniques for metaphor processing. A flat clustering solution, where metaphorical patterns are learned by means of hard clustering of verbs and nouns at one level of generality can be used as a semisupervised approach. This approach to metaphor identification is based on the hypothesis of clustering by association. We can perform verb and noun clustering using the spectral clustering algorithm, which has proven to be effective in lexical acquisition tasks and is suitable for high-dimensional data. Clustering methods model modularity in the structure of the semantic space, and thus naturally provide a suitable framework to capture metaphorical information. The metaphorical cross-domain structure of the distributional space has not yet been explicitly exploited in wider NLP. Instead, most NLP approaches tend to treat all types of distributional features as identical, thus possibly losing important conceptual information that is naturally encoded in the distributional semantic space.

11


Refrences: [1] Ekaterina Shutova, Lin Sun, Patricia Lichtenstein, Elkin DarÄąo Gutierrez and Srini Narayanan, Multilingual Metaphor Processing: Experiments with Semi-Supervised and

Unsupervised Learning, Proceedings of Association Computational Linguistics, 2016

In for

[2] Ekaterina Shutova , Simone Teufel and Anna Korhonen, Statistical Metaphor Processing, In Proceedings of Association for Computational Linguistics, 2011

Optical Deep Learning "Deep Learning" computer systems, based on artificial neural networks that mimic the way the brain learns from an accumulation of examples, have become a hot topic in computer science. In addition to enabling technologies such as face- and voicerecognition software, these systems could scour vast amounts of medical data to find patterns that could be useful diagnostically, or scan chemical formulas for possible new pharmaceuticals. A team of researchers at MIT and elsewhere has developed a new approach to such computations, using light instead of electricity, which they say could vastly improve the speed and efficiency of certain deep learning computations. Their results appear today in the journal Nature Photonics in a paper by MIT postdoc Yichen Shen, graduate student Nicholas Harris, professors Marin Soljacic and Dirk Englund, and eight others. To demonstrate the concept, the team set the programmable nanophotonic processor to implement a neural network that recognizes four basic vowel sounds. Even with this rudimentary system, they were able to achieve a 77 percent accuracy level, compared to about 90 percent for conventional systems. There are "no substantial obstacles" to scaling up the system for greater accuracy, Soljacic says. Once the system is scaled up and fully functioning, it can find many user cases, such as data centers or security systems.

CLEAR JUNE 2017

12


Malayalam Word Sense Disambiguation Using Naive Bayes Classifier Manjusha P D M.Tech Computational Linguistics Government Engineering College, Sreekrishnapuram manjushapda@gmail.com

Word Sense Disambiguation is a task that determines the correct sense, selected from a set of different senses of an ambiguous word in a particular context. Polysemic words are the same word with different senses or meaning. Many polysemic words are there in natural languages. WSD system gives the exact sense. It is an important and challenging technique of natural language processing (NLP). The main applications of WSD are machine translation (MT), semantic mapping (SM), semantic annotation (SA), ontology learning (OL), information retrieval (IR), information extraction (IE), and speech recognition (SR). Words with more than one sense are called ambiguous words and the process of determining the exact sense among them in that context is called Word Sense Disambiguation. By CLEAR JUNE 2017

removing these type of ambiguities the prediction rate of all NLP tasks can be improved. Malayalam is a Dravidian language used by around 36 million people in the state of Kerala. Malayalam WSD system disambiguates the polysemic word from Malayalam sentence. In machine learning approaches, training is given to the system to perform word sense disambiguation. In Supervised Learning method, training set contains feature encoded inputs along with their appropriate category, or label. A corpus based approach of WSD based on Machine learning technique called NaĂŻve Bayes classifier is used. This frame work mainly uses two corpus called sense corpus and ambiguous corpus. Ambiguous corpus include all the possible ambiguous words available in Malayalam language and 13


sense corpus hold the synsets, synonyms of those words. Word sense disambiguation (WSD) is a mechanism for automatically defining the correct sense of a word in the context used in the field of linguistics. A particular word may have different meanings in different contexts. Identifying accurate sense in such situations is a tedious task and have an important role in this era. In many natural language processing tasks such as machine translation, information retrieval etc. Word Sense Disambiguation plays an important role to improve the quality of systems. Disambiguation requires two strict inputs: a dictionary to specify the senses which are to be disambiguated and a corpus of language data to be disambiguated. Word sense disambiguation can be done by supervised approaches, unsupervised approaches, dictionary based approaches. The supervised approaches use machine learning technique from manually created sense-annotated data. Training set consists of examples related to target word. Each occurrence of an ambiguous word is annotated with semantic label. The main task is to build a classifier which correctly classifies new cases based on their CLEAR JUNE 2017

context of use. Naive bayes classifier is one of the method of supervised approach for word sense disambiguation. NaĂŻve Bayes Classifier Nave Bayes classifiers are family of classifiers based on Bayes theorem. It is based on concept of simple conditional probability. For the purpose of disambiguation it is considered that all the features for the classification are independent of each other. Probability for each feature as an individual is calculated for a class (sense) and finally the product of them is taken. This product represents a probability of occurrences of target word in that sense. System design A supervised machine learning approach on word sense disambiguation based on naĂŻve bayes classifier is divided into different modules. Ambiguous corpus and Sense corpus is used for disambiguation. Ambiguous corpus contains all the disambiguated words in Malayalam language which is used for word sense disambiguation. Sense corpus is actually a dataset for calculating the conditional probability of different 14


senses of an ambiguous word with respect to the given context. The system shows the correct sense of an ambiguous word in a context by comparing the sense from ambiguous corpus and sense corpus. Pre-processing is an essential step in Natural Language Processing tasks. The pre-processing module include tokenization, stemming and stop words removal. In the Tokenization phase, different lexemes are sorted out. Tokens are usually words and is taken as a continuous string of characters which are separated by a space, line break, or punctuation characters. In the Stop words removal phase the commonly occurring words in documents like some verbs, adverbs and adjectives are treated as stopwords. They are removed in order to get more significant results. It reduces the size of the document. A list of stop words in Malayalam were identified. These stop words are removed from the text. In the stemming phase, the root of words occurring in the input sentence is found out. Stemmer is used to take all the root words available from the corpus that is appropriate for that sentence. When an input sentence is

CLEAR JUNE 2017

given for disambiguation it is preprocessed. After pre-processing the given input sentence, ambiguity checker is used to check the ambiguous words. An ambiguous corpus is present which is used for identifying the ambiguous word present in the given sentence, which is then used for disambiguation. Then sense lookup is performed using sense corpus, which identifies the different senses for the ambiguous word. Conditional probability checker can be used to measure the probability of different sense in the context with the help of a Naive Bayes Classifier and can select the highest probability word as the output. Disambiguation is performed on basis of bag of words approach. It calculates the conditional probability for all the senses of ambiguous word with respect to nearby words in the sentence and synonyms of both using Naive Bayes classifier. If a sense gets highest probability then that sense will be assigned. If the sentence does not contain any ambiguous word then no ambiguous word has been found will be displayed as output.

15


Algorithm Input: Sentence to be disambiguated, S. Output: Accurate sense for ambiguous word in the sentence. 1. Tokenize S into lexemes L1, L2, L3 etc... 2. For i =1, 2,....,n. 3. If Li contains any stop word goto step 5 else go to step 6 5. Remove the word. 6. For i=1, 2......n 7. Find the root words of Li and use it as feature vectors. 8. For each word, check whether any ambiguous word present or not. 9. If yes, go to step 10 else go to. 10. Extract different sense of ambiguous word. 11. For different senses 12. Compute conditional probability of sense given feature vectors. 13. Choose highest probability sense as output sense.

CLEAR JUNE 2017

Word Sense Disambiguation is a very important task in natural language processing and a new machine learning approach for Malayalam word sense disambiguation system is implemented. As compared to other approaches, it is easy to identify the sense for each ambiguous word in a sentence using Naive Bayes classifier. A corpus based approach has been adopted. A separate corpus have maintained for the whole work. Still there is an opportunity for a better corpus by including more number of 16


words so that efficiency of the system will be improved to a greater extent. A better corpus including more number of words can be used to improve efficiency of the system. Quality of WSD system is directly proportional to the quality of corpora used. As a future

work, handle the ambiguities introduced at semantic and discourse level by incorporating the necessary modules in the existing system can be implemented.

Robot uses deep learning and big data to write and play its own music

A marimba-playing robot with four arms and eight sticks is writing and playing its own compositions in a lab at the Georgia Institute of Technology. The pieces are generated using artificial intelligence and deep learning. Ph.D. student Mason Bretan is the man behind the machine. He's worked with Shimon for seven years, enabling it to "listen" to music played by humans and improvise over pre-composed chord progressions. The robot named Shimon, is now coming up with higher-level musical semantics. Rather than thinking note by note, it has a larger idea of what it wants to play as a whole.

CLEAR JUNE 2017

17


Zero-Shot Translation by Google’s Multilingual Neural Machine Translation System Sandhini S1, Varsha E2 M.Tech Computational Linguistics Government Engineering College, Sreekrishnapuram sandinisukumar@gmail.com1, varshaedakkat23@gmail.com2

To translate between multiple languages, an elegant solution is to use a single Neural Machine Translation (NMT) model. The base system requires no change in the model architecture, instead it requires an additional artificial token at the beginning of the sentence to translate into target language. The encoder, decoder and attention, of the base system remains unchanged and is shared between all languages. Utilizing a mutual word piece vocabulary, the approach empowers Multilingual NMT utilizing a solitary model with no expansion in parameters, which is altogether less complex than base model for Multilingual NMT. In addition to improving the translation quality of language pairs that the model was trained with, solution model can also learn to perform implicit bridging CLEAR JUNE 2017

between language pairs never seen explicitly during training, showing that transfer learning and zero-shot translation is possible for neural translation. Neural Machine Translation (NMT) is an end-to-end machine translation technique. Multilingual GNMT (Google’s Multilingual Neural Machine Translation) is an extension of NMT which has lot of advantages: Simplicity: At GOOGLE, they support 100 languages as source and target, so it requires 1002 models for translations which degrade the performance. Since no changes are modified to the design of the model, scaling to multiple languages is trivial. If the target language changes new

18


token is being added which is represented correctly.  Low-resource language improvements: It is observed that when language pairs with little available data and language pairs with abundant data are mixed into a single model, translation quality on the low resource language pair is significantly improved.  Zero-shot translation: The model implicitly learns to translate the language which is never seen (Zero-shot translation). For example, if Multilingual Neural Machine Translation System is trained with the translation between Portuguese → English and English → Spanish can reasonably translate Portuguese → Spanish even if the model have not seen the data for that language pair. System Architecture Multilingual Translation

for

Google’s Multilingual Neural Machine Translation System differ from the multilingual model architecture by an additional direct connection between the encoder and CLEAR JUNE 2017

decoder layers. To be able to make use of multilingual data within a single system, we propose one simple modification to the input data, which is to introduce an artificial token at the beginning of the input sentence to indicate the target language the model should translate to. For instance, consider the following English →Spanish pair of sentences: Hello, how are you? -> ¿Hola como estás? It will be modified to: <2es> Hello, how are you? -> ¿Hola como estás? to indicate that Spanish is the target language. The model learns the source language automatically. Not indicating the source language has the potential disadvantage that words with the same spelling but different meaning from different source languages can be ambiguous to translate, but the advantage is that it is simpler and we can handle input with code-switching. The model with all multilingual data consisting of multiple language pairs at once is trained by adding the token to the input data, possibly after over- or under sampling some of the data to adjust for the relative ratio of the 19


language data available. The architecture model for Google’s Multilingual Neural Machine Translation System is given below:[2]

Zero-Shot Translation Zero-shot translation can be done for which no explicitly data set have been seen. For zero-translation two multilingual models are used – a model trained with examples from two different language-pairs, Portuguese → English and English → Spanish (Model 1), and a model trained with examples from four different language-pairs, English ↔ Portuguese and English ↔ Spanish (Model 2). The models can generate reasonably good quality Portuguese → Spanish translations without ever CLEAR JUNE 2017

having seen Portuguese → Spanish data during training. To explicitly improve the zero-shot translation quality, explore two different ways of adding available parallel data and find that small additional amounts are sufficient to reach satisfactory results. In the largest experiment merge 12 language pairs into a single model of the same size as each single language pair model, and achieve only slightly lower translation quality as for the single language pair baselines despite the drastically reduced amount of modelling capacity per language in the multilingual model. Visual interpretation of the results shows that these models learn a form of Interlingua representation for the multilingual model between all involved language pairs. The simple architecture makes it possible to mix languages on the source or target side to yield some interesting translation examples. The approach has been shown to work reliably in a Google-scale production setting and

20


enables us to scale to a large number of languages quickly. References [1] Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat,”Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation”, 2016. [2] Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., Klingner, J., Shah, A., Johnson, M., Liu, X., Łukasz Kaiser,Gouws, S., Kato, Y., Kudo, T., Kazawa, H., Stevens, K., Kurian, G., Patil, N., Wang, W., Young, C.,

Smith, J., Riesa, J., Rudnick, A., Vinyals, O., Corrado, G., Hughes, M., and Dean, J. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016). [3] Gillick, D., Brunk, C., Vinyals, O., and Subramanya, A. Multilingual language processing from bytes. CoRR abs/1512.00103 (2015). [4] Firat, O., Sankaran, B., AlOnaizan, Y., Vural, F. T. Y., and Cho, K. Zero-resource translation with multi-lingual neural machine translation. arXiv preprint arXiv:1606.04164 (2016).

Technology edits voices like text New technology may do for audio recordings of the human voice what word processing software did for the written word. The software, named VoCo, provides an easy means to add or replace a word in an audio recording of a human voice by editing a transcript of the recording. New words are automatically synthesized in the speaker's voice even if they don't appear anywhere else in the recording. On a computer screen, VoCo's user interface looks similar to other audio editing software such as the popular podcast editing program Audacity or Apple's music editing program GarageBand. It offers visualization of the waveform of the audio track and a set of cut, copy and paste tools for editing.

CLEAR JUNE 2017

21


M.Tech Computational Linguistics Dept. of Computer Science and Engg, Govt. Engg. College, Sreekrishnapuram Palakkad www.simplegroups.in simplequest.in@gmail.com

SIMPLE Groups Students Innovations in Morphology Phonology and Language Engineering

Article Invitation for CLEAR- September-2017 We are inviting thought-provoking articles, interesting dialogues and healthy debates on multifaceted aspects of Computational Linguistics, for the forthcoming issue of CLEAR (Computational Linguistics in Engineering And Research) Journal, publishing on September 2017. The suggested areas of discussion are:

The articles may be sent to the Editor on or before 10th September, 2017 through the email simplequest.in@gmail.com. For more details visit: www.simplegroups.in Editor,

Representative,

CLEAR Journal

SIMPLE Groups

CLEAR JUNE 2017

22


Hello world, Automatic synthesis of realistic images from text is an interesting and useful concept. For the past few years, generic and powerful recurrent neural network architectures have been proposed to do text to image onversion. Natural language processing systems faces major challenges when they have to deal with linguistic concepts like metaphor, irony etc. So an efficient system that can do metaphor processing can improve the performance of NLP systems. Word sense disambiguation is a classical problem which have recvied lots of reasearch attention. Even then applying WSD techniques on Dravidian language like Malayalam is still a dificult task. Machine Translation is an evergreen field which have invited focus from all over the scientific world. Doing Zero-shot translation using Neural Machine transaltion system is an elegant solution. This issue of CLEAR Journal contains articles about some interesting topics like Text to Image Synthesis by Using GAN, Metaphor Processing, Malayalam Word Sense Disambiguation Using Naive Bayes Classifier and Zero Shot Translation by Googles Multilingual Neural Machine Translation System. The articles are written with the hope of spreading some light to the various trending fields related to computational linguistics. CLEAR is thankful to all who have given their time and effort for introducing their valuable ideas. Simple group invites more strivers in this field. Wish you all the success in your future endeavors‌!!!

Sreelakshmi K

CLEAR JUNE 2017

23


CLEAR JUNE 2017

24


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.