Cover Story
Faculty Talk Dr.
Inside:
Blondie24
CONTENT
03
editorial
Faculty Talk
04
Dr. Vijay S. Iyengar, Visiting Professor, CSE
DIGITAL MIND
06
Ingredients of ARP Poisoning Gunjan Bansal, 2nd year, B. Tech. CSE
NATURAL SELECTION
Blondie24 : Playing at the edge of AI
09
Pranav Kumar, 3rd year, B. Tech. CSE
COVER STORY 12
Cloud Computing: What does the Future Hold d? Gautam Sewani, 4th Year, B. Tech. CSE
BONJOUR 16
GOOD TIMES
Manish Goyal
17
Mukund R
GEEK CORNER 19
Bash History Tips and Tricks Sanmukh, 3rd Year, B. Tech. CSE
DIGITAL MIND 21
Intelligent Drug Discovery Om Prasad Patri, 3rd Year, B. Tech. CSE
NOSTALGIA
24
Krishna Kishore 25
android
26
the turing test
28
winners of turing test (node 2)
Editorial The third node of Linked List is here. It strategically follows the official launch of Windows 7, Ubuntu 9.10 and Google Wave. Courtesy these much awaited products, I have something to write on in this section of Linked List (which most of you conveniently prefer to skim through). The Redmond’s newcomer defies Darwin’s theory of evolution in some sense that it does not inherit the dilly-dallying of its predecessor. Be it performance, stability, security, or user interface, Win7 has it all improved. So those who have been waiting for something out of the box from Microsoft do try the institute’s evaluation copy of Win7, and there will be no going back to Vista! On the other hand, the new desktop edition of the popular open source distribution is also available now. Ubuntu 9.10 features a redesigned, faster boot and login experience, a revamped audio framework, and improved 3G broadband connectivity, all of which contribute to a first-class user experience. Based on Linux 2.6.31, Karmic Koala offers GNOME 2.28 and Ext4 as defaults, and adds “cloud” features and improved installation. Linux in this user-friendly avtaar is expected to force you to think twice before you empty your pockets to purchase a licensed copy of a paid operating system. Next on my list is the product from the marketing genius, the Google Wave. It invokes the nostalgia of the days when I was seeking a Gmail invite; though this time I got an invite fairly easily. For those who have been missing it out till now, Google Wave is an awesome real-time service for sharing docs, sending emails and much more. In-fact it is the most anticipated product of the year and people are already desperate for an invite. One of the strongest premise on which Google Wave has been built is to integrate and aggregate the online user’s social media/network needs. I am sure many of you would be crazy to get your hands high in this WAVE that is Google! Those of you with the most correct entries for the node 3 Turing Test (or those willing to offer me some change from the routine mess cuisine) may get lucky! Do not be misguided that an elegant publicity of certain geeky products is all this node has. This page is one place where I enjoy the complete freedom to blabber without any sort of intervention from my dear group of editors. Indeed, most of my jabbering may not feature at all in the magazine! What features is left for you to explore!
Om Prasad Pat ri Publication Secretary, CSEA
Linked List
03
Computer Science and Engineering Association, IIT Guwahati
Faculty Talk
04
with
Dr.
Vijay S. Iyengar Dr. Vijay S. Iyengar worked as Research Staff Member in IBM’s T. J. Watson Research Center, Yorktown Heights, NY, USA for 25 years where he held various technical leadership and management positions. He is currently at IIT Guwahati as a visiting professor in the CSE Department. He shares with us his experience and interests. How are you finding your stint at IIT Guwahati? Do you find any major change in how IITs used to be in your under-grad times?
to 78, which was a different era! Apart from the obvious technological changes, students today are much more “worldly aware” (notice I did not say “worldly wise”). I am delighted to see that some students are tackling advanced, real world problems even in a B.Tech. project. But I find level of student
“It is a fallacy to assume that personal financial goals cannot be a ained in the technical workplace. Top technical people can demand and get salaries that are higher than many in the management ladder.”
much higher sense of seriousness and purpose. Maybe this is because so many students in the US work hard to put themselves through college. On the research side, many US universities are successful in tackling projects that address the hard problems the industry is facing. I would love to see an increase in such projects here. I firmly believe that this will not only lead to new and relevant theoretical concepts but also produce postgraduate students who are better problem solvers and more rounded.
I am teaching a class after a long time and I find it exciting to try and leverage my industrial research experience. The Data Mining course had to be developed from scratch and this was challenging because of the varied academic background of my students. My research discussions with faculty members and students are great but we are constrained by my short stay here. Aside from academics, I am enjoying the beautiful campus with the rich bird life (and the elusive leopard!) Participating in the crosscountry race was pretty cool too.
The IBM Deep Blue team was looking at computationally intensive data analysis applications after their victory over Kasparov. They gave me the opportunity to join in that effort. After 15 years in various areas of CAD, I was eager to try enthusiasm and effort in my something completely different. class to be much lower than I had Switching areas was really hard. expected. It was a learning experience on the job and the learning for me What differences did you find in continues even after ten years. the academic culture of the US and India?
I was at IIT Madras from 1973
I am used to students having a
When and how did you develop interest in Data Mining?
Faculty Talk Which project related to applica- The choice was a no-brainer for me tion of Data Mining did you find to – the Industrial Research Lab where be the most exciting? I spent 25 years of my life. The following factors were important The project to find fraud and abuse for me. in corporate travel and business entertainment expenses is my fa- I am more excited by the real life vorite for various reasons. I had a usage of the technologies I helped great partner from the “product” develop than by writing papers. The group with deep domain experience. industry gave me access to real life We had access to huge amounts of problems, oodles of data, domain data and to the end users (audi- experts, application platforms and tors and business controls person- clients. nel). We were able to go from problem definition to an offering in the I also have the wanderlust in marketplace within a year. To top terms of areas of research. This it off, the results from a real large was possible for me to satisfy in scale application at a client indicat- an industrial research lab. Ideally ed improvements in precision by a industrial labs should drive some of factor of 5 compared to their earlier the market disruptions. At the least approaches. The technology used they must be nimble and responsive was an adaptation of Spatial Scan to market shifts and so the problems Statistics tuned to the characteris- keep changing. tics of this problem. I was also able to easily collaborate Do you think AI will ever reach a and learn from people who were level that computers can replace experts in other domains. In humans in most activities? my opinion, this is the biggest weakness in Academia. How many You are asking the wrong guy. I university projects do you see that consider myself a problem solver in are inter-disciplinary in nature? the engineering sense and not a vi- I believe they could have a huge sionary. I can imagine the com- impact. bination of learning technologies and computational engines taking A lot of students in IITs are opton more and more tasks. Embed- ing for an MBA/managerial career ded intelligence is going to be more rather than sticking to their core pervasive. A good example today is technical areas. Your opinion? robot assisted surgery where surgeons are able to improve their per- I can’t comment on any individual’s formance and the quality of care goals and aspirations. I hope they thanks to the intelligent robotic are passionate about their career systems. choice and strive to be the best in it. But, I doubt that IITs can fulfill Take your pick: Industrial Re- their purpose if predominantly search Lab or University Aca- their graduates pursue MBA/ demia? Why? management positions. Clearly, Linked List
05
we need top quality engineers in all the disciplines to develop, manufacture and service in the global marketplace. If not in India, the global industry will find the engineers it needs elsewhere. Also, it is a fallacy to assume that personal financial goals cannot be attained in the technical workplace. Top technical people can demand and get salaries that are higher than many in the management ladder. They are also given technical freedom, project and strategic responsibility and command respect in the organization (all factors for likely job satisfaction). But to succeed in the technical ladder you need to pursue postgraduate studies and really know your stuff. Mediocrity does not get rewarded (for long) in either the technical or the management ladders. What’s next? Future plans? We will be splitting our time between the US and India. I will be taking on industry/academic positions that fit this constraint. Bioinformatics is a new area of interest for me. I am also eager to pursue some of my other interests like animal welfare, conservation and cartooning. I dream of getting some of my cartoons published. [Interviewed by] Abhishek Anand and Gautam Sewani, for CSEA
Computer Science and Engineering Association, IIT Guwahati
Digital Mind
06
Ingredients of ARP Poisoning Man in the Middle Attacks By Gunjan Bansal, 2nd Year, B. Tech. CSE
Terminology Used NIC: Network Interface Card or simply the Ethernet card in our case MAC address: Media access control address or the physical address of the NIC which was meant to be globally unique but is easily spoofed. Used for communica on within LAN IP address: Internet Protocol address (of course all are familiar with this) used while communica ng across networks i.e. des na on MAC can’t be learned.
FTP: File Transfer Protocol (Protocol used to transfer files between 2 nodes) DOS: Denial of Service a ack MITM: Man In The middle a ack
LAN: Local Area Networks VLAN: Virtual Local Area Network .This can be compared with subneted network but the difference here is that the switch (layer 2 switches) can be made to handle data from different subnets (i.e. the different ports can be virtually on different sub networks). Communica on between different sub networks is s ll ARP: Address Resolu on Protocol which is used for mapping IP->MAC i.e. find MAC address correspond- done by a router. The layer 3 switches provide much ing to a par cular IP address by sending ARP request more features in here. packets on LAN (Reverse ARP is used for the opposite) OSI: Open Systems Interconnec on (ISO standard) is a model to used to standardize communica on Sniffing: Reading packets meant for other nodes. SSL Connec on: Short for Secure Sockets Layer, a protocol for transmi ng private documents via the Internet. SSL uses a cryptographic system that has two keys to encrypt data − a public key known to everyone and a private or secret key known only to the recipient of the message. User Agent: User agent is the generic term used to deNODE CACHE SHOWING IP->MAC/PHYSCIAL ADDRESS scribe any device which might access a web page (web browser, search engines, handheld mobile phones etc). MAPPING
Digital Mind
Figure Showing the IP->MAC mapping a er the ARP-Poisoning has been done Basics Devices in a Network Hub: A Mul Port device used for communica on within a single network. It is very slow and causes a lot of network overhead as it broadcasts all frames to all connected nodes all the me. It is usually replaced by switch in a large network. Switch: A Mul Port device but much more intelligent than a hub. Used for communica on within a network (Layer 3 switches an excep on which are beyond scope of this ar cle). It makes forwarding decisions based on the MAC address of the des na on and reduces network overload to a great extent. Broadcas ng in general is replaced by mul cas ng and unicas ng. Router: A Mul Port device which provides communica on between different networks (also to internet) as well as communica on between different VLAN’s.
admin themselves etc. In this ar cle only the ARP POISONING is explained. This ar cle only explains the basics of ARP Poisoning and is not a step by step guide.
This technique can be used to sniff data packets (done a er ARP Poisoning has been done), modify them and forward or just drop them or do anything the a acker wants to do with them. The soul of the a ack lies in the loop hole of the Address Resolu on Protocol (ARP) (a Stateless protocol) which is devoid of any kind of authen ca on. This means that the system won’t check if the request for a ARP mapping is authen c or is being faked neither it checks that ARP reply it received is in actual reply to a query (It accepts all replies even if it didn’t make a query) (some OS like Sun OS prevent this, may be this might also be incorporated in the upcoming patches or OS but this can also be overshooted What is ARP Poisoning ? by performing DOS a ack 1st which is beyond this ar cle). The principle of ARP spoofing is to send fake, or ARP Poisoning is one of the few techniques employed “spoofed”, ARP messages to an Ethernet LAN. Generfor sniffing/man in the middle/DOS a acks, the other ally, the aim is to associate the a acker’s MAC address being MAC spoofing (or MAC cloning i.e. flooding the with the IP address of another node (such as the deswitch with MAC addresses at rapid rates which can fault gateway or just any other node in the network). force some switches to go into fail-safe mode i.e. in Any traffic meant for the faked IP (IP address whose BROADCAST mode or act just like a hub)/a acks by MAC mapping has been changed in the ARP cache of
Linked List
07
Computer Science and Engineering Association, IIT Guwahati
Digital Mind the vic m’s PC) is routed (ONE WAY) to the a acker. This may lead to DOS a ack if the a acker doesn’t set his NIC to IP_Forwarding mode (This is pre y obvious because all data packets will end at a acker’s node). For two way data capturing, the a acker just poisons (change IP->MAC mapping by steps men oned above) both the nodes. Then it can monitor traffic flowing between them by capturing the packets and ge ng the data out of them. This is usually employed for Telnet/ FTP sessions which send passwords in clear text or for Session Stealing.
for us to interpret! , so what about encrypted data?? Well there is a catch here also. One can generate a fake cer ficate which “if accepted by vic m manually” (which he generally does for browsing websites) can cause a good man in the middle a ack (sniffing can be done). No doubt this is many a mes not possible as the connec on is already established and an encrypted key is used to store session ids/cookies so, session stealing can’t be performed. But s ll it is of good use in many poorly constructed sites. Same is applicable for SSH connec ons. But s ll to some extent brute-forcing or decryp ng may help (This must be last resort). There is much more complexity here which is beyond our topic.
How exactly does it happen ? Switch sends broadcast only for the 1st me it is turned on to populate its MAC Table (Table containing informa on about loca on of MAC address on different ports). This is done by switch by storing all the MAC addresses of the computer trying to send data to another computer and the computer that replies to it. A er that this cache is used to forward packets and no broadcas ng takes place on ports with known MAC>port Mapping. Similar thing is done by PC’s or nodes. They form a ARP cache in their memory which stores IP->MAC mapping. The frames (formed at layer 2 i.e. Data Link Layer of OSI MODEL) contain MAC address of source and des na on (des na on is ini ally empty as node doesn’t know MAC of des na on, It only knows its IP address). Remember only IP address is added at layer 3 (Network Layer). A er a successful transmission of data between 2 nodes ARP cache is maintained in the node (which contains IP->MAC mapping) thus the new frames sent will contain des na on MAC address in their frames. Hence, now the switch doesn’t look for the IP->MAC by broadcas ng/mul cas ng but instead just forward the frames to the des na on (communica on on the LAN takes place based on MAC address not IP address). This is a security lapse. If we send a fake ARP reply to a node and change the des na on MAC address to ours then the Packets will be forwarded to us and we can do whatever we want. This is generally employed between Gateway and node so that all data meant for internet is intercepted.
Solu ons ?? Till now there is no foolproof method for large LAN to stop this a ack, but this a ack can be easily no ced and admin can catch hold of the culprit easily. One of the solu ons for a small LAN is STATIC ARP entries (in switch as well as nodes) of all the connected devices. Others may include forming VLAN’s which will require a bit more knowledge to break into. Others may include ARP Inspec on on switches. Also to some extent one can switch user agent to evade popular available so ware as most of them use the user agent to start search into packet by default. They usually go into packets if the par cular user agent is found in the data packet (this way only session hijacking and password sniffing from just browser can be done). This se ng is also mostly set to use Mozilla. Se ng our User agent to an arbit agent (may cause some browser based sites to malfunc on) will help us to evade this a ack to some extent. But in actual all packets are sent to a acker’s PC, he might manually override default se ngs or go in for reading the packets himself. Think twice before proceeding when your browser complains of fake cer ficate. Sta c ARP entries are of great u lity but their use is restricted to a large extent. In the End, The Lord Of the (Token) Ring, (the fellowship of the packet), “One Ring to link them all, One Ring to ping them, one Ring to bring them all and in the darkness sniff them.”
SSL and SSH Connec ons Now the ques on might arise that what about SSL connec on?? The data sniffed above must be in clear text Linked List
08
Computer Science and Engineering Association, IIT Guwahati
Natural Selection
09
Blondie24 ying at the edge of AI By Pranav Kumar, 3rd year, B.Tech. CSE
“M
eet Blondie. A 24-year old graduate student of mathema cs at the University of California at San Diego. She skis and surfs and is an ace at math, but her real claim to fame is her ability to play checkers. She’s not good enough to defeat a grand master (yet), but she did earn a spot in the top 500 of an interna onal checkers tournament. Not bad when you consider that Blondie taught herself how to play without reading books, taking classes or ge ng ps from experienced players. Even be er when you realize that Blondie is a computer program and the rest of her persona is a product of my imagina on.” Excerpt from “Blondie24: Playing at the edge of AI” by David B. Fogel
Yes folks, Blondie24 is an ar ficial intelligence checkers-playing program developed by David B. Fogel and Kumar Chellapila in 1999. It is not the first AI program for checkers, but it is significantly different from the others. But before I go into describing how smart it is, let me tell you how it was made. The algorithm the program uses is a very simple one: a minimax algorithm. For the unini ated ones, it is policy widely used in games to maximize your advantage and minimize that of your opponent. At each step, the program looks ahead n moves for each side from the current board posi on and evaluates the results using an evalua on func on. The move corresponding to the board posi on with the highest score wins. This process is known as an n-ply-search. For example, a program which evaluates 4 moves in advance is said to execute a 4-ply-search. Correspondingly, there can be 6-ply searches, 8-ply searches, and so on. Basically, every other checkers program implements
this algorithm in one way or another. But this is not the cool part of Blondie24. Now, let us come to the evalua on func on. Conven onal “ar ficial intelligence” programs (such as the Checkers World Champion “Chinook”) rely on features that were chosen using human exper se and weighted by hand tuning. Their “intelligence” is pre-programmed into them. The secret to Chinook’s success lies in the high-speed parallel architectures which can calculate billions of possible board posi ons per second, plus an end-game database that allows it to make perfect moves a er eight or fewer pieces remain on the board. However, Blondie24 does not need such luxurious machines to run on. It uses an ar ficial neural network for its evalua on func on. Neural Nets can be trained to perform tasks without programming required informa on in them. That is, they have the ability to formulate their own tac cs. The neural net receives as input a vector representa on of the checkerboard posi ons and returns a single value which is passed on to the minimax algorithm. Thus, what Blondie24 does is, it considers each possible move for the next 4 moves (in a 4-ply search), evaluates each board posi on using the neural net, and finally decides which move to make based on the scores of the different board posi ons. But this is not the cool part either. The cool part is the training algorithm used for the neural net. It was trained with an Evolu onary Algorithm. Why is it called that? It is because it is derived from nature’s process of evolu on. The process which made us humans what we are today.
Natural Selection The process of natural selec on. The neural nets were evolved through hundreds of genera ons ll Blondie became an expert level checkers player. This is how it was done:
random from {–0.1, 0, 0.1}. For convenience, Ki’ was constrained to the range [1.0, 3.0]. These players played a set of games with each other and received points based on winning (+1 point), losing (-2 points) or drawing (0 points). A er 150 games, the 15 players with the highest scores were retained as parents for the new genera on.
Each checkerboard was represented by a vector of length 32, with each component corresponding to an available posi on on the board. Components in the vector were elements from {−K, −1, 0, +1, +K}, where 0 corresponded to an empty square, 1 was the value of a regular checker, and K was the number assigned for a king. A common heuris c has been to set this value at 1.5 mes the worth of a checker, but such exper se was eschewed in these experiments. Instead, the value of K was evolved by the algorithm. The sign of the value indicated whether the piece belonged to the player (posi ve) or the opponent (nega ve). The evolu onary algorithm began with a randomly created popula on of 15 ar ficial neural networks (also described as strategies), Pi, i = 1, …, 15, defined by the weights and biases for each network and the associated value of K. Weights and biases were sampled uniformly over [–0.2, 0.2], simply to provide a small range of ini al variability, with the value of K set ini ally at 2.0. Each strategy had an associated self-adap ve parameter vector σi, i = 1, …, 15, where each component corresponded to a weight or bias and served to control the step size of the search for new mutated parameters of the neural network. The self-adap ve parameters were ini alized at 0.05 for consistency with the range of ini al weight and bias terms. Each “parent” generated an offspring strategy by varying all of the associated weights and biases, and possibly the K value as well. Specifically, for each parent Pi, i = 1, …, 15, an offspring Pi’ was created by:
So you see, the program was learning to play checkers without any help from anyone. No other strategies were programmed. Just the basic rules, implemented by the ply-search engine, like each checker moved diagonally forward one square at a me and it became a king on reaching the last row and so on. OK, so now i can go on with boas ng about how awesome Blondie24 was. A er training for 840 genera ons (which took about 6 months using the computer technology of the 90’s), the best player was used to play with human opponents on the website h p://www.zone.com. The username Fogel and Kumar used was, yes, you guessed it, Blondie24. They chose the name so they could a ract other players easily. A er all, hardly anyone wants to play with someone having the username chellapilla24!! And the program gained its popular name from this experiment. The site used the standard of the United States Chess Federa on for ra ng players. New players started with the score 1600 and the score was adjusted with the outcome of each game and the ra ng of the opponent. As Blondie was put to the test, it recorded an impressive windraw-lose ra o of 94-32-39 in 165 games. The best win came against a human player ranked 2173 (just 27 points short of the master level), who was ranked 98th out of the 80,000 people registered at zone.com. The final ra ng of Blondie24 according to calcula ons was 2045.85 with a standard devia on of 0.48. This was an expert level ra ng and placed Blondie be er than 99.61% of the players registered at the site.
σi’(j) = σi(j) exp(τNj(0,1)) ; j = 1, …, Nw wi’(j) = wi(j) + σi’(j)Nj(0,1) ; j = 1, …, Nw where Nw is the total number of weights and bias terms in the neural network (here, 5046), τ = 1/sqrt(2 sqrt(Nw)) = 0.0839, and Nj(0,1) is a standard Gaussian random variable resampled for every j. The offspring king value K’ was obtained by:
However, the real achievement was yet to come. The current world-champion checkers program is called Chinook, rated at 2814. Chinook relies on features that were chosen using human exper se and weighted by hand tuning. It also includes a look-up table of
Ki’ = Ki + δ where δ was chosen uniformly at Linked List
10
Computer Science and Engineering Association, IIT Guwahati
Natural Selection transcribed games from previous grandmasters and a complete endgame database for all cases with up to eight pieces on the board (440 billion possible states). Chinook does not use self-learning techniques to improve its play, relying instead on opening books, perfect informa on in the endgame, and on high-speed computa on to look ahead as many ply as possible. Blondie24 certainly cannot compete with Chinook at its best, or with players at the lesser-ranked master level. Yet the evolu onary program exhibits a flexibility that cannot be achieved with Chinook or other similar approaches. It can invent new and unorthodox tac cs.
Call for Articles Since its incep on, the CSEA has conducted many workshops, lectures and programming contests to increase awareness regarding new technologies and the science behind them.
The real achievement came when Blondie succeeded in defea ng Chinook at the novice se ng. The novice-se ng of Chinook is equivalent to a highlevel expert rated player. This was a great feat and marked the rise of true “ar ficial intelligence” over conven onal programming.
Linked List is yet another a empt to get closer to the fellow IITians. Linked List requires ar cles that maintain the originality and the quality of the magazine.
And now, the team at Natural Selec on Inc. under Dr. Fogel has gone another step further with Blondie25, the Chess-playing program. Relevant paper can be viewed at: h p://65.44.200.132/Library/2006/CIG2006.pdf
Send in your ar cles to
csea@iitg.ernet.in or contact the editorial team for any clarifica ons. Looking forward for an overwhelming response.
Other useful links: Fogel and Kumar’s paper introducing Blondie24: h p://65.44.200.132/Library/2000/IntellCheckersPaper.pdf World Champion Chinook’s website: h p://www.cs.ualberta.ca/~chinook/ Also in the reading list: “Blondie24, Playing on the edge of AI” by David B. Fogel.
Linked List
11
Computer Science and Engineering Association, IIT Guwahati
Cover Story
12
Cloud Computing Gautam Sewani, 4th year B.Tech student from the So ware as a Service department of Computer Science and Enginnering explores what the buzz of Cloud Compu ng is all about. Now, if you weren’t so shameless, and had spent a few bucks buying, say a Windows 7 license (instead of asking Jal to download it for you), you would have the o ware Piracy is the most glorified criminal ac vity following ques ons troubling you: in India. Paying for so ware is a sureshot way of • When you have paid for the so ware, why is it that making yourself a subject of ridicule amongst your you can use it only on one par cular computer? peers. Indeed, you will be hard-pressed to find many It’s totally baffling - when you bought an audio individuals who have ever ‘bought’ so ware in their case e in the grand old days, you could use it on lives.(Excluding, of course, the amount we pay to any case e player without paying anything extra. Microso every me we buy a computer, but then most of us aren’t aware of it and what we don’t know • You have a friend who plays Age of Empires II (he’s doesn’t hurt us :P). a purist and frowns upon stuff like DOTA) the whole day on his computer. You, on the other hand, are an outdoor person and use your computer just
S
Cover Story
for a couple of hours daily. Why then, do you have to pay as much for the OS as your friend? •
what’s more, it’s perfectly scalable – if your site traffic increases, more resources will be automa cally allocated, so that you do not have to plan ahead, and your site won’t face any outage.
Why is so ware intalla on such a pain? Why do you have to beg the geek next door to come and fix my “thisso waresucks.dll not found” error every me you install a so ware?
Well, it turns out that some wiseguy heard your rants, and came up with whats known as So ware as a Service (Saas). In a nutshell, it’s so ware which you can use “any me, anywhere”, without installa on, paying an amount propor onal to the me for which you use the so ware. Obviously, to achieve all this, the web is the preferred medium of delivery, and most of the products in this paradigm are browser-na ve. U lity Compu ng If I said, “To set up a factory, you need to set up a power plant”, you’d call me insane, and rightly so. You can buy electricity from the government, vary your usage and pay according to the amount you use. But in To summarize, u lity compu ng provides the following the field of IT, the statement “To set up a half-decent benefits: search engine, you need to set up a $100 million data center” used to be a truism. That is, before the advent 1. The elimina on of a huge up-front payment: you can of U lity Compu ng. start small and grow as required. 2. Automa c scalability, so that you don’t have to plan Selling compu ng resources to the public, just like ahead with regard to compu ng resources. Electricity, Water, LPG and other public u li es is called 3. Ability to release resources when you are not using u lity compu ng. It has obvious advantages. Let us them, so for example if your site traffic decreases, your return to the example of the search engine you want to compu ng resource bills decrease too. set up. You think your algorithm is be er than Sergey and Page, but you are not sure if others will think the Cloud Compu ng same way. You want to try it out anyway. Without u lity compu ng, you would have to run a er venture Cloud compu ng has become a buzzword, and as with capitalists, raise a significant amount of capital and set all buzzwords, it’s got gazillion defini ons floa ng up a data center. A er all this, if the public rejects your around. Here, we define cloud compu ng to be the sum hot-shot algorithms, your investments turn to dust and of SaaS and U lity Compu ng. From the perspec ve you are doomed. of end-users, Cloud Compu ng is SaaS – Google Docs and Acrobat.com can be cited as examples, where With u lity compu ng, in contrast, you rent out we use statements like “Our documents are on the compu ng resources. You will only be charged for cloud”. From the perspec ve of a SaaS provider, Cloud the amount of resources you use (which depends Compu ng is U lity Compu ng. on the kind of traffic your search engine gets). And Linked List
13
Computer Science and Engineering Association, IIT Guwahati
Cover Story
Table: Comparison of medium-sized and large-sized data centers infrastructure like MapReduce and Google File System to use the cloud in a convenient way. This, coupled with the fact that a large part of their resources remained unu lized made ren ng them out the obvious next step. Note that the economic viability increases as the size of the cloud increases. This is illustrated by the table given above (from Internet-scale service efficiency by J. Hamilton) which compares the network, storage and administra on costs for medium-sized (1000 servers) and large-sized (50000 servers) data centers. Another compelling reason was the fact that cloud (or SaaS) versions of a lot of enterprise applica ons are being created. Google, for example has offered Gmail A cloud is an en ty containing compu ng resources. for enterprises at a fixed monthly cost wherein all the Internally, it can be a grid, a supercomputer or another data (mails etc) and so ware is on Google’s cloud, such system. If these compu ng resources are rented freeing the enterprises of storage, maintenance and out, it is known as a public cloud. The organiza on installa on costs. This is a direct a ack on Microso owning the cloud is called a cloud compu ng provider. Exchange, the predominant enterprise communica on If the organiza on does not rent out the cloud and uses system. Hence, to defend these franchises, Microso is it for it’s own internal purposes, it’s called a private forced to offer cloud-based versions of it’s enterprise cloud. applica ons. Therefore, it had to create it’s own cloud infrastructure, which led to the birth of Azure. Why Now, Not Then? What’s under my Control? It should be clear by now that Cloud Compu ng is a godsend for SaaS providers. But is it economically When you buy a computer, you get total control over it. viable from the perspec ve of a cloud compu ng Is that also true with the compu ng resources bought provider? Indeed, this lack of economic viability was on a cloud? The answer is, it depends. the missing link which had held back the emergence of compu ng as a public u lity for a long me. The There is a whole spectrum available when it comes to newfound interest in the cloud is a result of significant the level of flexibility offered. On one hand, we have changes in the landscape of the internet and the Web Amazon EC2, which looks pre y much like physical which has made owning and ren ng out a cloud an hardware. It offers API calls to request and configure a rac ve business proposi on. hardware (obviously virtualized). The user has complete control over the en re so ware stack. However, such Building a public cloud requires investment to the a high level of control has it’s flipsides too. It makes it tune of hundreds of millions of dollars. But due to the very difficult for amazon to offer automa c scalability, tremendous of growth of web services, companies like because it’s seman cs depend to a very high degree on Google, Microso and Amazon were already building the so ware stack and the applica ons used. such systems to sa sfy the compu ng needs of their own services. They also invested in crea ng so ware At the other extreme, we have the Google AppEngine, Linked List
14
Computer Science and Engineering Association, IIT Guwahati
Cover Story which only supports tradi onal Web-based applica ons with a request-reply model. It provides impressive scalability but cannot be used for general-purpose compu ng. Microso Azure is somewhere between Amazon EC2 and Google AppEngine. Unlike AppEngine, It is not restricted to a specific type of applica on. It provides users the ability to use any programming langauage. However, the language is compiled to .NET CIL (Commer Intermediate Language) and executed in the CLR (Common Language Run me). The user thus cannot change the run me and the OS. Challenges and Opportuni es Despite the fact that Cloud Compu ng has a lot going for it, a few key challenges, related to both the technology available and legal policies adopted, need to be overcome for it to realize it’s full poten al. I will discuss a few of them here.
data available to the government in interest of Na onal Security? (US Patriot Act, for instance). How secure is the data stored in clouds and what kind of encryp on should one adopt to ensure total secrecy? A word about open-source MapReduce is a cloud compu ng programming framework developed by Google. It can be termed a programming paradigm for cloud compu ng. It allows programmers to specify programs in terms of two opera ons: Map and Reduce. The terminology is borrowed from func onal programming. For more details, refer to the ar cle on MapReduce in the book Beau ful Code.Hadoop is an open-source implementa on of MapReduce, primarily backed by Yahoo. Eucalyptus is an opensource infrastructure for implemen ng clouds on clusters (provided with the latest Ubuntu distros). Conclusion
As men oned earlier, a big a rac on of Cloud Compu ng to SaaS providers is the feature of automa c scalability. The current implementa ons of this feature leave a lot to be desired. For example, Amazon will charge you by the number of ‘instances’ you occupy, without taking into account the computa onal cycles being used by those instances. An area of ac ve research is to use Machine Learning to offer automa c scalability with many research labs working on it. Virtualiza on is a key technology for Cloud Compu ng. While the benefits it provides are significant, it also comes with a performance penalty. Analysis has shown that though VM’s (Virtual Machines) are excellent at sharing CPU and Main Memory, they cause a sharp dip in performance when it comes to I/O. A key challenge then is to make I/O architectures which work well with a large number of VM’s. Cloud Compu ng throws up some important legal ques ons. What happens if a country requires it’s enterprises to keep customer data within it’s na onal boundary? And wouldn’t enterprises be nervous about storing data in countries where laws exist to make this Linked List
15
We are lucky to live in a me where all the economic and technological factors have conspired to make cloud compu ng viable. Based on the changes we have witnessed, specula on on future trends con nues unabated. Certain changes are inevitable. So ware will have to change to run on clouds instead of stand-alone hardware. Virtualiza on will witness rapid development to allow cloud providers to offer a single physical machine to as many customers as possible. Just as availability of water as a public u lity rendered wells useless, cloud compu ng may make thick clients like powerful Desktops superfluous. Or Cloud Compu ng might run into a brick wall and be dismissed as a fad of our mes. Whatever the case maybe, one thing is for sure, (as Dylan said) The Times They Are a-Changin’! [Wri en by] Gautam Sewani, 4th Year, B.Tech., Dept. of Computer Science & Engineering.
Computer Science and Engineering Association, IIT Guwahati
Bonjour
16
Manish Goyal Manish, M.Tech. 2nd year CSE, recounts his summer internship experiences at Verimag Research Lab, Grenoble, France.
On 27th March, I received an offer from Verimag Research Lab, a leading research centre on theore cal and technical aspects of modeling, developing and formally verifying real me systems. Model Checking and verifica on being my area of interest, it was like a dream come true. Applica on, confirma on, excitement, last minute visa and there I was in the Land of Fashion. Happiness of stepping on foreign land was enhanced manifold in the train journey when a Dutch girl seated beside me started the conversa on with a “Bonjour!” I wished the train would run a bit slower! (unfortunately, it was the TGV). At mes like these, one realizes the importance of Indian Rails! Next morning, it was wonderful to see myself surrounded by the snow-capped peaks of the Chartreuse, Belledonne and Alps. What a scenic start of a new day in a new country with an awesome climate! My advisor, Dr. Oded Maler (a pioneer in my research field), made me comfortable with the project, lab rou ne, access policies etc. No one was allowed to be in the lab a er midnight. In Dr. Maler’s words - “You are supposed to sleep during night”. (Any chance he might be aware of IITians’ schedules?) It was good to know that people were flexible and open to my ideas too. Food (followed by language) was the major problem as even non-veggies found it difficult to survive. We (Rohith, Goverdhan and I) ended up cooking on our own with me as the lead and the only chef!
cast a perfect image in our minds. We visited Mount Titlis, Interlaken, Geneva and Zurich. Shockingly, we met Indians at every place, the climax being when we were traveling to Mount Titlis and an Uncle in an approaching train shouted: “Beta, kahan se aa rahe ho?” Kudos to our Popula on! Night-outs which were spent roaming around added another feather to the Swiss “cap”. We also travelled to Cannes (the city of hotels), Nice (city of “beau ful” beaches) and watched the fabulous fireworks on Bas lle Day. A er 2.5 months and a collec on of over 1500 pics (Digicams having revolu onized our world with thoughts as “10 mein se 1 to acchi niklegi”), it was me to come out of the dream. We had reserved 2 days for Paris and visited the Louvre, Eiffel, Arc de Triomphe and Sacré-Cœur.
Till now, I have portrayed only the lighter side. You would wonder if I ever worked! Indeed I worked hard rest of the me (with a sense of responsibility too to keep up the legacy of the IIT brand). I was involved in the implementa on of one of their projects using Matlab/Simulink. Who else would be happier than CS guys if they get to code! I realized how actual research was done, in the true sense. Their passion towards their work and research is commendable, with a clear demarca on between their personal and professional lives. They believe in enjoying every moment, be it at the workplace or outside. I thanked Dr. Maler and Alex (my co-guide) a er discussing future aspects and research opportuni es. Eventually, I waved adieu to Almost se led, it was me for the “Euro Trip” and France on 24th July. My vote of thanks to Dr. Purandar the first des na on was Switzerland. We procured Bhaduri for his support and encouragement that got eatables which were supposed to be unaffordable in me started and my friends Prabhat and Vallabh for the Swiss terrains. Train journeys through green valleys, playing “first cri cs” to this piece. Wish everyone a snow covered mountains, dark tunnels and waterfalls bright future!
Good Times
17
Mukund R Mukund, 4th Year B. Tech. CSE, tells us about his extra-internship activities during the summer at Microsoft Bing, Redmond, USA A er three long interviews, and another short one with the consular officer for my visa, it was confirmed that I would be going to Microso for my internship. The apartments in which we would stay - two of us from Guwaha , three from Kgp, and seven from Kanpur were booked, and so were the flight ckets. Microso wanted us to stay for 12 weeks, but our vaca on was only 11.5 weeks long, so it was cramped - we were to leave the day a er our endsems ended.
its integra on with Visual Studio, so that programmers elsewhere within Bing could have an easier me with the language. It was more at a proof-of-concept level, since my manager himself said that this was actually a 1 year opera on, and they wanted to see how possible/ easy it was. Work sure was memorable, and also were my manager’s constant words of encouragement. But I’m supposed to be talking about extra-curricular ac vi es here, so let’s move on.
Everyone’s third year internship is memorable, and so was ours. Except that our memories started a li le earlier - at Guwaha airport, to be precise. I had not slept the night before because stuff needed to be packed, and the hostel room cleared out. What’s more, nobody told me that we wouldn’t really be needing woollens there. So I had a more-than-full suitcase that refused to shut, and I was just wai ng to fall asleep. We reached the airport 1.5 hours before the flight was due, and I took my me freshening up, and ge ng my suitcase to close properly. We approached the check-in counter half-an-hour before the flight was scheduled, and the lady there, very politely and s ll smiling, refused to let us get on the plane. Lesson learned: “Don’t waste me at the airport.”
One thing that most Indians will no ce there is how polite people are (although our stereotypical American isn’t quite so). The first few days, I spent training myself to respond courteously to gree ngs of “How do you do?” with “Good, and how about you?”, and wishing others “Have a great day!” The first me someone greeted me like that, I blushed, caught my tongue, and didn’t quite know what to say.
America has a policy called the Uniform Monday Holiday Act - with this, most government holidays are defined as the first or last Monday of some month. You always get long weekends, and never miss a holiday because it falls on a weekend. I argue here that this system should come to India as well, at least we’ll have a much be er apprecia on of how many holidays We finally reached Redmond on me, and then our we have in a semester. The first such long weekend home. Whatever you may say, jet-lag does occur was Memorial Day, and we decided to go over to Los - and my earlier experiments with sleep put me out Angeles. Several memorable “kand”s happened in LA of ac on for the next 18 hours. The next day was our I’ll only describe the least of them here. first day at work - my work was on a Microso -internal programming language named Scope, in which many The average Indian has to worry about his finances, and of the queries to Bing’s huge store of data were made. the twelve of us were not that much above average. Its similar to SQL, except that it is op mized for the The area in which we chose to stay in LA was probably resources that Microso uses in Bing. I had to improve a bad choice, and everyone was scared for their lives,
Good Times especially a er we no ced that the local Pizza Hut had thicker glass windows than in the visa office, and a prominent sign read, “Cash registers operated by me lock. Management cannot open them, even if demanded to do so.” We had heard that LA has a good public transport system. Again, in the interests of saving money, we opted to use public transport. Later in the night, at about 1 am, we found ourselves about 20 miles from the hotel, and nobody knew how the bus service worked. Finally we had to call for 3 cabs, and get a ride back to the hotel. Lesson learned: “Only New York City has public transport.”
parachute opens, you’re probably safe. If it doesn’t, then you’ll definitely die. Did you have to do this?” Well, the chute opened, and I landed safely. But no, I was never scared.
Speaking of Near-Death Experiences, we went to a place called Six Flags in New Jersey. Its a few hours by bus from New York City, and its extremely fun. But if you’ve ever sat on a roller coaster, never sat on one, or think that Indiana Jones in Disneyland is scary, then this place is for you. Its home to several of the world’s scariest roller coasters - Kingda Ka, El Toro and the Great American Scream Machine. Kingda Ka was Back to Redmond, we each had 12 days of car rent closed for the day, but we screamed during the ride coupons, and probably the best roads we have ever on the Great American Scream Machine, and were too seen (P.S. Germany, I’ve heard, has much be er roads). scared to even scream while on the El Toro. I wasn’t Much of what we did was possible only because of so petrified during the skydive as I was on the El Toro. this, otherwise we were all too lazy to go anywhere. That machine is dangerous. Lesson learned: “Never sit We discovered a small city called Medina, home to one on a roller coaster for fun.” Of course, Disneyland roller man called Bill Gates, which offers scenic views of the coasters are s ll fun. Btw, one chap from Kgp had this Sea le skyline. On many nights, a er work, we used to to say just before the ride climaxed: drive down there and see Sea le’s reflec on in Lake “Bhagwan bacha le!” Washington. This was just a small list of the fun things I did during Most others wanted to return home in one piece, so my internship. I’m sure everyone who goes abroad when we heard of a local skydiving center, only three during the summer will have similar stories to tell. of us volunteered. The other two were visibly scared, I wasn’t so (I’m very modest). I had considerable difficulty in not thinking of the actual moment when I would jump off the plane, but if you’re not thinking of it, then there’s nothing to be scared about. A few minutes of prepara on, and some $400 of payment later, we Editor’s Note: For a similar descrip on of “How Good found ourselves in the back of a small aeroplane, I’m My Intern Is”, the extra-internship ac vi es of Gautam wearing this strange harness with a pink cap, one Sewani, 4th Year, B. Tech. CSE, during his summer jumpmaster strapped to my back, and one lady si ng internship at Microso IDC, Hyderabad, India, visit his next to me filming my reac ons. At 13000 , I was s ll blog at h p://kholublogs.blogspot.com/ the only fearless man, and so approached the open door confidently. I looked outside, and then back inside. I saw the chap from Kanpur looking at me, wai ng for me to jump. I didn’t want to appear hesitant, and so when my jumpmaster counted to 3, I blindly released hold of the railing and jumped. We could think during the 6 minute journey down. But for the one minute you’re in free fall, there’s nothing much you can do: wave to the camera, talk to your jumpmaster, look around, think, and look at the ground. The last 2 things you should never do together. I thought, “Ok. So if the
Linked List
18
Computer Science and Engineering Association, IIT Guwahati
Geek Corner
19
Bash History Tips & Tricks K Sanmukh Rao, 3rd Year, B.Tech. CSE Inspired by spsneo’s blog (h p://www.spsneo.com/ blog), Sanmukh, a 3rd year B. Tech. student from the Department of Computer Science and Engineering, enlists some handy tricks on retrieving previously used commands from the linux terminal. To all you lazy linux command line coders and scripters, bash has a rich feature to pamper your laziness, The Bash History. Almost all of you might be knowing that pressing the up arrow brings the previous command onto the command prompt, and pressing it some more gives you the less recent ones. But it doesnt end here, actually you’ve just started here. Lets explore the rich features the bash history can provide us. The first thing to note here is that bash stores your command history in a file named .bash_history in your home folder. Just open the file and you can see upto the last 500 commands you typed. Delete a command if you wish it to be deleted, modify it or do whatever you like. (I trust you’d figure out the reasons for doing so).
310 311 317 319 321
jatingassh labssh ssh $uname@202.141.81.145 ssh $uname@$pintos ssh $uname@$pintos
Bash also allows for incremental search on the history list. Use Ctrl+R and type the first few le ers of the command, the last command from the history list that matches with your string would be displayed. Press Ctrl+R again to find a command further back. Now just press enter to execute the command or press any of the arrow keys to bring the command on the prompt to edit and execute it. (reverse-i-search)`vec’: g++ vectortest.cpp But by far the most powerful feature is the bash history expansion. History expansions are implemented by the history expansion character ‘!’. The line selected from the history list is called ‘event’ and por ons of that command that are selected are called ‘words’.
Then you have the history command in your linux box So there are basically three parts to a history expansion which shows you the list of all commands along with all of which are op onal and separated by a colon ‘:’ their ids. If you need a finer result, you could use the 1. Event Designators: powerful “grep” command which linux provides. An illustra ve example: • !n -It refers to nth command in the history. $history | grep ssh • !-n - It refers to nth command in the history from 165 ssh -Y 172.16.25.98 the end. 166 ssh -Y 172.16.25.98 • !! is an alias for !-1. 167 ssh -Y guest@172.16.25.98 • !string - It refers to the most recently used command 169 ssh -Y guest@172.16.25.98 in the history star ng with “string”. It is again an 170 ssh -Y 172.16.25.98 useful expansion when you don’t remember the 309 pintosssh
Geek Corner $vi !:1 // This is equivalent to vi !!:1 or vi ~/.bashrc
arguments to a command which you have executed earlier. • !?string[?] - It refers to the most recent command containing “string”. The trailing ? may be omi ed if “string” is immediately followed by a newline.
3. Modifiers:
• h - This removes the trailing file name component, leaving the head. Example: $cat /home/spsneo/.bashrc 2. Word Designators: $ls !!:1:h //This expands to ls /home/spsneo Explana on: !! refers to the last command and then :1 • n - The nth word, count star ng from 0. 0th word refers to the 1st word of the last command and then :h normally refers to the command. Example: removes the trailing file name component i.e., .bashrc $sudo cat /etc/resolv.conf Hence the expansion. //Instead you want to edit the resolv.conf file • t - This removes all leading file name components, $sudo vi !!:2 leaving the tail. //This is equivalent to sudo vi /etc/resolv.conf • r - This removes the trailing suffix of the form .xxx, • ^ - This refers to the first word. This is equivalent to leaving the basename. :1 as refered above. The only advantage is that you can • p - Print the new command but do not execute it. omit : (colon) when you use ^. Example: • s/old/new - This subs tutes the first occurrence of $cat ~/.bashrc “old” with “new”. Example: $vi !!^ //Equivalent to “vi !!:1” that is vi ~/.bashrc $cat ~/.bashrc • $ - This refers to the last word. $!!:s/rc/_history //Expands to cat ~/bash_ • x-y - This refers to a range of words; ‘-y’ is equivalent history to ‘0-y’. • * - This refers to all the words except the 0th one. • g - This is used in conjunc on with ‘:s’ modifier. This This is helpful when you have to execute a command causes changes to be applied over the en re event line rather than just the first occurrence. Example: with all the arguments passed to the last command. $cat test.cpp test.h • x* - This is an alias for x-$ . $!!:gs/test/source/ //This expands to Note: If a word designator is used without an event cat source.cpp source.h specifica on, the last command in the history is used Adopt these features to save yourself from a lot of as the event. Example : repe ve typing and enjoy the terminal :) $cat ~/.bashrc
Linked List
20
Computer Science and Engineering Association, IIT Guwahati
Digital Mind
21
Intelligent Drug Discovery Predicting Structure-Activity Relationships Om Prasad Patri, 3rd Year, B.Tech. CSE This ar cle will serve as an appe zer for the exci ng field of computa onal methods in bio and chemoinforma cs. The ar cle illustrates the importance of this growing area and the materials and methods involved. It goes on to men on the steps involved in the making of a QSAR model and compares two popular techniques employed for this field, that of ar ficial neural networks and decision trees.
molecule. The vector space associated with these vectors is o en called the feature space. To enable us to “visualize” or see this dataset, we might have to employ some dimensionality reduc on method (like principal component analysis - PCA, or its nonlinear variants) and reduce the original feature space to a 2D or 3D space.
Ever wondered what you can find in common between images, text, the shape of clouds, ac vity of chemical compounds and cricket match scores? In one word, the answer could be “Pa erns”. For images, this can be the pixel representa on of the images; for text, the frequencies of certain le ers; for clouds, the fractal pa erns in their shapes; for cricket matches, it might be the runs scored by a batsman in the last 10 matches. Figure: Structure of HEPT deriva ves (some of which For chemical compounds and their biological ac vi es, exhibit an -HIV ac vity). This structure can lead to well, let us delve a li le deeper. millions of possible compounds by various combina ons of R1, R2 and R3! From a certain point of view, any pa ern can be seen as matrices and vectors and in case of the input This is the generic structure of a set of compounds, data, we refer to this as the input feature vector. 1-[(2-hydroxyethoxy)methyl]-6-(phenylthio)thymine When represen ng images, the feature values might or simply, HEPT deriva ves. What makes this class of correspond to the pixels of an image, when represen ng compounds noteworthy is that some of them have texts, perhaps to the occurrence frequencies of le ers. been shown to exhibit an -HIV ac vity and inhibit For a set of chemical compounds (which is our input the HIV retrovirus. This makes them exci ng objects dataset), the feature vector will correspond to a of research for the drug discovery industry. Now look certain number of molecular descriptors for each of at the structure closely. Here, R , R and R are alkyl 2 3 the chemical compounds. These molecular descriptors groups and X is either oxygen or1 sulphur. Each of R1, are various structural features of the compounds like R or R can be a hydrogen atom, a methyl group, an 2 3 molecular weight, electronega vity, diameter or the ethyl group, an isopropyl group, a halo alkyl group and number of rotatable bonds or hydrogen atoms in the so on. The list goes on expanding as we increase the
Digital Mind various drug regula on authori es as well as reduce rampant animal tes ng in the area. These techniques include approaches based on sta s cal and machine learning, pa ern recogni on, clustering, similaritybased methods, as well as biologically mo vated approaches, such as neural networks, evolu onary approaches or fuzzy modeling, collec vely described as Computa onal Intelligence. Applica ons of However, our task would be much easier if we knew ar ficial intelligence (AI) methods involve selec on of some about “pa erns” or rela ons between the relevant informa on, data visualiza on, classifica on chemical structure and an -HIV ac vity. For example, it and regression, op miza on and predic on. could be a set of IF-THEN rules like: IF dipole moment of R1 is high AND molecular weight of R2 is low, THEN an - A generic step-by-step process for modeling any HIV ac vity is high. Such rela ons between structural pa ern recogni on problem is as given in the figure. features of a chemical molecule and its biological Since QSARs are also essen ally pa erns, so, designing ac vity are termed as quan ta ve structure-ac vity a QSAR model follows essen ally the same steps. rela onships (QSARs). QSARs can be seen as a method to encapsulate the chemical and biological informa on about a compound such that some conclusions can be drawn about the rela onships between chemical structure and biological ac vity. In this ar cle, we will consider two major areas where predic on of QSARs is of prime importance: drug discovery (specifically: designing compounds with high an -HIV ac vity) and predic ve toxicology (specifically: iden fying compounds with carcinogenic poten al). The drug discovery applica on will be treated as a regression exercise (predic ng an an -HIV-ac vity index of the chemical compounds) and the toxicology applica on as a classifica on task (predic ng whether a compound has carcinogenic poten al or not). number of carbons and branching. Let us say we have 1000 groups which can act as R1, R2 or R3. Considering all possible combina ons, we have 109 (one billion) possible compounds. We don’t want to miss out on any compounds which can be poten al HIV inhibitors. Now, it is not feasible to individually manufacture all these 109 compounds and test their an -HIV ac vity!
Based on a database of some known compounds and their biological ac vi es, intelligent techniques can be used to determine QSARs for them. The QSARs can then be used to predict proper es of a large number of compounds which have not yet been manufactured, termed in-silico drug design. Only selected compounds Figure: A step-by-step model for finding ‘pa erns’ which are expected to have desirable biological ac vi es should be prepared in the industry to increase Our aim will be to predict the values of the final ‘target’ efficiency, thus saving on me, money and resources. ac vity by using the data in the input feature vector. This will involve extrac ng complicated rela onships Computa onal methods can greatly help in the between various variables in the input and the target drug design process by predic ng ac vi es of data. We will have to construct a learning system (for compounds before they are actually manufactured. regression/classifica on) , ‘train’ the system with data For the toxicology applica on also, proper predic on from previously observed ‘target’ values (referred of carcinogenic poten als can ease the tasks of to as supervised learning), and then use the trained Linked List
22
Computer Science and Engineering Association, IIT Guwahati
Digital Mind system to predict the values of an -HIV ac vity index Detec on of rela onships (QSAR models) between (regression) or predict a ‘true’ or ‘false’ value for these features and the concerned endpoint carcinogenic poten al (classifica on). This system • Evalua on of the predicted model : The QSAR model and learned rela onships have to be has then to be tested on data which were not in the validated and their performance evaluated training set so as achieve sufficient generaliza on. Finally, we have to interpret the results in terms of • Interpreta on of the rela onships in terms of the their biochemical significance thus leading to rela ons defined endpoint between chemical structures and biological ac vi es Ar ficial neural networks (ANNs), with a layered of compounds, or simply, QSARs. architecture, can be used to model such a complicated Summarizing, the major steps for the implementa on rela onship to provide any desired mapping. The rise in use of ANNs for pa ern recogni on problems is because of a QSAR model (as shown in the sketch) would be: of their ability to generalize input-output rela onships • Dataset Preprocessing : Selec on and preprocessing from a limited set of training data. However exci ng of a dataset with a well-known endpoint e.g. index ANNs may seem, they alone are not deemed good enough by today’s standards and have given way to of an -HIV ac vity or carcinogenicity or toxicity • Chemical Representa on : Iden fica on and more advanced methods. Decision trees are another type of pa ern classifiers which arrive at a decision by calcula on of relevant features (descriptors) • Construct classifica on and regression models : a sequence or hierarchy of stages, choosing one branch of the tree at each intermediate stage. A decision tree is basically like any other tree data structure but here, the decisions taken while traversing the tree (which branch to choose) are decided by a certain condi on or ques on asked in the parent node. Decision tree classifiers are sequen al compared to the massive parallelism of ANNs. ANNs also have be er generaliza on capabili es. However, for our purpose decision trees can be more beneficial as some structural feature of a chemical compound can be directly represented as a node in the decision tree instead of the elusive hidden layers in neural networks. Training or learning using decision trees is faster than neural networks because in a decision tree, all the training examples are considered simultaneously to make every decision. Further, decision trees do not impose any restric ons on the distribu on of the input dataset unlike many other methods. If we could make a ‘hybrid’ structure which combines the best features of neural networks and decision trees, we could probably get the best of both worlds.
Linked List
23
Computer Science and Engineering Association, IIT Guwahati
Nostalgia
24
Krishna Kishore Krishna Kishore Annapureddy (h p://linkedin.com/ in/akkishore/) is an alumnus from the batch of 20042008. He was the General Secretary, CSEA during his final year at IIT Guwaha and was popularly known as ‘KK’. Unsa sfied with his job as a So ware Engineer at Google, he went on to work for ‘Knowlarity Communica ons’ (h p://www.knowlarity.com), a startup formed by a group of IITians, which provides automated communica on solu ons. Komal Jalan, Batch Representa ve of 3rd year B. Tech. CSE, interviews him for CSEA. Recount some parts of your life at IIT Guwaha . Now that I am out of IITG, I can say its one of the best parts of my life. Friends, par es, labs, quizzes, exams, alcher, manthan, techniche, achievements, failures, girl friends, breakups, makeups, and finally gradua on, its a wonderful mix of everything. To the 4th year guys, I should say this is your last shot at it. 8th sem is your best. You will have the least responsibili es. Enjoy to the fullest and become the laziest ;) Was Google a dream job for you as for most IITians? It was nothing like a dream job. I was actually confused in my 7th semester on whether to go for a MS or a PhD or a job. I was not able to decide and ended up taking the first job that I got. Why did you leave Google? I didn’t like the work that I was doing. There was not much of a challenge. As a fresher, mostly I was doing JS and HTML templates and etc. I am more interested in building systems. Google’s organisa on of teams is such that there are separate teams for infrastructure/ systems development and as a fresher you cannot be
moving to those teams. The main mo va on behind the start-up plan? Just to clarify I am just an employee in the startup that I am working in currently. As an early engineer in a startup, you get to do the most exci ng work: build systems from scratch and that’s far more fulfilling and rewarding. What are your further plans? Nothing specific for now. Given that I own some part of the systems developed at the startup, I am not planning on moving. I will try and make them be er. But in the long term I have plans of star ng my own venture. Any advice to the students of IITG? The community that you have around you, your peers, your professors, and the infrastructure that you have at your disposal are the best things that the ins tute has to offer you. Make best use of them. Also I would like to propel the culture of startups among you guys. I am also learning and I will not give any advice here. But I would like to give you examples of some startups of IITG alumus: • Drish So (h p://www.drish -so .com/) by Sachin Ba a 1999-01 • Muziboo (h p://www.muziboo.com/) by Prateek Dayal 2001-05 • ViVu (h p://www.vivu.tv) by Siva Kiran 2002-06 • Cash UR Drive (h p://www.cashurdrive.com/) by Raghu Khanna 2004-08. Check them out. If you guys have ideas, talk to your peers, seniors, professors. Make use of EDC at the ins tute and let it come to life. Be ready to take risks in your life, if you are not risking, you are risking it all”. All the very best.
Bird’s Eye View
25
Android ANDROID is a fairly recent mobile opera ng system that delivers a complete set of so ware for mobile devices: an opera ng system, middleware and key mobile applica ons. It was ini ally developed by Google, and later the Open Handset Alliance (OHA), a mul na onal alliance of 48 technology and mobile industry leaders. The Android mobile pla orm has enabled wireless operators and manufacturers to give their customers be er, more personal and more flexible mobile experiences. The en re source code of Android is available under an Apache open source License.
Android, a developer can combine informa on from the web with data on an individual’s mobile phone such as the user’s contacts, calendar, or geographic loca on, to provide a more relevant user experience. For instance, you can view the loca on of your friends and be alerted when they are in the vicinity.
• Fast and easy applica on development : Android provides access to a wide range of useful libraries and tools that can be used to build rich applica ons. For example, Android enables developers to obtain the loca on of the device, and allows devices to communicate with one another enabling rich peer-to-peer social applica ons. In addi on, Android droi dr oid d includes incl in clud udes a full set of tools that have been built Android runs on top of the he Linux kernel,, and allows from the ground up alongde the the pla pla orm providing developers to write applica ons in Java, using a set set of side developers with w ith high proit Java libraries bundled ndled with the Android pla orm. orm rm. Furd c du c vvity vvit itty an aand nd de deep ep p ins iinsight n igght ns ther, it u lizes custom Virtual Machine es a cu cust stom om JAV JJAVA AVA A Virt Vi rtua uall Ma Mach chin ch ine in ne th that at was wass duc into their applica ons. to o t heir he ir app p lica li c ca a on on o n s. s. designed d to op mize memory and hardware haard rdware resources in a mobile obile environment. Market Share Highlights of Android According A cording to Q2 2009 marAc ket share various shar aree data from Canalys, the share off vva ar arious Mo• Open : A Android was built ground-up And n roid dw a b as uilt uilt ui l from fro rom om the tth he gr rou ound n -u up to to ket sh bile bi le OSes OSe S s in in the the he worldwide worldwide smartphone smartphon ne market, in ororenable developers mobile enab ab blee d dev evel ev e op perrs to to create creeaatte mo obi bile l applica app plilica liccaa ons that that bile der, iiss Symbian Sym Sy mb bia ian (50.3%), (50.3% (5 0.3% 3%)),, RIM RIM Blackberry Blackbe beerr rry (20.9%), Apple Apple pplee take advantage take ffull ulll ad ul dvaant ntagge off eeverything veryyth ve thin ingg a handset in hand ha n se sett has has to ha o ofof- der, iPhone i iP hone (13.7%), ( (13 13 1 3 .7 7 % %) ) , Windows Wi W i nd d o ow w s M Mo Mobile bi b i le l e (9%) and then th h e en n AnAnfer. on off fer.r For For or eexample, xxaamp plee, an n aapplica pplliica pp ca a on on can can an call calll upon upon upo up on any anyy o droi dr oid (only (onl (o n y 2.8%). 2.8% 2. 8%)). 8% ). However, How weevveerr, G Ga artner Inc hass pr p ediicted ed Gartner predicted the th he ph phon phone’s on ne’ e s cco core oree ffun func u c un c onali o ona naalii es es ssuch uccch u h as as making makkin ng calls, caalllls, droid t th h a at t b y 2 20 2012 0 1 12 12, 2 , A An n dr r oid oi d w wi ll h old d 14% shar r e in t the h he globa gglobal al that by 2012, Android will hold share seen send nd d ng text din teextt messages, mes essaage ges, or or using u in us ing the the camera, th caameera ra, th tthus hu uss alall sending sm mar artp pho hone ne market, marke arrke ket, ket t, ahead aheead d of of iPhone, iP Ph ho one ne, Windows Wind Wi ndow ow ws MoMo olowi wing ing ng d evvel elop elo oper ers to to cre ccreate reeat ate rri ich ich cher er aand nd n dm ore ccohesive or ohe hessiivvee smartphone lowing developers richer more b lee aand bi nd B nd lacckkbe la berr rry smartphones. sm mart arrtp tpho h ne nes. s. A s. ndroid nd id will wili l rank raankk bile Blackberry Android expeeriien ence cees for f r users. fo usseerrrss. s. experiences 2nd globally, 2n glob gl lo ob bal ally ally ly, behind beeh b hiind nd the the he Symbian Sym ymbiiaan n OS(39% OSS( O S(39% % byy that tha h t me). 2nd All applica appl ap p icca ons onss aare r ccreated re rre reat eat ated ed d equal equ qual al : A al nd ndro dro oid id does doe oes • All Android no ot diff differen eren eren er en at a e be at b etw twee weeen the th he ph p ho on ne’ e s ccore oree aapplicappllliiccaapp a not ate between phone’s By providing pro rovi vid diingg d eevvel elop op per ers a new w le eve vell off opennesss th tthat att developers level on on ns and and third-party th thir hirdd pa dp rt rtyy ap appl plic pl ica a ons. a onss. They TTh hey can can an all allll be be b bu uiillt lt By ons applica built enab en ble less them them th em to to work work m wo orre collabora collllab co abor bora a vely, vvel ely, y, A nd dro oid id more Android to have hav avee equal e ua eq uall access acce ac cess tto oap ph hon one’ e’ss ca capa pabi pa bili illii es. ess. With With Wi ith th dedee- enables to phone’s capabili h ha a as s ac c c ce e l le e ra a t te ed d t th h e pace pa ce e a t whi wh i ch h new and d c co mp m p el l ling li has accelerated the at which compelling v cees bu vi uillt on n the the A And n rro nd oid Pla P Pla la orm, la orrm, yyou ou o u aare ree aable blle to fful b ulul vices built Android fulmobi mo billee services bil ser ervi v ce vi ces e aree made made de aavailable vvaaili able to consumers. co ons n um mers ers. er ailo ai lorr the the ph p onee to yyour on o r inte ou tere te reest rest s s. s YYou ou can can n swap sswa waap out out mobile ou lyy ttailor phone interests. t e phone’s th ph hon one’ e’’s homescreen, home home ho mesc scre sc reen re en n, the th he style styl st yle off the the he dialer, dia d ialle ler, or or any aan ny the n even eve ven n instru ruct ctt yyou our ph hon onee By off the applica ons! You can instruct your phone Pu et Jindal JJin i da al Puneet see yyou ourr fa favo vour urit itee ap appl plic ica a on n to t view photos. ph hottos. oss. to use your favourite applica 4 h year, 4t y ar, B. ye B Tech. TTec e h. CSE ec CSEE 4th itth • Breakingg down applica on boundaries : W With
The Turing Test As always, Google is your friend. But you need some trivial and intelligent manipula ons before seeking help. Best of Luck ! Q1. Who was the first HOD of IITG’s CSE Department? Q2. Connect the pictures below.
Q3. [This ques on appeared in a compe on held at Stanford in 1985.] It is widely known that syntac c and seman c correctness are dis nct. A gramma cally correct sentence might be meaningless. Noam Chomsky gave a famous example: “Colourless green ideas sleep furiously.” Compose a passage in which this statement becomes meaningful. The shorter your passage, the be er. Q4. Connect the pictures on the right.
26 21
The Turing Test
Some acceptable answers include 4, pi, 1/pi, “the reciprocal of that number obtained by adding pi to Q6. In 1963, Harvard linguist Susumo Kuno asked a itself 16 mes”, numbers in scien fic nota on - say computerized parser to process the sentence, “Time 6.626*10-34. You might also use func ons - “the reflies like an arrow.” The computer gave 7 different in- ciprocal of the exponen al func on e^x, evaluated terpreta ons of the same sentence - which you’d find at x=500”, say. You could define new func ons like f_500(1000) where f_0(x) = e^-x, and on Wikipedia. f_i(x) = 0.5*f_(i-1)(x).” Consider the phrase: “Outside of a dog, a book is man’s best friend.” Give as many interpreta ons of Answers which are not acceptable are things this as you can possibly come up with. like, “half of the smallest number wri en by the other contestants” - for you cannot assume their existence, -9 Q7. An arcsecond is 771.6*10 of a circle. The wave- and “my age, divided by the age of the universe” - a length of green light is 550*10-9 m. Light takes 3*10-9 mathema cian could not evaluate such a number. It s to travel 1 meter. The radius of the Hydrogen atom has to be greater than zero, and we are talking about is 2.5*10-11 m. The charge on a proton is 1.6*10-19 C. standard real analysis - modern mathema cs has There is no doubt that the “scien fic nota on” has produced such marvelous ideas such as Internal Set greatly expanded our capacity to state numbers, both Theory, which talk about numbers smaller than any large and small. Yet, we could create smaller numbers known real - these we have not the background, nor s ll, say 10^(-10^10^10^10^(4.829*10^183230)). You the me to evaluate. know well that there is no least posi ve real number. Thus, if you played a game with a friend, in which you Submission Instructions took turns to write a smaller number, no clear winner would ever emerge. But now we challenge you - you - Mail your answers to csea@iitg.ernet.in with the have only one chance, and only one sheet of A4 paper. subject as “Turing Test” The game becomes interes ng. - Only one entry per email id would be accepted Q5. Find a connec on between the pictures above.
- Deadline for submissions is 6 PM, 9th November (Monday) - Members of CSEA and all ‘related’ are not en tled to par cipate - Name and photograph of lucky winners will be published in the next issue.
Name a single posi ve real - using mathema cal nota on and/or language that any modern mathema cian could understand. Your objec ve is to write a number smaller than that wri en by everyone else.
Linked List
27
Computer Science and Engineering Association, IIT Guwahati
The Turing Test Answers of the Turing Test (Node 2) 1. Google Copernicus 2. X - RocketMail Y - YahooMail 3. Rediff 4. Padmasree Warrior 5. “You” , In 2006, Time magazine chose “You” (the people) as its Person of the Year , In 2005 it was Bono, Bill and Melinda Gates, In 2007 it was Pu n. 6. Alice and Bob 7. Wikimedia Founda on, Picture signifies the cri cs of its wikipedia project 8. Alexa Internet, Inc. (U.S.-based subsidiary company of Amazon.com) 9. Reddit and Digg 10. Longhorn and Blackcomb projects of Microso leading to Vista
Winners: First Prize P V Ravi Kiran Sastry, Department of CSE. Second Prize Sameer Agarwal, Department of CSE.
Linked List
28
Computer Science and Engineering Association, IIT Guwahati
Linked L inked L List ist Brought to you by: Computer Science and Engineering Associa on, Department of Computer Science and Engineering, Indian Ins tute of Technology Guwaha Email: csea@iitg.ernet.in Website: h p://csea.iitg.ernet.in
The Editorial Team •
Om Prasad Patri (Editor)
•
Abhishek Anand
•
Ni n Dua
•
Karthik R
•
Siddharth Prakash Singh
•
Vinay Kumar (Design)
Save Trees. Do not waste paper.
Mail in your sugges ons to csea@iitg.ernet.in. Visit h p://csea.iitg.ernet.in for more.