CSEA Linked List Node4

Page 1

Cover Story

Faculty Talk Prof.

Inside:

4th Year Special..


CONTENT

03

editorial

Faculty Talk

04

Dr. S. B. Nair, Associate Professor, CSE

DIGITAL MIND

08

Internet 3.0 : A New View of Connectivity Neminath Hubballi, Research Scholar CSE

COVER STORY

11

Memcached : Scaling your Web Application Siddharth Prakash Singh, 4th Year B. Tech. CSE

GOOD TIMES 16

Internship : Do’s and Dont’s Anurag Nilesh / Ashish Thakur, 4th year B. Tech. CSE

GEEK CORNER 18

Ajax : Pitfalls and Solutions Shirish Surti, 2nd Year M. Tech. CSE

DIGITAL MIND 20

CSP for Verification of Security Protocols Niteesh Kumar, 2nd Year B. Tech. CSE

NOSTALGIA

22

Abhishek Gupta 24

You@YourDomain with Google Mail Karthik R, 2nd Year M. Tech. CSE

25

4th Year Special : BTP Mania

28

4th Year Batch Photograph


Editorial Recently attending a workshop on “Science Journalism and Communication“, I realized how grave the problem of communication in science and technology is. It is one thing to understand what is happening (scientifically or technologically) but it is a different ball game altogether to present to a layman and make him understand. Especially in India, where the percentage of media coverage and journalism for science is much below the recommended international standards, the issue is quite pertinent. Often, due to temptations of sensationalizing scientific news (or just plain ignorance of the reporter), it leads to miscommunication. Just think of how difficult it is for most of us to understand a scientific paper in any journal or magazine. And we are in a (supposedly) technologically-aware campus. One brilliant solution I happened to learn of (at this workshop) is that of ‘Scientoons’ (www.scientoon.com), originated by Pradeep K. Srivastava of the Central Drug Research Institute (CDRI). A scientoon is a science cartoon which not only makes you laugh and smile but also learn a scientific concept in a novel manner. And so in this issue, instead of the usual comic strips, you will find a select few scientoons here and there. Go, search for them! Through Linked List, we have constantly endeavored to learn how to bring to you computer science stuff and make you understand whilst retaining your interest in the topic. This fourth node of Linked List is yet another addition to that effort. For all the people in the campus excited about going on their summer internships, we have some special tips. And for keeping the memories of our beloved 4th yearites, the to-be alumni, we choose a few final year people, their BTPs and take a look at the work they have been doing. Without labelling the issue as ‘futuristic’, I would like to say that it covers a few technologies which will (or already have) started dominating the web as we see it today. Take the instance of memcached, which is currently used by almost all web applications and among the vast universe of it users, you might have heard of a certain Facebook or Twitter or Digg (or LiveJournal or YouTube). Be it about emotional robots or very-large-scale web applications or the new Internet 3.0, this node has something of each of them. So wait no longer, and delve deeper! With this node, comes the end of my tenure as Editor of Linked List. I hope that you have enjoyed reading the previous issues as much as we have enjoyed creating them. I would take this opportunity to thank the entire Editorial Team for the brilliant support they have provided. My special regards to V. Krishna Brahmam, 2009 alumnus CSE IITG, for the design templates. We wish the new team all succes and hope that the succeeding Nodes of this Linked List will continue the tradition of excellence!

Om Prasad Pat ri Publication Secretary, CSEA Editor, Linked List

Linked List

03

Computer Science and Engineering Association, IIT Guwahati


Faculty Talk

04

with

Prof.

S. B. Nair

Dr. S. B. Nair, Associate Professor in the Department of Computer Science and Engineering shares his interests with us. Please tell us something about your academic background. I was born, brought up and educated in a (then) small city named Amravati in Maharashtra. I graduated in Science from Nagpur University and then did a Masters’ in Applied Electronics followed by Masters’ in Electronics Engineering from Amravati University. It was during this phase that I came across AI and then pursued my Ph.D. in an allied area from the same University. You worked as a Senior lecturer before completing your Ph.D. When and how did you then decide to pursue your Ph.D? Yes, I worked as a faculty at the Post Graduate Dept. of Applied Electronics at Amravati University for more than 12 years before I joined IITG. I always wanted to pursue higher education. During my initial years at the University, I was selected as a regular candidate for Ph.D. at the CSE Dept. at IITM. But the University refused

to sponsor me. My would-havebeen Supervisor at IITM and several others advised me not to quit a University job. Instead they asked me to pursue the same at my University with help from my wouldhave-been Supervisor acting as a co-guide. Of course, all this never really worked – thanks to the red-

“How would it be if a robot welcomes you with the same gusto as when you meet a long lost friend? Or imagine your car purring on a long, fast ride and then exclaiming ‘Ouch!’ when it goes over a bump after which it tends to move cautiously.” tape and the very many rules and regulations. After slogging for about 6 years, while also working as a Senior Lecturer and carrying most of my research work at the Central Electronics Engineering Research Institute (CSIR lab.) at Pilani and at the Centre for AI &

Robotics (a DRDO lab.) at Bangalore, I obtained my Ph.D. in 1998. What are your fields of interest? I am currently into Bio-inspired Robotics. This is all about Bioinspired AI paradigms acting as controllers for real robots. These paradigms try to cope up with the mess that the earlier AI researchers thought was the best. I am also interested in developing software and hardware for the physically challenged and deploying them for free. How did you develop interest in them? This is a long story. I’d rather cut it real short. It began with my teaching AI for the first time way back in 1987. I was using the same book which I later co-authored. But I found all the stuff within to be a bit theoretical and wondered whether they can ever be realized. Finally I concluded that the challenge is not in the software alone but the hardware too. Well, that’s how the robots, that form the ideal test-beds for intelligence, came into my interest purview.


Faculty Talk You have co-authored a book on ‘Artificial Intelligence’. Tell us something about this book.

imaginary bond between man and robot. Of course emotions are not embedded just for the heck of it. Emotional robots can in many Authoring a book was something ways act as constant companions I dreamed of always but never got and soothe the mentally and to it. The McGraw Hill guys had physically challenged, in some way been pursuing me right from the or the other. Such robots have been days when I was a faculty at the known to have a profound effect on University. It finally happened in children who have suffered from 2009. The book is one of the oldest psychological problems or trauma. ones in AI and the current version Many researchers have in some happens to be its 3rd edition. The way or the other tried their hand at major revision can be seen in the making such emotional robots. At last few chapters that reflect more the moment this field is all set to of bio-inspired paradigms. The book take off in spite of several debates on also has its on-line learning centre the manner in which the emotions pages on the web which among are recognized, comprehended and others, hosts code for some of the generated. At the CSE Dept. a few of relevant programs. the students are presently working on recognition as well as generation You have recently done some of such emotions. work related to ‘Emotional Robots’. Could you explain what does Please mention a few interesting ‘Emotional Robots’ mean? projects that you have been associated with in the past and the presEmotions have always been easy ent? to express but hard to actually define and generate. Their causes After I joined the IITG, I was inand mechanisms in biological volved, along with other faculty beings are yet to be uncovered. members, in a project funded by the Emotional robots provide to some Ministry of Information and Comextent a means to remove the munication Technology that aimed monotony which otherwise exists at setting up a Resource Centre for in their manner of working and Indian Language Technologies. The presenting themselves. How would centre aimed at developing technolit be if a robot welcomes you with ogies for languages of the Norththe same gusto as when you meet East. The main objectives were to a long lost friend? Or imagine map these languages, provide for your car purring when you take relevant fonts, editors, multi-linit for a long and fast ride in those gual dictionaries, language corpounder-ideal-conditions-like roads ra, OCRs and speech and language and then exclaims “Ouch!” when translation systems. it goes over a bump after which it Around the same time Microsoft tends to move cautiously. Well Academic Alliance also provided all this renders a certain amount some modest funding in dollars for of human-like feeling forcing an two projects – one involving a netLinked List

05

work of robots and the other on an intelligent desktop agent. Though these projects have been completed, a small group of students are still working under my guidance to enhance the former. Later, based on some wonderfully useful work on software aids for the visually challenged carried out by some of the CSE students, we received a modest funding from the Government to develop and enhance such technologies. At the moment a group of three B.Tech. students are actively engaged in the design of an autonomous robotic guide for the visually challenged. Christened Nayan, the system, comprising both a hardware and a software component, will guide the person both indoors and outdoors. In 2005, I had the opportunity to mentor a great team of four B.Tech. students (Gautam Das, Rahul Singh, Suvesh Malhotra and Archit Gupta) who realized a project codenamed SapienNet, a very useful people-to-people network. The students made it to the finals of the Windows Embedded Systems Challenge held that year at the Microsoft Research campus in Redmond, USA. Recently, the Robotics Lab. at IITG has been enriched with stateof-the-art technology. What are the objectives of this lab? The term state-of-the-art is a bit of an exaggeration! We have recently received funding to set up labs. for Robotics and Embedded Systems from the Department of Science and Technology, under their FIST programme. This year we have procured the new generation

Computer Science and Engineering Association, IIT Guwahati


Faculty Talk Lego NXT Mindstorms robots and software such as Matlab and LPA Prolog. Just keep “all” your fingers crossed! If things go well we will soon see the lab being equipped with more robots such as e-pucks and Boe-bots complemented by sensor nodes. We will also be procuring speech corpora and related software and robot simulators like Webots. The biggest problem at the moment is space. While moving around in the lab. these robots will thus need to be deft at obstacle avoidance, not to mention acquire an ability to safe-guard themselves from being stomped! As for the objectives we hope all this will attract more students to actually dirty their hands in the lab. and come up with novel and useful ideas. The lab will of course aim at research in the area and provide for the much needed practical test-bed for the algorithms being developed. Do you agree that present students are more inclined to doing innovative work in software than hardware? What are the reasons behind this trend? We seem to take pride in the fact that many countries view us as a massive human resource of software engineers or programmers. On one side this seems to be good but on the other we seem to be writing all the goodies for the hardware for cheap while the others manufacture, embed and sell the hardware to us with the software partly/fully written by us. This is analogous to putting in a lot of effort to publish your work in a (foreign) International Journal and then subscribe for the same by paying Linked List

hefty amounts. They seem to have the machinery (hardware) to make the hardcopies while we supply the intelligent material (software) to be printed (loaded/embedded) free of cost! Both ways they seem to get the lion’s share which is possibly why they are a developed country! Imagine if we stressed a bit more on hardware and co-developed it with the software, where would the others be? We may then as well swap the term “developing country” with theirs viz. “developed”. Of course this will take some investment and a paradigm shift from our current mindsets to earn a fast buck. You spent a year at Hanbat National University, South Korea. How do you compare the Academic environment there with that at IIT Guwahati? The academic environment there is great. Its more of practice than theory. So a proper blend seems missing which of course is true for us too. I remember seeing a student who required a zinc battery of certain specifications. In a few days he showed me the same all assembled - electrode by electrode - and neatly packed and of course working to specifications! The same was the case with robots. Things are mostly custom-made and assembled and there is usually no concept of buying and using ready-made stuff. This way one learns to be more practical while solving a problem and knows for sure whether what one thinks can be actually implemented. The University also runs workshops throughout the year educating school children on how to assemble 06

and use robots and other things. This is an important service they do to their prospective students. Masters’ projects are many a time industry-related problems. The student carries out a project that is finally used in some way or the other by the concerned industry. One interesting thing is that most of the Dept. offices and labs. are handled by ex-students. So virtually everyone is a technical hand and appreciates the problems we encounter in an administrative set-up. Everything is kept spick and span – while the Professors sweep and clean their respective rooms, the labs. are cleaned and tidied by the students. Since everyone there has to undergo mandatory military, navy or air force training for 2-3 years, they are very disciplined and controlled in their behaviour. Being a developed country, procuring things is also fast, efficient with minimum red tape. The Government funding is liberal but not without results. If you get funds, then both the Professor and his students have to slog it to justify the same. It is not just a matter of submitting a report and defending it at the end of the day. They need to show something that actually works and is of use for either the industry or the public. One cannot get away with leaving the application of the technology vague. Funding is given mostly for practical research which is why the country is known for its hardware and economy. Language posed a big barrier. English is spoken only by a select few which did make things difficult.

Computer Science and Engineering Association, IIT Guwahati


Faculty Talk What are the things that one must some of the classic short stories consider before devoting his/her written by Leo Tolstoy – How much life to research? land does a man need?, Elias, What men live by,…. I used to send these Apart from ethics, one of the most over as a parting gift to the older important aspects is that one batches. Of course, if you have read should have a passion for research. them maybe its time to read them If you are doing research merely again! to acquire a degree or to ensure someone gets a degree or earn more Let me quote Gandhiji’s talisman then such research in my opinion which as per him could be used is of no real use to anyone but when you are in doubt: “Recall the oneself. Further one needs to muster face of the poorest and the weakest courage and build the enthusiasm man you have ever seen and ask to venture and tread on untrodden yourself if the step you contemplate paths. Other dominating aspects is going to be of any use to him. are whether the research will have Will he gain anything by it? Will some impact on the society at large. it restore him to a control over his life and destiny? In other words will Take the case of Prof. Norman it lead to Swaraj of the hungry and Borlaug, Nobel Laureate and winner the spiritually starving millions? of the World Food Prize and creator Then you will find your doubts of a life-saving wheat strain. But for and your self melting away.” his research, we may have seen the developing world (including India) ravaged by war and famine. Now [Interviewed By] this is what I consider real research. Wonder why all this reminds me Anurag Kumar Nilesh, of the Irish proverb – “You’ll never B. Tech. 4th Year CSE, plough the field by turning it over for CSEA in your mind.”

Robotics Lab @ IITG Recently, Robo cs Lab has been officially set up in our department. The mentors for this lab are Dr. S.B.Nair and Dr. P.K.Das. The website of the lab is h p://www.iitg.ernet.in/sbnair/RoboticsLab/index.html The objec ves of this lab are to 1) Design robots which can assist humankind 2) Put theore cal concepts into prac ce and 3) Support innova ve ideas A group of three students par cipated in the Microso Imagine Cup 2010 in the Accessibility for Local Innova on Award category. The project idea, named as Nayan (picture below), was to design a robot that can guide the visually challenged persons in outdoor as well as indoor loca ons.

Any message that you would like to convey to the students through this magazine? There are several good things that can be practiced with a wee bit of effort but if it is done by all of us comprising this society we will find one another more amiable. A few of such things that come to my mind at the moment are – Humility, patience, discipline and perseverance. I would also advise you all to read, comprehend and practice the message portrayed in Linked List

People who are interested in robo cs please join the google groups created for Robo cs Lab at IITG: http://groups.google.co.in/group/robo cslab-iitg

07

Computer Science and Engineering Association, IIT Guwahati


Digital Mind

08

Internet 3.0 A New View of Connectivity By Neminath Hubballi, Research Scholar, Dept. of CSE The history of networking dates back to ARPANET in 1970, which was a project of United States defense department. It is exactly from here that the world of computer networking sprung. It was Licklider (the head of ARPANET team) who found an analogy between human interac on and computer interac on. The success story of first computer to computer interac on created an enormous interest and kept driven research in communica ons for a prolonged period. In the process, lot of communica on architectures were implemented. Most of these architectures were proprietary: Digital’s DecNet, IBM’s SNA (system network architecture) and Novel’s Netware are the prominent ones. Then the researchers no ced the need to have a common standard for communica on thereby avoiding the monopoly of these vendors for any updated solu ons and new added features in the exis ng ones, which ini ated the standardiza on of overall communica on architecture and associated protocols. A er a long debate and design process the industry came up with a standard popularly known as OSI reference model in the year 1983. Later another standard called TCP/IP architecture came into existence. It was TCP/IP architecture which got wide acceptance in the industry and is being used even today. This no on of connec vity and networking of computers emphasized world is considered as the first genera on or internet 1.0. The next genera on of communica ons was centered on the actual crea on of devices such as routers,

switches which work with these standardized protocols that made the actual connec vity in the World Wide Web. Formally it was in the year of 1993 when Mosaic released its first commercial web browser, that people who were connected to networks could do much more than they were doing before. This era in communica on which emphasized on device and so ware development is generally known as internet 2.0. In a sense it was the internet 2.0 that represents the real meaning of internet. Because of wide acceptance of TCP/IP architecture and packet based rou ng concepts the industry spent a considerable amount of me fixing problems with TCP architecture and por ng these protocols to new communica on media such as wireless. Even today we live in the era of TCP/IP and are s ll fixing problems in it!. Numerous issues such as security, rou ng, scalability and shortage of address space were important factors for research. Number of security components like Firewalls, Intrusion Detec on Systems and Intrusion Preven on Systems were developed to fix the security issues. Open Shortest Path First (OSPF), Border Gateway protocols were designed to address the scalability issues of internet. Link State and Distance vector rou ng were developed to handle routing issues and concepts like Network Address Translator (NAT), IPv6 and private addressing were found out to mi gate the shor alls of IP addresses globally. In addi on quality of service (QoS) of the internet was also of prime importance. The issue with the current Internet architecture is that none of above problems is fixed permanently as the changing demands of users


Digital Mind

need addi onal features in the basic solu ons and that network and doing what. It permits the user to control when, where and how they actually want what. creep addi onal flaws into the systems. In a project called as Global Environment for Network Innova on (GENI), a ques on was raised “is this the way we design the internet if we were star ng from the scratch” and “What are the requirements of today’s business”. This ques on is perhaps valid as the first genera on of internet was designed by researchers for their research interest and now it has grown to an extent where it is part of everyday life. The requirements of modern day Internet can be enumerated as below:

As the no on and idea of current internet is a quite old concept, (although the designers of ARPANET never thought it!) over the years we learnt a lot about networking and associated risks. Now the industry understands what the TCP/IP and packet based switching technology can give and cannot give. It is very recently that thinking of a new design of the Internet architecture which can u lize the best that packet switching technology can give and to that add things which are essen al in the new era has begun. According to Prof. Raj Jain of Washington university, (one of the pioneers in the thought process of Internet 3.0), it will be fundamentally different from its successors and will be designed in a way as if the en re internet will be designed from zero. It meets the demands of commerce and allows the government and firms to enforce policy decisions and control, track about who is coming into their Linked List

1. Energy efficient Communica on: Current communica on methods require both sender and receiver to be awake for communica on to happen. With mobile communica ons base sta ons have got the limited storage capacity of messages when the receiver is offline. This need to be extended to other communica on methods too. 2. Iden ty management: The point of connec vity and 09

Computer Science and Engineering Association, IIT Guwahati


Digital Mind one’s iden ty becomes vital in communica ons. Currently if a system moves to a different loca on its IP address changes, hence there is a need to change the iden fica on mechanism and get rid of dynamic IP addresses.

rated in a similar way as that of cell phone networks. 8. Asymmetric protocols: The protocols used in current Internet are designed for systems with iden cal capabili es. But with the person to person communica on enabled, one system may be significantly resource constrained compared to other. Allowing the network to adjust the communica on when the devices are asymmetric is necessary.

3. Loca on awareness: Finding the loca on where the receiver or sender is situated becomes vital from the security point of view. Either party can decide where they want to go and what they want to exchange if the loca on where they are in is known. Hence there is a need for loca on awareness to the communica ng end points.

9. QoS guarantees: In the present Internet IP is totally unreliable, thus ensuring QoS to the end user flows becomes difficult. Next genera on Internet should address this fact and should be designed for achieving desired quality of service.

4. Support for explicit communica on: With the distributed service and client-server nature of communica on, implicit communica on is an unnecessary mess. Instead clients need to be allowed to iden fy the nearest server and establish a communica on channel with it.

All of the above requirements are not easy to be met and any solu on that pops up need to be debated and evaluated. Industry, academia and research groups have to collaborate in the design in much the same way as TCP/IP was standardized years back.

5. Person to person communica on: Network was designed for desktop to desktop communica on but today’s requirement is person to person communica on. Persons may be using any device like desktop, cell phone, palmtop etc and she should be reachable with that. The network needs to iden fy the best way of reaching the person rather than the device. This can be achieved if addresses are given to human beings rather than to the devices they are using. 6. Security: This is the biggest concern of the today’s Internet. The next genera on internet has to be secure allowing the end par es to enforce the rules as what is permi ed and who is permi ed. Governments need to protect their ci zens with the data or exchange the way they protect the na on with defense forces. Enforce policy decisions and maintain the integrity of communica on. 7. Separa on of control and data planes: Currently Internet uses a single channel for both control and data planes - this is a significant security threat. For example the TCP/IP connec on setup and piggybacking mechanisms give enough informa on to do malicious things to the ongoing communica on. Hence the connec on management and data transmission need to be sepaLinked List

10

Computer Science and Engineering Association, IIT Guwahati


Cover Story

11

Memcached Siddharth Prakash Singh, 4th year B.Tech student from the department of Computer Science and Enginnering tells us all about building modern scalable web applica ons using memcached.

Whenever you surf a new website for the first me, the browser downloads the HTML page, javascript files, CSS files, media files etc. and saves it in the cache (on your PC). So next me when you open the same website again, it is loaded and displayed much more quickly as the browser can serve the locally saved emcached has become a buzz-word these days resources. Caches are almost always constrained by for deploying scalable web-applica ons. Despite size. For our example, there is no limit to the number this, surprisingly a large number of people do not of dis nct websites we visit, the browser cannot save know “what exactly is memcached?”. Lets learn it by data for each of them. When the cache is full, a cache dissec ng the word memcached into cache and mem. replacement algorithm is used to replace some files A ‘cache’ is a component that stores data transparently with new ones. The cache replacement algorithm to improve performance by serving the frequently is such that when a replacement is done, it tries to requested data faster. The readers must be aware of ensure that the informa on it stores is the most likely cache in processors - L1 cache and L2 cache, browser informa on to be needed again. cache etc. Lets take an example of browser cache.

M


Cover Story Now let us come to the next part of the word - “mem”. As you might have guessed it, mem refers to the memory. Hence, memcached precisely is using memory as cache. But it is not just that. It is a distributed memory caching system. This means that if you have a memcached server, you don’t have to worry on what machine is the data object cached. You just have to give the command “Get the object named foo” and the memcached knows from where to get foo. The cache can span as many machines as you need. Memcached is fast. It is an in-memory, huge hash table

How to use memcached?

M

emcached just by its own is a distributed memory caching server/daemon. It can be used in a variety of ways. It can be coupled with a database or it can be directly used as a caching layer between client and database. A couple of use cases are discussed below.

Memcached as a caching layer

which can be distributed among several machines. The me complexity of fetching cached result is hence, O(1), the order we love the most. It u lizes highly efficient, non-blocking networking libraries to ensure that memcached is always fast even under heavy load. Hence, in circumstances when your database might be failing under heavy load, memcached won’t be. And in fact memcached was designed to alleviate the database load, which is the bo leneck and risk for scalability for majority of the high load web apps.

Linked List

N

ow, when we have a basic understanding of what memcached is, lets dive into its usage. From a layman’s perspec ve, here is the basic sequence of code:

a.Every me you need to query a database for any read opera on, check whether the par cular data is stored in the cache. If the data is found in memcached, then use it as opposed to querying the database for it. 12

Computer Science and Engineering Association, IIT Guwahati


Cover Story member, discovered his password and then used Twi er’s privileged admin tool to reset the password of accounts they wanted to hack.

b.If the informa on is not found in the memcached, then query the database. Once you get the result of the query, send it to the client, and don’t forget to put it in the cache as well. Now, in subsequent calls to fetch this informa on you don’t need to call the database at all.

This incident led some developers (not Twi er developers) to rethink about access rate limi ng. Rate limi ng could be implemented by incremen ng a counter stored in a file or a database. Wri ng to a disk is costly and is difficult to scale.

c.Now, if there is an update opera on on any data, and if the data is found in the cache, delete it from the cache. This keeps the cache consistent.

Memcached comes to rescue. There is a command incr which atomically increment an already exis ng counter simply by specifying its key. add command can be used to create that counter and it fails without giving any

For all those who couldn’t imagine the execu on of above three steps, I have some PHP code -

// $memcache = new Memcache(); $huge_data_for_front_page = $memcache->get(“huge_data_for_front_page”); if($huge_data_for_front_page === false){ $huge_data_for_front_page = array(); $sql = “SELECT * FROM hugetable WHERE timestamp > lastweek ORDER BY timestamp ASC LIMIT 50000”; $res = mysql_query($sql, $mysql_connection); while($rec = mysql_fetch_assoc($res)){ $huge_data_for_frong_page[] = $rec; } // cache for 10 minutes $memcache->set(“huge_data_for_frong_page”, $huge_data_for_frong_page, 600); } // use $huge_data_for_front_page how you please

Consider the data produced by the query: SELECT * FROM hugetable WHERE mestamp > lastweek ORDER BY mestamp ASC LIMIT 50000; is required every me somebody loads the homepage of a web app. For a highly loaded web app, this database query may make the app slow. Let’s put the data in memcached.

error if the specified key already exists (so beware). Now lets say we want to limit a user to 10 hits every minute. A naive implementa on could be to create a counter based on user’s IP (this approach might not work for proxy environment, where large number of users are behind the same proxy server), something like - numbero its_202.141.81.2_2010-04-10-12:37. Increment this counter for every hit from the IP address 202.141.81.2 and block the request if it exceeds 10. We can set the counter to automa cally expire a er one minute while crea ng the counter. This naive solu on will work without wri ng to disk. This is just an

Isn’t it simple and easy? Memcached as a rate limiter Back in January 2009, some high profile celebrity Twi er accounts were hacked. The hacker ran a dic onary a ack on a Twi er engineering team Linked List

13

Computer Science and Engineering Association, IIT Guwahati


Cover Story example, in real world one needs to do more homework on this approach. Choosing a key You might have no ced that we tried to retrieve the data from the memcached using a key “huge_data_for_ front_page” in the above example. The cache can be thought of as a big associa ve array in which each item is stored as a key-value pair with key being an arbitrary string. Therefore to store and retrieve data in the cache you need to define a key. A key uniquely iden fies data stored in the cache, and is used when storing, retrieving and removing data from the cache. Technically speaking a key can be any arbitrary string. But you should define a pa ern for naming key to avoid conflict and easy management. Key naming pa ern is also important for security as men oned in the next sec on. Data distribu on among memcached hosts Data distribu on among memcached hosts is an important concern. Lets say we have n memcached hosts and we want to distribute the load among all the n hosts. The most common way to do this to use the mod operator. For simplicity let us assume for the moment that we are saving data in memcached using numerical keys. For data d1 the key is an integer k1. We save the data for the key k1 in the server with id k1 mod n . This is a nice way to distribute the load among all the n servers. But the problem arises when you want to add or remove a memcached host. Say the number of users using your app has drama cally increased and you need to add more caching machines to increase the number of cache hits. Or, on the contrary say one of the caching machine has crashed and you temporarily need to remove that from the list of memcached hosts. In bot the cases n changes. Whats the big deal when n changes? When n changes every key will now hash to a different server id - k mod n’. This can be devasta ng. It is something like all the cache has suddenly disappeared. Almost 100% cache miss by adding or removing a caching machine! How does memcached work then? Solu on is Consistent Hashing! The basic idea behind consistent hashing algorithm is to hash both key and the server id using the same hash func on. The following example will illustrate this Linked List

14

The hash func on maps keys and server id to a number range. Imagine mapping this range into a circle so the values wrap around. Here’s a picture of the circle with the number of keys (1,2,3,4) and server ids (A,B,C) marked to the points they hash to (based on a diagram from Web caching with consistent hashing by David Karger et al):

To find which server a cache object with key k goes in, we move clockwise round the circle un l we find a caching server. So in the diagram above, we see object 1 and 4 belong in cache A, object 2 belongs in cache B and object 3 belongs in cache C. Consider what happens if cache C is removed: object 3 now belongs in cache A, and all the other object mappings are unchanged.

If then another cache D is added in the posi on marked it will take objects 3 and 4, leaving only Computer Science and Engineering Association, IIT Guwahati


Cover Story object 1 belonging to A. This scheme works pre y well. Whenever a caching server is added or removed very few cache miss occurs. Security concerns Memcached access is not authen cated by username/ password. So there is no na ve access control for memcached. Few simple things can be done to secure your instance of memcached: • Prevent external access: Deploy memcached behind a firewall and allow machines from within a specific network to access the cache. • Choose obscure keys: There is no way a user can query memcached for the list of keys. Hence, only if somebody knows the key can access the data. So obvious keys are vulnerable. Add obscurity to the key by adding some number in the key like “foobar:12321”. Or, something be er like using some hash func on on the key. Some other issues • By default memcached can store data of size upto 1MB only • By default memcached key can have a length of upto 256 characters. Client Libraries The client/server interface to memcached is simple and lightweight. Client libraries are now available in almost all major programming languages. For a list of available libraries take a look at this page: h p://code.google.com/p/memcached/wiki/Clients

Call for Articles Since its incep on, the CSEA has conducted many workshops, lectures and programming contests to increase awareness regarding new technologies and the science behind them. Linked List is yet another a empt to get closer to the fellow IITians. Linked List requires ar cles that maintain the originality and the quality of the magazine. Send in your ar cles to

csea@iitg.ernet.in or contact the editorial team for any clarifica ons. Looking forward for an overwhelming response.

One can measure the popularity of memcached by taking a look at its users - Facebook, Wikipedia, Flickr, Twi er, Youtube, Digg, Wordpress, Livejournal, Farmville, Amazon.com and the list goes on. In this sec on, we will look at some of the use cases of memcached. For more informa on on memcached check out: h p://www.memcached.org [Wri en by] Siddharth Prakash Singh (SPS), 4th Year, B.Tech., www.spsneo.com/blog Dept. of Computer Science & Engineering.

Linked List

15

Computer Science and Engineering Association, IIT Guwahati


Good Times

16

Internship Do’s & Dont’s Anurag Kumar Nilesh and Ashish Thakur, 4th Year B. Tech. CSE “Internship” where did that word come from and what does it really mean? Well some mes dissec ng some words help but this one didn’t. Try dissec ng the word so as to get some meaningful words out of it which could possibly explain the meaning. Some of possible combina ons could be Intern + Ship, and another one could be Interns + Hip. Since the second one is sort of creepy enough to censor much of the contents of this ar cle so be er s ck to the first and try to decipher something meaningful out of its components i.e. Intern and Ship. An intern is someone who works in a temporary posi on with an emphasis on on-the-job training rather than merely employment. And ship well everyone knows what is a ship. So if we join these two words and try to get out something meaningful out of this, we end up confusing ourselves. Well if we go with the second combina on it does end up to something meaningful. Well whatever the meaning might be according to us(your seniors) it is a me where you learn lot of things like working on some research topic, team work and yeah it is part where you have hell lot of fun and these few months are one of the most memorable mes of your life. So to make it memorable you go a be aware of some do’s and don’t of internship. So here we go…

What to DO : 8 Tips for Internship

to get thrilled via adventure sports such as mountain biking, paragliding, bungee jumping, skydiving and so on. I hope that your plans work out and you all have a memorable me during the internship period. However, this ar cle is not to suggest you tourist spots or adventure sports but to suggest a few prac cal ps. Some of you may be wondering about the tle as to why 8 and not why 10. Has the author ran out of ps to make it a perfect ten? Well, I can give you three reasons for this peculiar tle. One, choosing this peculiar tle gives me a chance to lengthen this ar cle by trying to explain the tle but without brainstorming for two more ps to make it a perfect ten. Two, since i am the author of this ar cle, i hope you will agree with me that i am en tled to create any tle. 1. Learn cooking. If you know cooking, then it shall serve you in good stead. It shall help you in keeping a check on your food budget and hence, more money in the travel budget. If you don’t know cooking very well, try to learn how to make rice, omle e, aloo-ki-sabji etc. at your home or learn it there itself from internet sites or from your friends. You don’t need to be an expert in cooking. Believe me that you will even love your own half-cooked meals.

2. Plan your trips. You should try to plan your trips so that you make maximum use of weekends. Look out for holidays which extends your weekend and try to I am sure that by now, you would have started coun ng make use of such weekends to do long-distance trips down the days le before your internship begins. Some such as trips to neighbouring countries/states. of you may have already started making travel plans such as to visit Eiffel Tower, Disneyland, etc.. Some 3. Money ma ers. I suggest that you carry some extra of you may be looking forward it as an opportunity cash than your planned budget while making trips. (By Anurag Nilesh)


Good Times are so screwed. As it costs around 40-50 euros to break the lock and that is a hell lot of money! So be careful with the keys of your apartment.

Carrying an interna onal credit/debit card may come handy. 4. Experience foreign culture. You shall get a chance to interact with people of different origins and cultures, a chance to visit historical and famous places. Don’t miss this opportunity to learn about their work culture and social culture.

• Never travel without ckets: Well three guys of CSE Dept had to pay around 40 Euros each to the cops as they were traveling without a cket in the bus. So whenever you are using public transport make sure to buy the ckets as in case you don’t buy one who will end up paying he y fine.

5. Maintain a travel journal. I suggest that you either maintain a travel blog or personal travel journal. Believe me that documen ng your travel experiences • Don’t be an Indian: Don’t take me wrong over here. will not be a waste of me. We Indians li er, spit, talk and laugh very loud, never follow other necessary e que e, ogle whenever we 6. Remember the purpose of your visit. Don’t forget see some hot chick. But remember the saying “When the purpose of your internship. O en, people consider in Rome, do as the Romans do”. I am not asking you to travelling/adventure as the purpose of their internship be at your best behavior but at least try to follow some but that’s not true. The purpose of your internship is to basic e que e during your internship. work on some project. It’s a chance to showcase your talent and prove that you were truly deserving for this • Don’t get fooled by conmen: You see some hot chick internship offer. trying to provoke visual contact with you and smiling at you. Remember the case of Saif Ali Khan in the movie 7. For those in India. If you are doing your internship Dil Chata Hai…you might get trapped in that beau ful in India, don’t get disappointed. I suggest that you smile and end up losing all of your stuff. start wri ng a blog and in that document something like how delicious the food was last evening or what a great game of foosball you had last weekend. Such descrip ons shall envy your counterparts surviving on their half-cooked meals. 8. Play it safe. Finally an advice to my virile gers, play it safe.

What NOT to do during your Internship! ( By Ashish Thakur) Read 8 Tips for Internship by Anurag and put a (!) operator in front and if you think you are done well I guess you are are wrong. So in case you want to know what NOT to do during your internship then read on... • Never lose the keys of your apartment: If you loose your key in IITG all you have to do is to break the lock and get a new one and it would cost around 50 INR. But in case you lose your key in Europe and in case your landlord has only one key to that lock fellas you

Linked List

17

Computer Science and Engineering Association, IIT Guwahati


Geek Corner

18

AJAX

Pitfalls & Solutions Shirish Surti, 2nd Year, M.Tech. CSE Ajax (Asynchronous Javascript And XML) is a group of web development techniques used to create interac ve web applica ons. Clients can retrieve data asynchronously using Ajax allowing for refreshing and upda ng parts of a web page. Ever since Google made successful use of Ajax in Gmail and Google Maps, Ajax has got considerable recogni on from web developers. This ar cle explains a few pi alls which you could face while developing your Ajax based website and gives solu ons to few of the problems. A very temp ng use of Ajax would be to design a website with naviga on menus which uses Ajax to load tab contents in a HTML div as shown in Figure 1. When user clicks on a link in the naviga on menu the page content for content div is fetched from the server asynchronously. O en familiar GIF images are used for anima on like a rota ng pair of arrows or dots or some other fancy image indica ng to user that the div contents are being loaded. Let us study what obstacles such a design could create for a website. 1. Bookmarking A user visi ng your website is o en interested in only a par cular sec on the URL of which he/she bookmarks to revisit again in future. Using ajax as shown in Figure 1 causes the browser URL to remain same. So if now a user clicks on “link1” the browser URL points to h p:// your-website-link.com and when user clicks on “link2” the browser s ll points to h p://your-website-link. com . 2. Javascript Ads A good source of revenue which you can earn from the content on your website is using content based contextual ads. Google adsense dominates the ad

market. However to your surprise you discover that Google ads don’t show up when you add them to the contents in ‘content div’ shown above. To understand why the ads don’t show up we need to understand how the Google ads or any other javascript ads work on a webpage. To add ads on your webpage you sign up with Google Adsense. Google provides you a Javascript code snippet which runs when your page loads as shown in Figure 2. When the tab content is fetched from server as a part of content div in Figure 1 Ajax sets content div.innerHtml = fetchedHtml This statement does change the content of content div but the Javascript snippets in the fetched html are not evaluated by browser. As a result, step 1 in Figure 2 fails. The adsense server uses keywords gathered from your page to generate ads in step 3 of Figure 2. Google Adsense bot visits different pages on your website to index pages and gather keywords. If Ajax is used Google adsense bot can gather keywords only from the index. html page at h p://your-website-link.com. Thus even


Geek Corner the one in Figure 1. A good news is problems 3 and 4 do have a solu on. The solu on is to provide both href links (for non java script users) and handle onClick event for making an Ajax call. Consider the snippet below, < a href = “link1.html” onClick=”ajaxCall(‘link1’); return false;”> link1 </a> Now when a user with a browser not suppor ng javascript visits the page and clicks on link1 the page at link1.html opens. If javascript is supported by the browser the onClick func on call executes fetching the data using Ajax. The hrefs direct the Google search engine bots to the linked web pages causing them to be indexed as well. A very good approach which solves problems 3 and 4 is that of using Hijax which treats Ajax as an enhancement. if you do manage to evaluate the javascript in content Hijax div html code the ads which will be displayed in content Hijax approach is a very simple idea: div will be irrelevant to its context. 1. First, build an old-fashioned website that uses hyperlinks and forms to pass informa on to the server. 3. Search Engine Indexing The server returns whole new pages with each request. An important metric for judging the popularity of a 2. Now, use JavaScript to intercept those links and website is its Google page rank. Google bot visits your form submissions and pass the informa on via web pages and ranks your website based on the content XMLH pRequest instead. You can then select which in your web pages. If your website is developed using parts of the page need to be updated instead of Ajax as shown in Figure 1 every me the google bot upda ng the whole page. visits your website only the page contents of index. Hijax Example, html are returned. This does not allow all the pages window.onload = doPopups; from your website to be indexed leaving your website func on doPopups() { with a poor page rank. if (document.getElementsByTagName) { var links = document.getElementsByTagName(“a”); 4. Non-Javascript support for (var i=0; i < links.length; i++) { If a user with a text browser like lynx which does not if (links[i].className.match(“help”)) { support Javascript visits your website it is not possible links[i].onclick = func on() { for him/her to visit other pages apart from index.html. window.open(this.getA ribute(“href”)); // this can be replaced by a Ajax call In order to avoid problem 1, it is necessary to ensure return false; that your naviga on menu is not completely ajaxified. }; Few parts of the content could be loaded using Ajax } but not the en re naviga on menu should be using } Ajax. Few have proposed work arounds for problem } 2 like the one at h p://www.jguru.com/forums/ } view.jsp?EID=1305379 but it violates Google adsense <a href=”help.html” class=”help”>contextual help</a> program policies. Right now there is no support from For more informa on on Hijax please visit h p:// Google for websites which use Ajax for naviga on like domscrip ng.com/presenta ons/xtech2006/

Linked List

19

Computer Science and Engineering Association, IIT Guwahati


Digital Mind

20

CSP for the Verification of Security Protocols Niteesh Kumar, 2nd Year, B.Tech. CSE CSP is an abstract language designed specifically for the descrip on of communica on pa erns of concurrent system components that interact through message passing. It is underpinned by a theory which supports analysis of systems described in CSP. It is therefore well suited to the descrip on and analysis of network protocols. Protocols can be described within CSP, as can the relevant aspects of the network. Their interac ons can be inves gated and certain aspects of their behaviour can be varied through use of the theory. Formalisms based on Hoare’s Communica ng Sequen al Processes (CSP) and Milner’s Calculus of Communica ng Systems (CCS) for verifying protocols are currently being used by the Interna onal Standards Organisa on (ISO). However, these models need to be extended if protocol performance specifica on and verifica on is to be done, as neither of these models have ming informa on (other than sequencing) nor a way of specifying controlled loss of informa on. CSP descrip on of a protocol has a precisely defined seman cs - it is a precise mathema cal ques on as to whether the protocol meets the property or not.

par cular messages. Although standard proof rules would support the verifica on . since they are sound and complete; it is preferable to develop a specialised theory since it provides an appropriate level of abstrac on for suppor ng the kind of reasoning we require. The authen ca on property we consider states that if some events R in the system are restricted, then other events T should not occur, We establish this by defining a suitable rank func on on messages which shows that only messages above a par cular rank can circulate in the restricted system and hence, messages from T are not possible.

A network provides a means for users, such as people or applica on programs to communicate by sending and receiving messages. This situa on may be modelled at a high level of abstrac on in CSP as a process NET which provides to each user two ways of interac ng with it, sending messages to other par es and receiving messages from other par es. There are two views from which security proper es can be considered. One is from the viewpoint of the users of the network who do not know which other par es are to be trusted. One of the strengths of CSP is the ease with which Proper es expressed from this viewpoint will generally specialised theories can be constructed on top of the include assump ons, implicitly or explicitly, that a seman c model. This allows par cular specifica on user’s communica on partner will not act contrary to statements to be defined in terms of the standard the aims of the protocol. For example that any shared seman cs, and new proof rules appropriate to these secrets should not be disclosed to third par es from specifica ons to be provided. This approach is taken a high level, God`s eye view which iden fies those where we specify and reason about authen ca on nodes which follow their protocols faithfully and also proper es and also about agent’s inability to generate iden fies those which are engaging in more general


Digital Mind ac vity; perhaps in a emp ng to a ack a protocol. If this view is taken, then care should be taken to ensure that this privileged informa on is not accidentally used in the protocol descrip on. The responses of a node should not be dependent on informa on which is available only at the high-level view. In some circumstances node may not have knowledge concerning its communica on partner; in other cases, a protocol may be invoked only when communica ng with par cular known and trusted users. How this knowledge and trust is obtained is outside the scope of this ar cle.

We also assume they cannot be generated by user which would be true for example for signed messages, though this is a simplifying assump on that is not jus fied in all circumstances. Other messages , such as encrypted messages or control messages will in general be available to eavesdroppers but confiden ality is not concerned with protec ng these messages. Security proper es are generally proper es requiring that something bad should not occur, though they are not exclusively of this form. These tend to be considered as safety proper es. But there is a dis nc on to be drawn between the security requirements implemented by such a protocol, and its liveness requirements which are important for communica on but which are generally independent of security. It is possible that there are some security proper es which can be expressed only as liveness proper es; hence the traces model for CSP will be adequate for our present needs to analyze proper es of the form.

Security protocols are designed to provide proper es such as authen ca on, key exchanges, key distribu on, non repudia on proof of origin, integrity, confiden ality and anonymity for users who wish to exchange messages over a medium over which they have li le control. These proper es are o en difficult to characterize formally or even informally. The protocols themselves o en contain a great deal of combinatorial complexity making their verifica on extremely difficult and prone to error. Process algebra can provide a single framework both for modeling protocols and for capturing security proper es facilita ng verifica on and debugging. Security proper es such as confiden ality and authen city may be considered in terms of the flow of messages within a network. The use of a process algebra such as Communica ng Sequen al Processes (CSP) seems appropriate to describe and analyze them. Security proper es may be described as CSP specifica ons; how security mechanisms may be captured and how par cular protocols designed to provide these proper es may be analyzed within the CSP framework. It has been argued that security proper es should be considered as proper es concerning the flow of messages within a network, to the extent that this characteriza on is jus fied. For analysis purposes, we will consider the system from the God’s-eye view. Confiden ality will be captured as a specifica on requiring that any message output to user, must have actually been sent to user. We restrict a en on to the message set M as being those messages which are intended to remain confiden al. Linked List

21

Computer Science and Engineering Association, IIT Guwahati


Nostalgia

22

Abhishek Gupta Abhishek Gupta (h p://www.linkedin.com/in/ abhishek85gupta) is a CSE alumnus from the batch of 2004-2008. He has recently completed his MS in CS at Stanford with a specializa on in Ar ficial Intelligence(AI). He worked as an intern with Apple Inc in 2009 and is currently working as a So ware Engineer with the Search Team at LinkedIn. He shares with us some really bright insights and sugges ons for the academic system at IIT Guwaha . Ni n Dua and Siddharth Prakash Singh interview him for CSEA. Tell us about your life as a student at IITG? Any specific moments which you will cherish for life? I am a typical CS student. I used to love the Physics and Math classes in the first year. In the second year we were given a computer in a lab with AC! From that point onwards, I spent almost all my me in the CS lab. I used to really enjoy the programming assignments. Not only was the CS lab a great place to work but also to chit-chat with my fellow classmates. I cherish the late night coding sessions for the assignments, the 4-Bit CPU all-nighters and the beau ful PINTOS. I also used to enjoy my discussions about life and IITG with Singh, Nangia, Aggarwala and Aditya Raj. IITG was a great learning experience for me not just professionally, but personally as well. I cannot thank IITG enough for all that it has given me. How would you compare student life in India to that in the US? What addi ons and changes would you like to see in the IITG educa on system? This is a great ques on. I have been thinking about this for a long me now. I feel the student life is pre y much the same. But the amount of learning that you do per unit of me spent is much higher in the US. One of the reasons is that every course devotes a lot of resources in terms of Teaching Assistants’ (TA’s) office hours, starter code for assignments,

discussion sec ons lead by class TA’s. This greatly smooths the ini al learning curve, gives the students confidence and encouragement that there is a team of professionals only for helping them understand the subject be er. Finally, having starter codes help you in learning the core ideas of the course without having to deal with other orthogonal issues. All this results in a lower barrier to entry for exploring something new. To put things in perspec ve, almost all the courses are like the Opera ng Systems(PINTOS) course at IITG but with a much be er TA support! Another important reason is that over here one has more op ons in terms of courses. As a result of which, one ends up doing something that one really likes and hence the student has much higher mo va on to learn. Furthermore, learning more ideas per unit of me spent encourages students to explore more. This in turn increases their breadth of knowledge and helps them make a more objec ve decision so as to what it is that they do or do not like. Even at places like IITs, a large frac on of students say that they don’t like their courses and they were be er off doing something else. There is nothing wrong with this statement per se. But the truth is that a vast majority of these students give up at the start itself because of the ini al (un)smooth learning curve. Had the ini al curve been smoother, then the students would not have given up and would have actually learned the subject. Having learnt the subject, students might have been in a be er posi on to make an objec ve and informed assessment of their interests. The following changes might be helpful: a. Problem: Its hard for students to figure out interes ng things in CS especially the interes ng things happening in IITG itself. Solu on: Every semester there should be a seminar course (OPTIONAL TO ATTEND) where every Professor gives a 30 minute talk on what research he has been upto, what are the interes ng areas related to his field


Nostalgia of study and why it is exci ng, which courses might be relevant for students if they want to work with him. This would help both 1st year, 2nd year and 3rd year students in gaining a holis c understanding of the available opportuni es at IITG and would help them plan their degree.

state-of-the-art learning experience e.g. PINTOS! Can you briefly discuss your present works and your future plans? I started my MS in CS at Stanford University immediately a er I graduated from IITG. I recently graduated from Stanford with a specializa on in Ar ficial Intelligence (AI). I am currently working as a So ware Engineer b. Problem: Most of the coding assignments are graded with the Search Team at LinkedIn. My goal for this year by demoing them to the TA and there is no plagiarism is to build a Recommenda on Engine for LinkedIn to detec on. This unfortunately acts as an incen ve for recommend jobs, people and News. Eventually I intend students to copy code from others. This hampers the to start something of my own. I am s ll figuring out the student’s understanding significantly because in CS a remaining details! large part of the learning happens when you actually sit down and code. What has been the mo va on for your strong Solu on: Coding assignments should be checked academic and research orienta on? by code and not by manually demoing them to I am in love with the idea that in CS, one person can TAs. Code plagiarism should be detected by MOSS. fundamentally change how people live their lives! Students should be told beforehand of the poten al Furthermore, as I took more CS courses at IITG, my consequences of copying stuff. belief in this hypothesis and my liking for CS only grew stronger. c. Problem: IITG lacks role models. This results in lower mo va on and lower self-confidence amongst the Few words of advice for your junior batches? students. Most students are clue-less about what they The goal of college is to help you to figure out what you want to do in lives. A sneek peek of what their seniors/ want from your life and what you like. I would like to alumnis are up to might be helpful. quote Steve Jobs here “Your work is going to fill a large Solu on: Increase the visibility of what their alumni/ part of your life, and the only way to be truly sa sfied seniors are up to within the current batch of students is to do what you believe is great work. And the only by linking alumni’s web pages from IITG CSE students way to do great work is to love what you do. If you homepage. Time and again interview a few alumnis haven’t found it yet, keep looking. Don’t se le. Your and 3rd/4th year students, about what they have been me is limited, so don’t waste it living someone else’s upto in the past 6 months or so and post it on a Google life. Don’t let the noise of others’ opinions drown out your own inner voice.” Group. To this effect, college is a great place to go out of d. Problem: It is hard for people to gauge the end-goal your comfort zone and explore stuff. The only way to of courses. By the me people realise it, it’s already the find what you truly love is by having an open mind and end of the semester. Having more informa on about explore what seems interes ng to you with passion and why the material covered in the course is useful, what perseverance. Ul mately, this is the only way you can is interes ng in the course, what are the pain-points make an informed and objec ve decision about your etc. would be really helpful. In summary, there is no life based on what you like, and not based on what others like. Don’t be afraid of failures. As a guiding easy way to transfer wisdom of seniors to juniors. Solu on: It would be nice to have a webpage for principle, always think of the day when you would every course where students who took those courses graduate. Do you really want regrets and ‘WHAT If?’s a er you graduate? Try to find like-minded colleagues last year can post their views of those courses. and make discussion groups for things that you like. e. Problem: Bad learning curve for students and non- It is great to have smart people around you, people with whom you can debate various guiding principles standardized assignments. Solu on: Collaborate with universi es like Stanford, of life and design choices in CS. Leverage their grey MIT. Use their assignments and lecture notes, if ma er and try and learn together with them. Finally, possible. This would help provide students with more remember that IITG is just the beginning. Linked List

23

Computer Science and Engineering Association, IIT Guwahati


TidBits

24

You@YourDomain with Google Mail Karthik R, 2nd Year, M.Tech. CSE Ever wondered about checking your own domain’s email through a GMail-like interface? Karthik R, 2nd Year M.Tech. CSE guides us through a step-by-step tutorial for se ng up mail for your domain using Google Mail. This system is already in use in various ins tu ons like IT-BHU and VIT.

Step 2: Create Google App Engine Account Step 2.1: Visit h p://code.google.com/intl/en/ appengine/ and create an App Engine account

Google Mail is so ware that is given as a service to support mail for any domain, so that one can access mail from their site using Google mail interface. This is achieved by direc ng all mail des ned to our domain to be sent to Google’s mail exchange server. To do this, one must set their mail exchange server as that of Google. Now all mail des ned to our domain would be sent to Google. One can use the Google Mail Applica on to access mail, just like accessing any mail from Gmail.com. In fact, Gmail is also implemented using the same model.

Edi on Step 3.2: Use ‘Get Started’ sec on and add your domain name as Administrator Step 3.3: Specify Account Details for your account Step 3.4: Create and Administrator account for your domain

Step 1: Set up the Mail Exchange (MX) records to ASPMX.L.GOOGLE.COM Step 1.1: Login to CPanel of your domain Step 1.2: Open MX Records sec on under ‘Mail’ Step 1.3: Select your domain and set the value of MX to ASPMX.L.GOOGLE.COM

Step 5: Accessing mail Step 5.1 Now, use the link mail.google.com/a/ domainname to check mail. A er ownership is confirmed you can add Chat, Contacts, Calendar, Documents, Sites and Mobile to your domain, by selec ng from Google Apps Dashboard.

Step 3: Add Domain name to Google App Account Step 3.1: Visit google.com/a/ and select Standard

Step 4: Confirm ownership of domain Step 4.1: Select Upload a HTML file mode to confirm ownership of domain Step 4.2: Create a file with given name and contents and upload it to root directory Follow the steps given below to setup Google Mail for Step 4.3: Now use the Confirm ownership link and your domain: finish the procedure


4th Year Special

25 21

With the end of the BTP season approaching fast, let us have a look at a few of the BTP topics on which our beloved 4th Yearites have been working !

Ni n Kumar Gupta : PIVO - Improving Web Browser History Tools Most of the modern web browsers provide history tools that allow people to select and revisit pages that they have viewed before. However, these tools tend to take ad-hoc approaches that do not appear to take advantages of the past research. You will be surprised to know that in a recent study as many as 41% of the par cipants were unaware of a history list available in their web browser. How many mes have you tried to search something in your browser history and then se led for a Google search? Isn’t it some mes frustra ng when you are looking for something which you visited days before, and now you can not find it in your browser history (although it is there, you are just unable to recognize it!!)? Our goal, in this project, is to develop a web browser history tool which will provide support for recurrent behavior, query reformula on and will provide visual cues in order to facilitate recogni on. We apply associa on data mining to find the related pages in the history for a webpage and present the results in a easily browsable ‘hubs and spokes’ architecture providing annotated trails that users can follow to reach some desired page in the history. We also index web pages from history and past search queries for major search engines (Google, Yahoo and Bing) to facilitate local searches and be er results.

Siddharth Prakash Singh : Byzan ne Fault Tolerant, Scalable Database System Architecture Database management systems are now turning into sophis cated, complex so ware having millions of lines of code. These so ware systems are built to reliably implement the ACID (Atomicity, Consistency, Isola on, Durability) seman cs while achieving high transac onal throughput. With increasing complexity of the so ware, bugs become inevitable, in spite of the best efforts put in by the vendors and developers. Bugs in the so ware system can lead to faults which may immediately crash the system. The database systems are designed to recover from these crash faults by using the write-ahead log. Crash faults can lead to down me during recovery which has been taken care of earlier by using replicated systems. However, bugs may also cause another class of faults - Byzan ne Faults. These are arbitrary faults which can lead to incorrect execu on of a query and hence returning wrong results to the client or inser ng wrong data in the database. In fact, even if a bug eventually led to crash, the system might have exhibited byzan ne behavior producing erroneous results before crashing. Byzan ne faults are hard to detect and hence even harder to prevent. These can be tolerated by replica on using the solu on of famous Byzan ne General’s Agreement Problem. But this solu on leads to a very low-performance system. Prevalent database systems are not capable of tolera ng byzan ne faults. In this project, I have proposed a middleware based replicated byzan ne fault tolerant architecture for database systems without losing much on performance. Normally for any applica on, the number of read opera ons is much larger than that of write opera ons. Hence, to achieve read opera on scalability, I am trying to integrate memcached based caching facility with the middleware.


4th Year Special

26 21

Nipun Sehrawat : Scalable Load Balancing for the Cloud

Cloud compu ng is a distributed compu ng paradigm, marked by dynamic provisioning of compu ng resources, such as processing power and storage, from a ”cloud” of such resources. The advent of cloud compu ng is said to have started a transi on in the IT industry, from having their own data centers to using compu ng resources from a cloud, analogous to the shi in using private generators for electricity produc on to depending on power grids for electricity requirement. Scalability is one of the prominent features offered by cloud compu ng for services such as web hos ng. Typically a website is hosted on mul ple virtual servers in the cloud, depending on the overall amount of traffic being experienced by the website. With such an architecture comes the problem of load balancing among various servers that are collec vely hos ng a single given service. Most of the current solu ons are hardware based proprietary solu ons, which offer limited scalability and fault-tolerance. In this work, we implement a so ware based distributed load balancing solu on, which has a be er scalability and fault-tolerance. This solu on works in conjuga on with Eucalyptus, which is an open source cloud compu ng implementa on. In a cloud compu ng environment, where one has to pay according to the amount of compu ng resources being used, automa c scaling-up and scaling-down of Load-Balancers becomes an important issue. This is addressed by running Load-Balancers in pre-configured Virtual Machines, which can be easily deployed, suspended and resumed. The work involved (1) Modifying a Kernel based Load Balancer (KTCPVS), (2) Modifying a DNS so ware (Unbound) and (3) Java-RMI based distributed programming.

Abhishek Anand : Machine Learning for Efficient Garbage Collec on in Flash Filesystems

Unlike disk drives in which there is mechanical movement of disks and the head to select a par cular block of the disk for I/O, in flash-drives there are no moving parts. Flash memory consists of blocks of semiconductor devices which are selected electronically. This has many advantages: no seek me(the me required in disk-drives for the disk and head to move to the desired loca on), ability to endure extreme shock, high al tude, vibra on and extremes of temperature, silent opera on and often less power consump on. The downside is that unlike disks, a block has to be erased before it can be wri en again(modified). Moreover to even modify a bit, you have to erase a whole erase unit(typically of size 512KB). Therefore, when a file is modified in a Flash filesystem, the corresponding part/page is wri en at some other loca on instead of erasing the previous loca on. The previous loca on now contains state data. In course of me, a large frac on of the device can contain stale data and hence a process called Garbage Collec on is required. It recovers those stale loca ons by dele ng their erase units and moving the non-stale data in those units to other units. Naturally, we would like the unit we want to erase to have only state data so that no copying of non-stale data is required. This calls for grouping files which are overwri en together into same units. In the past many people have grouped files with similar overwrite frequency together. However, the overwrite frequencies of newly created files are not available. In my BTP, I’m using machine learning to predict the overwrite-frequency of a file when it is created so that it can be grouped properly. Preliminary results have shown that various a ributes of files like the path(folder) in which created, it’s owner, the applica on which created it can predict it’s overwrite-frequency with great accuracy. Moreover, once the overwrite frequencies are available, we s ll have to answer ques ons like how many groups to form, what should be the ranges of overwrite frequencies of those groups. To answer these, I formulated a mathema cal model which approximates a flashfilesystem and used that model to find the op mal grouping. The final step is to do simula ons to prove that my techniques indeed reduce the garbage collec on costs.


4th Year Special

27 21

Mukund R : On State Reachability in Counter Automata

Imagine you are wri ng so ware for a microwave oven. A er you complete the so ware, and the device passes ini al tests, your manager asks you a simple ques on, “Are you sure that the microwave emi er (he would probably use the more technical term, magnetron) will not be on if the door is open?” For a simple program of a few lines convincing your manager of this fact may be simple. But look at how difficult it is to analyse even a small quicksort implementa on - for most programs that are prac cally of any significance, it is difficult to be sure. And s ll, there are thousands of programs with millions of lines of code each - airplane autopilots, opera ng systems, webservers, in nuclear power-plants etc - where we need to be absolutely sure that there are no mistakes. That’s where formal verifica on comes in. Across its different flavours and variants, the common goal is typically to make a computer automa cally verify the correctness of some program, system or model. Program verifica on is closely related to the problem of automa c theorem proving. It was once a goal of mathema cians to have a systema c procedure by which they could prove theorems (you see, everybody is, at heart, of a par cularly lazy breed.). And come to think of it, what we want to do in program verifica on is just what you did in algorithms class - prove that some algorithm is correct (the only difference being that we want it done automa cally.). But alas, the results of Turing and Godel have shown that it is an elusive goal - it just isn’t possible, theore cally, to mechanically show that some program obeys some “non-trivial” property. Ideal program verifica on may not be possible, but that doesn’t mean that cases of “prac cal” significance cannot be dealt with. What we are interested then, is in specific subclasses. My BTP is on one such subclass - the class of counter automata. The term automaton may be familiar to those who have gone through a course on automata theory, but for others (who have a ended digital design), it is an idealized version of the finite-state machine you might be familiar with. Now imagine that these have access to a finite number of integer-valued counters - and you have a simple counter automaton. The problem is now to decide whether, given a counter automaton and an ini al configura on, a final configura on is reachable. “Will the airplane ever do a nosedive if the al tude is below 10000 ?” - this might be a typical ques on we wish to answer. About 10 years ago, it was shown that if the automaton were “flat” then we can say this (rather, write a program that can say this.). The specific result was that if a counter automaton is flat, then its reachability rela on is effec vely Presburger-expressible (you might want to read the Wikipedia ar cle on Presburger arithme c.). In my BTP, I am looking at what happens if a counter automaton is not flat, and why, although the reachability rela on is Presburger-expressible, it cannot be mechanically computed.


Linked L inkkedd L List istt Brought to you by: Computer Science and Engineering Associa on, Department of Computer Science and Engineering, Indian Ins tute of Technology Guwaha Email: csea@iitg.ernet.in Website: h p://csea.iitg.ernet.in

The Editorial Team •

Om Prasad Patri (Editor)

Abhishek Anand

Ni n Dua

Siddharth Prakash Singh

Vinay Rajput (Design) Save Trees. Do not waste paper.

Mail in your sugges ons to csea@iitg.ernet.in. Visit h p://csea.iitg.ernet.in for more.


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.