BitStream CSE Newsletter 1.2
Ask Me Anything
Page
2
For the Better Society
Page
3
Informatics Lab
Page
6
Cheerful Clan at CSE Feast
Hey, We are excited to bring you the edition 1.2 of CSE Newsletter, BitStream. We hope that you enjoy reading it as much as we did making it. We are looking for constructive feedback, article ideas and any other suggestions. Feel free to contact us. Also, if you are interested in writing articles, you are most welcome to join the team! Sai Sandeep and Raghava Sarma bitstream.cse@gmail.com
Team : Bhavesh Singh, Harshal Mahajan, Shachi Deshpande, Shubham Jain, Yashwanth Reddy Special Thanks : Aman Gour, Akshay Bapat, Pushkar Dongare Design : Raghava Sarma
Page 10
Ask Me Anything What do you think is the difference between academic culture at IIT Bombay and Stanford? Stanford is a fantastic school for doing graduate research. It’s not just Stanford, this applies to any top research university. Historically, IITs have focused on undergraduate education and gained their reputations primarily as teaching institutions. Their students go on to other research universities where they do a lot of stuff and become well known. The research presence in IITs is not as strong as in Stanford or Berkeley or MIT, even though the undergraduate programs are pretty comparable, in strength at least. One nice thing about Stanford, which is a great model to follow, is that it’s not a technical institute. It’s an all-round university. It has several departments in the humanities, social sciences and every other field of study that you can think of. This really adds to the richness of the Siddhartha Chaudhuri is an Assistant Professor in the department of place. I think in IIT we have a rather narrow view Computer Science and Engineering at IIT Bom- of what it means to have a rounded education, or bay, where he currently holds an Institute Chair what constitutes a broad educational ecosystem. Assistant Professorship. Previously, he spent a year as a lecturer in the Computer Science department What are the reasons why you chose teaching rathat Cornell University. Earlier, he was a postdoc at er than going to the industry after Ph.D.? Princeton University. He received his Ph.D. from There are a bunch of reasons. Partly, you could Stanford University in 2011, supervised by Vladlen say that I was institutionalised. Lots of people in my family teach, so I came to it from a position Koltun. of privilege: I knew what the teaching life was like Can you tell us about your research work at IIT and I liked it. I like the fact that I can constantly engage with people. I like the fact that there’s a Bombay? My research at IIT Bombay is a continuation of my lot of freedom. Not just the freedom to make your work before I came to the department. I work in an own courses, to teach the way you want, or to learn area of computer graphics called Shape Analysis, from your students. But also that, in a place that which deals with the geometry of everyday things. combines teaching and research, you’re really free If I see a table and a chair, as a human being, I find to set your own agenda. In industry, the things you it easy to tell them apart. But it’s very difficult for a do and the projects you take up are driven by comcomputer to solve this problem. It’s partly a prob- pany strategy and the market. This is unavoidable. lem in computer vision, but we’re also interested A research university gives you all the nice things in generative questions: what makes a good chair? that come with teaching -- students are great peoIf I’m designing a chair, how do I make one that’s ple to work with: they’re young, motivated, ask the right for you, or that has certain aesthetic or func- hard questions no one else asks -- and at the same tional properties? So my goal is to teach computers time you can set your agenda in terms of time, in to reason about shapes in a meaningful way, and terms of what you focus on, in terms of what questhe way we learn this is to analyze large amounts tions to ask that don’t necessarily have a payout in of data. By looking at many, many chairs, tables, one or two years but have a longer term goal. airplanes etc, we try to learn structural, semantic, cognitive principles that go into the shapes of Is it a good idea for students to take up a job in objects. Once we build statistical models of these industry and then do Ph.D.? principles, we can incorporate them in design Well, I’m nobody to tell students what they should tools. So if we’re designing, say, an airplane and we or shouldn’t be doing. Students are adults, they give the tool very high-level directions, e.g. “I want know what their life choices are. That said, I feel the plane to fly fast”, then the tool with its learnt happy with my decision to have gone to grad school domain knowledge should be able to help you with straight out of my B.Tech. What you gain from that is that you don’t lose the academic momentum: the that task. B.Tech. prepared you for something, and then you apply it in a specific research area. You become a How was your Ph.D. experience at Stanford? It was fun. The best thing about a Ph.D. is that you much more focused individual, so what you gained spend 5-6 years in sustained engagement with a in breadth as an undergraduate, you now concenbranch of knowledge. You really feel that you’re trate in depth. This seemed like a natural transitrying to make a dent in some small way on the tion for me. I think that the fraction of people who frontiers of what human beings know, at the same make this choice has gone down across IITs. In my time, you’re having a lot of fun doing it. At the batch, not that long ago, 20% opted for PhDs. Now, same time, research can be frustrating. In the mid- I hear it’s in the low single digits. dle years of my PhD, I wasn’t sure where I was going. What I ended up doing my Ph.D. in was not How do we bridge the gap between students and what I entered grad school thinking I would do. I faculty? went from being a theoretician proving theorems Well, there are activities which students and faculty to being an applied geometer who codes a lot. Over can do together, for example, that trek that we went 6 years you have ups and downs, but it was a very on. Activities like hackathons and robotics competitions, where you’re actually doing something fun experience.
together, are fun. Doing research or R&D with a faculty member is a one-on-one way of getting to know people. But I’d like to see more crossover activities bringing the campus community together. For example, I’d like to see a faculty plus students music fest, or outdoor trips. What are 3 things that you love about students at IIT-B? I love the fact that you’re very enthusiastic about doing lots of things. People are engaged with academic, extracurriculars, start-ups, research, and many more things. I like that so many folks are open-minded and are willing to have conversations about all sorts of things. And there seems to be a strong sense of community, the feeling that we’re all in this together and we can help each other out. 3 things that you would like to see improved? I think I’d address that question more in terms of how we set up these systems for interaction and growth. I’d like to see more interaction between faculty and students, which is a responsibility also shared by faculty. I’d like to see less respect for faculty. Treat them as peers and collaborators. No one should call faculty “sir” or “ma’am”. This colonial-era practice needs to go away. We should address each other by first names. What do you enjoy the most apart from teaching/ research? Well, things I’ve engaged with fairly deeply over many, many years include: going to the mountains, football (I play on the IITB staff team) and Hindustani classical music (primarily as a listener, not a performer). Do professors take a bath every day? Yes, sometimes more than once, depending on how hot and sweaty it is. Did you take a bath regularly when you were in college? No, especially during the Kanpur winter. I wasn’t as bad as some, though. Which movies did you enjoy the most? Among recent movies, I really liked Aligarh and Masaan. What is the next big thing after internet that will change our lives dramatically? Well, I don’t think any person is qualified to give this answer. I’d suggest global warming. I don’t have a tech-related answer, also I don’t think the answer necessarily has anything to do with technology. What is your advice to current students? Engage in a sustained way with something. By this I don’t mean go to grad school, though that’s one way of doing it. Take up something and engage with it in a meaningful way, gain depth and not just breadth. Keep an open mind about things, and keep a broad mind. The world is not just about tech. Happiness is not just about making money. We need to figure out what gives us happiness and go after that in a sustained way. We need to be good human beings and maintain good relations with people we care about and people we don’t care about. What can I say beyond: be a decent, engaged, nice human being! Video link: tinyurl.com/ama-sid
Page 11
For the Better Society Work Done by our Professors for Social Causes Just as we excel in academics, our Department is well known for its social efforts. We have many faculties striving hard for the betterment of the society. Our department has Prof. Milind Sohoni as the core faculty at CTARA, and Prof. Purushottam Kulkarni and Prof. Om Damani as the Associate Faculty. Our professors have done countless projects ranging from environment protection to rural education. In the water sector, CTARA has done numerous studies in groundwater modelling, groundwater regulation, analysis of water supply schemes, gram panchayat level reporting, simulations, optimization and feasibility studies.One of the major projects the professors are involved is Drinking water. The project which was undertaken by Prof. Milind Sohoni, Prof Purushottam Kulkarni and Prof. Om Damani. They worked for the better of Karjat Taluka in Raigad district. Along with the groundwater analysis, they completed the geospatial analysis of not only Thane but also of the Raigad District, bringing great reforms in the water supply schemes in these villages.
saging) which enables push-to-talk and messaging applications within a village. The whole purpose of education is to turn mirrors into windows. Other than rural development, CS department focuses on educating the less fortunate as well, for free of cost. Prof. Deepak Phatak is one of the eminent professors in Computer Science Department. The Affordable Solutions Lab(ASL) was founded by him in 2000.Since then, through ASL, many nice projects have been undertaken for the benefit of the society as a whole. One of the projects running under ASL is ‘Train 1000 Teachers’.About 350 colleges in the country have been identified as Remote Centres and workshops for teachers are organised regularly at these places. The lectures in the workshops are delivered by some of the excellent professors at IIT Bombay, including some professors in the Computer Science Department. These include Professors Deepak Phatak, Kameswari Chebrolu, Sudarshan, Supratik Chakraborty, Soumen Chakrabarti to name a few. The lectures are transmitted live and people from remote centres can interact with the professors delivering lectures. There are coordinators at each of the Remote Centres which liaise between the IIT Bombay faculty and the participants in workshops. .These workshops are free of cost and can be attended by any teachers of any Engineering/Polytechnic/Postgraduate Science Institute with knowledge in the specified subject.This programme has benefitted more than 10000 teachers at this stage, and the expansion of this work continues.
For better internet connectivity in the rural areas, Professors Bhaskaran Raman, Kameshwari Chebrolu and Purushottam Kulkarni introduced FRACTEL (wiFi-based Rural data ACcess and TELephony) which seeks to use a combination of long distance links and local access links to enhance the network connectivity of the rural villages. Besides FRACTEL, these professors have devoted their lives to many other projects for improvement of rural connectivity through various projects such as Computer Science Department has also participatLo3 (Low cost, Low power, Local Voice and Mes- ed in the Massive Open Online Courses(MOOC)
under the EdX platform. The basic course of CS101 in 2 parts has been offered until now. Each part spans for 6 months, and students anywhere on the globe can access these contents for free. Also, they can get a verified certificate of the course at a minimal cost. Prof. Phatak travelled to many different colleges in the country and concluded that there was talent in many small places too. But due to lack of proper guidance and motivation, such students were not able to bring their ideas to reality. So he initiated a wonderful project ‘Eklavya’. Under the condition of releasing their work as Open Source, students could get good mentors for their projects and a very good web-based support. This is especially important for the final year students in various engineering colleges across the country, where good guidance is not available. Eklavya was launched in 2004. This project has succeeded in helping many students build their ideas into reality, and these ideas have been released as open source for the benefit of the society. Eklavya is not working right now, but the ideas forming the core principles of Eklavya project continue to be implemented through various activities undertaken by the ASL. In addition to this, Prof Phatak worked on many other projects aimed at improving the quality of education. He developed low-cost Clicker devices, which provides an easy way to conduct and evaluate small quizzes, and students can submit their answers with just a few ‘clicks’. This makes the process of continuous evaluation quick and accurate. He was also given the responsibility of executing the well-known Aakash project of the National Mission.
AlphaGo, using ML to master the game Go AlphaGo is a computer program developed by Google DeepMind in London to play the board game Go. In October 2015, it became the first Computer Go program to beat a professional human Go player without handicaps on a full-sized 19×19 board. In March 2016, it beat Lee Sedol in a five-game match, the first time a computer Go program has beaten a top professional without handicaps. Although it lost to Lee Sedol in the fourth game, Lee resigned the final game, giving a final score of 4 games to 1 in favour of AlphaGo.
imagination for centuries. There are more possible positions in Go than there are atoms in the universe. That makes Go a googol times more complex than chess.This complexity is what makes Go hard for computers to play, and, therefore, an irresistible challenge to artificial intelligence (AI) researchers, who use games as a testing ground to invent smart, flexible algorithms that can tackle problems, sometimes in ways similar to humans. The first game mastered by a computer was tic-tac-toe in 1952. Then fell checkers in 1994. In 1997, Deep Blue famously beat Garry The game of Go originated in China more than Kasparov at chess. It’s not limited to board games 2,500 years ago. Played by more than 40 million either, IBM’s Watson bested two champions at people worldwide, the rules of the game are sim- Jeopardy in 2011. ple: Players take turns to place black or white stones on a board, trying to capture the opponent’s Traditional AI methods which construct a search stones or surround empty space to make points of tree over all possible positions don’t have a chance territory. The game is played primarily through in- in Go. AlphaGo combines an advanced tree search tuition and feel, and because of its beauty, subtlety with deep neural networks. These neural networks and intellectual depth it has captured the human take a description of the Go board as an input and
process it through 12 different network layers containing millions of neuron-like connections. One neural network, the “policy network,” selects the next move to play. The other neural network, the “value network,” predicts the winner of the game. The neural networks have been trained on 30 million moves from games played by human experts until it could predict the human move 57 percent of the time (the previous record before AlphaGo was 44 percent). But the goal is to beat the best human players, not just mimic them. To do this, AlphaGo learned to discover new strategies for itself, by playing thousands of games between its neural networks and adjusting the connections using a trial-and-error process known as reinforcement learning. Of course, all of this requires a huge amount of computing power, so they made extensive use of Google Cloud Platform.
Page 100
Know your Alumni, Ravi Kannan Ravi Kannan is one the top researchers in the known world on computational complexity aspects of linear algebra. One of his best results is on computing the approximate volume of a convex body in high dimensions, and his most “magical” result is on the power of random sampling for linear algebraic computations. He graduated from IIT Bombay in 1974. He was awarded the Fulkerson Prize in 1991 with Martin Dyer and Alan Frieze for their wonderful work on the volume problem and the Knuth Prize in 2011. Can you brief out your current work at Microsoft? what are the broad implications of your current research? For the past 10 years or so, I have been working on algorithms on linear algebra, and sampling from a large matrix, where you do some sampling on large matrices to reduce the size and then work on smaller samples. What is the motive behind your shift from University teaching to Microsoft Research? I wanted to give more time to pure research. And thus the shift. Also, though I am working in Microsoft, it is not about production or development. I am more involved in publishing papers, working on hard problems. What inspired your initial interest and work in algorithm based research? Was there any one scientist/philosopher/academic/writer who ignited your interest? The Undergraduate thesis was the point I started to think about it. My mentor H Narayanan who was a PostGraduate student has inspired me. He was also in his last year and being in same hostels we spent a lot of time discussing and I also had discussions with other professors and postgraduate students, which helped me in this decision. What is the difference between Ph.D. during your time and our time? During our time Ph.D students had to undergo three semesters courses to go deeper into subjects they want to pursue in and later they start with research. Today, you start working on problems from day 1 of your Ph.D, especially in fields such as systems or databases. Which classes of algorithms will get more attention in future? More of optimization techniques, ideas involving numerical methods. Optimization plays a key role in machine learning. Ten years ago, machine learning was not that popular, but currently, it is a hot topic. I had always insisted that machine learning cannot be ignored, ten years ago itself. It has some interesting maths to it as well. But, the core focus
is on the heuristics. There is not much importance make up your fundamentals strong in Physics, to proving time complexity and error bounds. But Chemistry and Mathematics. more algorithm people are entering into Machine Learning and trying to prove things formally there. So as you mention PCM, we are taught this in our first year at IIT Bombay and lot of us feel that these How was IIT Bombay back then, and how difficult are not useful. What is your take on it? or easy was it to get into IITs then? This attitude of students is completely wrong. Ten IIT’s were not known to too many people back years ago, when linear algebra was taught, everythen, so it was easier to get into it, as compared to one felt that it is not useful for computer sciennow. I have heard that it is too difficult to crack jee tists. But now, it is used everywhere in theoretical without coaching classes, but I would say spending computer science including machine learning. For too much time in coaching classes is not worth it. example, the first step of PageRank algorithm involves finding Eigenvectors of a huge matrix. This What were you involved in IIT Bombay apart from must be taken as makeup of your intelligence and academics? learning, and questions such as whether I will be I was from H2. We used to play volleyball regular- using it in my job or not are irrelevant. ly. And we used to go to movies occasionally. The first TV came into our hostel in 1974, the year in P = NP or P != NP? which I graduated. It would be harsh to say P != NP. Over the years, people who are trying to prove things are possible I have seen a lot of my friends dropping out from have been more successful than the ones who are their BTPs after 1st semester, citing that problem trying to prove that some things can not be done. wasn’t well-defined enough or advisor wasn’t that As a person who works on algorithms, I would say good. What is your take on this? we don’t have enough evidence to believe that P != For an undergraduate thesis, you must read about NP different research areas, read papers, talk to professors and pick up a problem that excites you. Would you rather fight 100 duck-sized horses or There are two kinds of advisors, one who leaves the one horse-sized duck? student on his own to explore and gives complete In the first few years of research, it’s better to try freedom, and the other one who suggests ideas and a deep hard problem. Even if we don’t solve it, we just encourages the student. During my days at IIT will get good insight into the area. If we cannot Bombay and Cornell, my advisors were of the lat- make good progress, then it’s better to shift to easer type. And even I work the same way. I listen to ier problems. ideas my students have, and encourage them to pursue it. Should Undergraduate Thesis be made mandatory? In the first few years of research, Not necessarily, people who want to pursue reshould go on and do their thesis, but not evit’s better to try a deep hard prob- search eryone will want to or be interested in it. So courslem. Even if we don’t solve it, we es instead of a thesis are fair enough.
will get good insight into the area.
We have seen some graphs plotting the enthusiasm of Ph.D. students initially increasing, but then decreasing with time and some people drop out after reaching a critical stage. What do you think about it? Plotting such curves is the job of management students. These are plotted based on social activity and need not be always true. Individually, for me, Ph.D. is about sustainability and trying out ideas to solve one big problem.
How important are soft skills for students? IITs usually do not emphasize on non technical skills, which I feel are really important. Presentation skills and writing skills are important for someone who is getting into research. Writing skills can be improved by practice. Undergraduate thesis is a good starting point. It’s important to go out and socialise. It’s ok to spend some time on facebook, but we should not spend too much time on it. I call it facelessbook because we can not see the other person’s face.
How have things changed in Computer Science So when do you think, during Ph.D., one must stop since your times ? I have a son who is a Computer Scientist who is trying really hard problems? When you start your Ph.D., try to come up with working at facebook currently. These days, lot of good ideas for the problem, and try seeing if it information is readily available and we are aware works. You can go on up to two years. When in of all the things that are going around the world, the third year and you are still stuck, search for which was not the case during our time. We had a month for good ideas to your problem, I mean to read from magazines. The amount of hard work good ideas, not solution, if you are out of good people put in has decreased from earlier. We used to spend a lot of time talking with each other face ideas, it is time to think twice about the Ph.D to face. How did study at IIT Bombay help in further life? IIT Bombay has helped me to get to know more What is your advice to the current students? people and socialize. Also being from EE back- Whatever you are learning now are the foundaground, doing courses on graph theory helped me tions. Education is not about teaching you what later in career. Don’t consider first year courses as exactly you will be doing later, it teaches you the useless, that is not what education is about. They fundamentals.
Page 101
A Guide to UG Minors A Minor is an additional credential, a student will earn if he/ she does minimum 30 credits worth of additional learning in a discipline other than his/ her major discipline. Most of the academic units in the Institute will offer minors in their disciplines, and will prescribe a specific set of courses and/ or other activities like projects which are necessary for earning a minor in that discipline. Note that, courses equal in content to any of these specified courses from the same dept. can be taken as a minor course with the approval of the concerned HOD. Minor courses are allocated to students through a pre-registration process before the starting of every semester and the allocation for every minor course is done on the basis of CPI of the student as the seats are limited in every minor course. If you miss out on the allocation of a minor course due to CPI constraints you can avail a position in the wait list for a course so that if some student drops the course you can take up the seat (Wait list allocation would be first come first serve though).
you select a minor, try to find what you are truly interested in. If you are looking for a research topic of interest then browse through the departmental web pages of each department and search on internet as to what each of it is and during this process you will definitely stumble across your actual interest, once you are confident about your interest then you should probably look for a minor that best fuels your interest and go ahead with it. Once you are done with this, you will definitely enjoy learning and become what you aim for.
cation, the programme gives a general intuition for the statistics making it helpful. Doing a minor in stats will be an advantage for students venturing into quant jobs where they are required to draw statistical inferences from large amounts of data. Though stats minor is deemed to be beneficial for students interested in machine learning, the statistics classes don’t cover any of the topics/ algorithms/techniques covered in a fundamental machine learning course except for very basic regressions or at best multiple regression.
Some of the relevant minors for our department are -
Electrical The aim of the minor courses offered by the Electrical Engineering department is to give an overview of the basic subjects in the field: (1) Communication and Signal Processing (2) Control and Computing (3) Analog and Digital Circuit design (4) Device Physics. The idea of memory elements of ROM and PLA are required as these are the basic building blocks of storage of many computational devices. In this age where processing is autonomous, the elementary knowledge of finite state machines is useful for a logical approach to programming.
Maths The minor programme in Mathematics is designed to allow students to pursue a more rigorous education in mathematics. A student completing the minor will achieve a better understanding of the mathematical techniques used in the sciences and engineering disciplines and will also be well Why a Minor? equipped for further advanced mathematical edTo become a successful engineer or scientist, you ucation. need supplements other than your own core courses in this competitive world, which is what your Coming to relevance of maths minor for our deminor courses are. Minor courses enable you to partment majors, Courses in algebra are helpful in learn something that you are passionate about and areas like cryptography and basic math courses are something out of the box. A minor degree adds a foundation for a lot of theory in CS, but many a value to your major degree and will enable you to times you will have a deeper understanding that get opportunities in the field you have completed may not be helpful depending on your goals. In your minor or even help you to shift to the field in general, we can pick up the math needed in our which you have done your minors in future. Your courses when need comes, and opting a math miminor degree will give you sufficient knowledge to nor for a foundation in CS isn’t really helpful. But enable you to do research in an interdisciplinary students with interest in math will like the minor field and even pursue your higher studies in the courses regardless of the profs and the presentasame, abroad at elite institutions. tion. How to select a Minor? You should go about selecting your minor degree in such a way that it either suits your major degree in a research oriented interdisciplinary aspect or in a generic way to any engineer or scientist. Before
FUSS The Faculty Unplugged Seminar Series (FUSS), managed by Prof Shivaram Kalyanakrishnan hosts talks by departmental faculty that are intended for a broad audience. Students are especially encouraged to attend FUSS to learn about recent developments in computer science. The main goal of the seminar series is to learn about the research done by the faculty and their research groups in our department. By attending the seminars, the students will be able to make an educated choice regarding their research area and research guides. Also, the hope is that the faculty members and the PhD students will be able to strike collaborations across research groups. The abstracts of the talks and further information can be found at https://www.cse.iitb.ac.in/~fuss/
Statistics Statistical data analysis, modelling and inference are required in almost all areas of the natural and social sciences, technology and industrial research. Although the content might not have direct appli-
To a large extent the minor goes in parallel with certain courses in our dept like signals and systems with DIP, CV and the likes, Digital Devices and Circuits with Digital Logic Design. You can select any minor course from any department but to draw the complete benefit out of it you should look for overlap between them and your discipline so that it supplements your learning, however there is no harm in learning something different and new and you may always try out something different. However, students who have diverging interests should not bind themselves into doing only one minor programme, instead they can do various courses from different disciplines and not worry much about losing the minor degree.
The Fault In Our Pages The Fault in our pages. Level-of-indirection isn’t limited to computer systems, We observe it every year, each semester, Every professor thinks that the space of time we possess Is solely owned by them, They don’t pay attention to submission clashes, Because they don’t know how hard it is To relocate our time while managing the mapping Of assignments to their deadlines But it’s time, professors need to know, That whatever they see is just an illusion,
That the space in which their assignments reside is virtual. They must know, that operating the system becomes more and more difficult at times When the semester is coming to an end, Too many processes, Too many active pages, Too many page faults, We are afraid of thrashing.
An Epigram By Frustrated Student
Page 110
InfoLab, where Data meets Intelligence This is the first part of a two-part series of articles on the Informatics Lab. Data is perhaps the most valuable asset in the age of the internet. And databases are an indispensable tool for managing this data. Data mining, on the other hand, helps make sense of this data. The research in the area of databases at IIT Bombay started in the early 1980s with Professors D B Phatak, N L Sarda, S Sudarshan, S Seshadri and later Krithi Ramamritham spearheading the research. Later, the group expanded to become a data management group with the joining of Professors Soumen Chakrabarti and Sunita Sarawagi. The Informatics Laboratory, or InfoLab, as it is now called includes Professors S Sudarshan, Soumen Chakrabarti, Sunita Sarawagi and Ganesh Ramakrishnan. The major areas of research include databases, data mining and Information retrieval. Some of the interesting projects that the lab has worked on in the area of databases are: XData How would you check whether the complex SQL queries you wrote for your application is correct? The standard approach is to run your query on some small fixed datasets(s) and to see if the output matches the expected output or not. Since the datasets are fixed, subtle errors for your specific SQL query might not get caught. The XData system (name inspired from X-Men), developed at Informatics Lab, solves this problem by automatically generating datasets specific to each query taking into account your query and the database tables. The datasets are designed such that common errors (called mutations) in the query can be caught. An automated query grading tool had also been developed that uses the generated datasets, based on the query provided by the instructor, to check if the results of the student’s query matches to that of the instructor’s. The query grading tool has made the life of TAs who grade SQL queries much simpler. Students can get feedback as to why their query was marked incorrect by looking at the datasets on which their query failed. The grading tool is available for download and many academics from around the world have expressed their interest in using it.
Holistic Optimization Most enterprise, mobile, or web applications use a database to store user and application data. These applications are usually written in an imperative language like Java or C++, and access data located in a database using SQL queries (or other interfaces such as Hibernate, which also use SQL internally). These queries are sent to the database server for execution (usually, over a network), and the results are returned for further use inside the application. Typically, query invocations from within an application can be costly, as they block program execution until the results are available. Traditionally, SQL queries embedded inside imperative programs were treated as black boxes, so optimization of these applications happened on two separate fronts: (i) optimization of imperative parts of the program by the programming language compiler, and (ii) optimization of SQL queries by the database query optimizer. However, optimizing individual components does not necessarily ensure optimal performance of the application as a whole. Many opportunities for optimization arising due to the interaction between the application program and the database are missed, because both the compiler and the query optimizer are working in isolation, unaware of the context in which a query is fired.
They developed a system called DBridge that rewrites programs containing embedded SQL, by performing multiple optimizations to improve application performance and reduce the number of queries as well as an amount of data transferred over the network. These optimizations include replacing iterative invocation of queries inside loops with a single query to fetch all relevant data in “bulk” (called batching), asynchronous submission of queries (to overlap execution of multiple queries and program statements), prefetching query results prior to actual query invocation, rewriting parts of imperative code to SQL, etc.
Asynchronous Submission
The process of rewriting is completely automated, the rewritten program is readable and contains minimal changes from the original program, and program semantics are preserved. Sounds like a good deal for developers? If you are interested in the underlying techniques, people, and publications by the group, head on to the DBridge website at http://www.cse.iitb.ac.in/ infolab/dbridge/.
Query Optimization The initial focus of this project was Multi-Query Optimization and parametric query optimization. Over the past few years, the Holistic Optimization It aimed at reducing query processing time and group at Infolab has been looking at the use of stat- other relevant parameters for a bunch of queries ic program analysis to identify such opportunities processed together (multiquery) and for different to optimize interactions between the application parameter values (parametric). InfoLab has many publications in this area. In the last half decade opand the database. timization effort is directed towards highly parallel systems hosting large volumes of data (so-called big data). InfoLab boasts Volcano/Cascades based Pyro/PyroJ optimizer which uses Hyracks as execution engine but can easily be integrated with other engines with very little effort. The latest work in this area is targeted towards optimizing response time of a query. This has significant applications in Cloud-centric architectures where a user is charged on the basis of time for which resources were used. PyroJ optimizer is being equipped with parallelization extensions, pruning strategies to reduce plan alternatives size. Also, it is being contemplated that the optimizer can be extended to support other widely used execution engines e.g. Apache Spark.
Batching
Apart from the above mentioned projects, there are several other interesting projects going on in InfoLab, a few of them being BANKS : Keyword Searching and Browsing of Databases, World Wide Tables, DDD : Dynamic Data Dissemination, ALIAS: An Active Learning Led Interactive Deduplication System. The next part of this article explores them in detail.