IT stuff

Page 1

2014 IT stuff

KTU 2014 0


Content The "Just In Time" Theory of User Behavior ............................................................................................. 3 The Rule of Three ...................................................................................................................................... 8 Oculus gives its first standing-up virtual reality demo with the new Rift Crescent Bay headset .. 11 Hands on at Oculus Connect 2014 .................................................................................................. 11 Down with the Bay ......................................................................................................................... 12 The demos ....................................................................................................................................... 12 Early verdict ..................................................................................................................................... 14 Hands on at Comic-Con and GDC 2014 ........................................................................................ 15 Hands on CES 2014 ......................................................................................................................... 20 Get low, low, low, low .................................................................................................................... 21 Wizard Chess ................................................................................................................................... 21 Early Verdict .................................................................................................................................... 23 In 5 Years, We Could All Have 'Digital Twins' That Make Decisions For Us ....................................... 24 Conversational Interfaces (CI) ......................................................................................................... 24 The ‘Valuecosm’................................................................................................................................. 25 The Future Of Privacy ......................................................................................................................... 26 The Dawn of the Age of Artificial Intelligence ................................................................................... 28 Thinking Machines, Available now ................................................................................................... 28 Billions of Innovators, Coming Soon ................................................................................................. 29 'Infinite Computing' and Beyond ..................................................................................................... 30 3-D printing takes shape........................................................................................................................ 32 Accelerated product-development cycles ................................................................................... 33 New manufacturing strategies and footprints ............................................................................... 33 Shifting sources of profit ..................................................................................................................... 34 New capabilities ................................................................................................................................. 35 Disruptive competitors ....................................................................................................................... 35 Will Gaming Save Education, or Just Waste Time? ............................................................................ 36 "Failing Up" ........................................................................................................................................... 37 Data Rich ............................................................................................................................................. 37 The Sweet Spot for Gaming .............................................................................................................. 38 Adapting to the Classroom............................................................................................................... 38 The Gaming World vs. the Real World ............................................................................................. 40 A Gaming Generation Grows Up ..................................................................................................... 41 Minecraft in Schools ........................................................................................................................... 41 The Research on Gaming ................................................................................................................. 42


Gaming vs. Gamification .................................................................................................................. 42 The Anatomy of a Large-Scale Hypertextual Web Search Engine ................................................. 44 Abstract ........................................................................................................................................... 44 1. Introduction ..................................................................................................................................... 44 1.1 Web Search Engines -- Scaling Up: 1994 - 2000 .................................................................... 44 1.2. Google: Scaling with the Web............................................................................................... 45 1.3 Design Goals ............................................................................................................................. 45 2. System Features .............................................................................................................................. 46 2.1 PageRank: Bringing Order to the Web .................................................................................. 46 2.2 Anchor Text ............................................................................................................................... 47 2.3 Other Features .......................................................................................................................... 48 3 Related Work ................................................................................................................................... 48 3.1 Information Retrieval ................................................................................................................ 48 3.2 Differences Between the Web and Well Controlled Collections ....................................... 48 4 System Anatomy.............................................................................................................................. 49 4.1 Google Architecture Overview .............................................................................................. 49 4.2 Major Data Structures .............................................................................................................. 50 4.3 Crawling the Web .................................................................................................................... 52 4.4 Indexing the Web ..................................................................................................................... 53 4.5 Searching................................................................................................................................... 54 5 Results and Performance ............................................................................................................... 55 5.1 Storage Requirements ............................................................................................................. 56 5.2 System Performance ................................................................................................................ 57 5.3 Search Performance ................................................................................................................ 57 6 Conclusions ...................................................................................................................................... 58 6.1 Future Work................................................................................................................................ 58 6.2 High Quality Search.................................................................................................................. 58 6.3 Scalable Architecture .............................................................................................................. 59 6.4 A Research Tool ........................................................................................................................ 59 7 Acknowledgments .......................................................................................................................... 59 Word list ................................................................................................................................................... 60


The "Just In Time" Theory of User Behavior I've long believed that the design of your software has a profound impact on how usersbehave within your software. But there are two sides to this story: 

Encouraging the "right" things by making those things intentionally easy to do.

Discouraging the "wrong" things by making those things intentionally difficult, complex, and awkward to do.

Whether the software is doing this intentionally, or completely accidentally, it's a fact of life: the path of least resistance is everyone's best friend. Learn to master this path, or others will master it for you. For proof, consider Dan Ariely's new and amazing book, The (Honest) Truth About Dishonesty: How We Lie to Everyone – Especially Ourselves.

3


Indeed, let's be honest: we all lie, all the time. Not because we're bad people, mind you, but because we have to regularly lie to ourselves as a survival mechanism. You think we should be completely honest all the time? Yeah. Good luck with that. But these healthy little white lies we learn to tell ourselves have a darker side. Have you ever heard this old adage? One day, Peter locked himself out of his house. After a spell, the locksmith pulled up in his truck and picked the lock in about a minute. “I was amazed at how quickly and easily this guy was able to open the door,” Peter said. The locksmith told him that locks are on doors only to keep honest people honest. One percent of people will always be honest and never steal. Another 1% will always be dishonest and always try to pick your lock and steal your television; locks won’t do much to protect you from the hardened thieves, who can get into your house if they really want to.The purpose of locks, the locksmith said, is to protect you from the 98% of mostly honest people who might be tempted to try your door if it had no lock. I had heard this expressed less optimistically before as 10% of people will never steal, 10% of people will always steal, and for everyone else … it depends. The "it depends" part is crucial to understanding human nature, and that's what Ariely spends most of the book examining in various tests. If for most people, honesty depends, what exactly does it depend on? The experiments Ariely conducts prove again and again that most people will consistently and reliably cheat "just a little", to the extent that they can still consider themselves honest people. The gating factor isn't laws, penalties, or ethics. Surprisingly, that stuff has virtually no effect on behavior. What does, though, is whether they can personally still feel like they are honest people. This is because they don't even consider it cheating – they're just taking a little extra, giving themselves a tiny break, enjoying a minor boost, because well, haven't they been working extra specially hard lately and earned it? Don't they of all people deserve something nice once in a while, and who would even miss this tiny amount? There's so much! These little white lies are the path of least resistance. They are everywhere. If laws don't work, if ethics classes don't work, if severe penalties don't work, how do you encourage people to behave in a way that "feels" honest that is actually, you know, honest? Feelings are some pretty squishy stuff. It's easier than you think. My colleagues and I ran an experiment at the University of California, Los Angeles. We took a group of 450 participants, split them into two groups and set them loose on our usual matrix task. We asked half of them to recall the Ten Commandments and the other half to recall 10 books that they had read in high school. Among the group who recalled the 10 books, we saw the typical widespread but moderate cheating. But in the group that was asked to recall the Ten Commandments, we observed no cheating whatsoever. We reran the experiment, reminding students of their schools' honor codes instead of the Ten Commandments, and we got the same

4


result. We even reran the experiment on a group of self-declared atheists, asking them to swear on a Bible, and got the same no-cheating results yet again. That's the good news: a simple reminder at the time of the temptation is usually all it takes for people to suddenly "remember" their honesty. The bad news is Clippy was right.

In my experience, nobody reads manuals, nobody reads FAQs, and nobody reads tutorials. I am exaggerating a little here for effect, of course. Some A+ students will go out of their way to read these things. That's how they became A+ students, by naturally going the extra mile, and generally being the kind of users who teach themselves perfectly well without needing special resources to get there. When I say "nobody" I mean the vast overwhelming massive majority of people you would really, really want toread things like that. People who don't have the time or inclination to expend any effort at all other than the absolute minimum required, people who are most definitely notgoing to go the extra mile. In other words, the whole world. So how do you help people who, like us, just never seem to have the time to figure this stuff out becase they're, like, suuuuper busy and stuff? You do it by showing them ‌ 

the minumum helpful reminder 5




at exactly the right time

This is what I've called the "Just In Time" theory of user behavior for years. Sure, FAQs and tutorials and help centers are great and all, but who has the time for that? We're all perpetual intermediates here, at best. The closer you can get your software to practical, useful "Just In Time" reminders, the better you can help the users who are most in need. Not the A+ students who already read the FAQ, and studied the help center intently, but those users who never read anything. And now, thanks to Dan Ariely, I have the science to back this up. Even something as simple as putting your name on the top of a form to report auto insurance milage, rather than the bottom, resulted in a mysterious 10% increase in average miles reported. Having that little reminder right at the start that hey, your name is here on this form, inspired additional honesty. It works. Did we use this technique on Stack Overflow and Stack Exchange? Indeed we did. Do I use this technique on Discourse? You bet, in even more places, because this is social discussion, not technical Q&A. We are rather big on civility, so we like to remind people when they post on Discourse they aren't talking to a computer or a robot, but a real person, a lot like you. When's the natural time to remind someone of this? Not when they sign up, not when they're reading, but at the very moment they begin typing their first words in their first post. This is the moment of temptation when you might be super mega convinced thatsomeone is Wrong on the Internet. So we put up a gentle little reminder Just In Time, right above where they are typing:

Then hopefully, as Dan Ariely showed us with honesty, this little reminder will tap into people's natural reserves of friendliness and civility, so cooler heads will prevail – and a few 6


people are inspired to get along a little better than they did yesterday. Just because you're on the Internet doesn't mean you need to be yelling at folks 24/7. We use this same technique a bunch of other places: if you are posting a lot but haven't set an avatar, if you are adding a new post to a particularly old conversation, if you are replying a bunch of times in the same topic, and so forth. Wherever we feel a gentle nudge might help, at the exact time the behavior is occurring. It's important to understand that we use these reminders in Discourse not because we believe people are dumb; quite the contrary, we use them because we believe people are smart, civil, and interesting. Turns out everyone just needs to be reminded of that once in a while for it to continue to be true.

7


The Rule of Three Every programmer ever born thinks whatever idea just popped out of their head into their editor is the most generalized, most flexible, most one-size-fits all solution that has ever been conceived. We think we've built software that is a general purpose solution to some set of problems, but we are almost always wrong. We have the delusion of reuse. Don't feel bad. It's an endemic disease among software developers. An occupational hazard, really. If I have learned anything in my programming career, it is this: building reusable software, truly reusable software, is an incredibly hard problem – right up there with naming things and cache invalidation. My ideas on this crystallized in 2004 when I read Facts and Fallacies of Software Engineering for the first time. It's kind of a hit-or-miss book overall, but there are a few gems in it, like fact #18: There are two "rules of three" in [software] reuse:  

It is three times as difficult to build reusable components as single use components, and a reusable component should be tried out in three different applications before it will be sufficiently general to accept into a reuse library.

Yes, this is merely a craftsman's rule of thumb, but the Rule of Three is an incredibly powerful and effective rule of thumb that I have come to believe deeply in. It's similar to the admonition to have at least one other person review your code, another rule of thumb that is proven to work. To build something truly reusable, you must convince three different audiences to use it thoroughly first. OK, so you built a solution that scratches your itch … but does anyone else care? How many other people have the problem that your software or website addresses? How many other competing solutions are there to choose from? Outside of your personal patient zero case, can you convince anyone to willingly, or even enthusiastically, adopt your solution? That's your first hurdle. Can you even get to number one? How deeply do I believe in the Rule of Three? So deeply that I built two whole companies around the concept. With Stack Overflow, we didn't set out to build a general purpose Q&A engine. We only wanted to solve the problem of programmers looking for fast, solid technical answers to their programming problems, instead of the endless pages of opinions and arguments they usually got. Oh yeah, and also to deal with that hyphenated site. One of the greatest pleasures of my life is meeting programmers that have never heard of this hyphenated site now. I hope you can forgive me, but I mentally superimpose a giant Dubya-style "Mission Accomplished" banner over their heads when they say this. I grin a mile wide every time. We launched Stack Overflow to the public in August 2008. It was such a runaway early hit that I started to get curious whether it actually would work for different audiences, even 8


though that was never the original idea. But we decided to play the six degrees of Kevin Bacon game and take some baby steps to find out. Less than a year later we had Stack Overflow for programmers, Server Fault for system administrators, and Super User for computer power users – the full trilogy. Three sites with three distinct audiences, all humming right along. One customer or user or audience might be a fluke. Two gives you confidence that maybe, just maybe, you aren't getting lucky this time. And three? Well, three is a magic number. Yes it is. Once we proved that the Stack Overflow engine could scale to these three distinct communities, I was comfortable pursuing Stack Exchange, which is now a network of over 100 community-driven Q&A sites. The programming audience derived assumptions that the engine was originally designed around means it can never scale to all communities – but for communities based on topics that can be understood via questions about science, facts, and data, there is no finer engine in the world. Not that I'm biased or anything, but it's stone cold truth. Don't believe me? Ask Google. When we launched Discourse in February, I had zero illusions that we had actually built workable general purpose forum software, even after eight months of hard work. That's why the "buy it" page still has this text at the top: Unfortunately, you can't [buy Discourse] … yet. Our immediate plan is to find three great partners willing to live on the bleeding beta edge and run forums with us, so that we can be confident we've built a discussion platform that works for a variety of different communities. We promise to do everything we can to host your forum and make it awesome for two years. In return, you promise to work with us on ironing out all the rough edges in Discourse and making sure it scales successfully – both socially and technologically – to those three very different audiences. Hey, there's that magic number again! Even now, months later, we're not even pretending that we have open source discussion software that works for most communities. Hell, the FAQ literally tells you not to use Discourse. Instead, we're spending all our effort slowly, methodically herding the software through these three select partners, one by one, tweaking it and adapting it for each community along the way, making sure that each of our partners is not just happy with our discussion software butecstatically happy, before we proceed to even tentatively recommend Discourse as any kind of general purpose discussion solution. Because I worship at the altar of the Rule of Three, it's pretty much been my full time job to say "no" to people every day for the last 6 months: 

Hey, Discourse looks great, can you host an instance for us?

Sorry, not yet. Probably in 2014! 9


We desperately need great forum software for our community! Can you help us set up Discourse?

Sorry, I can't. We're focused on building the software. It is 100% open source, and we do have a pretty good install guide if you want to grab the code and set it up!

We'll pay you to host Discourse for us! Shut up and take my money!

Sorry, I wish I could. It's not complete enough yet, and the last person I want to disappoint is a paying customer, and we don't even have a billing system! We plan to get to hosting in early-ish 2014.

So yeah, I won't lie to you – I'm basically a total bummer. But I'm a total bummer with a plan.

The solution we constructed in Discourse was a decent start, but woefully incomplete – even wrong in some areas. The only way we can figure this out is byslowly running the solution through its paces with our three partners, to live in the same house of software they do as roommates, to walk alongside them as they grow their discussion communities and do everything we can to help build it into a community we enjoy as much as everyone else does. And when there were only one set of footsteps in the sand, well … that's because we were carrying you. We haven't made it all the way through this process yet. We're only on partner #2; it takes the time it takes. But thanks to the Rule of Three, I'm confident that by the time we finish with partner #3, we will finally have a truly reusable bit of general purpose open source discussion software to share with the world – one that I can recommend unhesitatingly to (almost) anyone, because it'll probably work for their community, too. So the next time you think "I've built a reusable thing!", stop, and think "how can I find three users, customers, or audiences, to prove that I've built something reusable?" instead.

10


Oculus gives its first standing-up virtual reality demo with the new Rift Crescent Bay headset Hands on at Oculus Connect 2014 Update September 22 2014: Oculus has revealed its latest Rift prototype, codenamed Crescent Bay, and it's undoubtedly the most impressive virtual reality device yet. Oculus held its first ever Oculus Connect virtual reality conference in Hollywood on September 20, and the growing company used the opportunity to show off its newest Oculus Rift prototype: Crescent Bay. The lighter, more comfortable Crescent Bay Rift prototype has beefed-up specs and, for the first time, integrated headphones designed by the engineers at Oculus VR. But unlike with past prototypes like DK2 or "Crystal Cove," Oculus is being less than upfront about Crescent Bay's specifications. They bumped the last headset up to 1080p, and Crescent Bay certainly appears to have an even higher resolution, but the company won't confirm as much. That's because they want to focus on the Oculus Rift as a full package rather than as a simple amalgamation of its various components, all of which will no doubt change by the time the consumer version Rift - CV1, as the company refers to it - is finally ready.

Oculus Rift has never looked better "It's the combination of the resolution with the optics, with the mechanical engineering and industrial design of this thing, that allow for it to look like it's a higher resolution, even though it may or may not be," Oculus Vice President of Product Nate Mitchell told TechRadar. "The synergy of all the components together is what takes it up a notch." 11


What Oculus instead focused on with the Crescent Bay demos it showed off at Oculus Connect was the level of "presence" the Rift can make users feel under optimal conditions and with content designed specifically to be as immersive as possible.

Down with the Bay Whereas every past official Oculus Rift demo took place with users seated, this time the company had journalists and other Oculus Connect attendees standing up and walking around with the headset strapped to their faces. In interviews afterward, Mitchell and Oculus VR founder Palmer Luckey emphasized that the stand-up Rift experience is not the experience that they're stressing for consumers, but was simply meant in this case to crank up the immersion as high as possible. Mitchell called this demo "conceptual," and Luckey said "the Oculus Rift is a seated experience. It's very dangerous to stand up."

Crescent Bay is lighter, despite the added headphones As true as that may be - you probably shouldn't try walking blindly around your home while the Oculus Rift is tricking your brain into thinking you're on a different planet or in a submarine - the stand-up experience demonstrated with Crescent Bay at Oculus Connect was undoubtedly the most immersive and impressive virtual reality demo ever. The experience consisted of about a dozen demos developed by Oculus's new internal content team. Luckey said these demos are the cream of the crop as far as what Oculus has developed, and many more experiences were scrapped or sidelined. Over several minutes they showed off a variety of potential Rift applications, eliciting a number of very different responses.

The demos The Crescent Bay demos took place in a highly controlled environment: a small, empty room with four plain, grey walls. A camera - larger than the one used with Crystal Cove 12


was mounted on the wall, tracking users' positions as they walked around a small, black mat on the ground. By tracking the Crescent Bay prototype's white-studded surface (these nubs are now located all around the headset, including on the back of the strap) this camera can accurately understand your position in the room, allowing you to walk around freely in virtual space. Not to get too dramatic, but it really is a mind-blowing experience. The demos themselves consisted of several non-interactive environments, from a creaking submarine chamber to a sunny museum in which a life-sized (looked that way at least) TRex sniffs around and ultimately steps directly over you.

There was no detectable latency during these demos These short experiences lasted less than a minute each. One highlight took place at the top of a skyscraper in a steampunk, BioShock-inspired city. Standing up in that grey room, you could walk to the edge of the virtual roof and look down hundreds of feet to the traffic below. And as with the T-Rex's roar, the Crescent Bay Rift's attached headphones technically stereo, but with simulated surround sound - made the experience seem all the more real with traffic noises, hissing wind and more. That demo called to mind the Game of Thrones "Ascend the Wall" Oculus Rift experience designed by visual effects firm Framestore. Used by HBO at promotional events like the premiere of Game of Thrones' fourth season, Ascend the Wall put users inside an actual metal cage - replicating the elevator from the series - that rumbled and blew cold air at them as they virtually ascended to the top of the show's fictional 800-foot-high Wall. The more points of feedback these demos are able to simulate, the more "presence" users feel, Oculus contends. These feedback points range from that feeling of cold air being blown in your face - which is not very practical - to ambient sound, which is practical - to something as simple as standing up, which is not ideal for every situation but nevertheless ramps things up considerably. 13


"You stand up, and suddenly your balance kicks in, and you're like, 'woah!' and you feel your weight shift subconsciously," Mitchell explained to us after the demo. "When you stand up suddenly [your subconscious] is totally engaged."

Another angle on Crescent Bay All of these demos showed off the ways that standing up can enhance virtual reality. For example, within environments that appear small, like a tiny cartoon city or a sci-fi terrain map that could be used for a strategy game, walking around makes you feel like you're playing an Ender's Game-like simulation. But one of the most fun demos involved simply standing and facing a curious alien on a distant planet. As the user bends down and moves around to better examine the alien, it does the same to the user, clucking in a strange tongue. You actually get the sense that it's talking to you, and it's easy to see how this type of interaction could be used to make video games better. Yet another demo had you staring into a mirror, with your head represented by a floating mask. No matter how hard I tried or how fast I moved, I couldn't detect a shred of latency as the mask in the mirror reflected my every movement. Again, the grey room in which this took place was a more controlled environment than most people's homes, but it was nevertheless impressive.

Early verdict The final experience - and the most game-like - showed off exactly how cool an Unreal Engine 4 Oculus Rift game might be. Futuristic soldiers shot at a hulking robot as it fired right back, explosions sending cars flying in slow motion as the point of view crept slowly down the street toward the machine. It felt natural to physically dance around, dodging incoming bullets and ducking under flipping vehicles, no matter how ridiculous I might have looked to onlookers who couldn't see what I was seeing. 14


This could legitimately be the future of gaming - if Oculus can figure out the input problem. Although many Oculus Rift demos have used an Xbox 360 controller, there's still no standard input device for Rift games. Like Crescent Bay's integrated audio, though, this is a problem Oculus is actively working on. "There's a very real possibility that we would have come to the conclusion that audio is something we were going to leave to third parties," Luckey told us at the conference. "We came to the conclusion that we had to do it ourselves, and we had to do a good job, because it was so important to get right. I think input is in that camp." That's just one of the problems Oculus needs to solve before the Rift is ready for consumers, and given that Crescent Bay is just the latest of many prototypes it's unclear when it will be. But when Oculus Rift CV1 is ready, it has the potential to change entertainment forever.

Hands on at Comic-Con and GDC 2014 Update: Oculus Rift Dev Kit 2 is on its way to game makers and it's being used for by movie studios. We revised our hands-on review and added facts about its Galaxy Note 3 screen and Mac support. Hands on impressions by Matt Swider and Alex Roth As Oculus Rift Dev Kit 2 starts shipping to pre-order customers, we got more face time with the virtual reality headset at PAX Prime and Comic-Con. Codenamed Crystal Cove, the updated Oculus Rift DK2 costs $350 (about ÂŁ207, AU$373). That's $50 (about ÂŁ30, AU$53) more than the first-generation developer kit. However, the improved specs make it well worth the price bump if you're a developer with a passion for cutting-edge technology and the patience for beta hardware. The face-worn display outfits developers with an HD screen that's 1080p or 960 x 1080 per eye. It finally meets our next-generation gaming needs. Believe it or not, the Oculus Rift DK2 display actually uses the 5.7-inch Super AMOLED panel from the Samsung Galaxy Note 3. Behind its rubber casing is same exact front panel, "Samsung" logo and all. This makes sense. Oculus was rumored to be working with Samsung on the South Korean electronics giant's own virtual reality headset. Whether or not that pans out remains to be seen. Despite both the physical and theorized Samsung ties, Mac compatibilityhas been added to the Oculus Rift DK2, making good on the start-up company's promise to support Apple machines. All five OS X game developers are rejoicing right now. Oculus Rift DK2 drops the first interation's control box in favor of integrating the guts into the headset itself. Only a single cable - HDMI and USB woven together - dangls from your face.

15


The new kit also comes with a motion-tracking camera, which allows for greater movement within the world of the Rift. It looks a bit like a webcam, and a lot like a PlayStation Eye camera from the PS3 days. It features a blue "on" light and an Oculus logo, but its true power isn't visible to the naked eye. It uses forty infrared LEDs on the headset to track your head movements and integrate them into the game. These LEDs were visible on the version we tried at CES 2014, but not anymore.

The new Crystal Cove, looking less spotty than before In the demos we saw at GDC 2014, this meant players could lean in for a closer look at ingame objects and characters. These were the same demos we saw at CES, with the exception of a new one by Epic Games, which integrated the player into the game a unique way. The game was a one on one battle between two sword and shield wielding avatars. It takes place in a living room, where players can see representations of themselves seated in the room, controller in hand. To keep an eye on the fight we had to swivel our head and crane our neck.

16


Still no support for in-game limbs The Rift was a surreal experience as always; when our opponent turned his head or leaned forward it gave his neck a stretched, snake-like appearance. And when one of the battling avatars leapt up onto your lap, you half expect to feel his little feet on your legs.

The camera is key to the Rift's head tracking If you've used the previous Rift, know that Crystal Cove is a night and day difference. The higher resolution makes all the difference in the world; it's like going from Skyrim on a fouryear-old PC to one from last year.

17


Finally, a wearable that tracks more than your heart rate Note that we say last year; the Oculus Rift still isn't sporting visuals that you could call next gen. There are still jaggedly rendered objects, but the immersive nature of the experience trumps graphics any day, and is one you need to see to believe.

Step into the super mind of the X-Men's leader

18


Comic-Con 2014 provided a different sort of experience - with entertainment at the forefront and maybe one we can expect more of now thatFacebook owns Oculus VR. Both Twenty Century Fox and Warner Bros. were backing new Oculus Rift Dev Kit 2 units at the cosplay-filled San Diego convention with demos for their X-Men and Into the Storm films. The X-Men Cerebro Experience provided the more surreal experience as attendees slipped into the wheelchair and saw through the eyes of mutant leader Professor Charles Xavier. He, fittingly, donned the just-as-snug brain amplifying mutant detector Cerebro on his own head. The concept involved seeking the shapeshifting mutant Mystique by looking 360 degrees in any direction. She was hiding in a Comic-Con crowd that was fictitious and barren - it would have been cooler if it used augmented reality here. The actual hunt was automated and fairly boring, but Professor X's replica wheelchair at the Fox booth provided developers with the opportunity to predict the location of our limbs and torso. It accurately overlayed his body onto our own. Obviously, this demo didn't call for much movement and that worked to the movie studio's advantage. It could easily trick your mind into thinking that the Professor's subtle finger tap on the armrest was your own with a "Wait, I didn't just do that!"

Into the Storm upped the energy level with simulated tornado winds inside a small glass both built by Warner Bros. Through the first-person perspective, we saw three characters 19


hunker down behind a gated sewer entrance, truck-sized debris smash against its ironclad bars and pipes burst with gushing water. It didn't have the advantage of a stationary wheelchair-bound character to map our bodies and there was no interaction whatsoever, but Warner Bros did aptly demo its new disaster movie with this terrifying scene recreation. It also messed up our hair. Both X-Men Cerebro Experience and Into the Storm also gave us insight into how bigname movie studios intend to use Oculus Rift to invent new ways of enjoying theatrical experiences. Video games were just the beginning.

Hands on CES 2014 Oculus Rift gets more impressive every time we see it, and the futuristic virtual reality headset's appearance at CES 2014 was definitely no exception. Since E3 2013 Oculus VR has gained impressive talent and raised an extra $75 million in funding, and the result is the Oculus Rift Crystal Cove prototype (named for a state park in southern California). It's significantly easier on the eyes than older versions of the headset and, by extension, closer than ever to the Rift's final, fully functional, consumer-facing form. The two game demos Oculus co-founder Nate Mitchell showed us in a private meeting room at CES were designed to showcase two new features: positional head-tracking and low persistence, both of which help make the virtual reality experience more immersive and address some users' complaints with the headset, including motion blur-induced nausea.

The head-tracking is the most obvious improvement. The new white studs on the Oculus Crystal Cove prototype's face are indicators that communicate your head's position to a new external camera, mounted near your monitor. As a result the full movements of your upper body, not just the sideways and up/down movements of your head, are detected and translated to the game world. 20


That means you can lean forward while playing CCP Games' extremely impressive 3D space-shooting game EVE: Valkyrie, bringing your in-game face closer to your space ship's various monitors and switches so you can better read their warnings and instructions. Since the very first demo Oculus Rift has inserted players into virtual worlds, and with this addition it's a more immersive experience than ever.

Get low, low, low, low Second and more subtle is the low persistence, which makes the Oculus Rift's somewhat notorious motion blur a thing of the past. Now the graphics remain more clear and sharp even when you move your head around rapidly. There's still a tiny amount of blurring, but it's a massive improvement over the previous version of Oculus Rift. To prove it Mitchell turned low persistence off and then on as we moved around, and although the image became darker with it on, it almost totally alleviated what was previously one of the Rift's biggest issues.

The tech behind the low persistence is somewhat complex, but Mitchell explained the gist of it. Essentially the new "Crystal Cove" Oculus Rift's OLED display has zero latency, so it takes the pixels no time at all to change color. Even then, Mitchell said, there was some blurring, but Oculus alleviated it even further by programming the pixels to consistently but imperceptibly flicker on and off, only turning on when they have "good" data to display. That new OLED display is also full HD 1080p, just like the prototype Oculus showed off behind closed doors at E3 2013. That of course helps as well.

Wizard Chess We played EVE: Valkyrie at E3 2013 as well, though on the older, lower-resolution Oculus Rift. In 1080p, and with minimal motion blur and the new positional head-tracking, it was even more immersive now than it was back then - and that's saying something, because even that first time it was totally mind-blowing. 21


Piloting a space ship with an Xbox 360 controller while you look around the cockpit and target enemies with the motions of your head is one of the most impressive gaming experiences ever created. It feels like the first time you played Super Mario 64, or Halo, or Wolfenstein - completely fresh and like it has the potential to change the world of gaming. And right now it's only a demo.

The other software Oculus had at CES was a very basic defense game built by Epic Games in Unreal Engine 4. It's an evolution of one of the original Oculus Rift demos Oculus showed around - the one where users simply walked or floated around several beautiful but interaction-light Unreal Engine 4 environments, including a snowy mountain and the lava-filled lair of a scary-looking demon lord. Now, that demon sits on his throne across from you, the player, he being your apparent opponent. Around you is his cavernous, fiery lair, and before you is something like a 3D board game with moving pieces. He sends tiny dwarves marching inexorably toward your goal, and you press buttons on the Xbox 360 controller to fire arrows, cannonballs and flamethrowers at them.

22


There are two views: one overhead and one from closer to the game's level, almost like you're leaning down toward it to put on your thinking cap. And thanks to that positional head-tracking you can actually lean forward to peer into the game and examine the little dwarves up close. You can look into their faces as they're pinned with arrows and crisped with fire. The experience of playing a game inside a game world is not unique to Oculus Rift. This little game, though still very basic, could conceivably be a mini-game within some epic, sprawling RPG. But like with everything else, playing it on Oculus Rift makes you feel like you're really there.

Early Verdict Mitchell said the camera that enables the positional tracking may be only a temporary solution. But whatever Oculus settles on to make sure the final version of Oculus Rift features full six-point head-tracking will be included with the unit, whether that means bundling a camera in or something else. There's still no projected release date or final pricing for the consumer product that the Oculus Rift Crystal Cove prototype will eventually become, despite rumors of a Christmas 2014 goal that Mitchell would neither confirm nor deny. And the conspicuous indicator lights on the Crystal Cove's front aren't final either, Mitchell revealed, even if they do look kind of cool. Mitchell and his colleagues at Oculus VR seem to think the Rift still has a long way to go. That may very well be true, but the fact is the Oculus Rift is the coolest product in the world right now, and it gets better every time we see it.

23


In 5 Years, We Could All Have 'Digital Twins' That Make Decisions For Us “When you and I die, our kids aren’t going to go to our tombstones, they’re going to fire up our digital twins and talk to them,” John Smart tells Business Insider. As a futurist and founder of the Acceleration Studies Foundation, Smart uses many names for the technology he predicts — digital twin, cyberself, personal agent — but the concept stays the same: a computer-based version of you. Using various strategies for gathering and organising your Microsoft’s Cortana is an early version of this technology data, digital twins will mirror peoples’ interests and values. They will “input user writings and archived email, realtime wearable smartphones (lifelogs), and verbal feedback to allow increasingly intelligent and productive guidance of the user’s purchases, learning, communication, feedback, and even voting activities,” Smart writes. They will displace much of today’s information overload from regular people to their cyber-selves. And one day, Smart theorizes, these digital twins will hold conversations and have faces that mimic human emotion. “They will become increasingly like us and extensions of us,” Smart says. The concept might sound far-fetched. But consider that people often turn to a deceased friend or family member’s Facebook wall to grieve. People already form relationships with each other’s online presences. As computer science advances, the connection will only improve and strengthen — even with identities that aren’t real people. “Where we’re headed is creating this world in which you feel you have this thing out there looking after your values,” Smart says. For digital twins to reach their full potential, however, they require two important developments: “good conversational interfaces and semantic maps,” Smart explains.

Conversational Interfaces (CI) Ron Kaplan, a data scientist in Silicon Valley, already chronicled the necessity of CI for Wired last year. In his words, simply scheduling a flight could require 18 different clicks or taps on 10 different screens. “What we need to do now is be able to talk with our devices,” he wrote.

24


Smart couldn’t agree more. “With technology, we want things that enable us to use as much of our brains as possible at one time,” he adds. When you and I die, our kids aren’t going to go to our tombstones, they’re going to fire up our digital twins and talk to them. For example, with a single, spoken sentence, you could tell your personal agent you feel sick. It could reference your calendar or emails to determine when to make a doctor’s appointment. And when you arrive, you might not even need to fill out forms. Your personal agent would have looked at your hospital records and healthcare information for you — and then later relayed the outcome of any tests taken during your visit. While no company boasts such comprehensive abilities yet, many have started to implement similar technologies. Right now, Apple has Siri. Microsoft has Cortana. And in the summer of 2014, a program named “Eugene Goostman,” imitating a Ukrainian teen, passed the Turing Test (with some healthy scepticism). Smart, however, places great emphasis on an earlier cognitive machine: IBM’s Watson, which the company claims “literally gets smarter.” Watson’s performance on Jeopardy against champion Ken Jennings, shown below, convinced many sceptics of the emergence and optimization of CI. Vocal technologies like Siri, Cortana, and Watson already rely on semantic maps, tools that represent relationships in data, especially language. And companies constantly improve them. For example, a late 2013 Google update brought pronouns to the table — and Smart’s wife, for one, quickly noticed a difference. Walking in downtown Mountain View, his wife pulled out her phone, and as a test, asked Google, “Who is the President of the United States?” Naturally, her phone responded: “Barack Obama.” Next, Smart’s wife inquired: “Who is his wife?” Phone: “Michelle Obama.” Smart’s wife: “Where was she born?” Phone: “Chicago, Illinois.” Not only did Smart’s wife engage in conversation with her phone, it understood words like “he” and “she” — pronouns that refer to an antecedent earlier in the conversation. “Now, you don’t have to specify every little detail,” Smart explains. “Because the computer has some memory of previous exchanges and uses that as context.” Once we create “decent maps of human emotion,” Smart adds, digital twins will even have faces to help them communicate. They will smile or furrow their brows to show whether they understand or not. “But the next step is something I call a ‘valuecosm,’” Smart explains.

The ‘Valuecosm’ A valuecosm doesn’t just, for example, analyse all your emails and formulate a record of your interests and values. It allows a personal agent to interact in your stead based on this information. 25


“You’re reaching for a can of tuna at a grocery store in 2030,” Smart envisions. “And your bracelet gives a green arrow to move your hand a few inches to the left, from Bumble Bee to Chicken of the Sea or whatever.”

Your digital twin can help you choose products in-line with your values

You’d previously told your personal agent to watch for foods with high mercury levels or companies that over-fish the oceans. So this wearable piece of technology, imprinted with a digital version of your values, informed you which product to choose based on that.

“And then, back in your car, your digital twin directs you to the gas station that’s most in line with your environmental values,” Smart adds. A valuecosm not only uses information in a human way, it’s flexible, too. You can review your settings and change them manually. “You’ll be having a conversation with your [personal] agent, and you say, ‘I want more of this or this plus something else,’” Smart explains. “You know, I care more about social justice so make that area bigger.” To make this technology the most usable and effective though, your digital twin will have to pullyour information from various places, with your permission — not push its functions onto you. “People who have started using Google alerts, they have moved themselves toward a more pull-based view of the internet,” Smart says. The future that we care about is control of an algorithmic interface of your identity. In truth, the concept started as a way to improve advertising. For example, internet cookies monitor your online activity, allowing companies to match their advertisements to your interests. But instead of a company “pushing” their products or ideas onto you and trying to create demand, with pull-based marketing, you give permission for access to your information, and the advertising follows. “Instead of a filter, it’s more like a magnet,” Smart explains. That idea, however, could lead to even less online privacy.

The Future Of Privacy The uncertain status of online privacy already bothers the general public. People criticise companies like Google and Amazon that only pull their information from what’s available. But with digital twins, we’ll have to give full permission for companies to access our online identities to optimise our use of the technology. 26


“You know, I’d like to have control of my healthcare or financial information in my own little internet locker,” Smart admits. “But that kind of thinking is first generation. You can’t accomplish much by having control of your own data.” Big-name companies using algorithms and predictive analytics can probably best host our personal agents. As long as people feel they have strong control over the technology, privacy will come secondary, in Smart’s opinion.

Talking to your digital twin could one day be like looking in the mirror.

“People who are thinking that you can control your own identity aren’t thinking about the problem right,” he says. “The future of personal control isn’t control of data. The future that we care about is control of an algorithmic interface of your identity.”

For comparison, Smart mentions domestication. Humanity didn’t engineer the brains of cats and dogs. We simply chose the ones more amenable to us and bred them. “We’ll do the same to our advanced AIs, whose brains we won’t be designing, but rather teaching, like a small child,” Smart explains. And as Smart predicts, all these technologies, required to make fully functional personal agents possible, are only about five years away.

27


The Dawn of the Age of Artificial Intelligence

The advances we’ve seen in the past few years—cars that drive themselves, useful humanoid robots, speech recognition and synthesis systems, 3D printers, Jeopardy!champion computers—are not the crowning achievements of the computer era. They’re the warm-up acts. As we move deeper into the second machine age we’ll see more and more such wonders, and they’ll become more and more impressive. How can we be so sure? Because the exponential, digital, and recombinant powers of the second machine age have made it possible for humanity to create two of the most important one-time events in our history: the emergence of real, useful artificial intelligence (AI) and the connection of most of the people on the planet via a common digital network. Either of these advances alone would fundamentally change our growth prospects. When combined, they’re more important than anything since the Industrial Revolution, which forever transformed how physical work was done.

Thinking Machines, Available now Digital machines have escaped their narrow confines and started to demonstrate broad abilities in pattern recognition, complex communication, and other domains that used to be exclusively human. We’ve recently seen great progress in natural language processing, machine learning (the ability of a computer to automatically refine its methods and improve its results as it gets more data), computer vision, simultaneous localization and mapping, and many other areas. We’re going to see artificial intelligence do more and more, and as this happens costs will go down, outcomes will improve, and our lives will get better. Soon countless pieces of AI will be working on our behalf, often in the background. They’ll help us in areas ranging 28


from trivial to substantive to life changing. Trivial uses of AI include recognizing our friends’ faces in photos and recommending products. More substantive ones include automatically driving cars on the road, guiding robots in warehouses, and better matching jobs and job seekers. But these remarkable advances pale against the lifechanging potential of artificial intelligence. To take just one recent example, innovators at the Israeli company OrCam have combined a small but powerful computer, digital sensors, and excellent algorithms to give key aspects of sight to the visually impaired (a population numbering more than twenty million in the United States alone). A user of the OrCam system, which was introduced in 2013, clips onto her glasses a combination of a tiny digital camera and speaker that works by conducting sound waves through the bones of the head. If she points her finger at a source of text such as a billboard, package of food, or newspaper article, the computer immediately analyzes the images the camera sends to it, then reads the text to her via the speaker. Reading text ‘in the wild’—in a variety of fonts, sizes, surfaces, and lighting conditions—has historically been yet another area where humans outpaced even the most advanced hardware and software. OrCam and similar innovations show that this is no longer the case, and that here again technology is racing ahead. As it does, it will help millions of people lead fuller lives. The OrCam costs about $2,500—the price of a good hearing aid— and is certain to become cheaper over time. Digital technologies are also restoring hearing to the deaf via cochlear implants and will probably bring sight back to the fully blind; the FDA recently approved a first-generation retinal implant. AI’s benefits extend even to quadriplegics, since wheelchairs can now be controlled by thoughts. Considered objectively, these advances are something close to miracles—and they’re still in their infancy.

Billions of Innovators, Coming Soon In addition to powerful and useful AI, the other recent development that promises to further accelerate the second machine age is the digital interconnection of the planet’s people. There is no better resource for improving the world and bettering the state of humanity than the world’s humans—all 7.1 billion of us. Our good ideas and innovations will address the challenges that arise, improve the quality of our lives, allow us to live more lightly on the planet, and help us take better care of one another. It is a remarkable and unmistakable fact that, with the exception of climate change, virtually all environmental, social, and individual indicators of health have improved over time, even as human population has increased. This improvement is not a lucky coincidence; it is cause and effect. Things have gotten better because there are more people, who in total have more good ideas that improve our overall lot. The economist Julian Simon was one of the first to make this optimistic argument, and he advanced it repeatedly and forcefully throughout his career. He wrote, “It is your mind that matters economically, as much or more than your mouth or hands. In the long run, the most important economic effect of population size and growth is the contribution of additional people to our stock of useful knowledge. And this contribution is large enough in the long run to overcome all the costs of population growth.” 29


We do have one quibble with Simon, however. He wrote that, “The main fuel to speed the world’s progress is our stock of knowledge, and the brake is our lack of imagination.” We agree about the fuel but disagree about the brake. The main impediment to progress has been that, until quite recently, a sizable portion of the world’s people had no effective way to access the world’s stock of knowledge or to add to it. In the industrialized West we have long been accustomed to having libraries, telephones, and computers at our disposal, but these have been unimaginable luxuries to the people of the developing world. That situation is rapidly changing. In 2000, for example, there were approximately seven hundred million mobile phone subscriptions in the world, fewer than 30 percent of which were in developing countries. By 2012 there were more than six billion subscriptions, over 75 percent of which were in the developing world. The World Bank estimates that three-quarters of the people on the planet now have access to a mobile phone, and that in some countries mobile telephony is more widespread than electricity or clean water. The first mobile phones bought and sold in the developing world were capable of little more than voice calls and text messages, yet even these simple devices could make a significant difference. Between 1997 and 2001 the economist Robert Jensen studied a set of coastal villages in Kerala, India, where fishing was the main industry.10 Jensen gathered data both before and after mobile phone service was introduced, and the changes he documented are remarkable. Fish prices stabilized immediately after phones were introduced, and even though these prices dropped on average, fishermen’s profits actually increased because they were able to eliminate the waste that occurred when they took their fish to markets that already had enough supply for the day. The overall economic well-being of both buyers and sellers improved, and Jensen was able to tie these gains directly to the phones themselves. Now, of course, even the most basic phones sold in the developing world are more powerful than the ones used by Kerala’s fisherman over a decade ago. And cheap mobile devices keep improving. Technology analysis firm IDC forecasts that smartphones will outsell feature phones in the near future, and will make up about two-thirds of all sales by 2017. This shift is due to continued simultaneous performance improvements and cost declines in both mobile phone devices and networks, and it has an important consequence: it will bring billions of people into the community of potential knowledge creators, problem solvers, and innovators.

'Infinite Computing' and Beyond Today, people with connected smartphones or tablets anywhere in the world have access to many (if not most) of the same communication resources and information that we do while sitting in our offices at MIT. They can search the Web and browse Wikipedia. They can follow online courses, some of them taught by the best in the academic world. They can share their insights on blogs, Facebook, Twitter, and many other services, most of which are free. They can even conduct sophisticated data analyses using cloud resources such as Amazon Web Services and R, an open source application for statistics.13 In short, 30


they can be full contributors in the work of innovation and knowledge creation, taking advantage of what Autodesk CEO Carl Bass calls “infinite computing.” Until quite recently rapid communication, information acquisition, and knowledge sharing, especially over long distances, were essentially limited to the planet’s elite. Now they’re much more democratic and egalitarian, and getting more so all the time. The journalist A. J. Liebling famously remarked that, “Freedom of the press is limited to those who own one.” It is no exaggeration to say that billions of people will soon have a printing press, reference library, school, and computer all at their fingertips. We believe that this development will boost human progress. We can’t predict exactly what new insights, products, and solutions will arrive in the coming years, but we are fully confident that they’ll be impressive. The second machine age will be characterized by countless instances of machine intelligence and billions of interconnected brains working together to better understand and improve our world. It will make mockery out of all that came before.

31


3-D printing takes shape

3-D printing, or additive manufacturing, has come a long way from its roots in the production of simple plastic prototypes. Today, 3-D printers can not only handle materials ranging from titanium to human cartilage but also produce fully functional components, including complex mechanisms, batteries, transistors, and LEDs. The capabilities of 3-D printing hardware are evolving rapidly, too. They can build larger components and achieve greater precision and finer resolution at higher speeds and lower costs. Together, these advances have brought the technology to a tipping point—it appears ready to emerge from its niche status and become a viable alternative to conventional manufacturing processes in an increasing number of applications. Should this happen, the technology would transform manufacturing flexibility—for example, by allowing companies to slash development time, eliminate tooling costs, and simplify production runs—while making it possible to create complex shapes and structures that weren’t feasible before. Moreover, additive manufacturing would help companies improve the productivity of materials by eliminating the waste that accrues in traditional (subtractive) manufacturing and would thus spur the formation of a beneficial circular economy (for more, see “Remaking the industrial economy”). The economic implications of 3-D printing are significant: McKinsey Global Institute research suggests that it could have an impact of up to $550 billion a year by 2025. The advantages of 3-D printing over other manufacturing technologies could lead to profound changes in the way many things are designed, developed, produced, and 32


supported. Here are five 3-D printing disruptions that senior executives should begin preparing for.

Accelerated product-development cycles Reducing time in product development was a key benefit of the first 3-D printing machines, which were designed to speed the creation of product prototypes (and in some cases helped reduce turnaround times to a matter of hours, from days or weeks). Now many industries are poised for a second wave of acceleration as the line between additive and conventional manufacturing blurs. For example, additive manufacturing is already being used to get prototypes into the hands of customers faster, for quicker and more detailed feedback. (This is happening thanks to advances in printer resolution, higher-definition coloration, and the broader use of materials, such as elastomers, that help customers envision the final product.) The ability to make prototypes without tooling lets companies quickly test multiple configurations to determine customer preferences, thus reducing product-launch risk and time to market. Companies could even go into production using 3-D printed parts and start selling products while the traditional production tools were still being manufactured or before the decision to produce them had been made. When companies did order those tools, they could use additive-manufacturing techniques to make them, saving even more time and money. We expect that the use of such techniques will contribute to significant reductions in product-development cycle times over the next decade. (For example, 3-D printing makes some aspects of day-to-day R&D work, such as producing simple lab apparatus, vastly more productive.) Over time, 3-D printing will begin to affect how companies think about R&D more broadly, given how the technology enhances the ability to crowdsource ideas through remote cooperation. For some companies, that crowdsourced brainpower might one day begin supplanting R&D activities, making its management a new priority.

New manufacturing strategies and footprints As of 2011, only about 25 percent of the additive-manufacturing market involved the direct manufacture of end products. With a 60 percent annual growth rate, however, that is the industry’s fastest-growing segment. As costs continue to fall and the capabilities of 3D printers increase, the range of parts that can be economically manufactured using additive techniques will broaden dramatically. Boeing, for example, already uses printers to make some 200 part numbers for ten different types of aircraft, and medical-products companies are using them to create offerings such as hip replacements. Nonetheless, not every component will be a candidate for the technology and reap its benefits (cost reductions, performance improvements, or both). Companies should understand the characteristics that help determine which ones are. These include components with a high labor-cost element (such as time-consuming assembly and secondary machining processes), complex tooling requirements or relatively low volumes (and thus high tooling costs), or high obsolescence or scrap rates. Forward-looking manufacturers are already investigating ways of triaging their existing parts inventories to determine which hold the most potential. 33


Additive-manufacturing techniques also have implications for manufacturing-footprint decisions. While there is still a meaningful labor component to 3-D printed parts, the fact that it is lower than that of conventionally manufactured ones might, for example, tip the balance toward production closer to end customers. Alternatively, companies could find that the fully digital nature of 3-D printing makes it possible to produce complex parts in remote countries with lower input costs for electricity and labor. A related area that executives should watch with interest is the development of the market for printing materials. The cost of future materials is uncertain, as today many printers use proprietary ones owned or licensed by the manufacturer of the printing equipment. Should this change and more universal standards develop—thus lowering prices—the implications for executives devising manufacturing strategies and making footprint decisions would become very significant very quickly.

Shifting sources of profit Additive-manufacturing technologies could alter the way companies add value to their products and services. The outsourcing of conventional manufacturing helped spur companies such as Nike to rely more on their design skills. Likewise, 3-D printing techniques could reduce the cost and complexity of other kinds of production and force companies to differentiate their products in other ways. These could include everything from making products more easily reparable (and thus longer lived) to creating personalized designs. Indeed, reducing the reliance on hard tooling (which facilitates the manufacture of thousands of identical items) creates an opportunity to offer customized or bespoke designs at lower cost—and to a far broader range of customers. The additive manufacture of individualized orthodontic braces is just one example of the potential of these technologies. As more such offerings become technically viable, companies will have to determine which are sufficiently appealing and commercially worthwhile. The combination of mass customization and new design possibilities will up the ante for many companies and could prove very disruptive to traditional players in some segments. In certain parts of the value chain, the application of additive manufacturing will be less visible to customers, although its impact may be just as profound. A key challenge in traditional aftermarket supply chains, for example, is managing appropriate inventories of spare parts, particularly for older, legacy products. The ability to manufacture replacement parts on demand using 3-D printers could transform the economics of aftermarket service and the structure of industries. Relatively small facilities with on-site additive manufacturing capabilities could replace large regional warehouses. The supply of service parts might even be outsourced: small fabricators (or fabs) located, for example, at airports, hospitals, or major manufacturing venues could make these parts for much of the equipment used on site, with data supplied directly by the manufacturers. Of course, retailers too could someday use fabs—for example, to let customers tailor products such as toys or building materials to suit their needs. That business model could represent a value-chain play for manufacturers if, for instance, they owned the machines, core designs, or both.

34


New capabilities Design is inherently linked to methods of fabrication. Architects can’t design houses without considering construction techniques, and engineers can’t design machines without considering the benefits and limitations of casting, forging, milling, turning, and welding. While there is a wealth of knowledge around design for manufacturing, much less is available on design for printing. Our conversations with executives at manufacturing companies suggest that many are aware of this gap and scrambling to catalog their design know-how. Getting the most out of additive-manufacturing techniques also involves technical challenges, which include setting environmental parameters to prevent shape distortion, optimizing the speed of printing, and adjusting the properties of novel materials. Indeed, tuning materials is quite a challenge. While plastics are relatively straightforward to work with, metals are more difficult. Slurries and gels (for example, living tissue or the material for printed zinc–air batteries) are extremely difficult. The most successful players will understand these challenges. Some are already creating centers of excellence and hiring engineers with strong experience in additive manufacturing.

Disruptive competitors Many benefits of 3-D printing could cut the cost of market entry for new players: for example, the use of the technology to lower tooling costs makes it cheaper to begin manufacturing, even at low volumes, or to serve niche segments. The direct manufacturing of end products greatly simplifies and reduces the work of a designer who would only have to take products from the computer screen to commercial viability. New businesses are already popping up to offer highly customized or collaboratively designed products. Others act as platforms for the manufacture and distribution of products designed and sold online by their customers. These businesses are gaining insights into consumer tastes and building relationships that established companies could struggle to match. Initially, these new competitors will be niche players, operating where consumers are willing to pay a premium for a bespoke design, complex geometry, or rapid delivery. Over the longer term, however, they could transform industries in unexpected ways, moving the source of competitive advantage away from the ability to manufacture in high volumes at low cost and toward other areas of the value chain, such as design or even the ownership of customer networks. Moreover, the availability of open-source designs for 3-D printed firearms shows how such technologies have the potential to create ethical and regulatory dilemmas and to disrupt industries.

35


Will Gaming Save Education, or Just Waste Time? By Dian Schaffhauser

Today's sophisticated digital games are engaging students and conveying hard-to-teach concepts like failure and perspective. So why aren't more classrooms playing along?

If the use of technology in education is about meeting students where they are, it seems like gaming would be a good place to start. After all, as far back as 2008, the Pew Internet & American Life Project reported that 97 percent of kids ages 12 to 17 were playing some kind of digital game every week; about half played daily.

And why not? When Neil Postman wrote his classic, Amusing Ourselves to Death, about the shift from a typographically focused society to one that was ruled by television, his title could just as easily have been foretelling the increasing use of gamelike activities in all aspects of life. Consumers spent about $21 billion on the game industry last year. Half of all American households have dedicated game consoles; many have two. We don't fly without getting our miles. We can't shop without handing over our rewards card. We seem to be a species well-suited for seeking "pleasure and reward," notes Janna Quitney Anderson, director of Pew and Elon University's Imagining the Internet Center. Gaming and its trappings play right into our appetites.

Proponents say gaming provides a compelling way to engage students and make educational efforts more effective. Others believe it simply provides a merry diversion from what should truly be happening in the classroom. Where do the golden tokens reside? Let's click for a roll of the die and find out. 36


"Failing Up" Games and playing have been part of classrooms "for a long time," declares Katie Salen, well-known game designer, DePaul University professor, and one of the masterminds behind Quest to Learn, a public school in New York designed around game playing. "Play is the way that human beings learn about the world‌. That's how we discover how things work."

Besides providing an environment in which to "learn by doing," games offer several other educational benefits, according to Salen. For one, games structure problem-solving in a way that helps the player to "fail up," as Salen puts it. "Every game designer expects every player to be successful in their game," she notes. The best games are designed to makes the problem to be solved both hard and fun so that students will "want to continue to persist on that problem."

Sometimes, the learner will have to try hundreds of times to find success. "You know you're getting better at that particular skill every time you try, and you're learning something about how to solve that particular problem," Salen says. The failure is productive. Rather than hitting some kind of wall and just stopping--which is "what kids tend to experience in the classroom"--failure becomes "a fantastic thing because it's about iteration and discovery. When I try it again, I'm going to apply that particular piece of learning."

Data Rich One of Salen's arguments for the use of games in education should sit well with most policymakers: Games are rich with data. "They're filled with information for players around how they're doing, where they need to go, how they need to get better," she notes. That data can be used by teachers as well as students, she adds. "Games open up assessment, so it's really kid-facing, and that can be incredibly powerful." Kristen DiCerbo, a senior research scientist at Pearson, agrees with Salen and offers an example from the world of SimCity. A collaboration between Electronic Arts and GlassLab has produced an education-oriented update of the decades-old city-building simulation game built specifically for the classroom. Called SimCityEDU, the game challenges learners to solve problems such as combating pollution, and a multitude of data can be captured to show what students do first, what actions they take, and where they linger. DiCerbo says, "By starting to look at how long people spend looking at information--all those little things that we could never gather from a paper and pencil path--we hope to be able to get a finer and better measurement of what kids can do."

37


The Sweet Spot for Gaming For Jessica Millstone, education fellow at the Joan Ganz Cooney Center and adjunct professor at Bank Street College of Education, the sweet spot for gaming in education is helping students grasp concepts that are tough to learn out of a book.

She cites the example of Mission US, an online PBS game that immerses players in "missions" about US history. Backdrops include colonial America and the Underground Railroad, and players become characters who might have lived during those times. Millstone says, "These are very abstract for a third- or fourth-grader to consider or to think about what that part of the country looked like 400 years ago…and even play out some scenarios that are not historically accurate but that involve you making decisions and problem-solving and thinking as somebody would during that time period."

Yes, Millstone acknowledges, those same lessons could be emulated in real life, through dress-up "pioneer day"-type activities. But the online versions provide a number of benefits, such as "a huge amount of support for the teachers. They have all the curriculum materials; all the structure is there." On top of that, the kids like it. "They are so fluent in using these kinds of online tools," she adds. Plus, gaming generates learning data "that is not just anecdotal--not just what the teacher is observing, but actually providing some real information about what the kids are learning or not learning as they go through this content material. That's something [teachers] can feed back into their teaching."

Adapting to the Classroom While game playing is knit into the fabric of schools such as Quest to Learn and GameDesk Institute's Playmaker School, based in Los Angeles, most public school classrooms' digital gaming hasn't gotten much past the experimental stage. Pew's Quitney Anderson quotes futurist John Smart, a respondent to a Pew/Elon survey on gaming, who told researchers, "We simply don't have the artificial intelligence necessary to build really good versions of this yet, and educational software remains pitifully poor at creating games that improve rather than distract from learning."

One component that current games lack is "adaptivity," says DiCerbo. "Right now, most games respond to player behavior in basic ways: Once you get five of these problems right, you move to the next level, or if you pursue this path, you'll be taking this action; if you follow that path, you'll take that action." Those are fairly mechanistic responses.

The real goal is understanding not just what the behavior was but also the player's "underlying" skill. "That may be based on the combination of lots of different actions we've seen them do," she notes. "Once we know that skill, being able to really get them to the content that's going to hit that magic spot where they're able to learn and they're 38


challenged, but they're not getting frustrated." In learning theory, she explains, that's called the "zone of proximal development," which differentiates what a student is able to do independently from what he or she can do with adult help. Another challenge is that most game designers aren't focusing on content areas. "When they're making a commercial game that is adaptive to skills, they're thinking more about how good of a shooter you are," says DiCerbo. "In the [education] game, we want to know: How good are you at understanding that things can have two different causes or understanding how to gather evidence? It's taking different kinds of skills and adapting to those and making the game content adapt as well--as opposed to the game-play skills."

39


The Gaming World vs. the Real World Yet another lingering challenge to wide adoption of gaming in the classroom is the jarring difference between the traditionally structured schools and classrooms and the playing itself. When learning games are relegated to an hour a week in the computer lab, for example, students don't have time to really consolidate what they've learned, argues Ron Smith, an art and media teacher at Helen Bernstein High School in the Los Angeles Unified School District.

He does believe that games have a place in learning. "But you can't do it piecemeal," he insists. "You have to be all in or all out." Given a situation where every student had his or her own device, "I'd be the first in line to try gaming as a platform," he insists, because a 1-to-1 environments allows students to "take what we did and keep going on it after they walked out of the classroom."

Lucas Gillispie, a gamer and instructional technology coordinator at Pender County Schools (NC), would prefer that all educational activity take place in the world concocted in the game. If students need to write something, for example, they shouldn't have to get out of the game's simulated environment and get into a separate application for word processing. "They'd take out their scroll and quill and write," he says. Then what they've composed "would come out of the system somehow and back to me as a teacher so I could assess it and provide feedback. I would never have to pull the kid out of that space to give me that content. That's my ideal world."

The problem as he sees it is that if a student is having a "really compelling experience" in the game and then has to come out of that experience and log into a system to write about it or do something else that "smells like school," it sets up a dividing line in the student's mind "that play is fun and school and learning are boring."

Gillispie, who resides in the real world, is doing everything he can to help teachers introduce game playing into what they do. Besides his day-job duties, he runs edurealms, a site dedicated to integrating gaming into education. There, he and associate Craig Lawson offer a free download that provides curriculum and guidance for teachers who want to use World of Warcraft or its subscription-free competitor Guild Wars 2 in the classroom. He also advises those who would want to implement the program through the WoWinSchool Project.

Right now, though, introducing online gaming into schools is about as intense as staging a raid on fel orcs in Hellfire Citadel, because it requires the commitment of individual teachers who have a passion for gaming. In Lawson's own district, out of a staff of 750 or 40


800 people, Gillispie can count on two hands the number of teachers who are exploring gaming. In the wider world, his project has drawn active participation by 11 schools.

The ones who do sign up--at Pender County and other places--are teachers "who have had their own experiences in that space, and they realize the value of it and what it's meant to them personally," he says. "So helping them make the connection between that and classroom instruction is really simple. It's not a hard sell." But it is a tough hurdle to convince teachers who are not gamers themselves to "devote the extra time and energy and resources above and beyond what they're normally called to do on a day-to-day basis in the classroom."

A Gaming Generation Grows Up That hurdle may just dissipate as the latest generation of teachers moves into the classroom. Why? Familiarity. "They're all gamers," says Millstone. "They also know that their students are all gamers, whether it's casual games you might play on your iPhone or tablet, or video games you'd use a console to play, or multiplayer games. People have a lot of fluency in the language of games."

"I think we're only at the very beginning of how games are going to be used," Millstone observes. "We're going to see more games built into standardized tests, for example, and more use of data that comes out of games in sophisticated ways to improve student learning in all kinds of ways. And we'll actually let students get some of that data too, so they can see how they're doing and can get feedback and start to craft their own learning trajectories in the classroom."

In the meantime, there's a game out there for everybody, declares DiCerbo. "It's not one game for everybody; different kids will like different games. But eventually we could find games that engage and teach kids throughout their K-12--and into higher ed-experiences."

Minecraft in Schools Anyone who knows a middle-school-age child likely also knows about Minecraft, an interactive building construction environment that takes the idea of "immersive" to serious depths. Kids start out building structures that protect them from monsters, but end up spending hours, weeks, or months creating imaginative objects and environments. Think Legos and Erector Sets meet Madeleine L'Engle and J.K. Rowling.

Not surprisingly, Minecraft has been discovered by educators. Go to YouTube and search for "Minecraft in schools" and you'll get more than 800,000 results. Minecraft in education 41


has its own wiki. TeacherGaming created MinecraftEdu.com, which provides a suite of tools to use this immersive environment in the classroom. Custom versions of the game include features that support classroom use, such as server software that simplifies the process of getting multiple players up and running; functions to integrate curriculum; and a free library of worlds, levels, and activities to teach multiple subjects.

The Research on Gaming Research is showing that playing games can improve student achievement. Both Pearson's Kristen DiCerbo and the Joan Ganz Cooney Center's Jessica Millstone point to a meta-analysis study done by SRI with Gates Foundation funding. "Digital Games for Learning: A Systematic Review and Meta-Analysis" looked at a compilation of all studies in the literature published between 2000 and 2012, sorting out the effect across all of them.

Preliminary results of the study found that "when digital games were compared to other instruction conditions without digital games there was a moderate to strong effect in favor of digital games‌. Students at the median in the control group (no game) could have been raised 12 percent in cognitive learning outcomes if they had received a digital game."

Research led by DiCerbo herself with 11 pairs of students playing a simulation game found some evidence that game play encourages players to use particular cognitive processes, access prior knowledge, and apply skills as intended.

A research project out of Florida State University in Tallahassee examined the use of "stealth assessment" (in which assessment was built into the game itself) to measure the impact of gaming on learning physics, and found significant gains between tests given before and after students began playing.

Gaming vs. Gamification Any discussion of gaming in education has to address game playing versus gamification-the latter a term that has picked up a lot of attention the last few years. In the educational realm, "gaming" refers to digital games that have learning objectives. Gamification involves introducing the mechanics of games--the use of points, rewards, and leaderboards--into activities that don't have to be digital and don't necessarily have anything to do with playing a game; the "game factors" are intended to inspire engagement like real games do.

But what are the benefits of gamification without game playing itself? According to Lucas Gillispie, a gamer and instructional technology coordinator at Pender County Schools 42


(NC), "We're really missing out on some of the elements that make real games compelling when we just boil it down to the leaderboard and achievements levels and badges. If you don't have the fun factor in there, I'm not sure that it really qualifies as a game." Gamification, he proposes, is "really just getting back to the old gold star on the chart. 'If you do this, you get a gold star.'" What gamification lacks is novelty: "You need unexpected things, you need hard, difficult choices--all the elements that make people want to play our most compelling games out in the commercial world."

Kristen DiCerbo, a senior research scientist at Pearson, agrees with Gillispie. Gamification advocates are "really missing some of the point," she insists. "When we talk about gamification, people like it because they think it's like a game. But the big things that make games great are the narrative, telling a story that sucks people in, the continuous feedback, giving kids a challenge at just the right level, where it's tough, but not frustrating. When people say 'gamification,' they aren't talking about those things."

DiCerbo--who works with simulations, games, and large-scale assessments--suggests that game-rewards structures are really "behaviorism in a disguised form." She says, "We've been doing [game rewards] for years," referring to the use of tokens, points, and other artifacts that make up the backbone of the gamification movement. For some things--like very rote tasks--and for students who aren't motivated, "yes, that does increase kids' general interest in the short term," she says. The problem is that for learners who are already intrinsically interested in something, "Sometimes giving them rewards makes them think, 'Oh, I'm just doing this to get the reward.'" The result: a decrease in motivation. In the long term, she believes that games are more likely than gamification to succeed in the classroom.

43


The Anatomy of a Large-Scale Hypertextual Web Search Engine Abstract In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24 million pages is available at http://google.stanford.edu/ To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and web proliferation, creating a web search engine today is very different from three years ago. This paper provides an in-depth description of our large-scale web search engine -- the first such detailed public description we know of to date. Apart from the problems of scaling traditional search techniques to data of this magnitude, there are new technical challenges involved with using the additional information present in hypertext to produce better search results. This paper addresses this question of how to build a practical large-scale system which can exploit the additional information present in hypertext. Also we look at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want. Keywords: World Wide Web, Search Engines, Information Retrieval, PageRank, Google

1. Introduction (Note: There are two versions of this paper -- a longer full version and a shorter printed version. The full version is available on the web and the conference CD-ROM.) The web creates new challenges for information retrieval. The amount of information on the web is growing rapidly, as well as the number of new users inexperienced in the art of web research. People are likely to surf the web using its link graph, often starting with high quality human maintained indices such as Yahoo! or with search engines. Human maintained lists cover popular topics effectively but are subjective, expensive to build and maintain, slow to improve, and cannot cover all esoteric topics. Automated search engines that rely on keyword matching usually return too many low quality matches. To make matters worse, some advertisers attempt to gain people's attention by taking measures meant to mislead automated search engines. We have built a large-scale search engine which addresses many of the problems of existing systems. It makes especially heavy use of the additional structure present in hypertext to provide much higher quality search results. We chose our system name, Google, because it is a common spelling of googol, or 10100 and fits well with our goal of building very large-scale search engines.

1.1 Web Search Engines -- Scaling Up: 1994 - 2000 Search engine technology has had to scale dramatically to keep up with the growth of the web. In 1994, one of the first web search engines, the World Wide Web Worm (WWWW) [McBryan 94] had an index of 110,000 web pages and web accessible documents. As of November, 1997, the top search engines claim to index from 2 million (WebCrawler) to 100 million web documents (from Search Engine Watch). It is foreseeable that by the year 2000, a 44


comprehensive index of the Web will contain over a billion documents. At the same time, the number of queries search engines handle has grown incredibly too. In March and April 1994, the World Wide Web Worm received an average of about 1500 queries per day. In November 1997, Altavista claimed it handled roughly 20 million queries per day. With the increasing number of users on the web, and automated systems which query search engines, it is likely that top search engines will handle hundreds of millions of queries per day by the year 2000. The goal of our system is to address many of the problems, both in quality and scalability, introduced by scaling search engine technology to such extraordinary numbers.

1.2. Google: Scaling with the Web Creating a search engine which scales even to today's web presents many challenges. Fast crawling technology is needed to gather the web documents and keep them up to date. Storage space must be used efficiently to store indices and, optionally, the documents themselves. The indexing system must process hundreds of gigabytes of data efficiently. Queries must be handled quickly, at a rate of hundreds to thousands per second.

These tasks are becoming increasingly difficult as the Web grows. However, hardware performance and cost have improved dramatically to partially offset the difficulty. There are, however, several notable exceptions to this progress such as disk seek time and operating system robustness. In designing Google, we have considered both the rate of growth of the Web and technological changes. Google is designed to scale well to extremely large data sets. It makes efficient use of storage space to store the index. Its data structures are optimized for fast and efficient access (see section 4.2). Further, we expect that the cost to index and store text or HTML will eventually decline relative to the amount that will be available (see Appendix B). This will result in favorable scaling properties for centralized systems like Google.

1.3 Design Goals 1.3.1 Improved Search Quality Our main goal is to improve the quality of web search engines. In 1994, some people believed that a complete search index would make it possible to find anything easily. According to Best of the Web 1994 -- Navigators, "The best navigation service should make it easy to find almost anything on the Web (once all the data is entered)." However, the Web of 1997 is quite different. Anyone who has used a search engine recently, can readily testify that the completeness of the index is not the only factor in the quality of search results. "Junk results" often wash out any results that a user is interested in. In fact, as of November 1997, only one of the top four commercial search engines finds itself (returns its own search page in response to its name in the top ten results). One of the main causes of this problem is that the number of documents in the indices has been increasing by many orders of magnitude, but the user's ability to look at documents has not. People are still only willing to look at the first few tens of results. Because of this, as the collection size grows, we need tools that have very high precision (number of relevant documents returned, say in the top tens of results). Indeed, we want our notion of "relevant" to only include the very best documents since there may be tens of thousands of slightly relevant documents. This very high precision is important even at the expense of recall (the total number of relevant documents the system is able to return). There is quite a bit of recent optimism that the use of more hypertextual information can help improve search and other applications [Marchiori 97] [Spertus 97] [Weiss 96] [Kleinberg 98]. In particular, link structure [Page 98] and link text provide a lot of information for making relevance judgments and quality filtering. Google makes use of both link structure and anchor text (see Sections 2.1 and 2.2). 45


1.3.2 Academic Search Engine Research Aside from tremendous growth, the Web has also become increasingly commercial over time. In 1993, 1.5% of web servers were on .com domains. This number grew to over 60% in 1997. At the same time, search engines have migrated from the academic domain to the commercial. Up until now most search engine development has gone on at companies with little publication of technical details. This causes search engine technology to remain largely a black art and to be advertising oriented (see Appendix A). With Google, we have a strong goal to push more development and understanding into the academic realm.

Another important design goal was to build systems that reasonable numbers of people can actually use. Usage was important to us because we think some of the most interesting research will involve leveraging the vast amount of usage data that is available from modern web systems. For example, there are many tens of millions of searches performed every day. However, it is very difficult to get this data, mainly because it is considered commercially valuable. Our final design goal was to build an architecture that can support novel research activities on largescale web data. To support novel research uses, Google stores all of the actual documents it crawls in compressed form. One of our main goals in designing Google was to set up an environment where other researchers can come in quickly, process large chunks of the web, and produce interesting results that would have been very difficult to produce otherwise. In the short time the system has been up, there have already been several papers using databases generated by Google, and many others are underway. Another goal we have is to set up a Spacelab-like environment where researchers or even students can propose and do interesting experiments on our large-scale web data.

2. System Features The Google search engine has two important features that help it produce high precision results. First, it makes use of the link structure of the Web to calculate a quality ranking for each web page. This ranking is called PageRank and is described in detail in [Page 98]. Second, Google utilizes link to improve search results.

2.1 PageRank: Bringing Order to the Web The citation (link) graph of the web is an important resource that has largely gone unused in existing web search engines. We have created maps containing as many as 518 million of these hyperlinks, a significant sample of the total. These maps allow rapid calculation of a web page's "PageRank", an objective measure of its citation importance that corresponds well with people's subjective idea of importance. Because of this correspondence, PageRank is an excellent way to prioritize the results of web keyword searches. For most popular subjects, a simple text matching search that is restricted to web page titles performs admirably when PageRank prioritizes the results (demo available at google.stanford.edu). For the type of full text searches in the main Google system, PageRank also helps a great deal.

2.1.1 Description of PageRank Calculation Academic citation literature has been applied to the web, largely by counting citations or backlinks to a given page. This gives some approximation of a page's importance or quality. PageRank extends this idea by not counting links from all pages equally, and by normalizing by the number of links on a page. PageRank is defined as follows:

46


We assume page A has pages T1...Tn which point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. There are more details about d in the next section. Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows:

PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)) Note that the PageRanks form a probability distribution over web pages, so the sum of all web pages' PageRanks will be one. PageRank or PR(A) can be calculated using a simple iterative algorithm, and corresponds to the principal eigenvector of the normalized link matrix of the web. Also, a PageRank for 26 million web pages can be computed in a few hours on a medium size workstation. There are many other details which are beyond the scope of this paper.

2.1.2 Intuitive Justification PageRank can be thought of as a model of user behavior. We assume there is a "random surfer" who is given a web page at random and keeps clicking on links, never hitting "back" but eventually gets bored and starts on another random page. The probability that the random surfer visits a page is its PageRank. And, the d damping factor is the probability at each page the "random surfer" will get bored and request another random page. One important variation is to only add the damping factor d to a single page, or a group of pages. This allows for personalization and can make it nearly impossible to deliberately mislead the system in order to get a higher ranking. We have several other extensions to PageRank.

Another intuitive justification is that a page can have a high PageRank if there are many pages that point to it, or if there are some pages that point to it and have a high PageRank. Intuitively, pages that are well cited from many places around the web are worth looking at. Also, pages that have perhaps only one citation from something like the Yahoo! homepage are also generally worth looking at. If a page was not high quality, or was a broken link, it is quite likely that Yahoo's homepage would not link to it. PageRank handles both these cases and everything in between by recursively propagating weights through the link structure of the web.

2.2 Anchor Text The text of links is treated in a special way in our search engine. Most search engines associate the text of a link with the page that the link is on. In addition, we associate it with the page the link points to. This has several advantages. First, anchors often provide more accurate descriptions of web pages than the pages themselves. Second, anchors may exist for documents which cannot be indexed by a text-based search engine, such as images, programs, and databases. This makes it possible to return web pages which have not actually been crawled. Note that pages that have not been crawled can cause problems, since they are never checked for validity before being returned to the user. In this case, the search engine can even return a page that never actually existed, but had hyperlinks pointing to it. However, it is possible to sort the results, so that this particular problem rarely happens.

This idea of propagating anchor text to the page it refers to was implemented in the World Wide Web Worm [McBryan 94] especially because it helps search non-text information, and expands the search coverage with fewer downloaded documents. We use anchor propagation mostly because anchor text can help provide better quality results. Using anchor text efficiently is technically 47


difficult because of the large amounts of data which must be processed. In our current crawl of 24 million pages, we had over 259 million anchors which we indexed.

2.3 Other Features Aside from PageRank and the use of anchor text, Google has several other features. First, it has location information for all hits and so it makes extensive use of proximity in search. Second, Google keeps track of some visual presentation details such as font size of words. Words in a larger or bolder font are weighted higher than other words. Third, full raw HTML of pages is available in a repository.

3 Related Work Search research on the web has a short and concise history. The World Wide Web Worm (WWWW) [McBryan 94] was one of the first web search engines. It was subsequently followed by several other academic search engines, many of which are now public companies. Compared to the growth of the Web and the importance of search engines there are precious few documents about recent search engines [Pinkerton 94]. According to Michael Mauldin (chief scientist, Lycos Inc) [Mauldin], "the various services (including Lycos) closely guard the details of these databases". However, there has been a fair amount of work on specific features of search engines. Especially well represented is work which can get results by post-processing the results of existing commercial search engines, or produce small scale "individualized" search engines. Finally, there has been a lot of research on information retrieval systems, especially on well controlled collections. In the next two sections, we discuss some areas where this research needs to be extended to work better on the web.

3.1 Information Retrieval Work in information retrieval systems goes back many years and is well developed [Witten 94]. However, most of the research on information retrieval systems is on small well controlled homogeneous collections such as collections of scientific papers or news stories on a related topic. Indeed, the primary benchmark for information retrieval, the Text Retrieval Conference [TREC 96], uses a fairly small, well controlled collection for their benchmarks. The "Very Large Corpus" benchmark is only 20GB compared to the 147GB from our crawl of 24 million web pages. Things that work well on TREC often do not produce good results on the web. For example, the standard vector space model tries to return the document that most closely approximates the query, given that both query and document are vectors defined by their word occurrence. On the web, this strategy often returns very short documents that are the query plus a few words. For example, we have seen a major search engine return a page containing only "Bill Clinton Sucks" and picture from a "Bill Clinton" query. Some argue that on the web, users should specify more accurately what they want and add more words to their query. We disagree vehemently with this position. If a user issues a query like "Bill Clinton" they should get reasonable results since there is a enormous amount of high quality information available on this topic. Given examples like these, we believe that the standard information retrieval work needs to be extended to deal effectively with the web.

3.2 Differences Between the Web and Well Controlled Collections The web is a vast collection of completely uncontrolled heterogeneous documents. Documents on the web have extreme variation internal to the documents, and also in the external meta information that might be available. For example, documents differ internally in their language (both human and programming), vocabulary (email addresses, links, zip codes, 48


phone numbers, product numbers), type or format (text, HTML, PDF, images, sounds), and may even be machine generated (log files or output from a database). On the other hand, we define external meta information as information that can be inferred about a document, but is not contained within it. Examples of external meta information include things like reputation of the source, update frequency, quality, popularity or usage, and citations. Not only are the possible sources of external meta information varied, but the things that are being measured vary many orders of magnitude as well. For example, compare the usage information from a major homepage, like Yahoo's which currently receives millions of page views every day with an obscure historical article which might receive one view every ten years. Clearly, these two items must be treated very differently by a search engine.

Another big difference between the web and traditional well controlled collections is that there is virtually no control over what people can put on the web. Couple this flexibility to publish anything with the enormous influence of search engines to route traffic and companies which deliberately manipulating search engines for profit become a serious problem. This problem that has not been addressed in traditional closed information retrieval systems. Also, it is interesting to note that metadata efforts have largely failed with web search engines, because any text on the page which is not directly represented to the user is abused to manipulate search engines. There are even numerous companies which specialize in manipulating search engines for profit.

4 System Anatomy First, we will provide a high level discussion of the architecture. Then, there is some in-depth descriptions of important data structures. Finally, the major applications: crawling, indexing, and searching will be examined in depth.

4.1 Google Architecture Overview In this section, we will give a high level overview of how the whole system works as pictured in Figure 1. Further sections will discuss the applications and data structures not mentioned in this section. Most of Google is implemented in C or C++ for efficiency and can run in either Solaris or Linux.

In Google, the web crawling (downloading of web pages) is done by several distributed crawlers. There is a URLserver that sends lists of URLs to be fetched to the crawlers. The web pages that are fetched are then sent to the storeserver. The storeserver then compresses and stores the web pages into a repository. Every web page has an associated ID number called a docID which is Figure 1. High Level Google Architecture assigned whenever a new URL is parsed out of a web page. The indexing function is performed by the indexer and the sorter. The indexer performs a number of functions. It reads the repository, uncompresses the documents, and parses them. Each document is converted into a set of word occurrences called hits. The hits record the word, position in document, an approximation of font 49


size, and capitalization. The indexer distributes these hits into a set of "barrels", creating a partially sorted forward index. The indexer performs another important function. It parses out all the links in every web page and stores important information about them in an anchors file. This file contains enough information to determine where each link points from and to, and the text of the link. The URLresolver reads the anchors file and converts relative URLs into absolute URLs and in turn into docIDs. It puts the anchor text into the forward index, associated with the docID that the anchor points to. It also generates a database of links which are pairs of docIDs. The links database is used to compute PageRanks for all the documents. The sorter takes the barrels, which are sorted by docID (this is a simplification, see Section 4.2.5), and resorts them by wordID to generate the inverted index. This is done in place so that little temporary space is needed for this operation. The sorter also produces a list of wordIDs and offsets into the inverted index. A program called DumpLexicon takes this list together with the lexicon produced by the indexer and generates a new lexicon to be used by the searcher. The searcher is run by a web server and uses the lexicon built by DumpLexicon together with the inverted index and the PageRanks to answer queries.

4.2 Major Data Structures Google's data structures are optimized so that a large document collection can be crawled, indexed, and searched with little cost. Although, CPUs and bulk input output rates have improved dramatically over the years, a disk seek still requires about 10 ms to complete. Google is designed to avoid disk seeks whenever possible, and this has had a considerable influence on the design of the data structures.

4.2.1 BigFiles BigFiles are virtual files spanning multiple file systems and are addressable by 64 bit integers. The allocation among multiple file systems is handled automatically. The BigFiles package also handles allocation and deallocation of file descriptors, since the operating systems do not provide enough for our needs. BigFiles also support rudimentary compression options.

4.2.2 Repository The repository contains the full HTML of every web page. Each page is compressed using zlib (see RFC1950). The choice of compression technique is a tradeoff between speed and compression ratio. We chose zlib's speed over a significant improvement in compression offered by bzip. The compression rate of bzip was approximately 4 to 1 Figure 2. Repository Data Structure on the repository as compared to zlib's 3 to 1 compression. In the repository, the documents are stored one after the other and are prefixed by docID, length, and URL as can be seen in Figure 2. The repository requires no other data structures to be used in order to access it. This helps with data consistency and makes development much easier; we can rebuild all the other data structures from only the repository and a file which lists crawler errors.

4.2.3 Document Index The document index keeps information about each document. It is a fixed width ISAM (Index sequential access mode) index, ordered by docID. The information stored in each entry 50


includes the current document status, a pointer into the repository, a document checksum, and various statistics. If the document has been crawled, it also contains a pointer into a variable width file called docinfo which contains its URL and title. Otherwise the pointer points into the URLlist which contains just the URL. This design decision was driven by the desire to have a reasonably compact data structure, and the ability to fetch a record in one disk seek during a search

Additionally, there is a file which is used to convert URLs into docIDs. It is a list of URL checksums with their corresponding docIDs and is sorted by checksum. In order to find the docID of a particular URL, the URL's checksum is computed and a binary search is performed on the checksums file to find its docID. URLs may be converted into docIDs in batch by doing a merge with this file. This is the technique the URLresolver uses to turn URLs into docIDs. This batch mode of update is crucial because otherwise we must perform one seek for every link which assuming one disk would take more than a month for our 322 million link dataset. 4.2.4 Lexicon The lexicon has several different forms. One important change from earlier systems is that the lexicon can fit in memory for a reasonable price. In the current implementation we can keep the lexicon in memory on a machine with 256 MB of main memory. The current lexicon contains 14 million words (though some rare words were not added to the lexicon). It is implemented in two parts -- a list of the words (concatenated together but separated by nulls) and a hash table of pointers. For various functions, the list of words has some auxiliary information which is beyond the scope of this paper to explain fully.

4.2.5 Hit Lists A hit list corresponds to a list of occurrences of a particular word in a particular document including position, font, and capitalization information. Hit lists account for most of the space used in both the forward and the inverted indices. Because of this, it is important to represent them as efficiently as possible. We considered several alternatives for encoding position, font, and capitalization -- simple encoding (a triple of integers), a compact encoding (a hand optimized allocation of bits), and Huffman coding. In the end we chose a hand optimized compact encoding since it required far less space than the simple encoding and far less bit manipulation than Huffman coding. The details of the hits are shown in Figure 3.

Our compact encoding uses two bytes for every hit. There are two types of hits: fancy hits and plain hits. Fancy hits include hits occurring in a URL, title, anchor text, or meta tag. Plain hits include everything else. A plain hit consists of a capitalization bit, font size, and 12 bits of word position in a document (all positions higher than 4095 are labeled 4096). Font size is represented relative to the rest of the document using three bits (only 7 values are actually used because 111 is the flag that signals a fancy hit). A fancy hit consists of a capitalization bit, the font size set to 7 to indicate it is a fancy hit, 4 bits to encode the type of fancy hit, and 8 bits of position. For anchor hits, the 8 bits of position are split into 4 bits for position in anchor and 4 bits for a hash of the docID the anchor occurs in. This gives us some limited phrase searching as long as there are not that many anchors for a particular word. We expect to update the way that anchor hits are stored to allow for greater resolution in the position and docIDhash fields. We use font size relative to the rest of the document because when searching, you do not want to rank otherwise identical documents differently just because one of the documents is in a larger font.

51


The length of a hit list is stored before the hits themselves. To save space, the length of the hit list is combined with the wordID in the forward index and the docID in the inverted index. This limits it to 8 and 5 bits respectively (there are some tricks which allow 8 bits to be borrowed from the wordID). If the length is longer than would fit in that many bits, an escape code is used in those bits, and the next two bytes contain the actual length. 4.2.6 Forward Index The forward index is actually already partially sorted. It is stored in a number of barrels (we used 64). Each barrel holds a range of wordID's. If a document contains words that fall into a Figure 3. Forward and Reverse Indexes and particular barrel, the docID is recorded into the the Lexicon barrel, followed by a list of wordID's with hitlists which correspond to those words. This scheme requires slightly more storage because of duplicated docIDs but the difference is very small for a reasonable number of buckets and saves considerable time and coding complexity in the final indexing phase done by the sorter. Furthermore, instead of storing actual wordID's, we store each wordID as a relative difference from the minimum wordID that falls into the barrel the wordID is in. This way, we can use just 24 bits for the wordID's in the unsorted barrels, leaving 8 bits for the hit list length.

4.2.7 Inverted Index The inverted index consists of the same barrels as the forward index, except that they have been processed by the sorter. For every valid wordID, the lexicon contains a pointer into the barrel that wordID falls into. It points to a doclist of docID's together with their corresponding hit lists. This doclist represents all the occurrences of that word in all documents.

An important issue is in what order the docID's should appear in the doclist. One simple solution is to store them sorted by docID. This allows for quick merging of different doclists for multiple word queries. Another option is to store them sorted by a ranking of the occurrence of the word in each document. This makes answering one word queries trivial and makes it likely that the answers to multiple word queries are near the start. However, merging is much more difficult. Also, this makes development much more difficult in that a change to the ranking function requires a rebuild of the index. We chose a compromise between these options, keeping two sets of inverted barrels -- one set for hit lists which include title or anchor hits and another set for all hit lists. This way, we check the first set of barrels first and if there are not enough matches within those barrels we check the larger ones.

4.3 Crawling the Web Running a web crawler is a challenging task. There are tricky performance and reliability issues and even more importantly, there are social issues. Crawling is the most fragile application since it involves interacting with hundreds of thousands of web servers and various name servers which are all beyond the control of the system.

52


In order to scale to hundreds of millions of web pages, Google has a fast distributed crawling system. A single URLserver serves lists of URLs to a number of crawlers (we typically ran about 3). Both the URLserver and the crawlers are implemented in Python. Each crawler keeps roughly 300 connections open at once. This is necessary to retrieve web pages at a fast enough pace. At peak speeds, the system can crawl over 100 web pages per second using four crawlers. This amounts to roughly 600K per second of data. A major performance stress is DNS lookup. Each crawler maintains a its own DNS cache so it does not need to do a DNS lookup before crawling each document. Each of the hundreds of connections can be in a number of different states: looking up DNS, connecting to host, sending request, and receiving response. These factors make the crawler a complex component of the system. It uses asynchronous IO to manage events, and a number of queues to move page fetches from state to state. It turns out that running a crawler which connects to more than half a million servers, and generates tens of millions of log entries generates a fair amount of email and phone calls. Because of the vast number of people coming on line, there are always those who do not know what a crawler is, because this is the first one they have seen. Almost daily, we receive an email something like, "Wow, you looked at a lot of pages from my web site. How did you like it?" There are also some people who do not know about the robots exclusion protocol, and think their page should be protected from indexing by a statement like, "This page is copyrighted and should not be indexed", which needless to say is difficult for web crawlers to understand. Also, because of the huge amount of data involved, unexpected things will happen. For example, our system tried to crawl an online game. This resulted in lots of garbage messages in the middle of their game! It turns out this was an easy problem to fix. But this problem had not come up until we had downloaded tens of millions of pages. Because of the immense variation in web pages and servers, it is virtually impossible to test a crawler without running it on large part of the Internet. Invariably, there are hundreds of obscure problems which may only occur on one page out of the whole web and cause the crawler to crash, or worse, cause unpredictable or incorrect behavior. Systems which access large parts of the Internet need to be designed to be very robust and carefully tested. Since large complex systems such as crawlers will invariably cause problems, there needs to be significant resources devoted to reading the email and solving these problems as they come up.

4.4 Indexing the Web 



Parsing -- Any parser which is designed to run on the entire Web must handle a huge array of possible errors. These range from typos in HTML tags to kilobytes of zeros in the middle of a tag, non-ASCII characters, HTML tags nested hundreds deep, and a great variety of other errors that challenge anyone's imagination to come up with equally creative ones. For maximum speed, instead of using YACC to generate a CFG parser, we use flex to generate a lexical analyzer which we outfit with its own stack. Developing this parser which runs at a reasonable speed and is very robust involved a fair amount of work. Indexing Documents into Barrels -- After each document is parsed, it is encoded into a number of barrels. Every word is converted into a wordID by using an in-memory hash table -- the lexicon. New additions to the lexicon hash table are logged to a file. Once the words are converted into wordID's, their occurrences in the current document are translated into hit lists and are written into the forward barrels. The main difficulty with parallelization of the indexing phase is that the lexicon needs to be shared. Instead of sharing the lexicon, we took the approach of writing a log of all the extra words that were not in a base lexicon, which we fixed at 14 million words. That way multiple indexers can run in parallel and then the small log file of extra words can be processed by one final indexer. 53




Sorting -- In order to generate the inverted index, the sorter takes each of the forward barrels and sorts it by wordID to produce an inverted barrel for title and anchor hits and a full text inverted barrel. This process happens one barrel at a time, thus requiring little temporary storage. Also, we parallelize the sorting phase to use as many machines as we have simply by running multiple sorters, which can process different buckets at the same time. Since the barrels don't fit into main memory, the sorter further subdivides them into baskets which do fit into memory based on wordID and docID. Then the sorter, loads each basket into memory, sorts it and writes its contents into the short inverted barrel and the full inverted barrel.

4.5 Searching The goal of searching is to provide quality search results efficiently. Many of the large commercial search engines seemed to have made great progress in terms of efficiency. Therefore, we have focused more on quality of search in our research, although we believe our solutions are scalable to commercial volumes with a bit more effort. The google query evaluation process is show in Figure 4.

To put a limit on response time, once a certain number (currently 40,000) of matching documents are found, the searcher automatically goes to step 8 in Figure 4. This means that it is possible that sub-optimal results would be returned. We are currently investigating other ways to solve this problem. In the past, we sorted the hits according to PageRank, which seemed to improve the situation. 4.5.1 The Ranking System

1. Parse the query. 2. Convert words into wordIDs. 3. Seek to the start of the doclist in the short barrel for every word. 4. Scan through the doclists until there is a document that matches all the search terms. 5. Compute the rank of that document for the query. 6. If we are in the short barrels and at the end of any doclist, seek to the start of the doclist in the full barrel for every word and go to step 4. 7. If we are not at the end of any doclist go to step 4.

Google maintains much more information about web documents than typical search engines. Every hitlist includes position, font, and capitalization information. Additionally, we factor in hits from anchor text and the PageRank of the document. Combining all of this information into a rank is difficult. We designed our Sort the documents that have ranking function so that no particular factor can have matched by rank and return too much influence. First, consider the simplest case -the top k. a single word query. In order to rank a document with Figure 4. Google Query Evaluation a single word query, Google looks at that document's hit list for that word. Google considers each hit to be one of several different types (title, anchor, URL, plain text large font, plain text small font, ...), each of which has its own type-weight. The type-weights make up a vector indexed by type. Google counts the number of hits of each type in the hit list. Then every count is converted into a count-weight. Count-weights increase linearly with counts at first but quickly taper off so that more than a certain count will not help. We take the dot product of the vector of countweights with the vector of type-weights to compute an IR score for the document. Finally, the IR score is combined with PageRank to give a final rank to the document.

For a multi-word search, the situation is more complicated. Now multiple hit lists must be scanned through at once so that hits occurring close together in a document are weighted higher than hits 54


occurring far apart. The hits from the multiple hit lists are matched up so that nearby hits are matched together. For every matched set of hits, a proximity is computed. The proximity is based on how far apart the hits are in the document (or anchor) but is classified into 10 different value "bins" ranging from a phrase match to "not even close". Counts are computed not only for every type of hit but for every type and proximity. Every type and proximity pair has a type-prox-weight. The counts are converted into count-weights and we take the dot product of the count-weights and the type-prox-weights to compute an IR score. All of these numbers and matrices can all be displayed with the search results using a special debug mode. These displays have been very helpful in developing the ranking system. 4.5.2 Feedback The ranking function has many parameters like the type-weights and the type-prox-weights. Figuring out the right values for these parameters is something of a black art. In order to do this, we have a user feedback mechanism in the search engine. A trusted user may optionally evaluate all of the results that are returned. This feedback is saved. Then when we modify the ranking function, we can see the impact of this change on all previous searches which were ranked. Although far from perfect, this gives us some idea of how a change in the ranking function affects the search results.

5 Results and Performance Query: bill clinton http://www.whitehouse.gov/ 100.00% (no date) (0K) http://www.whitehouse.gov/ Office of the President 99.67% (Dec 23 1996) (2K) http://www.whitehouse.gov/WH/EOP/OP/html/OP_Home.html Welcome To The White House 99.98% (Nov 09 1997) (5K) http://www.whitehouse.gov/WH/Welcome.html Send Electronic Mail to the President 99.86% (Jul 14 1997) (5K) http://www.whitehouse.gov/WH/Mail/html/Mail_President.html mailto:president@whitehouse.gov 99.98% mailto:President@whitehouse.gov 99.27% The "Unofficial" Bill Clinton 94.06% (Nov 11 1997) (14K) http://zpub.com/un/un-bc.html Bill Clinton Meets The Shrinks 86.27% (Jun 29 1997) (63K) http://zpub.com/un/un-bc9.html President Bill Clinton - The Dark Side 97.27% (Nov 10 1997) (15K) http://www.realchange.org/clinton.htm 55


The most important $3 Bill Clinton measure of a search engine 94.73% (no date) (4K) is the quality of its search http://www.gatewy.net/~tjohnson/clinton1.html results. While a complete Figure 4. Sample Results from Google user evaluation is beyond the scope of this paper, our own experience with Google has shown it to produce better results than the major commercial search engines for most searches. As an example which illustrates the use of PageRank, anchor text, and proximity, Figure 4 shows Google's results for a search on "bill clinton". These results demonstrates some of Google's features. The results are clustered by server. This helps considerably when sifting through result sets. A number of results are from the whitehouse.gov domain which is what one may reasonably expect from such a search. Currently, most major commercial search engines do not return any results from whitehouse.gov, much less the right ones. Notice that there is no title for the first result. This is because it was not crawled. Instead, Google relied on anchor text to determine this was a good answer to the query. Similarly, the fifth result is an email address which, of course, is not crawlable. It is also a result of anchor text.

All of the results are reasonably high quality pages and, at last check, none were broken links. This is largely because they all have high PageRank. The PageRanks are the percentages in red along with bar graphs. Finally, there are no results about a Bill other than Clinton or about a Clinton other than Bill. This is because we place heavy importance on the proximity of word occurrences. Of course a true test of the quality of a search engine would involve an extensive user study or results analysis which we do not have room for here. Instead, we invite the reader to try Google for themselves at http://google.stanford.edu.

5.1 Storage Requirements Aside from search quality, Google is designed to scale cost effectively to the size of the Web as it grows. One aspect of this is to use storage efficiently. Table 1 has a breakdown of some statistics and storage requirements of Google. Due to compression the total size of the repository is about 53 GB, just over one third of the total data it stores. At current disk prices this makes the repository a relatively cheap source of useful data. More importantly, the total of all the data used by the search engine requires a comparable amount of storage, about 55 GB. Furthermore, most queries can be answered using just the short inverted index. With better encoding and compression of the Document Index, a high quality web search engine may fit onto a 7GB drive of a new PC.

Storage Statistics Total Size of Fetched Pages 147.8 GB Compressed Repository

53.5 GB

Short Inverted Index

4.1 GB

Full Inverted Index

37.2 GB

Lexicon

293 MB 56


5.2 System Performance It is important for a search engine to crawl and index efficiently. This way information can be kept up to date and major changes to the system can be tested relatively quickly. For Google, the major operations are Crawling, Indexing, and Sorting. It is difficult to measure how long crawling took overall because disks filled up, name servers crashed, or any number of other problems which stopped the system. In total it took roughly 9 days to download the 26 million pages (including errors). However, once the system was running smoothly, it ran much faster, downloading the last 11 million pages in just 63 hours, averaging just over 4 million pages per day or 48.5 pages per second. We ran the indexer and the crawler simultaneously. The indexer ran just faster than the crawlers. This is largely because we spent just enough time optimizing the indexer so that it would not be a bottleneck. These optimizations included bulk updates to the document index and placement of critical data structures on the local disk. The indexer runs at roughly 54 pages per second. The sorters can be run completely in parallel; using four machines, the whole process of sorting takes about 24 hours.

Temporary Anchor Data (not in total)

6.6 GB

Document Index Incl. Variable Width Data

9.7 GB

Links Database

3.9 GB

Total Without Repository

55.2 GB

Total With Repository

108.7 GB

Web Page Statistics Number of Web Pages Fetched

24 million

Number of Urls Seen

76.5 million

Number of Email Addresses

1.7 million

Number of 404's

1.6 million

Table 1. Statistics

5.3 Search Performance Improving the performance of search was not the major focus of our research up to this point. The current version of Google answers most queries in between 1 and 10 seconds. This time is mostly dominated by disk IO over NFS (since disks are spread over a number of machines). Furthermore, Google does not have any optimizations such as query caching, subindices on common terms, and other common optimizations. We intend to speed up Google considerably through distribution and hardware, software, and algorithmic improvements. Our target is to be able to handle several hundred queries per second. Table 2 has some sample query times from the current version of Google. They are repeated to show the speedups resulting from cached IO.

Initial Query

Query al gore

Same Query Repeated (IO mostly cached)

CPU Total CPU Total Time(s) Time(s) Time(s) Time(s) 0.09

2.13

0.06

0.06

57


6 Conclusions

vice 1.77 president

3.84

1.66

1.80

hard 0.25 4.86 0.20 0.24 Google is designed to be a scalable disks search engine. The primary goal is to provide high quality search results over a search 1.31 9.63 1.16 1.16 rapidly growing World Wide Web. Google engines employs a number of techniques to improve search quality including page Table 2. Search Times rank, anchor text, and proximity information. Furthermore, Google is a complete architecture for gathering web pages, indexing them, and performing search queries over them.

6.1 Future Work A large-scale web search engine is a complex system and much remains to be done. Our immediate goals are to improve search efficiency and to scale to approximately 100 million web pages. Some simple improvements to efficiency include query caching, smart disk allocation, and subindices. Another area which requires much research is updates. We must have smart algorithms to decide what old web pages should be recrawled and what new ones should be crawled. Work toward this goal has been done in [Cho 98]. One promising area of research is using proxy caches to build search databases, since they are demand driven. We are planning to add simple features supported by commercial search engines like boolean operators, negation, and stemming. However, other features are just starting to be explored such as relevance feedback and clustering (Google currently supports a simple hostname based clustering). We also plan to support user context (like the user's location), and result summarization. We are also working to extend the use of link structure and link text. Simple experiments indicate PageRank can be personalized by increasing the weight of a user's home page or bookmarks. As for link text, we are experimenting with using text surrounding links in addition to the link text itself. A Web search engine is a very rich environment for research ideas. We have far too many to list here so we do not expect this Future Work section to become much shorter in the near future.

6.2 High Quality Search The biggest problem facing users of web search engines today is the quality of the results they get back. While the results are often amusing and expand users' horizons, they are often frustrating and consume precious time. For example, the top result for a search for "Bill Clinton" on one of the most popular commercial search engines was the Bill Clinton Joke of the Day: April 14, 1997. Google is designed to provide higher quality search so as the Web continues to grow rapidly, information can be found easily. In order to accomplish this Google makes heavy use of hypertextual information consisting of link structure and link (anchor) text. Google also uses proximity and font information. While evaluation of a search engine is difficult, we have subjectively found that Google returns higher quality search results than current commercial search engines. The analysis of link structure via PageRank allows Google to evaluate the quality of web pages. The use of link text as a description of what the link points to helps the search engine return relevant (and to some degree high quality) results. Finally, the use of proximity information helps increase relevance a great deal for many queries.

58


6.3 Scalable Architecture Aside from the quality of search, Google is designed to scale. It must be efficient in both space and time, and constant factors are very important when dealing with the entire Web. In implementing Google, we have seen bottlenecks in CPU, memory access, memory capacity, disk seeks, disk throughput, disk capacity, and network IO. Google has evolved to overcome a number of these bottlenecks during various operations. Google's major data structures make efficient use of available storage space. Furthermore, the crawling, indexing, and sorting operations are efficient enough to be able to build an index of a substantial portion of the web -- 24 million pages, in less than one week. We expect to be able to build an index of 100 million pages in less than a month.

6.4 A Research Tool In addition to being a high quality search engine, Google is a research tool. The data Google has collected has already resulted in many other papers submitted to conferences and many more on the way. Recent research such as [Abiteboul 97] has shown a number of limitations to queries about the Web that may be answered without having the Web available locally. This means that Google (or a similar system) is not only a valuable research tool but a necessary one for a wide range of applications. We hope Google will be a resource for searchers and researchers all around the world and will spark the next generation of search engine technology.

7 Acknowledgments Scott Hassan and Alan Steremberg have been critical to the development of Google. Their talented contributions are irreplaceable, and the authors owe them much gratitude. We would also like to thank Hector Garcia-Molina, Rajeev Motwani, Jeff Ullman, and Terry Winograd and the whole WebBase group for their support and insightful discussions. Finally we would like to recognize the generous support of our equipment donors IBM, Intel, and Sun and our funders. The research described here was conducted as part of the Stanford Integrated Digital Library Project, supported by the National Science Foundation under Cooperative Agreement IRI-9411306. Funding for this cooperative agreement is also provided by DARPA and NASA, and by Interval Research, and the industrial partners of the Stanford Digital Libraries Project. 

59


Word list Coloration - nudažymas; the appearance of something with regard to colour („Some bacterial structures take on a purple coloration.“) Elastomer - elastinga medžiaga; a natural or synthetic polymer having elastic properties, e.g. rubber („More recently, surrounds have been made from polymer foam, synthetic elastomers, or polymer composites.“) Crowdsource - žmonių apklausa tam tikru klausimu interneto pagalba; obtain (information or input into a particular task or project) by enlisting the services of a number of people, either paid or unpaid, typically via the Internet („She crowdsourced advice on album art and even posted an early version of the song so fans could vote for their favorite chorus.“) Obsolescence – susidevėjimas; the state, process, or condition of being or becoming obsolete („Perhaps all intelligent life has a built-in obsolescence.“) Triage – veiksmų pirmumo parinkimas; the assigning of priority order(„Triage categories to determine their priority.“); Spur companies – paskatinti kompanijas; give an incentive or encouragement to companies(„Weak pension plans spur companies to take out life insurance on their employees.“) Bespeak - užsakyti iš anksto; order or reserve (something) in advance („The defendant’s insurers took steps to bespeak his medical records.“) Orthodontic - dantų breketai; devices used in orthodontics that align and straighten teeth and help to position them(„Orthodontist will put your orthodontic braces on.“) Distortion – iškraipymas; the action of distorting or the state of being distorted(„There is no curvature, thus distortion is eliminated.“) Slurry – suspensija; a thin mixture of an insoluble substance, as cement, clay, or coal, with a liquid, as water or oil(„Some 3d printers are using slurry for printing.“) Sophisticated – complicated, elegant, refined. Iteration – the act of repeating a process with the aim of approaching a desired goal. Grasp – to get hold of mentally, comprehend, understand. Backdrop – the background of an event, setting. Curriculum materials – materials for the aggregate of courses of study given in a school, college, university, etc. Fabric – framework, structure. Underlying – discoverable only by close scrutiny or analysis. Consolidate – to bring together (separate parts) into a single or unified whole, unite, combine. Hurdle – a difficult problem to be overcome, obstacle. Leaderboard – a board displaying the names and current scores of the leading competitors. Achievements – act of achieving, attainment or accomplishment.

60


Hypertext - dokumentas, sudarytas iš tekstų ir piešinių, turi nuorodas arba hipersaitus į kitus dokumentus ar kitas sritis tame pačiame dokumente; data, as text, graphics, or sound, stored in a computer so that a user can move nonsequentially through a link from one object or document to another. World Wide Web – pasaulinis tinklas; is a system of interlinked hypertext documents that are accessed via the Internet. Search Engine – paieškos sistema; is a software system that is designed to search for information on the World Wide Web. PageRank - kiekvieno individualaus puslapio populiarumo įvertinimas; it is an algorithm used by Google Search to rank websites in their search engine results. Anchor Text – matoma nuorodos dalis; it is the visible, clickable text in a hyperlink. Crawling – hiperteksto puslapių nuskaitymas; it is a proccess of systematically browsing web, typically for purposes of web indexing. Web indexing – indeksavimas; refers to various methods for indexing the contents of a website or of the Internet as a whole. CPU – procesorius; central processing unit (CPU) is the hardware within a computer that carries out the instructions of a computer program by performing the basic arithmetical, logical, and input/output operations of the system. Information Retrieval – informacijos gavimas; it is the activity of obtaining information resources relevant to an information need from a collection of information resources. Web server – web serveris; it is a computer system that processes requests via HTTP, the basic network protocol used to distribute information on the World Wide Web.

WORD/ TERM/ COLLOCATION 1. amalgamation p. 88

LITHUANIAN EQUIVALENT susivienijimas

ENGLISH DEFINITION

CONTEXT/ COLLOCATIONS

amalgamation is the process of combining or uniting multiple entities into one form.

That's because they want to focus on the Oculus Rift as a full package rather than as a simple amalgamation of its various components.

2. synergy

sinergija

"The synergy of all the components together is what takes it up a notch."

3. Oculus

Oculus

synergy is the creation of a whole that is greater than the simple sum of its parts. An oculus, plural oculi, from Latin oculus: eye, denotes a circular opening in the centre of a dome or in a wall. In context used.

Used as a product name. What Oculus instead focused on with the Crescent Bay demos it showed off at Oculus Connect was the level of "presence" the Rift can make users feel under optimal conditions and with content designed specifically to be as immersive as possible. 61


noting or pertaining to digital technology or images that deeply involve one's senses and may create an altered mental state: immersive media Accepted as beyond question; undisputed.

4. immersive

svaiginančius

5. undoubtedly

neabejotinai

6. the cream of the crop

Geriausia geriausių.

iš The best of the best.

7. plain

paprastas

8. Crystal Cove

Crystal Cove

someone who may be normal or boring. A brand name

9. hissing

šnypštimas

to hiss - An expression

Being the cream of the crop is being the best of something, it ranges from sports to school, to just being awesome.

of disapproval, contempt, or dissatisfaction conveyed by use of this sound.

10. ascended

pakilti

to ascend - to move, climb, or go upward; mount; rise:

11. presence

būvimas

The state or fact of being present; current existence or occurrence.

12. clucking

kudakavimas

The characteristic sound made by a hen when brooding or calling its chicks.

the Rift can make users feel under optimal conditions and with content designed specifically to be as immersive as possible. with Crescent Bay at Oculus Connect was undoubtedly the most immersive and impressive virtual reality demo ever. Luckey said these demos are the cream of the crop as far as what Oculus has developed

empty room with four plain, grey walls A camera - larger than the one used with Crystal Cove was mounted on the wall, tracking users' positions as they walked around a small, black mat on the ground. And as with the T-Rex's roar, the Crescent Bay Rift's attached headphones technically stereo, but with simulated surround sound made the experience seem all the more real with traffic noises, hissing wind and more. that rumbled and blew cold air at them as they virtually ascended to the top of the show's fictional 800-foothigh Wall. The more points of feedback these demos are able to simulate, the more "presence" users feel, Oculus contends. As the user bends down and moves around to better examine the alien, it does the same to the user, clucking in a strange tongue 62


Exponential Dawn Impediment Insight Boost Precision Implication Failure Far-Fetched Compell Proliferation Proximity Bottleneck Crawl Invariably Retrieval Obsolescence Immersive

63


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.