qwertyuiopasdfghjklzxcvbnmqw ertyuiopasdfghjklzxcvbnmqwert yuiopasdfghjklzxcvbnmqwertyui opasdfghjklzxcvbnmqwertyuiopa sdfghjklzxcvbnmqwertyuiopasdf ghjklzxcvbnmqwertyuiopasdfghj Searching the World Wide Web klzxcvbnmqwertyuiopasdfghjklz xcvbnmqwertyuiopasdfghjklzxcv bnmqwertyuiopasdfghjklzxcvbn mqwertyuiopasdfghjklzxcvbnmq wertyuiopasdfghjklzxcvbnmqwe rtyuiopasdfghjklzxcvbnmqwerty uiopasdfghjklzxcvbnmqwertyuio pasdfghjklzxcvbnmqwertyuiopas dfghjklzxcvbnmqwertyuiopasdfg hjklzxcvbnmqwertyuiopasdfghjk lzxcvbnmrtyuiopasdfghjklzxcvbn Brought to you by your librarian Ms. B.
What we call the internet is really an enormous number of computers networked throughout the world via data lines or wireless routers. Every day, someone adds new computers and web sites. But unlike your local library where an organizational system exists to document and catalog them all, the World Wide Web has no such structure. And the internet is always growing and changing which makes it difficult to navigate or search it in its entirety. While search engines are not the only way to find material on the Web, knowing a few strategies can make finding clear and accurate information easier. In this guide, we present some basic information about how to search the Web, along with a brief explanation of search engine mechanics.
What’s a search engine and how does it work? When you look for information on a specific topic, service or product, you use an internet search engine. Today there are a number of search engines (like Google, Bing and Ask) and while they work differently, they all use webcrawlers (also called bots). These crawlers work in much the same way as you might if you were indexing a book: they index pages on the Web by finding specific words. That automatic indexing is what allows you to enter a word or combinations of words in a search box and then find web sites containing the information. Some search engines look only through page titles and headers. Others like Google, look through electronic documents such as .PDFs and even media files (like .WAV, .AVI, etc.). Think of search engines as large databases of information that store and retrieve relevant web site results based on keywords. No search engine has the same sites listed in the same order, and none returns all the possible sites on the internet. Furthermore, the ranking of a site within a search engine (i.e. how high on the results list it appears) does not always reflect the quality of the site’s information. Many factors determine search engine rankings including the:
amount of information on the site, the number of other sites that link to it, number of people who select a particular link, length of time the site has been listed in the search engine database, way the site is coded.
More and more search engines are increasing the number of web sites they index. For example, in 1994, Google indexed approximately 20 million pages. As of 2004, that number
1
Web Searching – Updated 2011
was up to 8 billion!1 But search engines index only a fraction of what is available on the Web and not all of it is up-to-date. Search engines may crawl (or visit) sites once a month but the engines won’t “see” any changes after their crawl. Because of the changing nature of web pages, some information appears and then disappears from sites. In addition, search engines don't always search an entire page. Try several search engines (e.g., Google, Bing, or Ask), and you’ll notice that you get different results from each. If a page is larger than 500k, many search engines will only index the first 100 to 500k of the page; you could be missing valuable information. You may have also noticed that search engines like Google and Yahoo provide "sponsored links;” links that appear on the first few pages of your search results. Advertisers pay for these links, which doesn’t make them useless, but proceed with caution; you might click something that is irrelevant to your search.
What does this mean for you? Understanding the nature of the internet, how to navigate it, and how it is organized can help you filter quality information and sites from those that are irrelevant or are of questionable quality.
Web Directories Another way to search is with a web directory. Web directories use lists composed by human editors (rather than automated indexing performed by bots) in an attempt to 1
The Writing Lab & The OWL at Purdue and Purdue University. (1995-2010). Searching the World Wide Web: Overview. Retrieved from http://owl.english.purdue.edu/owl/resource/558/1/
Web Searching – Updated 9/2011
2
organize the best of existing web sites into categories and subcategories. Today most search engines offer complementary search-related products such as shopping search, news and other services that go beyond the basic keyword search function. Web directories are good for broad searches of established sites. For example, the University of California hosts a Web directory called Infomine (http://infomine.ucr.edu/). It divides information into 8 different categories and while you might not need to use this resource until high school or college, it’s worth investigating. Let’s say you are looking for information on global warming but are not sure how to phrase a potential topic on holes in the ozone. You could try browsing through another directory called Open Directory Project (http://www.dmoz.org). The Open Directory Project gives you 16 categories. One of them is called Science. Click Science to see a number of subcategories, including one called Environment. Environment has over twenty subcategories listed. One of those subcategories is Global Change which includes the Ozone Layer category. The Ozone Layer category has over twenty-five references, including a FAQ site. These references can help you figure out what terms to use for a more focused search.
Metasearch engines Metasearch engines search often include smaller, less well-known engines and specialized sites. Metasearch engines are good for doing large searches when you first start your research and are trying to find out as much as you can about a topic. Some well-known metasearch engines include Dogpile, Mamma, and Metacrawler. There are some disadvantages to using these engines. First, most metasearch engines only permit basic search terms, so you can’t really refine your search. Second, many metasearch engines pull from pay-per-click advertisers, so the results you get may be heavily paid advertising and not the most valid results on the Web; in other words, use these engines with caution.
Searching Search engines are good for finding sources for well-defined topics. If you enter a general term such as "education" or "Shakespeare" you get way too many results. But if you narrow your terms you can get the kind (and amount) of information that you need. For example, try this. 1. Point your browser to Google. 2. Enter Education as your search term. 3
Web Searching – Updated 2011
3. Yikes! You get approximately 2,200,000,000 results Let’s try narrowing that search by adding modifiers. What you really want to know about is education in Massachusetts, particularly in urban areas. 4. Enter urban education Massachusetts Now you get 4,010,000 results. 5. Try being even more specific. Put quotes around your search “urban education Massachusetts” Now you get 13,600 results. This is because Google looks for those exact words on every page it searches. 6. Keep adjusting your search based on the number of responses you receive. If you get too few hits, enter more general terms; if you get too many, keep modifying your search. For example, you could look for urban education Massachusetts middle schools.
Notice that we don’t use words like “in” or “at.” That’s because search engines ignore them. For more information, see “Select your terms carefully” later in this handout.
Know your engine Each search engine has different advantages. Google is one of the largest search engines, followed closely by MSN and Yahoo!. These three search engines cover a larger portion of the World Wed Web. The Lycos and Altavista search engines let you search for news. Ask allows you to phrase your search in the form of a question. And almost all engines have an advanced search function. Get in the habit of using Advanced Search and more than one search engine to find the best information. Because each search engine is slightly different, you should also read any instructions or FAQs. For example find information about Google searches from the main Google search page by clicking About Google → Web Search Features and look at the pages in the Web Search Help Center. For example: 1. Point your browser to http://www.google.com/insidesearch/features.html 2. Scroll down to the Research section 3. Find search feature that interests you; e.g., Search for Similar Terms. 4. Use the default (Christmas ~dessert recipes) or enter a term and follow it through to a specific topic that can you research by following the listed links or by using a phrase in a keyword search.
Web Searching – Updated 9/2011
4
Select your terms carefully Using inexact terms or terms that are too general will cause problems. If your terms are too broad or general, the search engine may not process them. Search engines are programmed with various words called stop words. They are called stop words because the search engine doesn't “stop" when it finds them in its index (if they are even indexed at all). Why? Because stop words are too common to generate meaningful results, or are parts of speech like adverbs, conjunctions, prepositions, or forms of the verb “to be” and they mean nothing unless they’re part of a phrase with more “important” nouns and verbs. If you use a stop word in a query you may get wildly irrelevant results. For example, the phrase “searching the web” contains two stop words: the and web. Though not a particularly common word, web is used so frequently on the internet that it’s just about worthless for searching. In the previous example, when you strip out the stop words, you’re searching for the word “searching” which will lead to results describing everything from criminal manhunts to quests for enlightenment and maybe, after your looking through many pages of results, something about searching the web. How can you identify stop words? Some engines let you know when they are ignoring a term; others (such as Altavista) automatically rewrite your query to include a stop word as part of a quoted phrase with other search terms. If you’re interested in finding out more, check “The 300 Most Common Words” (http://www.tooter4kids.com/classroom/Most_Common_Words.htm). Many of these are stop words. If your early searches return too many hits, try using more specific terms. To make your terms more precise, try checking our online library catalog (or any online library catalog). For example, suppose you’re working on a paper about the American Revolution. What you want to know in particular is about women’s role in supporting the emerging nation. You could use the search term “American Revolution” in our collection and quickly locate books. But what about a larger collection? 1. Point your browser to http://destiny (or if you are not at school http://mbl.meadowbrook-ma.org) 2. Click to the Search screen 3. Enter American Revolution as a keyword search. Your search should return 69 results. 4. Scroll down to the 9th entry “Patriots in petticoats: heroines of the American Revolution,” and click the title. 5. In the Explore! Section click United States -- History -- Revolution, 1775-1783 – Women. Destiny returns a list of 2 other books that are cataloged under the same subject heading. 5
Web Searching – Updated 2011
Use this subject heading on a bigger collection. 6. Cut the subject heading (United States -- History -- Revolution, 1775-1783 – Women) 7. Point your browser to http://library.minlib.net/search and paste in the subject heading you copied from Destiny and change your search to Subject Heading. 8. Analyze the results.
Know Boolean operators Most search engines allow you to combine terms with words (called Boolean operators) such as AND, OR, and NOT. Knowing how to use these terms is very important for a successful search. Most search engines allow you to apply the Boolean operators through their advanced search option. However, they are deceptively easy to use. According to Randolph Hock, author of “The Extreme Searcher's Internet Handbook” (CyberAge Books, 2010), search engines implement Boolean features in different ways. For example, while some accept NOT others require AND NOT for the same effect. Additionally, some engines require that Boolean operators be capitalized, while others do not.
OR Use OR to allow any of your search terms to be present on the web pages listed in results. OR should give you a large set of results. For example: ireland OR eire
returns all web pages containing the word Ireland or Eire.
AND Sometimes called a match all search, use AND to indicate that all your search terms must be present on the web pages listed in results. AND should give you a smaller set of results than OR. For example: ireland OR eire
returns all the web pages that contain the words “Ireland” and “Eire.” Some search engines require a plus sign (+) rather than the word AND. Furthermore, for many engines, AND is the default.
NOT Use NOT to require that a particular search term not be present on web pages listed in results. You may also see NOT referred to as an exclude search. ireland NOT eire
Web Searching – Updated 9/2011
6
returns all web pages containing the word “Ireland” but not the word “Eire.” Some search engines require a minus sign (-) rather than the word NOT. Not all search engines support OR and NOT. Check the About page of your search engine for specifics. For information about how three popular search engines stack up, check “Recommended Search Engines” from the University of California Berkeley. (http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/SearchEngines.html).
If you want to try Boolean operators, really learn how to use them. You can find a good beginning tutorial on Youtube at http://www.youtube.com/watch?v=xsSZps3NH-M and at Penn State University (http://www.sgps.psu.edu/foweb/lib/boolean_search/refine.html).
Searching with a Web directory There are two main types of directories: hierarchical (i.e. they lead from a general topic to a more specific one) and those that list sources in some sort of order (most commonly alphabetical). The first type of index often contains a broad range of topics, while the second usually contains sources designed to address a particular topic or concern. The Open Directory Project has a hierarchal directory. This can be helpful to you if you want to get a feel for a topic. 1. 2. 3. 4. 5.
Point your browser to http://www.dmoz.org/ Find a topic in the list that interests you; for example, Science. Click on the next set of links; e.g., Science and Society Follow the next set of links until you arrive at a specific web page. Try the search with another engine using a keyword or phrase similar to the one that got you to the specific page.
Useful sites for finding information If you’re stuck, try one of these sites:
Librarians’ Internet Index (http://www.ipl.org/) Librarian-reviewed web sites and material on a host of different topics. While this site is not exhaustive, you’ll get quality information on a large variety of topics.
7
Web Searching – Updated 2011
About (http://www.about.com) Practical information on a large variety of topics written by trained professionals.
Refdesk (http://www.refdesk.com) Reviews and a search feature for free reference materials online.
Mashpedia (http://www.mashpedia.com/) Online encyclopedia of “LiveDocs,” which are dynamic web documents displaying blocks of content related to the given topic, retrieved from multiple sources across the Internet in real-time.
RefSeek (http://www.RefSeek.com) For students and researchers that aims to make academic information easily accessible to everyone. RefSeek searches more than one billion documents, including web pages, books, encyclopedias, journals, and newspapers.
WayBack Machine (http://www.archive.org/web/web.php) Search the Web the way it was, including texts, audio, moving images, and software, as well as archived web pages.
Other strategies Expand your searching beyond search engines . Be aware that approximately 99% of content on the Internet doesn’t show up on typical search engines, so think about other ways of searching. B e creative and think about what you have access to that might have the information you are looking for. For example, if you’re looking for:
Information likely to be discussed on newsgroups or blogs, check sites like sites like Delicious) (http://www.delicious.com/); Technorati (http://technorati.com/) and digg (www.digg.com) Information about current topics, try using the advanced search with the engine of your choice. Most search engines allow for news and article searches. Data that might be on a government site, try starting with the Library of Congress (www.loc.gov) or The White House (http://www.whitehouse.gov/). If the data you’re looking for concerns a state or foreign country, try looking for a specific web site for that political entity.
Some other strategies you can try:
Search for databases. Using any search engine, enter your keyword alongside "database" to find any searchable databases (for example, "running database" or "woodworking database").
Web Searching – Updated 9/2011
8
Get a library card. Many public libraries offer access to research databases for users with an active library card. Stay informed. Reading blogs or other updated guides about Internet searches on a regular basis will ensure you are staying updated with the latest information on Internet searches. Practice. Just like with other types of research, the more you practice searching the web, the better you will become at it. Don’t give up. Most researchers will tell you that there’s great information on Web, you just have to have the skills and patience to find it.
In summary The World Wide Web is a great resource, but it doesn’t contain all the information that you can find at the library or through library online resources. It’s good practice to widen your searches to include print resources and library databases. Be sure to try more than one search engine and do try to use the clearest terms to frame your search.
9
Web Searching – Updated 2011