Hacking with Search Engines – Google and Beyond Joshua Davis CISSP, CISA, CISM Joshuad at qualcomm dot com
Outline Acknowledgements Introduction Google search basics Google operators Real world examples What can you do to protect your interests?
Acknowledgements The best source for Google-based hacking is Johnny Long, who runs http://johnny.ihackstuff.com and has a book about it called “Google Hacking” This presentation is using techniques and examples found in Johnny’s book and George Kurtz’s RSA 2006 presentation
Introduction Search engines are designed to collect and index data in a way that makes it easy for user to locate. It uses various methods for prioritizing search results. Possible questionableHeather uses for search engines: Mills McCartney Invasion of privacy Competitive intelligence Trolling sites for information that can be used to attack information systems
NOTE that these slides are very busy because there’s so much information to convey in a small amount of time. Ask me if you have questions about something.
Random but related points Learning this stuff actually makes you a much more effective user of search engines Google is by far the best engine to focus on because it provides the most functionality
Google search basics Web search News search Image search Translation engine Multi-language interface including “Hacker” and “Elmer Fudd” ☺ Turn off SafeSearch Search for results in any language
Google search basics cont. Queries are not case sensitive Google wildcards != reg ex wildcards (*) = a single word in a search phrase (.) = a single character in a search phrase Using * within a word will not alter output versus the word itself
Google automatically stems diet=dietary=dietician
Google ignores certain words Common ones are who, where, what, the, an, in, a Quoted phrases are searched for Add (+) before words to force searching. Example “+and”. Add (-) before words to force removal of results.
Don’t expect first search results to be what you want
Google search basics cont. Ten word limit. (*)s not counted as words “+the * *” produces 15.3 billion results. #1 result is The Onion.
Boolean operators AND is useless NOT = (-) with no space. Example: hacking –phlegm OR = (|) between two or more words
Google allows submission of hex’d URLs http://www.google.com/search?hl=en&lr=&q=%22joshua+davis %22+san+diego+issa&btnG=Search
In general, you must start with a base search then use result reduction techniques
Google Operators Google offers operators to help you perform detailed queries! There are many of them. The ones we’ll focus on today are: filetype (and –ext) site intitle, allintitle inurl, allinurl allintext phonebook
Format us operator:search_term. No spaces. Some operators must be paired with search terms. Some search operators may have Boolean operators applied. Example search: “confidential site:qualcomm.com –site:www.qualcomm.com”
Google Operators cont. Intitle, allintitle: Search within the HTML title of a page
Google Operators cont. Inurl, allinurl: Finding text within a URL address
Google Operators cont. Site: Limit search to specific domains
Google Operators cont. Filetype: Search for files of a specific type Think txt, pdf, ps, rtf, xls, doc, mdb, csv, asp, php, cgi, jsp, cfm, shtml, html, etc.
Google Operators cont. Allintext: Search for a string within the text of a page Think “find this string anywhere except the page title, URL, or links�.
Google Operators cont. Phonebook: Search for a phone listing Limit results using rphonebook and bphonebook
Other Google operators of note Link: Search for links to a page Search for servers that shouldn’t be linked to.
Inachor: Locate text within link text Search text representation, not actual url link
Cache: Show the cached version of a page Leverage for anonymity
Numrange: Search for a number Ignores currency and commas
Other Google operators of note For newsgroups Author: Search for particular author Group: Search group titles Insubject: Search group subject lines
Hacking with Google You are rarely going to find an obvious issue, but you will find valuable nuggets Be creative Look for servers with known vulnerabilities Locate install of certain applications Look for remote access portals to grind against Look for system configuration files and file types Look for error messages Search for email addresses
Hacking with Google cont. Map all the web servers in a domain Map internal and proxy IP addresses by searching news posts Prioritize your attack techniques based on number of search results Target web-enabled network devices Locate targets by source code Look for web-cams, printers, and VOIP phones
Top 10 searches you should perform on your own domains intitle:index.of error | warning login | logon username | userid | employee.ID password | passphrase | passcode| ”your password is” admin | administrator -ext:html –ext:htm –ext:shtml –ext:asp –ext:php inurl:temp | inurl:tmp | inurl:backup | inurl:bak | inurl:log intranet | helpdesk confidential | restricted | “not for distribution” Don’t forget to search in multiple languages if applicable!
Automated search tools Gooscan Linux-based tool that enables bulk searches Have to sign up for Google account with limited number of API-based searches
Athena Doesn’t use APIs so is violation of Google TOS http://snakeoillabs.com
SiteDigger http://www.foundstone.com/resources/proddesc/sitedigger.htm Similar to Gooscan, but runs on Windows.
Wikto Get help from Google at http://services.google.com:8882/urlconsole/controller?cmd=reload&l astcmd=login
Beyond Google Maps Google Phone look-ups Desktop search and across-computer searches Revamped MSN search Revamped Yahoo! Search Dedicated hacker engines
Conclusion If you can’t 100% control your web content, scout your own domains before someone else does Take steps to protect privacy