November 1, 2014
Home
Top 100 sites
Newsroom training
People finders
Beat by beat
Company research
Government info
Nonprofit research
Reference shelf
Search tools
Alerts for journalists
Journalism shoptalk
Fair Lending
Help
  Contact
  Credits
  Site map
  Search
  Suggest a link
  Set as home page
Power Reporting Resources For Journalists

Web searching:
a tutorial on
search strategy
and syntax



1. Search strategy

3. Boolean logic

5. Who's to blame?

2. Rules of the road

4. Search syntax

6. Learn more



"Trying to use the Internet is like driving a car down a narrow road in a snow storm, a car in which the windshield wipers and headlights don't work. All of the signs along the highway are backwards and upside down and of no help at all. Finally when you see someone along the side of the road and stop for directions, they can only speak to you stuttering in Albanian."

Mike Royko, One More Time: The Best of Mike Royko



Intro:

What's the best search engine? The one you learn to use well.

Search engines can help you find information on the World Wide Web, but you'll get more chaff than wheat unless you learn general search strategies and the particular search syntax for your favorite search engine.

Many people think that the value of a search engine is this: Put in a little information, and you get a lot in return.

But the best searching is based on this principle: Put in a lot, and you might just get back what you want.

The following tips use
AltaVista Advanced as an example, because it has a great deal of flexibility in phrasing searches. You can now do many, but not all, of these tricks at Google, especially if you use its Google Advanced page to employ and learn its advanced syntax. To learn the lingo for your favorite search tool, look for "Help" or "search tips" on its main page.











Strategy:

(return to top)

Ten strategies for better Web searching:

  • 1. Be aware of what you're missing: Often more authoritative information is found off the Web, in books or journals or newspapers. On the Web, you may find more useful information in commercial services or government sites, such as those listed at Power Reporting. And much of what's on the Web is not found by search engines, as is made plain by Gary Price's Direct Search. Instead of searching, you may do better to guess the Internet domain name you want -- harvard.edu or insinkerator.com or whitehouse.gov or army.mil or redcross.org -- if you know what organization is likely to have what you want, and you think you'll find it by going through the front door. Your news librarian or researcher is the best guide to finding the research tool with the right balance of accuracy, cost and timeliness.

  • 2. Try a directory: Yahoo is not a search engine. It's a directory. Directories list Web sites, as opposed to pages, sorted by category. Directories include only the titles and descriptions of Web sites, not the text of their Web pages. So they have less info, but may be a better place to start. Examples include Yahoo and Google's implementation of the free, public Open Directory. Have you noticed that link at the top of Google to "Directory"? That's it. Directories are made by people, not compiled by computers, so they're sometimes quicker to follow the news, and you get less chaff. They are organized in categories, like a library, encouraging the serendipity of finding something great next to the one you were looking for. (There is no "next to" in a search engine, there being no shelves.) So use a directory if you're in the journalistic situation of wanting starting, basic information about a common subject, and you're likely to find that information by going in the front door of the site. But use a search engine if you already have formulated a question, and want to find Web pages using the words in your question or that you anticipate will be in the answer. Notice that even Yahoo has an Advanced Search page to look through its half million listed sites. But if you want to find a more obscure bit of information, or any references to a word, then use a search engine, which looks at the text, word by word, of millions of Web pages.

  • 3. Envision the result: Forget keywords when using a search engine. Remember that you're searching the complete text. Don't search for words about the subject. Search for words that will be on the page. Ask yourself, "If I were making the perfect page, just what I'm looking for, what words would I have to use to do it? In a directory you'd search for words about the subject, because that's all that a directory knows. But in a search engine, search for words that must be in the text of the page, because that's all that a search engine knows. If you want a list of children who have lived in the White House, don't think presidential progeny. Think chelsea and clinton and amy and carter and tricia and nixon and "white house." (We'll get to quotation marks below.) If you want a list of Super Bowl sites for the next three years, don't think "super bowl locations." Think 2001 and 2002 and 2003 and "super bowl." Under the theory that any good listing of future sites will also show where the games have been, you might add and 1983 and 1972 and pasadena and "new orleans". Try it! It helps to write down, or at least to imagine writing, the perfect page.

  • 4. Think of it as a zoom lens: The best possible first search will yield no results. Why? Because it's easier to back out from zero than to zoom in from 187,000. What's the first thing you do with a zoom lens on a camera? You zoom in all the way, then pull back until the subject fills the frame. Of course, the most successful search is ultimately one that finds what you're looking for. But first go for broke, then remove one restriction at a time to pull back.

  • 5. Search for more to get less: Add more restrictions at first. You'll get less, because few items will meet all of those conditions. Less is better. For example, if you want a list of Paul Newman movies, "paul newman" and movies will bring in a lot of stuff you don't need. But "paul newman" and sting and "butch cassidy" and "cool hand luke" and hudsucker and hustler will zoom in. The more arcane an item you can think of, the less you'll get. So use an old one, and the newest one you know, and the most arcane, if you're trying to find a complete list.

  • 6. Tinker: Expect to search several times. This is the first thing you'd notice by watching over the shoulder of professional news librarians: They don't search once, perfectly. They search over and over, circling the prey. Remove one restriction at a time. (That's the essence of troubleshooting.) Try being more restrictive here, and less there. Then try it the other way around. There is no perfect search, but often there is a pretty good combination of searches. For example, title:"jesse ventura" would insist that "jesse ventura" be in the title of the page. That's a good first step. But if you get nothing in return, you merely have to use your mouse to delete title: and then "search" again to try it without that restriction. Now you still want the governor, but anywhere on the page, not just in the title.

  • 7. Don't scroll: There's no crying in baseball, and there's no scrolling on the Web. If you turn up hundreds or thousands of hits, scrolling through them is unlikely to find the one you want. Better to refine your search. And when you choose a page to look at, remember that you can use your browser's "Find" command to quickly find any instance of your search terms on that one page, just as in a word processor. (Often it's under the "Edit" menu.)

  • 8. Use only what you're sure of: If you want to find a list of states, showing whether each state has the death penalty, you can't assume that the maker of the page will use the words "death penalty." It could be "capital punishment." So use only the word you are sure of, or hedge your bets. So try ("death penalty" or "capital punishment") and texas and missouri and georgia.

  • 9. Use anything you know: You can assume that every state name is on the list of states with or without the death penalty, or it's not the page you're looking for. So think ahead. Include in your search not only states with the death penalty, but states without. Try ("death penalty" or "capital punishment") and texas and minnesota and "new york" and "north dakota." It may seem onerous to type in so many state names, but it's more reliable, and quicker, than scrolling through too many hits.

  • 10. Use what you learn: As you search, whenever you see a fact in the results list or on a page, use it. Go back to the search engine and throw that fact onto the search. The more arcane the item, the more you'll narrow to pages that are just on that subject. For example, if you want a list of Robert Penn Warren's works, and all you know is the most famous one, start with "robert penn warren" and "all the king's men." You'll learn that he also wrote "Blackberry Winter" and "Band of Angels." Instead of reading more Web pages about "All the King's Men," go back: Add and "blackberry winter" and "band of angels" to the search, getting closer to only those pages that list all of his works.


Rules of the road:

(return to top)

For advanced searching on AltaVista, we'll follow these rules of the road, which are similar to those used by other search engines:

  • Use advanced search: First, make sure you're using the AltaVista Advanced search form. You can always get there from altavista.com -- just choose "advanced."

  • Put phrases or sentences in double quotes: Most search engines require this. As in "capital punishment" or "chicago tribune." Why quotes? Because otherwise AltaVista is expecting an and or an or between the words. You don't need to put every item you search for in quotes, just those items that are more than one word. Without the quotes, AltaVista will just laugh at you. With the quotes, you can find documents or songs or speeches if you know literal phrases from their text. To find the words to Joni Mitchell's "A Case of You," search for mitchell and "I could drink a case of you." This type of search is literal, so you have to get the punctuation just right (being careful of commas, hyphens, etc.) to find just what you want. If you're not sure of the punctuation, just include shorter phrases in quotes, separated by and; or else try both variants separated by or. Don't be afraid to go for broke with a long phrase or sentence.

  • Use lowercase letters: Typing in all lowercase letters will find any instance of the words or phrases, regardless of capitalization. Typing in caps and lower case, or all caps, will look only for those instances.

  • Type in the big box: At AltaVista Advanced, we'll type our commands in the "Boolean expression" box -- the large box. Why? The advanced page lets you use Boolean terms: and, or, and not, near. The advanced page knows the advanced search syntax: title, link, etc. And the advanced page lets you search only pages with certain dates.











Boolean logic:

(return to top)

There are four Boolean operators, or connectors: and, or, not, and near.

These four help you include or exclude pages from your search. And they're required when you type into AltaVista Advanced's Boolean search box. You can't just type army bases georgia to find pages on bases in Georgia. You have to type army and bases and georgia. Some search engines, but not AltaVista, require the connectors to be typed in ALL CAPS.





Imagine two sets of Web pages: those that refer to Mark McGwire and those that refer to Sammy Sosa. Of course, the pages would overlap, but not entirely. By the way, why is the McGwire circle bigger?

mcgwire, sosa



And narrows your search to fewer items by insisting that more conditions be met. So you retrieve only those documents containing both words or phrases. This yields the intersection of the two circles:

mcgwire and sosamcgwire and sosa






Or is the most dangerous word in searching. It widens your search by allowing either item (or one of several possibilities) to be in the text. Both words might be in the text, but not necessarily. So you use or when you're not sure how a word or phrase might be expressed. Or is also good for hedging your bet when you don't know the spelling of a word (feiger or fieger), or when you don't know if a word will be abbreviated (mississippi or miss.) Using or retrieves all of both circles:

mcgwire or sosamcgwire or sosa

And not narrows the search by leaving out a subset of items, such as mcgwire and not sosa. (Note that tricky part: Some search engines will let you use not, but on AltaVista it's and not. Think of it this way: mcgwire is an item, and not sosa is an item, so if you want both you connect them with an and. (Yes, this one makes my head hurt.)

mcgwire and not sosamcgwire and not sosa

sosa and not mcgwiresosa and not mcgwire

Parentheses, as you've noticed, can help you and AltaVista keep straight the Boolean logic. For example, AltaVista evaluates the operators in this order: near, not, and, or. But look at the example mcgwire or sosa and maris. See the problem that arises when the computer looks for an "and" before it looks for an "or." It's not "sosa and maris" that you want emphasized; instead, "mcgwire or sosa" is the first decision you want made. To put it another way, you aren't saying that you want "sosa and maris," but if that can't be found you'll just take any reference to mcgwire, right? You want either of the current ballplayers ("mcgwire or sosa") on the same page with the old player (maris). So sort it out with parentheses: (mcgwire or sosa) and maris.

(mcgwire or sosa) and maris(mcgwire or sosa) and maris

Near means within 10 words in either direction. (Most search engines don't have this ability.) Near provides a middle ground between finding hillary and clinton on the one hand, which would give you every junior high school graduation list in America, and "hillary clinton," which would be too restrictive and miss references to "Hillary Rodham Clinton" or "President Clinton and his wife, Hillary." So try hillary near clinton to find those words within 10 words of each other, in either direction. Of course, you'll get some false positives, but not too many to look through. Near is also the trick to use when you want results with or without a middle initial in a name. It's also good for narrowing your search by subject; for example, if you need a restaurant by the Spanish Steps in Rome, try something like "spanish steps" near (restaurant or lunch).






Search syntax:

(return to top)

The power of AltaVista comes out when you use these commands to zoom in or out. Note that the AltaVista syntax requires the command word and a colon and the item you're searching for, with NO SPACE just after the colon. As in, title:"jesse ventura."

  • title: This is the quickest way to narrow your search. You can use title: to search for Web pages by the words that the page designer used to name that page. The title is not the same as the headline, though it may contain the same words. Look up right now in the title bar of your Web browser -- the title of this page is "Web searching ... a Power Reporting tutorial." The lingo is, title:"james thurber.". Note that you need to repeat the title: if you want more than one item in the title. So, for example, it's title:draft and title:"declaration of independence" to find Jefferson's rough draft.

  • date: This isn't a command you type out, but you can set dates on the AltaVista Advanced search screen; look for a box that lets you fill in a start date and end date to limit the results. Note, however, that the date is when the page was updated in any way, not necessarily when the content changed.

  • asterisk: The *, known as a wildcard character, is useful in any search to allow for different endings of a word. Widen your search slightly by allowing for variations in spelling or word form. The asterisk stands for any number of characters, from 0 to as many as it happens to find. As in spous* to find spouse, spouses or spousal. Or if you can't remember how to spell a name: gorb* will find "Gorbachev." Notice: no space before the asterisk. This also works with phrases, so "spou* abuse" returns "spouse abuse" or "spousal abuse." Without the asterisk, AltaVista will not automatically search for plurals of any word, so you might use it on nearly every word in a search, to catch the plurals. The search gods call the use of an asterisk "truncation."

  • domain: You can use domain: to limit your search to certain corners of the Internet, such as only U.S. government sites (domain:gov) or U.S. military (domain:mil) or higher ed (domain:edu) or a country (domain:fr for France, etc.). And you can leave out a section of the Web by adding to your search and not domain:edu etc.

  • host: Even more specific than domain, you can use the host: command to include or exclude pages on a certain Web site. For example, to find pages about Tom Brokaw on the NBC site, search for brokaw and host:nbc.com. To find pages on Tom Brokaw that aren't on the NBC site, try brokaw and not host:nbc.com. You can use this command to point to pages on only one subsection of a Web site. Pages on the Medill School of Journalism page at Northwestern University are found by host:medill.nwu.edu.

  • image: Handy for finding photos of a place or person. The image: command looks in the file names of the images on the Web page. Try image:nader and "ralph nader" to find only pages that have his name on the page, and his photo as well. (Or at least images that are called nader.gif or nader.jpg or ralphnader.gif, etc.)

  • link: An MCI friends or enemies list. The link: command finds pages with hypertext links that contain the word or phrase. After the shootings in the Midwestern U.S. of people of color, by a man connected to the World Church of the Creator, we wanted to find Web sites that link to the church's site. Use link:creator.org to find other groups or individuals who linked their Web pages to creator.org, no matter what the link might be called on the page. (This tool is a favorite of Webmasters, who use it to find out who is linking to their sites.) To narrow further to such sites that are at colleges and uniersities, where the group recruits, try link:creator.org and domain:edu.

  • url: An arcane but sometimes useful bit of syntax, url: finds only Web pages with a certain word or string of characters in their Web address, or Uniform Resource Locator. For example, as a last ditch effort to find Web pages referring to the Montana Freemen, try url:freemen. That would find pages called freemen.html or pages in a subdirectory called freemen, etc.

  • anchor: Also arcane, anchor: searches for a word or phrase that appears on a page as the label for a hypertext link. It doesn't look at the Web address -- for that you use link: -- but in the label. So anchor:"power reporting" would find this page, because that phrase appears in this link: go here for Power Reporting.

  • text: Still more arcane -- and sort of the reverse of "link:" -- text: insists that the word or phrase appear in the visible text of a Web page, not in any link or keyword or image file name, or other bit of HTML lingo.

  • like: Finds pages with similar content, based on the words on the pages. So like:powerreporting.com finds other sites for journalists -- maybe.




Who's to blame for all this?

(return to top)

Boolean logic comes from George Boole, a British mathematician, who laid out the rules in 1849 on his own Web page, "An Investigation of the Laws of Thought, on Which are Founded the Mathematical Theories of Logic and Probability." Boolean logic merged logic and algebra, laying the foundation for the digital revolution.

George had this to say, "No matter how correct a mathematical theorem may appear to be, one ought never to be satisfied that there was not something imperfect about it until it also gives the impression of being beautiful."

Today's quiz: Find a photo on the Web of George Boole. How about "george boole" and "boolean logic" and image:boole. You'll find this:

George BooleGeorge Boole, 1815-1864


Learn more:

(return to top)

  • This Search Tools Chart compares the various search engines and directories. If you want a handy exercise, try the same search at each search tool.

  • Take the Web searching treasure hunt from Power Reporting.


You can reach Bill Dedman by e-mail at Bill@PowerReporting.com.



COPYRIGHT 1997-2005 Bill Dedman