In this document, you can learn what type of information is available on the Internet, how it has been made searchable through web databases, the role that AskScott plays in helping you select the proper database, and how you can get better results on the search you are doing.

Section 1 - Information accessible through the Web

The World Wide Web is the single most extensive information source today. It has millions of authors from almost every country. Over the few years, the WWW brought together easy access to web pages and other information sources like Gopher, FTP, Usenet news, and Telnet. The interesting thing about this is that, for the most part, the information is free once you have access to the Internet.

Why is it all free? Where did it come from?

This data is placed on the web and funded for different reasons. There are five main types of web pages.

  1. Educational institutions - Universities and other educational institutions place information on the web to "publish" it. They may be trying to attract others to their place of learning, but most likely the research was funded through a grant, and it is one more place to encourage the sharing of ideas.
  2. Non-profit organizations - These groups may be attempting to raise awareness about their group and carry out their mission through WWW publishing. Many such groups publish and distribute information, and the web allows a very inexpensive way of getting information to many people.
  3. Governmental organizations - The U.S. Government, for example, has a considerable amount of information on-line. This information, which used to be published on paper, is now put online to save money and reach more people. In addition, libraries fall into this category, providing information for others to use.
  4. Personal Home Pages - These pages are put up for a number of reasons. One is to "publish" your thoughts and ideas in a way that doesn't risk large financial losses. Some people have pages up in order to meet others with similar interests. A large portion of personal pages is information on a topic that a person really enjoys and wants to share their expertise (whether that be on a TV show, a game, or on web databases). The WWW has been compared to the printing press - it opens up publishing to a new portion of the population.
  5. Commercial pages - These pages are on the web to make money. Most of the time, they provide a service of some type. Most of the web databases talked about here are provided by services who make money off of the advertisements that appear on the screen. Other businesses use the web as an advertising medium, just providing information about their products and services.

Can I trust the information?

Not always. As you can see, there is no filter between author and the published work. This editing position which exists in print and performs a level of quality control is non-existent for most WWW pages. It is important to learn who created the information so that you know the reliability of the data. (Of course, printed information has the same problem. However, the lower cost of publishing on the web invites more problems in this area.)

While not being something to worry about, it is something to keep in mind.

Section 2 - Web databases

The Internet was created in order to survive war by having no centralized computer. Therefore, there is no centralized database like in a library. This need of an organized tool to search this mass of changing information is met by several "web databases".

How are they created?

The majority of these databases are created and maintained by computer programs called "robots" or "spiders". These programs search the web for new pages and record information about the page. Some databases save every word on a page - these are known as "full-text" databases. Other web databases remove common words (or words on a "stop list") in order to save disk space. Other spiders just examine the page for the "most important" words (known as keywords).
Some of the databases are created by humans, however. While a spider may find new pages, a person actually examines the page and writes a review, selects words from a list (known as a "controlled vocabulary") and assigns them to a document, and might select important keywords from the document.

How are these databases accessed?

Again, it varies from database to database. Some databases have a "search engine" attached. These request the user to type in a term or terms and then search this database for the words that match the user's request. These allow various levels of sophistication, from those that just search for the words entered to engines that will also look for synonyms of the words that were typed in. The common factor of all search engines is that the user types in the terms desired.
The other main access method is through a subject tree. If words from a controlled vocabulary (a list of predetermined terms) were selected for a document (usually by a human), then the user can search the tree to find works on a topic. If you haven't seen a subject tree, take a look at Yahoo now. (Use your browser's back button to return).

Which kind of web database is the best?

That's a complicated question. Different methods of database creation and database searching are good for different search topics. Studies in the Library and Information sciences have shown that both a search engine and subject tree have their benefits. One of the web databases claims that their database is better because they record every word on the page. However, if you wanted to search for all pages with AMERICAN HISTORY, it would be much easier in a database that either only picked out the most important words or in a subject tree.

Thus, the best web database depends upon your search topic.

Section 3 - AskScott's role in finding information

This is where AskScott comes into play. If you come with a search request, AskScott will lead you through a question and answer session to the database that is the best starting point for your search.

Why does it matter what web database I use?

There are two main reasons:

  1. If the pages you're looking for have not been indexed on the database you are using, you won't find the information.
  2. If the search engine/subject tree approach is not the proper one for your search topic, you won't find the information.

AskScott helps you find the database you need, and gives you advice about its operation.

Once I've found a database I like, why should I come back?

AskScott will be updated as databases grow and change. New databases will be examined to see where in the current hierarchy they fall, and old databases will be reclassified if need be. What you are using today may change tomorrow, and AskScott will help you use the best tool for your search need.

What if I don't find any relevant information?

If you are having trouble with your search, there are two things you can do. You can move to a different database and try your search there, or you can examine your search strategy.

Section 4 - Search strategies

The strategy you use in a search engine is the key to finding information you desire. On each page where a search engine is recommended, hints have been provided to remind a user of the aspects of that search engine that are different from other engines.

What are some general strategies in using search engines?

Almost every web database lists results in a "relevance-ranked" order. This means that the program will list the web pages that best match your request near the top of the list. A side effect of this is that you should enter as many "search terms" as possible. The more search terms you can come up with, the better your search will be.

Each search engine is different, and the advice on the individual page will help you, but an important search term to notice is the AND term (or the + sign, depending on the database). If I enter eggplant AND stew, the program will look for pages that have eggplant and stew on their pages. However, if I enter a "phrase", such as "'eggplant stew'", then the program will look for pages that have those two words in a row (again, check the individual engine to see how to enter a phrase). So, if I wanted to find recipes for stew that uses eggplant as an ingredient, I would enter "eggplant AND stew". If I wanted recipes for eggplant stew, I'd enter the phrase "eggplant stew". Other ways of creating phrases may be with the terms NEAR, ADJ (for adjacent), or W (for with).

Another useful strategy to learn is with the NOT sign (also known as - in some engines). Since the word "POLISH" could be a thing you do to shoes or a thing from Poland, it is important to use a NOT sign when doing searches. Let's say I want to learn about things from Poland - I would enter "polish NOT shoe".

A good thing to know when searching is truncation. Truncation symbols allow you to find similar forms of the same words. For example, if you were looking for information on discipline. If you searched for "discipline", you'd only find works with that term. However, if you search for "disciplin$" (assuming that $ was the truncation symbol in this database), you'd find discipline, disciplinary, disciplinarian, and any other words that started with 'disciplin'.

A similar tool to truncation is a wildcard. These help you to identify other forms on a word in a different way. If I was looking for works on females, I might search on the term "wom*n" (assuming that * was the wildcard symbol). That would find the words woman, women, womyn, and any other words that started with "wom" and end with "n".

How come there is so much junk and so few useful sites?

This is a case of either not entering enough search terms or entering terms with multiple meanings. Either come up with synonyms for the terms you are searching or use the NOT command to make your search more precise. In general, the more search terms you can come up with, the faster you'll find relevant pages.

I've got too much information. How can I narrow it down?

If this happens, you need to limit your search. Think about what you are REALLY looking for, and enter terms having to do with more unusual aspects. You can also try using a "phrase" instead of an AND for your search (see the above section for details).

Section 5 - AskScott's creation

Why is it called AskScott?

While I could try to come up with some goofy acronym (like Search-engine, Client-Oriented Tool Thingy), I won't. :) It's called AskScott because it simulates a Q&A session with a person - and as it's a little bit of my thoughts...not to mention it satisfies my ego. ;) So, if you want to know a little bit of my thoughts about the search engines, just Ask Scott!

Who is behind AskScott?

Dr. Scott Nicholson, Assistant Professor and Bibliominer (bibliomining is data mining for libraries) at your service. I'm working with a team of Syracuse University students and alumni to keep AskScott up to date. It's hosted at the Information Institute of Syracuse.

Where does the information in AskScott originate?

Most of it comes from the Help and FAQ screens of the web databases. Some comes from literature, some from original research, and some just comes from personal experience with the web databases. AskScott is a collection point for this scattered information.

