PubSub CLOSE WINDOW

To match newly discovered documents to a user's subscriptions, the PubSub Matching Engine stores a search query (also known in some circles as a selector) for each subscription. This page explains how to customize your search query to give you exactly the results you want.

What is a Query

A query is a logical sentence that consists of elements joined by special symbols called Boolean operators. An "element" can be a keyword, a phrase, or several of these grouped together with parentheses. There are also special elements called attributes that offer additional ways to filter subscription results. All of this will be explained below.

What is the difference between keywords and phrases?

A "keyword" is a single word, with a blank space on either side. A "phrase" is any text placed between double quotes.

For example, the expression:

dogs great danes

is three keywords, whereas

dogs "great danes"

is the keyword dogs followed by the phrase great danes. This is an important distinction, as we will see in a moment.

The Matching Engine ignores the case of keywords and phrases. In other words, the keyword Dachshund is exactly equivalent to the keyword dachshund.

Boolean Operators

Boolean operators tell the Matching Engine how to combine the query's elements to determine whether a particular document matches your subscription. There are only three Boolean operators: AND, OR, and NOT. The Matching Engine uses them like this:

OR

If two or more elements are joined by the operator OR, the Matching Engine will return all documents that contain any of the elements. For example, the query

dogs OR "great danes"

will return any items that contain either the word dogs , the phrase great danes , or both. Just to be clear: OR does not mean "only a or only b", but rather "a or b or both".

'OR' is the Default Operator!

One important thing to keep in mind is that if no operator is used to join two elements, the Matching Engine assumes that they are joined by the operator OR . In other words, the query

dogs cats rabbits

will produce exactly the same results as the query

dogs OR cats OR rabbits

The OR operator can also be represented by the"pipe" ( | ).

AND

If two or more elements are joined by the operator AND , the Matching Engine will return only documents that contain all of the elements. For example, the query

dogs AND "great danes" AND collies

will return only items that contain the word dogs , the phrase great danes , and the word collies . Any document that does not contain all three of these elements will not be chosen.

The AND operator can also be represented by the "ampersand" (&) symbol

NOT

The operator NOT before an element tells the Matching Engine to reject any documents that contain the element. For example, the query

dogs NOT "great danes"

will return only those documents that contain the word dogs and that do not contain the phrase great danes .

Combining operators with parentheses

To make precise queries using multiple elements, you can (and should) group and nest elements using parentheses. For example:

(dogs OR collies) AND (NOT "great danes")

will return only those pages that contain either the word dogs or the word collies (or both), but that also do not contain the phrase great danes .

This grouping and nesting can be as complex as you like. To give another, more involved example:

(dogs OR collies) AND (NOT (mastiffs OR chihuahuas))

will return only those pages that contain either the word dogs or the word collies (or both), but that also contain neither the word mastiffs nor the word chihuahuas .

The NOT operator can also be represented by the "exclamation point" (!) symbol.

Further, the Boolean operators ( AND , OR , and NOT ) will not be used as such if they are placed as a phrase (within quotes). For example:

"dogs and cats"

will only return only those pages that have the phrase "dogs and cats".

WITHIN

The operator /within between two elements tells the Matching Engine to match two words that appear within a certain number of words of each other. For example:

"teen /within 4 smoking"

will return only those documents that contain the word teen within 4 words of smoking.

Note: Remember to use quotations around the query. This only works with one keyword on either side of the operator. phrases like "teen smoking" /within 4 Marlboro" will cause false results to appear in your subscription.

Attributes

To allow for even more tightly focused subscriptions, search queries may also contain another type of element, called an attribute. Attributes allow you to filter your subscription according to the content of particular parts or properties of a document, such as its title or where it came from. Here's how they work:

The TITLE, CHANNEL, and GROUP Attributes:

To understand the difference between these attributes it is useful to think of weblogs and newsgroups as online magazines. Each story published in a magazine will have its own title, and of course the magazine itself has a name as well. Using the TITLE , CHANNEL , and GROUP attributes in your query allows you to restrict your results according to these properties of the online documents.

If you want to match only those documents that have a particular keyword or phrase in the individual story's title, use the TITLE attribute. For example, if your query includes

TITLE:microsoft

the system will only return results from weblog and newsgroup postings that have the word microsoft in the story's title. You can use phrases too, e.g.

TITLE:"montreal canadiens"

If you are interested in monitoring weblogs or newsgroups with particular titles, you can use the GROUP (newsgroups) or CHANNEL (weblogs) attribute. For example, if you entered the query

CHANNEL: " new york "

the matching engine will provide results only from weblogs that have new york in their names. Similarly, if you enter the query

GROUP:existentialist

the matching engine will return results only from newsgroups whose names include the word existentialist .

The SOURCE attribute:

The SOURCE attribute is used to limit your results to only those that come from a particular Internet "domain". For example, the query

SOURCE:nytimes.com AND iraq

To be matched to this query, an item must have come from the domain nytimes.com , and it must also contain the keyword iraq .

The URI attribute:

Web pages can contain links to other pages. The URI attribute allows you to filter your results based on what links a page contains. For example, if your query includes

URI:amazon.com

the matching engine will only return pages that hold a hyperlink to amazon.com . This attribute can also be helpful when following a discussion in the blogosphere. For example, if you want to find pages that have linked to an article that appeared on MSNBC entitled "Blog Nice, Everyone" ( http://www.msnbc.msn.com/id/6844492/ ), you can enter

URI: msnbc.msn.com/id/6844492

This works for all file formats -- images, movies, and any other type of media. For example, if you enter

(URI:jpg URI:jpeg URI:gif) AND madonna

any returned pages will not only have a linked image on the page, but will also contain an occurrence of the word madonna . Remember that the three URI attributes inside the parentheses are assumed, as explained above, to be joined by the implicit OR operator, so the matching engine looks to see if any of the links on the page contain the text jpg or jpeg or gif . If one of these conditions is met, the engine then looks to see if the page contains the word madonna . Only if all of these conditions are met will the page be chosen as a match. Be aware, though, that the picture will not necessarily be a picture of Madonna!

Combining Elements, Operators, and Attributes

By using parentheses to make combinations of the tools provided, you can make subscriptions that are as selective or inclusive as you like. To give two final examples, the simple query

football

will return any documents, from any source, that contain the word football , whereas the query

((TITLE:football) AND (SOURCE:nytimes.com)) AND ("denver broncos" OR " miami dolphins")

will return all pages that come from the New York Times, have the word football in the story's title, and contain either the phrase denver broncos , the phrase miami dolphins , or both.

Copyright © 2006 PubSub Concepts, Inc. All rights reserved.