diff options
Diffstat (limited to 'debian/htdig/htdig-3.2.0b6/htdoc/hts_method.html')
-rw-r--r-- | debian/htdig/htdig-3.2.0b6/htdoc/hts_method.html | 102 |
1 files changed, 102 insertions, 0 deletions
diff --git a/debian/htdig/htdig-3.2.0b6/htdoc/hts_method.html b/debian/htdig/htdig-3.2.0b6/htdoc/hts_method.html new file mode 100644 index 00000000..d4a7c676 --- /dev/null +++ b/debian/htdig/htdig-3.2.0b6/htdoc/hts_method.html @@ -0,0 +1,102 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> +<html> + <head> + <title> + ht://Dig: htsearch + </title> + </head> + <body bgcolor="#eef7ff"> + <h1> + htsearch + </h1> + <p> + ht://Dig Copyright © 1995-2004 <a href="THANKS.html">The ht://Dig Group</a><br> + Please see the file <a href="COPYING">COPYING</a> for + license information. + </p> + <hr size="4" noshade> + <h2> + Search Method Used + </h2> + <p> + The way htsearch performs it search and applies its ranking + rules are fairly complicated. This is an attempt at explaining + in global terms what goes on when htsearch searches. + </p> + <p> + htsearch gets a list of (case insensitive) words from the HTML + form that invoked + it. If htsearch was invoked with boolean expression parsing + enabled, it will do a quick syntax check on the input words. + If there are syntax errors, it will display the syntax error + file that is specified with the + <a href="attrs.html#syntax_error_file">syntax_error_file</a> + attribute. + </p> + <p> + If the boolean parser was not enabled, the list of words is + converted into a boolean expression by putting either "and"s + or "or"s between the words. (This depends on the search + type.) Phrases within double quotes (") specify that the words + must occur sequentially within the document. + </p> + <p> + If a word is immediately preceeded by a field specifer + (title:, heading:, author:, keyword:, descr:, link:, url:) + then it will only match documents in which the word occurred + within field. For example, descr:foo only matches documents + containing <meta value="description" value="... foo ...">. + The link: field refers to the text in the hyperlinks to a document, + rather than text within the document itself. Similarly url: + (will eventually) refer to the actual URL of the document, not any + of its contents. + The prefixes exact: and hidden: are also accepted. + The former (will) cause the + <a href="attrs.html#search_algorithm">fuzzy search algorithm</a> + not to be applied to this word, while the latter causes the word + not to be displayed in the query string of the results page. + </p> + <p> + Each of the words in the list (but not within a phrase) is now + expanded using the search algorithms that were specified in the + <a href="attrs.html#search_algorithm">search_algorithm</a> + attribute. For example, the endings algorithm will convert a + word like "person" into "person or persons". In this fashion, + all the specified algorithms are used on each of the words + and the result is a new boolean expression. + </p> + <p> + The next step is to perform database lookups on the words in + the expression. The result of these lookups are then passed + to the boolean expression parser. + </p> + <p> + The boolean expression parser is a simple recursive descent + parser with an operand stack. It knows how to deal with + "not", "and", "or" and parenthesis. The result of the parser + will be one set of matches.<br> + Note that the operator "not" is used as the word 'without' and + is binary: You can not write "cat and not dog" or just "not + dog" but you can write "cat not dog". + </p> + <p> + At this point, the matches are ranked. The rank of a match is + determined by the weight of the words that caused the match + and the weight of the algorithm that generated the word. Word + weights are generally determined by the importance of the + word in a document. For example, words in the title of a + document have a much higher weight than words at the bottom + of the document. + </p> + <p> + Finally, when the document ranks have been determined and the + documents sorted, the resulting matches are displayed. If + paged output is required, only a subset of all the matches + will be displayed. + </p> + <hr size="4" noshade> + + Last modified: $Date: 2004/05/28 13:15:18 $ + + </body> +</html> |