1 files changed, 392 insertions, 0 deletions
diff --git a/debian/htdig/htdig-3.2.0b6/htdoc/require.html b/debian/htdig/htdig-3.2.0b6/htdoc/require.html
new file mode 100644
index 00000000..d1975701
--- /dev/null
+++ b/debian/htdig/htdig-3.2.0b6/htdoc/require.html
@@ -0,0 +1,392 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
+<html>
+  <head>
+	<title>
+	  ht://Dig: Features and System requirements
+	</title>
+  </head>
+  <body bgcolor="#eef7ff">
+	<h1>
+	  Features and System requirements
+	</h1>
+	<p>
+	  ht://Dig Copyright &copy; 1995-2004 <a href="THANKS.html">The ht://Dig Group</a><br>
+	  Please see the file <a href="COPYING">COPYING</a> for
+	  license information.
+	</p>
+	<hr noshade>
+	<h2>
+	  Features
+	</h2>
+	<p>
+	  Here are some of the major features of ht://Dig. They are in
+	  no particular order.
+	</p>
+	<blockquote>
+	<dl>
+	  <dt>
+		<strong><img src="bdot.gif" width=9 height=9 alt="*">
+		Intranet searching</strong>
+	  </dt>
+	  <dd>
+		ht://Dig has the ability to search through many servers
+		on a network by acting as a WWW browser.
+	  </dd>
+	  <dt>
+		<strong><img src="bdot.gif" width=9 height=9 alt="*">
+		It is free</strong>
+	  </dt>
+	  <dd>
+		The whole system is released under the
+		<a href="COPYING">GNU Library General Public License (LGPL)</a>
+	  </dd>
+	  <dt>
+		<strong><img src="bdot.gif" width=9 height=9 alt="*">
+		Robot exclusion is supported</strong>
+	  </dt>
+	  <dd>
+		The <a href="http://www.robotstxt.org/wc/norobots.html">
+		Standard for Robot Exclusion</a> is
+		<a href="meta.html#robots">supported by ht://Dig.</a>
+	  </dd>
+	  <dt>
+		<strong><img src="bdot.gif" width=9 height=9 alt="*">
+		Boolean expression searching</strong>
+	  </dt>
+	  <dd>
+		Searches can be arbitrarily complex using boolean
+		expressions.
+	  </dd>
+	  <dt>
+		<strong><img src="bdot.gif" width=9 height=9 alt="*">
+		Phrase searching</strong>
+	  </dt>
+	  <dd>
+		A phrase can be searched for by enclosing it in quotes.
+		Phrase searches can be combined with word searches, as in
+		<code>Linux and "high quality"</code>.
+	  </dd>
+	  <dt>
+		<strong><img src="bdot.gif" width=9 height=9 alt="*">
+		Configurable search results</strong>
+	  </dt>
+	  <dd>
+		The output of a search can easily be tailored to your
+		needs by means of providing HTML templates.
+	  </dd>
+	  <dt>
+		<strong><img src="bdot.gif" width=9 height=9 alt="*">
+		Fuzzy searching</strong>
+	  </dt>
+	  <dd>
+		Searches can be performed using various
+		<a href="attrs.html#search_algorithm">configurable algorithms</a>.
+		Currently the following algorithms are
+		supported (in any combination):
+		<ul>
+		  <li>
+			exact
+		  </li>
+		  <li>
+			soundex
+		  </li>
+		  <li>
+			metaphone
+		  </li>
+		  <li>
+			common word endings
+		  </li>
+		  <li>
+			synonyms
+		  </li>
+		  <li>
+			accent stripping
+		  </li>
+		  <li>
+			substring and prefix
+		  </li>
+		  <li>
+			regular expressions
+		  </li>
+		  <li>
+			simple spelling corrections
+		  </li>
+		</ul>
+	  </dd>
+	  <dt>
+		<strong><img src="bdot.gif" width=9 height=9 alt="*">
+		Searching of many file formats</strong>
+	  </dt>
+	  <dd>
+		Both HTML documents and plain text files can be
+		searched directly ht://Dig itself.  There is also a
+		<a href="attrs.html#external_parsers">mechanism
+		to allow external programs ("external parsers")</a> to be used
+		while building the database so that arbitrary file formats
+		can be searched. <br>
+	  </dd>
+	  <dt>
+		<strong><img src="bdot.gif" width=9 height=9 alt="*">
+		Document retrieval using many transport services</strong>
+	  </dt>
+	  <dd>
+		Several transport services can be handled by ht://Dig,
+		including http://, ftp:// and file:///.
+		There is also a
+		<a href="attrs.html#external_protocols">mechanism
+		to allow external programs ("external protocols")</a> to be used
+		while building the database so that arbitrary transport
+		services can be used. <br>
+	  </dd>
+	  <dt>
+		<strong><img src="bdot.gif" width=9 height=9 alt="*">
+		Keywords can be added to HTML documents</strong>
+	   </dt>
+	  <dd>
+		Any number of <a href="meta.html">keywords</a>
+		can be added to HTML documents
+		which will not show up when the document is viewed.
+		This is used to make a document more like to be found
+		and also to make it appear higher in the list of
+		matches.
+	  </dd>
+	  <dt>
+		<strong><img src="bdot.gif" width=9 height=9 alt="*">
+		Email notification of expired documents</strong>
+	  </dt>
+	  <dd>
+		Special meta information can be added to HTML documents
+		which can be used to
+		<a href="notification.html">notify the maintainer</a> of those
+		documents at a certain time. It is handy to get
+		reminded when to remove the "New" images from a certain
+		page, for example.
+	  </dd>
+	  <dt>
+		<strong><img src="bdot.gif" width=9 height=9 alt="*">
+		A Protected server can be indexed</strong>
+	  </dt>
+	  <dd>
+		ht://Dig can be told to use a specific
+		<a href="attrs.html#authorization">username and password</a>
+		when it retrieves documents. This can be used
+		to index a server or parts of a server that are
+		protected by a username and password.
+	  </dd>
+	  <dt>
+		<strong><img src="bdot.gif" width=9 height=9 alt="*">
+		Searches on subsections of the database</strong>
+	  </dt>
+	  <dd>
+		It is easy to set up a search which only returns
+		documents whose
+		<a href="hts_form.html#restrict">URL matches a certain pattern.</a>
+		This becomes very useful for people who want to make their
+		own data searchable without having to use a separate
+		search engine or database.
+	  </dd>
+	  <dt>
+		<strong><img src="bdot.gif" width=9 height=9 alt="*">
+		Full source code included</strong>
+	  </dt>
+	  <dd>
+		The search engine comes with full source code. The
+		whole system is released under the terms and conditions
+		of the <a href="COPYING">GNU Library General Public License (LGPL) version
+		2.0</a>
+	  </dd>
+	  <dt>
+		<strong><img src="bdot.gif" width=9 height=9 alt="*">
+		The depth of the search can be limited</strong>
+	  </dt>
+	  <dd>
+		Instead of limiting the search to a set of machines, it
+		can also be restricted to documents that are a certain
+		number of <a href="attrs.html#max_hop_count">"mouse-clicks"</a>
+		away from the start document.
+	  </dd>
+	  <dt>
+		<strong><img src="bdot.gif" width=9 height=9 alt="*">
+		Full support for the ISO-Latin-1 character set</strong>
+	  </dt>
+	  <dd>
+		Both SGML entities like '&amp;agrave;' and ISO-Latin-1
+		characters can be indexed and searched.
+	  </dd>
+	</dl>
+	</blockquote>
+	<hr size="4" noshade>
+	<h1>
+	  Requirements to build ht://Dig
+	</h1>
+	<p>
+	  ht://Dig was developed under Unix using C++.
+	</p>
+	<p>
+	  For this reason, you will need a Unix machine, a C compiler
+	  and a C++ compiler. (The C compiler is needed to compile some
+	  of the GNU libraries)
+	</p>
+	<p>
+	  Unfortunately, we only have access to a couple of different
+	  Unix machines. ht://Dig has been tested on these machines:
+	</p>
+	<ul>
+<!--
+	  <li>
+		Sun Solaris 2.5 SPARC (using gcc/g++ 2.7.2)
+	  </li>
+	  <li>
+		Sun SunOS 4.1.4 SPARC (using gcc/gcc 2.7.0)
+	  </li>
+	  <li>
+		HP/UX A.09.01 (using gcc/g++ 2.6.0)
+	  </li>
+	  <li>
+		IRIX 5.3 (SGI C++ compiler. Don't know the version)
+	  </li>
+	  <li>
+		Debian Linux 2.0 (using egcs 1.1b)
+	  </li>
+-->
+	  <li>
+		FreeBSD 4.6 (using gcc 2.95.3) <!-- lha -->
+	  </li>
+	  <li>
+	        Mandrake Linux 8.2 (using gcc 3.2) <!-- lha -->
+	  </li>
+	  <li>
+		Debian, 2.2.19 kernel (using gcc 2.95.4) <!-- lha -->
+	  </li>
+	  <li>
+	        Debian on an Alpha <!-- lha -->
+	  </li>
+	  <li>
+	        RedHat 7.3, 8.0 <!-- Jim Cole -->
+	  </li>
+	  <li>
+	        Sun Solaris 2.8 = SunOS 5.8 (using gcc 3.1) <!-- lha -->
+	  </li>
+	  <li>
+	        Sun Solaris 2.8 = SunOS 5.8 (using Sun's cc / g++ 3.1) <!-- lha -->
+	  </li>
+	  <li>
+	        Mac OS X 10.2 (using gcc) <!-- Jim Cole -->
+	  </li>
+
+ 	</ul>
+	There are reports of ht://Dig working on a number of other platforms.
+	<h3>
+	  libstdc++
+	</h3>
+	<p>
+	  If you plan on using g++ to compile ht://Dig, you have to make
+	  sure that libstdc++ has been installed. Unfortunately, libstdc++ is a
+	  separate package from gcc/g++. You can get libstdc++ from the
+	  <a href="ftp://ftp.gnu.org/pub/gnu/">GNU software archive</a>.
+	</p>
+
+<!--		The current  Makefiles  don't use include...
+	<h3>
+	  Berkeley 'make'
+	</h3>
+	<p>
+	  The building relies heavily on the make program. The problem
+	  with this is that not all make programs are the same. The
+	  requirement for the make program is that it understands the
+	  'include' statement as in
+	</p>
+	<blockquote>
+	  <code>include somefile otherfile</code>
+	</blockquote>
+	<p>
+	  The Berkeley 4.4 make program doesn't use this syntax, instead
+	  it wants
+	</p>
+	<blockquote>
+	  <code>.include "somefile"</code><br>
+	  <code>.include "otherfile"</code>
+	</blockquote>
+	<p>
+	  and hence it cannot be used to build ht://Dig.
+	</p>
+	<p>
+	  If your make program doesn't understand the right 'include'
+	  syntax, it is best if you get and install
+	  <a href="ftp://ftp.gnu.org/pub/gnu/">gnumake</a> before you try
+	  to compile everything. The alternative is to change all the
+	  Makefiles.
+	</p>
+-->
+	<hr noshade>
+	<h1>
+	  Disk space requirements
+	</h1>
+	<p>
+	  The search engine will require lots of disk space to store
+	  its databases. Unfortunately, there is no exact formula to
+	  compute the space requirements. It depends on the number of
+	  documents you are going to index but also on the various
+	  options you use.
+	  </p>
+	  <p>As a temporary measure, 3.2 betas use a very inefficient
+	  database structure to enable phrase searching.  This will be
+	  fixed before the release of 3.2.0.  Currently, indexing a site of
+	  around 10,000 documents gives a database of around 400MB using the
+	  default setting for
+	  <a href="attrs.html#max_doc_size">maximum document size</a> and storing the
+	  <a href="attrs.html#max_head_length">first 50,000 bytes of each document</a>
+	  to enable context to be displayed.
+	  <!-- To give you an idea of the space
+	  requirements, here is what I have deduced from our own
+	  database size at San Diego State University.
+	</p>
+	<p>
+	  If you keep around the wordlist database (for update digging
+	  instead of initial digging) I found that multiplying the
+	  number of documents covered by 12,000 will come pretty close
+	  to the space required.
+	</p>
+	<p>
+	  We have about 13,000 documents:
+	</p>
+<pre>
+         13,000
+         12,000 x
+    ===========
+    156,000,000
+</pre>
+	or about 150 MB.
+	<p>
+	  Without the wordlist database, the factor drops down to about
+	  7500:
+	</p>
+<pre>
+         13,000
+          7,500 x
+     ===========
+     97,500,000
+</pre>
+	or about 93 MB.
+-->
+	<p>
+	  Keep in mind that we keep at most 50,000 bytes of each
+	  document. This may seen a lot, but most documents aren't very
+	  big and it gives us a big enough chunk to almost always show
+	  an excerpt of the matches.
+	</p>
+	<p>
+	  You may find that if you store most of each document, the
+	  databases are almost the same size, or even larger than the
+	  documents themselves! Remember that if you're storing a
+	  significant portion of each document (say 50,000 bytes as
+	  above), you have that requirement, plus the size of the word
+	  database and all the additional information about each document
+	  (size, URL, date, etc.) required for searching.
+	</p>
+	<hr size="4" noshade>
+
+	Last modified: $Date: 2004/05/28 13:15:19 $
+
+  </body>
+</html>