1 files changed, 2590 insertions, 0 deletions
diff --git a/debian/htdig/htdig-3.2.0b6/htdoc/FAQ.html b/debian/htdig/htdig-3.2.0b6/htdoc/FAQ.html
new file mode 100644
index 00000000..9f2db468
--- /dev/null
+++ b/debian/htdig/htdig-3.2.0b6/htdoc/FAQ.html
@@ -0,0 +1,2590 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
+<html>
+  <head>
+	<title>ht://Dig Frequently Asked Questions</title>
+        <link rel="stylesheet" href="css/htdig.css">
+  </head>
+  <body bgcolor="#eef7ff">
+	<h1>Frequently Asked Questions</h1>
+	<p>
+	  ht://Dig Copyright &copy; 1995-2004 <a href="THANKS.html">The ht://Dig Group</a><br>
+	  Please see the file <a href="COPYING">COPYING</a> for
+	  license information.
+	</p>
+	  <hr noshade size=4>
+	  <p class="main">This FAQ is compiled by the ht://Dig developers and the
+	  most recent version is available at &lt;<a
+	  href="http://www.htdig.org/FAQ.html">http://www.htdig.org/FAQ.html</a>&gt;.
+	  Questions (and answers!) are greatly appreciated.
+	  Please send questions and/or answers to the ht://Dig user
+	  mailing list at: &lt;<a href="mailto:htdig-general@lists.sourceforge.net">htdig-general@lists.sourceforge.net</a>&gt;.
+	  </p>
+	  <h2>Questions</h2>
+
+	  <h3>1. General</h3>
+	  1.1. <a href="#q1.1">Can I search the internet with ht://Dig?</a><br>
+	  1.2. <a href="#q1.2">Can I index the internet with ht://Dig?</a><br>
+	  1.3. <a href="#q1.3">What's the difference between htdig and
+	  ht://Dig?</a><br>
+	  1.4. <a href="#q1.4">I sent mail to Andrew or Geoff or
+	  Gilles, but I never got a response!</a><br>
+	  1.5. <a href="#q1.5">I sent a question to the mailing list but I
+	  never got a response!</a><br>
+	  1.6. <a href="#q1.6">I have a great idea/patch for ht://Dig!</a><br>
+	  1.7. <a href="#q1.7">Is ht://Dig Y2K compliant?</a><br>
+	  1.8. <a href="#q1.8">I think I found a bug. What should I do?</a><br>
+	  1.9. <a href="#q1.9">Does ht://Dig support phrase or near
+	  matching?</a><br>
+	  1.10. <a href="#q1.10">What are the practical and/or theoretical
+	  limits of ht://Dig?</a><br>
+	  1.11. <a href="#q1.11">Do any ISPs offer ht://Dig as part of
+	  their web hosting services?</a><br>
+	  1.12. <a href="#q1.12">Can I use ht://Dig on a commercial website?</a><br>
+	  1.13. <a href="#q1.13">Why do you use a non-free product to
+	  index PDF files?</a><br>
+	  1.14. <a href="#q1.14">Why do you have all those SourceForge
+	  logos on your website?</a><br>
+	  1.15. <a href="#q1.15">My question isn't answered here. Where should I
+	  go for help?</a><br>
+	  1.16. <a href="#q1.16">Why do the developers get annoyed when
+	  I e-mail questions directly to them rather than the mailing list?</a><br>
+	  1.17. <a href="#q1.17">Why do replies to messages on the
+	  mailing list only go to the sender and not to the list?</a><br>
+	  1.18. <a href="#q1.18">Can I use ht://Dig to index and search
+	  an SQL database?</a><br>
+
+	  <hr noshade size=2>
+
+	  <h3>2. Getting ht://Dig</h3>
+	  2.1. <a href="#q2.1">What's the latest version of ht://Dig?</a><br>
+	  2.2. <a href="#q2.2">Are there binary distributions of ht://Dig?</a><br>
+	  2.3. <a href="#q2.3">Are there mirror sites for ht://Dig?</a><br>
+	  2.4. <a href="#q2.4">Is ht://Dig available by ftp?</a><br>
+	  2.5. <a href="#q2.5">Are patches around to upgrade between
+	  versions?</a><br>
+	  2.6. <a href="#q2.6">Is there a Windows 95/98/2000/NT
+	  version of ht://Dig?</a><br>
+	  2.7. <a href="#q2.7">Where can I find the documentation for my
+	  version of ht://Dig?</a><br>
+
+	  <hr noshade size=2>
+
+	  <h3>3. Compiling</h3>
+	  3.1. <a href="#q3.1">When I compile ht://Dig I get an error
+	  about libht.a.</a><br>
+	  3.2. <a href="#q3.2">I get an error about -lg</a><br>
+	  3.3. <a href="#q3.3">I'm compiling on Digital Unix and I get
+	  mesages about "unresolved" and "db_open."</a><br>
+	  3.4. <a href="#q3.4">I'm compiling on FreeBSD and I get lots
+	  of messages about '___error' being unresolved.</a><br>
+	  3.5. <a href="#q3.5">I'm compiling on HP/UX and I get a complaint about
+	  "Large Files not supported."</a><br>
+	  3.6. <a href="#q3.6">I'm compiling on Solaris and when I run the 
+	  programs I get complaints about not finding libstdc++.</a><br>
+	  3.7. <a href="#q3.7">I'm compiling on IRIX and I'm having
+	  database problems when I run the program.</a><br>
+	  3.8. <a href="#q3.8">I'm compiling with gcc 3.2 and getting
+	  all sorts of warnings/errors about ostream and such.</a><br>
+
+	  <hr noshade size=2>
+
+	  <h3>4. Configuration</h3>
+	  4.1. <a href="#q4.1">How come I can't index my site?</a><br>
+	  4.2. <a href="#q4.2">How can I change the output format of
+	  htsearch?</a><br>
+	  4.3. <a href="#q4.3">How do I index pages that start with '~'?</a><br>
+	  4.4. <a href="#q4.4">Can I use multiple databases?</a><br>
+	  4.5. <a href="#q4.5">OK, I can use multiple databases. Can I
+	  merge them into one?</a><br>
+	  4.6. <a href="#q4.6">Wow, ht://Dig eats up a lot of disk
+	  space. How can I cut down?</a><br>
+	  4.7. <a href="#q4.7">Can I use SSI or other CGIs in my
+	  htsearch results?</a><br>
+	  4.8. <a href="#q4.8">How do I index Word, Excel, PowerPoint
+	  or PostScript documents?</a><br>
+	  4.9. <a href="#q4.9">How do I index PDF files?</a><br>
+	  4.10. <a href="#q4.10">How do I index documents in other
+	  languages?</a><br>
+	  4.11. <a href="#q4.11">How do I get rotating banner ads in
+	  search results?</a><br>
+	  4.12. <a href="#q4.12">How do I index numbers in documents?</a><br>
+	  4.13. <a href="#q4.13">How can I call htsearch from a hypertext
+	  link, rather than from a search form?</a><br>
+	  4.14. <a href="#q4.14">How do I restrict a search to only meta
+	  keywords entries in documents?</a><br>
+	  4.15. <a href="#q4.15">Can I use meta tags to prevent htdig from
+	  indexing certain files?</a><br>
+	  4.16. <a href="#q4.16">How do I get htsearch to use the star image
+	  in a different directory than the default /htdig?</a><br>
+	  4.17. <a href="#q4.17">How do I get htdig or htsearch to rewrite
+	  URLs in the search results?</a><br>
+	  4.18. <a href="#q4.18">What are all the options in
+	  htdig.conf, and are there others?</a><br>
+	  4.19. <a href="#q4.19">How do I get more than 10 pages of
+	  10 search results from htsearch?</a><br>
+	  4.20. <a href="#q4.20">How do I restrict a search to only
+	  certain subdirectories or documents?</a><br>
+	  4.21. <a href="#q4.21">How can I allow people to search
+	  while the index is updating?</a><br>
+	  4.22. <a href="#q4.22">How can I get htdig to ignore the
+	  robots.txt file or meta robots tags?</a><br>
+	  4.23. <a href="#q4.23">How can I get htdig not to index
+	  some directories, but still follow links?</a><br>
+	  4.24. <a href="#q4.24">How can I get rid of duplicates in
+	  search results?</a><br>
+	  4.25. <a href="#q4.25">How can I change the scores in
+	  search results, and what are the defaults?</a><br>
+	  4.26. <a href="#q4.26">How can I get htdig not to index
+	  JavaScript code or CSS?</a><br>
+
+	  <hr noshade size=2>
+
+	  <h3>5. Troubleshooting</h3>
+	  5.1. <a href="#q5.1">I can't seem to index more than X documents
+	  in a directory.</a><br>
+	  5.2. <a href="#q5.2">I can't index PDF files.</a><br>
+	  5.3. <a href="#q5.3">When I run "rundig," I get a message about
+	  "DATABASE_DIR" not being found.</a><br>
+	  5.4. <a href="#q5.4">When I run htmerge, it stops with an "out
+	  of diskspace" message.</a><br>
+	  5.5. <a href="#q5.5">I have problems running rundig from cron
+	  under Linux.</a><br>
+	  5.6. <a href="#q5.6">When I run htmerge, it stops with an
+	  "Unexpected file type" message.</a><br>
+	  5.7. <a href="#q5.7">When I run htsearch, I get lots of Internal
+	  Server Errors (#500).</a><br>
+	  5.8. <a href="#q5.8">I'm having problems with indexing words
+	  with accented characters.</a><br>
+	  5.9. <a href="#q5.9">When I run htmerge, it stops with a
+	  "Word sort failed" message.</a><br>
+	  5.10. <a href="#q5.10">When htsearch has a lot of matches, it runs
+	  extremely slowly.</a><br>
+	  5.11. <a href="#q5.11">When I run htsearch, it gives me a count of
+	  matches, but doesn't list the matching documents.</a><br>
+	  5.12. <a href="#q5.12">I can't seem to index documents with names
+	  like left_index.html with htdig.</a><br>
+	  5.13. <a href="#q5.13">I get Premature End of Script Headers errors
+	  when running htsearch.</a><br>
+	  5.14. <a href="#q5.14">I get Segmentation faults when running
+	  htdig, htsearch or htfuzzy.</a><br>
+	  5.15. <a href="#q5.15">Why does htdig 3.1.3 mangle URL parameters
+	  that contain bare "&amp;" characters?</a><br>
+	  5.16. <a href="#q5.16">When I run htmerge, it stops with an
+	  "Unable to open word list file '.../db.wordlist'" message.</a><br>
+	  5.17. <a href="#q5.17">When using Netscape, htsearch always returns the
+	  "No match" page.</a><br>
+	  5.18. <a href="#q5.18">Why doesn't htdig follow links to other
+	  pages in JavaScript code?</a><br>
+	  5.19. <a href="#q5.19">When I run htsearch from the web server,
+	  it returns a bunch of binary data.</a><br>
+	  5.20. <a href="#q5.20">Why are the betas of 3.2 so slow at indexing?</a><br>
+	  5.21. <a href="#q5.21">Why does htsearch use ";" instead of
+	  "&amp;" to separate URL parameters for the page buttons?</a><br>
+	  5.22. <a href="#q5.22">Why does htsearch show the
+	  "&amp;" character as "&amp;amp;" in search results?</a><br>
+	  5.23. <a href="#q5.23">I get Internal Server or Unrecognized
+	  character errors when running htsearch.</a><br>
+	  5.24. <a href="#q5.24">I took some settings out of
+	  my htdig.conf but they're still set.</a><br>
+	  5.25. <a href="#q5.25">When I run htdig on my site,
+	  it misses entire directories.</a><br>
+	  5.26. <a href="#q5.26">What do all the numbers and symbols
+	  in the htdig -v output mean?</a><br>
+	  5.27. <a href="#q5.27">Why is htdig rejecting some of the
+	  links in my documents?</a><br>
+	  5.28. <a href="#q5.28">When I run htdig or htmerge, I get a
+	  "DB2 problem...: missing or empty key value specified" message.</a><br>
+	  5.29. <a href="#q5.29">When I run htdig on my site,
+	  it seems to go on and on without ending.</a><br>
+	  5.30. <a href="#q5.30">Why does htsearch no longer recognize
+	  the -c option when run from the web server?</a><br>
+	  5.31. <a href="#q5.31">I've set a config attribute exactly
+	  as documented but it seems to have no effect.</a><br>
+	  5.32. <a href="#q5.32">When I run htsearch, it gives a page
+	  with an "Unable to read configuration file" message.</a><br>
+	  5.33. <a href="#q5.33">How can I find out which version
+	  of ht://Dig I have installed?</a><br>
+	  5.34. <a href="#q5.34">When running htdig, I get "Error (0):
+	  PDF file is damaged - attempting to reconstruct xref table..."</a><br>
+	  5.35. <a href="#q5.35">When running htdig on Mandrake Linux,
+	  I get "host not found" and "no server running" errors.</a><br>
+	  5.36. <a href="#q5.36">When I run htsearch, it gives me the
+	  list of matching documents, but no header or footer.</a><br>
+	  5.37. <a href="#q5.37">When I index files with doc2html.pl,
+	  it fails with the "UNABLE to convert" error.</a><br>
+	  5.38. <a href="#q5.38">Why do my searches find search terms
+	  in pathnames, or how do I prevent matching filenames?</a><br>
+	  5.39. <a href="#q5.39">I set up an external parser but I still
+	  can't index Word/Excel/PowerPoint/PDF documents.</a><br>
+
+	  <hr noshade size=4>
+	  <h2>Answers</h2>
+
+	  <h3>1. General</h3>
+	  <strong>1.1. <a name="q1.1">Can I search the internet with
+	  ht://Dig?</a></strong><br>
+	  <p>No, ht://Dig is a system for indexing and searching a
+	  finite (not necessarily small) set of sites or intranet. It
+	  is not meant to replace any of the many internet-wide search
+	  engines.</p>
+
+	  <strong>1.2. <a name="q1.2">Can I index the internet with
+	  ht://Dig?</a></strong><br>
+	  <p>No, as above, ht://Dig is not meant as an
+	  internet-wide search engine. While there is
+	  <em>theoretically</em> nothing to stop you from indexing as
+	  much as you wish, practical considerations (e.g. time, disk
+	  space, memory, etc.) will limit this.</p>
+
+	  <strong>1.3. <a name="q1.3">What's the difference between htdig and
+	  ht://Dig?</a></strong><br>
+	  <p>The complete ht://Dig package consists of several programs, one of
+	  which is called "htdig." This program performs the "digging" or
+	  indexing of the web pages. Of course an index doesn't do you much good
+	  without a program to sort it, search through it, etc.</p>
+
+	  <strong>1.4. <a name="q1.4">I sent mail to Andrew or Geoff
+	  or Gilles, but I never got a response!</a></strong><br>
+	  <p>Andrew no longer does much work on ht://Dig. He has started a
+	  company, called <a href="http://www.contigo.com/">Contigo
+	  Software</a> and is quite busy with that. To contact any of the
+	  current developers, send mail to &lt;<a
+	  href="mailto:htdig-dev@lists.sourceforge.net">htdig-dev</a>&gt;.
+	  This list is intended primarily for the discussion of current
+	  and future development of the software.</p>
+
+	  <p>Geoff and Gilles are currently the maintainers of
+	  ht://Dig, but they are both volunteers. So while they do
+	  read all the e-mail they receive, they may not respond
+	  immediately. Questions about ht://Dig in general, and especially
+	  questions or requests for help in configuring the software,
+	  should be posted to the &lt;<a
+	  href="mailto:htdig-general@lists.sourceforge.net">htdig-general</a>&gt;
+	  mailing list. When posting a followup to a message on the
+	  list, you should use the "reply to all" or "group reply"
+	  feature of your mail program, to make sure the mailing list
+	  address is included in the reply, rather than replying only
+	  to the author of the message.
+	  See also question <a href="#q1.16">1.16</a> and the
+	  <a href="http://www.htdig.org/mailarchive.html">mailing list</a>
+	  page.</p>
+
+	  <strong>1.5. <a name="q1.5">I sent a question to the mailing list but I
+	  never got a response!</a></strong><br>
+	  <p>Development of ht://Dig is done by volunteers. Since we all
+	  have other jobs, it make take a while before someone gets back
+	  to you. Please be patient and don't hound the volunteers with
+	  direct or repeated requests. If you don't get a response after
+	  3 or 4 days, then a reminder may help.
+	  See also question <a href="#q1.16">1.16</a>.</p>
+
+	  <strong>1.6. <a name="q1.6">I have a great idea/patch for
+	  ht://Dig!</a></strong><br>
+	  <p>Great! Development of ht://Dig continues through suggestions
+	  and improvements from users. If you have an idea (or even better,
+	  a patch), please send it to the ht://Dig mailing list so others
+	  can use it. For suggestions on how to submit patches, please check
+	  the <a href="dev/patches.html">Guidelines for
+	  Patch Submissions</a>. If you'd like to make a feature request,
+	  you can do so through the <a href="bugs.html">ht://Dig bug
+	  database</a></p>
+
+	  <strong>1.7. <a name="q1.7">Is ht://Dig Y2K compliant?</a></strong><br>
+	  <p>
+	  ht://Dig should be y2k compliant since it never <em>stores</em> dates as
+	  two-digit years. Under ht://Dig's copyright (GPL), there is no warranty
+	  whatsoever as permitted by law. If you would like an iron-clad,
+	  legally-binding guarantee, feel free to check the source code
+	  itself. Versions prior to 3.1.2 did have a problem with the parsing
+	  of the Last-Modified header returned by the HTTP server, which will
+	  cause incorrect dates to be stored for documents modified after
+	  February 28, 2000 (yes, it didn't recognize 2000 as a leap year).
+	  Versions prior to 3.1.5 didn't correctly handle servers that return
+	  two digit years in the Last-Modified header, for years after 99.
+	  These problems are fixed in the current release.
+	  If you discover something else, please let us know!
+	  </p>
+
+	  <strong>1.8. <a name="q1.8">I think I found a bug. What should I
+	  do?</a></strong><br>
+	  <p>Well, there are probably bugs out there. You have two options
+	  for bug-reporting. You can either mail the ht://Dig mailing list
+	  at &lt;<a href="mailto:htdig-general@lists.sourceforge.net">htdig-general@lists.sourceforge.net</a>&gt; or
+	  better yet, report it to the <a href="bugs.html">bug
+	  database</a>, which ensures it won't
+	  become lost amongst all of the other mail on the list.
+	  Please try to include as much information as possible, including
+	  the version of ht://Dig (see question <a href="#q5.33">5.33</a>),
+	  the OS, and anything else that might be helpful.
+	  Often, running the programs with one "-v" or more
+	  (e.g. "-vvv") gives useful debugging information.
+	  If you are unsure whether the problem is a bug or a configuration
+	  problem, you should discuss the problem on
+	  &lt;<a href="mailto:htdig-general@lists.sourceforge.net">htdig-general</a>&gt;
+	  (after carefully reading the FAQ and searching the
+	  <a href="http://www.htdig.org/mailarchive.html">mail archive</a>
+	  and <a href="#q2.5">patch archive</a>,
+	  of course)
+	  to sort out what it is. The mailing list has a wider audience, so
+	  you're more likely to get help with configuration problems there
+	  than by reporting them to the bug database.
+	  </p>
+
+	  <p>Whether reporting problems to the bug database or mailing
+	  list, we cannot stress enough the importance of
+	  <strong>always</strong> indicating <strong>which version of
+	  ht://Dig you are running</strong>.
+	  See question <a href="#q5.33">5.33</a>. There
+	  are still a lot of users, ISPs and software distributors using
+	  older versions, and there have been a lot of bug fixes and
+	  new features added in recent versions.  Knowing which version
+	  you're running is absolutely essential in helping to find a
+	  solution. If you're unsure if your version is current, or what
+	  fixes and features have been added in more recent versions,
+	  please see the <a href="RELEASE.html">
+	  release notes</a>. See also question <a href="#q2.1">2.1</a>.</p>
+
+	  <strong>1.9. <a name="q1.9">Does ht://Dig support phrase or near
+	  matching?</a></strong><br>
+	  <p>Phrase searching has been added for the 3.2 release,
+	  which is currently in the beta phase
+	 (<a href="http://www.htdig.org/files/htdig-3.2.0b6.tar.gz">3.2.0b6</a>
+	  as of this writing). Near or proximity matching will probably be added
+	  in a future beta.
+	  </p>
+
+	  <strong>1.10. <a name="q1.10">What are the practical and/or theoretical
+	  limits of ht://Dig?</a></strong><br>
+	  <p>The code itself doesn't put any real limit on the number of
+	  pages. There are several sites in the hundreds of thousands
+	  of pages. As for practical limits, it depends a lot on how
+	  many pages you plan on indexing. Some operating systems limit
+	  files to 2 GB in size, which can become a problem with a large
+	  database. There are also slightly different limits to each of
+	  the programs. Right now htmerge performs a sort on the words
+	  indexed. Most sort programs use a fair amount of RAM and
+	  temporary disk space as they assemble the sorted list. The
+	  htdig program stores a fair amount of information about the
+	  URLs it visits, in part to only index a page once. This takes
+	  a fair amount of RAM. With cheap RAM, it never hurts to throw
+	  more memory at indexing larger sites. In a pinch, swap will
+	  work, but it obviously really slows things down.</p>
+
+	  <p>The 3.2 development code helps with many of these
+	  limitations. In paticular, it generates the databases on the
+	  fly, which means you don't have to sort them before
+	  searching. Additionally, the new databases are compressed
+	  significantly, making them usually around 50% the size of
+	  those in previous versions.</p>
+
+	  <strong>1.11. <a name="q1.10">Do any ISPs offer ht://Dig as part of
+	  their web hosting services?</a></strong><br>
+	  <p>Yes. A list of such ISPs is <a href="isp.html">available
+	  here</a>
+	  </p>
+
+	  <strong>1.12. <a name="q1.12">Can I use ht://Dig on a
+	  commercial website?</a></strong><br>
+	  <p>Sure! The <a href="COPYING">GNU Library General Public License (LGPL)</a> has no
+	  restrictions on use. So you are free to use ht://Dig however you
+	  want on your website, personal files, etc. The license only
+	  restricts distribution. So if you're planning on a
+	  commercial software product that includes ht://Dig, you will
+	  have to provide source code including any modifications upon
+	  request.
+	  </p>
+
+	  <strong>1.13. <a name="q1.13">Why do you use a non-free
+	  product to index PDF files?</a></strong><br>
+	  <p>
+	  We don't. You <em>can</em> use the &quot;acroread&quot;
+	  program to index PDF files, but this is no longer
+	  recommended. Initially this program was the only reliable
+	  way to extract data from PDF files. However, the <a
+	  href="http://www.foolabs.com/xpdf/">xpdf package</a> is a
+	  reliable, free software package for indexing and viewing PDF
+	  files. See question <a href="#q4.9">4.9</a> for details on
+	  using xpdf to index PDF files. We do not advocate using
+	  acroread any longer because it is a proprietary product.
+	  Additionally it is no longer reliable at extracting data.
+	  </p>
+
+	  <strong>1.14. <a name="q1.14">Why do you have all those SourceForge
+	  logos on your website?</a></strong><br>
+	  <p><a href="http://sourceforge.net/">SourceForge</a> is a
+	  new service for open source software. You can host your
+	  project on SourceForge servers and use many of their
+	  services like bug-tracking and the like. The ht://Dig
+	  project currently uses SourceForge for a mirror of the main
+	  website at <a
+	  href="http://htdig.sourceforge.net/">htdig.sourceforge.net</a>
+	  as well as a mirror of ht://Dig releases and contributed
+	  work.
+	  </p>
+	  
+	  <strong>1.15. <a name="q1.15">My question isn't answered here. 
+	  Where should I go for help?</a></strong><br>
+	  <p>
+	  Before you go anywhere else, think of other ways of phrasing your 
+	  question. Many times people have questions that are very similar to 
+	  other FAQ and while we try to phrase the queries in the FAQ closely to 
+	  the most common questions, we obviously can't get them all! The next 
+	  place to check is the documentation itself. In particular, take a 
+	  look at the list of configuration attributes, particularly the list <a 
+	  href="cf_byname.html">by name</a> and <a 
+	  href="cf_byprog.html">by program</a>. There are a 
+	  lot of them, but chances are there's something that might fit your needs.
+	  You should also take a close look at all of
+	  <a href="htsearch.html">htsearch</a>'s
+	  documentation, especially the section "HTML form" which describes
+	  all the CGI input parameters available for controlling the search,
+	  including limiting the search to certain subdirectories.
+	  You can find the answer yourself to almost all "how can I..."
+	  questions by exploring what the various configuration attributes
+	  and search form input parameters can do.
+	  Also have a look at our collection of
+	  <a href="http://www.htdig.org/contrib/guides.html">Contributed Guides</a>
+	  for help on things like
+	  <a href="http://www.htdig.org/files/contrib/guides/htmlhelp.html">HTML
+	  forms</a> and CGI, tutorials on installing, configuring, using, and
+	  internationalizing ht://Dig, as well as using PHP with htsearch.
+	  </p>
+	  <p>
+	  Finally, if you've exhausted all the online documentation, there's the 
+	  <a href="mailto:htdig-general@lists.sourceforge.net">htdig-general</a> mailing list. 
+	  There are hundreds of users subscribed and chances are good that someone 
+	  has had a similar problem before or can suggest a solution.
+	  </p>
+
+	  <strong>1.16. <a name="q1.16">Why do the developers get annoyed when
+	  I e-mail questions directly to them rather than the mailing list?</a></strong><br>
+	  <p>The <a href="mailto:htdig-general@lists.sourceforge.net">htdig-general</a>
+	  mailing list exists for dealing with questions about the
+	  software, its installation, configuration, and problems with
+	  it. E-mailing the developers directly circumvents this forum
+	  and its benefits. Most annoyingly, it puts the onus on an
+	  individual to answer, even if that individual is not the best or
+	  most qualified person to answer. This is not a one-man show. It
+	  also circumvents the <a href="http://www.htdig.org/mailarchive.html">archiving
+	  mechanism</a> of the mailing list,
+	  so not only do subscribers not see these private messages
+	  and replies, but future users who may run into the exact same
+	  problems won't see them. Remember that the developers are all
+	  volunteers, and they don't work for free for your benefit alone.
+	  They volunteer for the benefit of the whole ht://Dig user
+	  community, so don't expect extra support from them outside of
+	  that community. See also questions <a href="#q1.4">1.4</a>
+	  and <a href="#q1.5">1.5</a>.</p>
+
+	  <p>Note also that when you reply to a message on the list, you
+	  should make sure the reply gets on the list as well, provided your
+	  reply is still on-topic.  See question <a href="#q1.17">1.17</a>
+	  below.</p>
+
+	  <strong>1.17. <a name="q1.17">Why do replies to messages on the
+	  mailing list only go to the sender and not to the list?</a></strong><br>
+	  <p>The simple answer is that, unlike some mailing lists, the
+	  lists on SourceForge don't force replies back on the list. This
+	  is actually a good thing, because you can reply to the sender
+	  directly if you want to, or you can use your mail program's
+	  "reply to all" capability (sometimes called "group reply")
+	  to reply to the mailing list as well. It does mean you have to
+	  think before you post a reply, but some would argue that this
+	  is a good thing too. There are some compelling reasons to try to
+	  keep on-topic discussions on the list, though (see questions
+	  <a href="#q1.16">1.16</a> and <a href="#q1.4">1.4</a> above).</p>
+
+	  <p>The technical answer is
+	  <a href="http://sourceforge.net/docman/display_doc.php?docid=6693&group_id=1">
+	  SourceForge's policy on Reply-To: munging</a>, where you'll
+	  find all the gory details about the pros and cons of the two
+	  common ways of setting up a mailing list, and why SourceForge
+	  turns off Reply-To munging. It so happens that the ht://Dig
+	  maintainers agree with SourceForge's policy on this, even if
+	  we did have a say in the matter. So, counterarguments to this
+	  policy are rather moot, and it would be better not to waste
+	  any more mailing list bandwidth debating them. (We've heard
+	  all the arguments anyway.)</p>
+
+	  <strong>1.18. <a name="q1.18">Can I use ht://Dig to index and search
+	  an SQL database?</a></strong><br>
+	  <p>You can if your database has a web-based front end that can
+	  be "spidered" by ht://Dig. The requirement is that every search
+	  result must resolve to a unique URL which can be accessed via
+	  HTTP. The htdig program uses these URLs, which you feed it via
+	  the <a href="attrs.html#start_url">start_url</a> attribute, to
+	  fetch and index each page of information. The search results
+	  will then give a list of URLs for all pages that match the
+	  search terms. If you don't have such a front end to your
+	  database, or the search results must be given as something
+	  other than URLs, then ht://Dig is probably not the best way of
+	  dealing with this problem: you may be better off using an SQL
+	  query engine that works directly on your own database, rather
+	  than building a separate ht://Dig database for searching.</p>
+
+	  <p>Ted Stresen-Reuter had the following tips: "In my case,
+	  because I like htdig's ability to rank results (and that
+	  ranking can be modified), I created an index page that simply
+	  walks through each record and indexes each record (with
+	  <em>next</em> and <em>previous</em> links so the spider can
+	  read all the records). And then I do one other thing: I make
+	  the <code>&lt;title&gt;</code> tag start with the unique ID
+	  of each record. Then, when I'm parsing the search results, I
+	  do a lookup on the database using the title tag as the key."</p>
+
+	  <hr noshade size=2>
+
+	  <h3>2. Getting ht://Dig</h3>
+	  <strong>2.1. <a name="q2.1">What's the latest version of ht://Dig?</a></strong><br>
+	  <p>The latest version is 3.1.6 as of this writing. A beta
+	  version of the 3.2 code,
+	 <a href="http://www.htdig.org/files/htdig-3.2.0b6.tar.gz">3.2.0b6</a>,
+	  is also available, for those who wish to test it.
+	  You can find out about the latest version by reading the
+	  <a href="RELEASE.html">release
+	  notes</a>.</p>
+
+	  <p><strong>Note</strong> that if you're running any version
+	  older than 3.1.5 (including 3.2.0b1) on a public web site,
+	  you should upgrade immediately, as older versions have a
+	  rather serious security hole which is explained in detail in
+	  this <a
+	  href="http://www.htdig.org/htdig-dev/2000/02/0272.html">advisory</a>
+	  which was sent to the Bugtraq mailing list.
+	  Another slightly less serious, but still troubling security hole
+	  exists in 3.1.5 and older (including 3.2.0b3 and older), so you
+	  should upgrade if you're running one of these. You can view details
+	  on this vulnerability from the
+	  <a href="http://www.securityfocus.com/bid/3410">bugtraq mailing list.</a>
+	  If you're unsure of which version you're running, see question
+	  <a href="#q5.33">5.33</a>.</p>
+
+	  <strong>2.2. <a name="q2.2">Are there binary distributions of
+	  ht://Dig?</a></strong><br>
+	  <p>We're trying to get consistent binary distributions for
+	  popular platforms. Contributed binary releases will go in <a
+	  href="http://www.htdig.org/files/contrib/binaries/">
+	  the contributed binaries section</a>
+	  and contributions should be mentioned to the <a
+	  href="mailto:htdig-general@lists.sourceforge.net">htdig-general</a>
+	  mailing list.
+
+	  <p>Anyone who would like to make consistent binary
+	  distributions of ht://Dig at least should signup to the <a
+	  href="mailing.html">htdig-announce mailing list</a>.</p>
+
+	  <strong>2.3. <a name="q2.3">Are there mirror sites for ht://Dig?</a></strong><br>
+	  <p>Yes, see our <a href="mirrors.html">mirrors
+	  listing</a>. If you'd like to mirror the site, please see
+	  the <a href="howto-mirror.html">mirroring guide</a>.</p>
+
+	  <strong>2.4. <a name="q2.4">Is ht://Dig available by ftp?</a></strong><br>
+	  <p>Yes. You can find the current versions and several older
+	  versions at various &lt;<a
+	  href="mirrors.html">mirror sites</a>&gt;
+	  as well as the other locations mentioned in the <a
+	  href="where.html">download page</a>.</p>
+
+	  <strong>2.5. <a name="q2.5">Are patches around to upgrade between
+	  versions?</a></strong><br>
+	  <p>Most versions are also distributed as a patch to the previous
+	  version's source code. The most recent exception to this was
+	  version 3.1.0b1. Since this version switched from the GDBM
+	  database to DB2, the new database package needed to be shipped
+	  with the distribution. This made the potential patch almost as large
+	  as the regular distribution. Update patches resumed with version
+	  3.1.0b2. You can also find archives of patches submitted to
+	  the htdig mailing lists, to fix specific bugs or add features,
+	  at Joe Jah's <a href="ftp://ftp.ccsf.org/htdig-patches/">
+	  htdig-patches ftp site</a>.</p>
+
+	  <strong>2.6. <a name="q2.6">Is there a Windows 95/98/2000/NT
+	  version of ht://Dig?</a></strong><br>
+	  <p>The ht://Dig package can be built on the Win32 platform when
+	  using the Cygwin package. For details, see the contributed guide,
+	  <a href="http://www.htdig.org/files/contrib/guides/Installing_on_Win32.html">
+	  <em>Idiot's Guide to Installing ht://Dig on Win32</em></a>.
+	  </p>
+	  <p>
+	  As of the <a href="http://www.htdig.org/files/htdig-3.2.0b5.tar.gz">3.2.0b5</a>
+	  beta release, there is also native Win32 support, thanks to
+	  Neal Richter.  (Installation docs will be written soon...)
+	  </p>
+
+	  <strong>2.7. <a name="q2.7">Where can I find the documentation for my
+	  version of ht://Dig?</a></strong><br>
+	  <p>The documentation for the most recent stable release is always
+	  posted at <a href="http://www.htdig.org/">www.htdig.org</a>.
+	  The documentation for the latest beta release can be found at
+	  <a href="http://www.htdig.org/dev/htdig-3.2/">http://www.htdig.org/dev/htdig-3.2/</a>.
+	  In all releases, the documentation is included in the
+	  <strong>htdoc</strong> subdirectory of the source distribution, so
+	  you always have access to the documentation for your current version.
+	  </p>
+
+	  <hr noshade size=2>
+
+	  <h3>3. Compiling</h3>
+	  <strong>3.1. <a name="q3.1">When I compile ht://Dig I get an error about
+	  libht.a</a></strong><br>
+	  <p>This usually indicates that either libstdc++ is not installed or
+	  is installed incorrectly. To get libstdc++ or any other GNU too,
+	  check
+	  <a
+	  href="ftp://ftp.gnu.org/gnu/">ftp://ftp.gnu.org/gnu/</a>.
+	  Note that the most recent versions of gcc come with
+	  libstdc++ included and are available from <a
+	  href="http://gcc.gnu.org/">http://gcc.gnu.org/</a></p>
+
+	  <strong>3.2. <a name="q3.2">I get an error about -lg</a></strong><br>
+	  <p>This is due to a bug in the Makefile.config.in of version
+	  3.1.0b1. Remove all flags "-ggdb" in Makefile.config.in. Then
+	  type "./config.status" to rebuild the Makefiles and
+	  recompile. This bug is fixed in version 3.1.0b2.</p>
+
+	  <strong>3.3. <a name="q3.3">I'm compiling on Digital Unix and I get
+	  mesages about "unresolved" and "db_open."</a></strong><br>
+	  <p>Answer contributed by George Adams
+	  &lt;learningapache@my-dejanews.com&gt;</p>
+
+	  <p>What you're seeing are problems related to the Berkeley DB
+	  library.  htdig needs a fairly modern version of db, which is
+	  why it ships with one that works. (see that -L../db-2.4.14/dist
+	  line?  That's where htdig's db library is).<br>
+
+	  The solution is to modify the c++ command so it explicity
+	  references the correct libdb.a .  You can do this by replacing
+	  the "-ldb" directive in the c++ command with
+	  "../db-2.4.14/dist/libdb.a" This problem has been resolved as of
+	  version 3.1.0.</p>
+
+	  <strong>3.4. <a name="q3.4">I'm compiling on FreeBSD and I get lots
+          of messages about '___error' being unresolved.</a></strong><br>
+	  <p>Answer contributed by Laura Wingerd &lt;laura@perforce.com&gt;<br>
+	  I got a clean build of htdig-3.1.2 on FreeBSD 2.2.8 by taking
+	  -D_THREAD_SAFE out of CPPFLAGS, and setting LIBS to null, in
+	  db/dist/configure.</p>
+
+	  <strong>3.5. <a name="q3.5">I'm compiling on HP/UX and I get a complaint about
+	  "Large Files not supported."</a></strong><br>
+	  <p>The db/ pacakge, included with ht://Dig seems to be unable to complete
+	  on HP/UX 10.20 in particular. After running the top-level configure 
+	  script, cd into db/dist and type:</p>
+	  <code>./configure --disable-bigfile</code>
+	  <p>Then continue with the normal compilation.</p>
+	  
+	  <strong>3.6. <a name="q3.6">I'm compiling on Solaris and when I run the 
+	  programs I get complaints about not finding libstdc++.</a></strong><br>
+	  <p>Answer contributed by Adam Rice &lt;adam@newsquest.co.uk&gt;</p>
+	  <p>The problem is that the Solaris loader can't find the library. The 
+	  best thing to do is set the LD_RUN_PATH environment variable <em>during compile</em>
+	  to the directory where libstdc++.so.2.8.1.1 lives. This tells the linker 
+	  to search that directory at runtime.
+	  </p>
+
+	  <p>Note that LD_RUN_PATH is not to be confused with LD_LIBRARY_PATH.
+	  The latter is parsed at run-time, while LD_RUN_PATH essentially
+	  compiles in a library path into the executable, so that it doesn't
+	  need a LD_LIBRARY_PATH setting to find its libraries. This allows
+	  you to avoid all the complexities of setting an environment
+	  variable for a CGI program run from the server. If all else fails,
+	  you can always run your programs from wrapper shell scripts that
+	  set the LD_LIBRARY_PATH environment variable appropriately.</p>
+
+	  <p>Note also that while this answer is specific to Solaris, it may
+	  work for other OSes too, so you may want to give it a try. However,
+	  not all versions of the <code>ld</code> program on all OSes support
+	  the LD_RUN_PATH environment variable, even if these systems support
+	  shared libraries. Try "<code>man&nbsp;ld</code>" on your system to
+	  find out the best way of setting the runtime search path for shared
+	  libraries. If <code>ld</code> doesn't support LD_RUN_PATH, but does
+	  support the <code>-R</code> option, you can add one or more of these
+	  options to LIBDIRS in Makefile.config before running make on a 3.1.x
+	  release. (For a 3.2 beta release, you can add these options to the
+	  LDFLAGS environment variable before you run ./configure.)</p>
+	  
+	  <strong>3.7. <a name="q3.7">I'm compiling on IRIX and I'm having 
+	  database problems when I run the program.</a></strong><br>
+	  <p>
+	  It is not entirely clear why these problems occur, though
+	  they seem to only happen when older compilers are
+	  used. Several people have reported that the problems go away
+	  when using the latest version of <a href="http://gcc.gnu.org/">gcc</a>.
+	  </p>
+	  
+	  <strong>3.8. <a name="q3.8">I'm compiling with gcc 3.2 and getting
+	  all sorts of warnings/errors about ostream and such.</a></strong><br>
+	  <p>
+	  With versions before 3.2.0b5,
+	  you should use the following command to configure the ht://Dig
+	  package so it can be built with gcc 3.2:
+<pre>
+CXXFLAGS=-Wno-deprecated CPPFLAGS=-Wno-deprecated ./configure
+</pre>
+	  </p>
+
+	  <hr noshade size=2>
+
+	  <h3>4. Configuration</h3>
+	  <strong>4.1. <a name="q4.1">How come I can't index my site?</a></strong><br>
+	  <p>There are a variety of reasons ht://Dig won't index a
+	  site. To get to the bottom of things, it's advisable to turn on
+	  some debugging output from the htdig program. When running from
+	  the command-line, try "-vvv"  in addition to any other
+	  flags. This will add debugging output, including the responses
+	  from the server.</p>
+	  <p>See also questions <a href="#q5.25">5.25</a>,
+	  <a href="#q5.27">5.27</a>, <a href="#q5.16">5.16</a> and
+	  <a href="#q5.18">5.18</a>.</p>
+
+	  <strong>4.2. <a name="q4.2">How can I change the output format of htsearch?</a></strong><br>
+<p>Answer contributed by: Malki Cymbalista &lt;Malki.Cymbalista@weizmann.ac.il&gt;</p>
+
+<p>You can change the output format of htsearch by creating different
+header, footer and result files that specify how you want the output
+to look. You then create a configuration file that specifies which
+files to use. In the html document that links to the search, you
+specify which configuration file to use.</p>
+
+<p>So the configuration file would have the lines:</p>
+<pre>
+search_results_header: ${common_dir}/ccheader.html
+search_results_footer: ${common_dir}/ccfooter.html
+template_map:  Long long builtin-long \
+               Short short builtin-short \
+               Default default ${common_dir}/ccresult.html
+template_name: Default
+</pre>
+<p>You would also put into the configuration file any other lines from the
+default configuration file that apply to htsearch.</p>
+
+<p>The files ${common_dir}/ccheader.html and
+${common_dir}/ccfooter.html and ${common_dir}/ccresult.html would be
+tailored to give the output in the desired format.</p>
+
+<p>Assuming your configuration file is called cc.conf, the html file that
+links to the search has to set the config parameter equal to cc. The
+following line would do it:<br>
+<code>&lt;input type="hidden" name="config" value="cc"&gt;</code></p>
+
+	  <p><strong>Note:</strong> Don't just add the line above to your
+	  <a href="hts_form.html">search form</a>
+	  without checking if there isn't already a similar
+	  line giving the config attribute a different value. The sample
+	  search.html form that comes with the package includes a line
+	  like this already, giving "config" the default value of "htdig".
+	  If it's there, modify it instead of adding another definition.
+	  The config input parameter doesn't need to be hidden either, and
+	  you may want to define it as a pull-down list to select different
+	  databases (see question <a href="#q4.4">4.4</a>).</p>
+
+	  <strong>4.3. <a name="q4.3">How do I index pages that start with '~'?</a></strong><br>
+	  <p>
+	  ht://Dig should index pages starting with '~' as if it was another
+	  web browser. If you are having problems with this, check your server
+	  log files to see what file the server is attempting to return.
+	  </p>
+
+	  <strong>4.4. <a name="q4.4">Can I use multiple databases?</a></strong><br>
+	  <p>Yes, though you may find it easier to have one larger
+	  database and use restrict or exclude fields on searches. To use
+	  multiple databases, you will need a config file for each
+	  database. Then each file will set the
+	  <a href="attrs.html#database_dir">database_dir</a> or
+	  <a href="attrs.html#database_base">database_base</a> attribute to
+	  change the name of the databases. The config file is selected
+	  by the <strong>config</strong> input field in the search form.
+	  <br>See also questions <a href="#q4.2">4.2</a> and
+	  <a href="#q4.20">4.20</a>.</p>
+
+	  <strong>4.5. <a name="q4.5">OK, I can use multiple databases. Can I
+	  merge them into one?</a></strong><br>
+	  <p>As of version 3.1.0, you can do this with the -m option to
+	  <a href="htmerge.html">htmerge</a>.</p>
+
+	  <strong>4.6. <a name="q4.6">Wow, ht://Dig eats up a lot of disk
+	  space. How can I cut down?</a></strong><br>
+	  <p>There are several ways to cut down on disk space. One is
+	  not to use the "-a" option, which creates work copies of the
+	  databases. Naturally this essentially doubles the disk
+	  usage. If you don't need to index and search at the same time, you can
+	  ignore this flag.</p>
+
+	  <p>If you are running 3.2.0b5 or higher and don't have
+	  <a href="dev/htdig-3.2/attrs.html#wordlist_compress_zlib">compression</a>
+	  turned on, then turning that on will also save considerable space.</p>
+
+	  <p>Changing configuration variables can also help cut
+	  down on disk usage. Decreasing
+	  <a href="attrs.html#max_head_length">max_head_length</a> and
+	  <a href="attrs.html#max_meta_description_length">max_meta_description_length</a>
+	  will cut down on the size of the excerpts stored (in fact, if you
+	  don't have
+	  <a href="attrs.html#use_meta_description">use_meta_description</a>
+	  set, you can set
+	  max_meta_description_length to 0!).</p>
+	  
+	  <p>If you are running 3.2.0b6 or higher, you can turn off
+	  <a href="dev/htdig-3.2/attrs.html#store_phrases">store_phrases</a>.  This cuts the
+	  database size by about 60%, at the expense of severely limiting
+	  the effectiveness of phrase searches.  It also reduces digging time
+	  slightly.</p>
+
+	  <p>Other techniques include removing the db.wordlist file and adding
+	  more words to the <a href="attrs.html#bad_words">bad_words</a>
+	  file.</p>
+
+	  <p>The University of Leipzig has published
+	  <a href="http://wortschatz.uni-leipzig.de/html/wliste.html">
+	  word lists</a> containing the 100, 1000 and 10000 most often used
+	  words in English, German, French and Dutch. No copyrights or
+	  restrictions seem to be applied to the downloadable files. These
+	  can be very handy when putting together a bad_words file. Thanks
+	  to Peter Asemann for this tip.</p>
+
+	  <strong>4.7. <a name="q4.7">Can I use SSI or other CGIs in my
+	  htsearch results?</a></strong><br>
+	  <p>Not really. Apache will not parse CGI output for SSI
+	  statements (See the <a
+	  href="http://www.apache.org/docs/misc/FAQ.html#ssi-part-iii">Apache
+	  FAQ</a>). Thus,the htsearch CGI does not understand SSI
+	  markup and thus cannot include other
+	  CGIs. However, it is possible doing it the other way round:
+	  you can have the htsearch results included in your dynamic
+	  page.
+	  </p>
+	  <p>
+	  The Apache project has mentioned that this will be a
+	  feature added to the Apache 2.0 version, currently in development.
+	  </p>
+
+	  <p>The easiest approach in the meantime is using SSI with
+	  the help of the <a
+	  href="attrs.html#script_name">script_name</a> configuration
+	  file attribute. See the <code>contrib/scriptname</code>
+	  directory for a small example using SSI.</p>
+
+	  <p>For CGI and PHP, you need a &quot;wrapper&quot; script to
+	  do that. For perl script examples, see the files in
+	  <code>contrib/ewswrap</code>. The PHP guide (see <a
+	  href="http://www.htdig.org/contrib/guides.html">contributed
+	  guides</a>) not only describes a wrapper script for PHP, but
+	  also offers a step by step tutorial to the basics of
+	  ht://dig and is well worth reading.
+	  For other alternatives, see question <a href="#q4.11">4.11</a>.
+	  </p>
+
+	  <strong>4.8. <a name="q4.8">How do I index Word, Excel, PowerPoint
+	  or PostScript documents?</a></strong><br>
+	  <p>This must be done with an
+	  <a href="attrs.html#external_parsers">external parser or converter</a>.
+	  A sample of such an external converter is the
+	  contrib/doc2html/doc2html.pl Perl script.
+	  It will parse Word, PostScript, PDF and other documents, when used
+	  with the appropriate document to text converters. It uses catdoc to
+	  parse Word documents, and ps2ascii to parse PostScript files. The
+	  comments in the Perl script and accompanying documentation
+	  indicate where you can obtain these converters.</p>
+
+	  <p>Versions of htdig before 3.1.4 don't support external converters,
+	  so you have to use an external parser script such as
+	  contrib/parse_doc.pl (or better yet, upgrade htdig if you can).
+	  External converter scripts are simpler to write and maintain than a
+	  full external parser, as they just convert input documents to
+	  text/plain or text/html, and pass that back to htdig to be parsed.
+	  Parsing is more consistent across document types with external
+	  converters, because the final work is done by htdig's internal
+	  parsers.  External parser scripts tend to be hacks that don't
+	  recognize a lot of the parsing attributes in your htdig.conf, so
+	  they have to be hacked some more when you change your attributes.</p>
+
+	  <p>The most recent versions of parse_doc.pl, conv_doc.pl and
+	  the doc2html package are available on our <a
+	  href="http://www.htdig.org/files/contrib/parsers/">web site</a>.<br>
+	   See below for an example of doc2html.pl, or see the comments in
+	  conv_doc.pl and parse_doc.pl, or the documentation for doc2html
+	  for examples of their usage.
+	  For help with troubleshooting, see questions
+	  <a href="#q5.37">5.37</a> and <a href="#q5.39">5.39</a>.</p>
+
+	  <strong>4.9. <a name="q4.9">How do I index PDF files?</a></strong><br>
+	  <p>This too can be done with an
+	  <a href="attrs.html#external_parsers">external parser or converter</a>,
+	  in combination with the pdftotext program that is part of the
+	  <a href="http://www.foolabs.com/xpdf/">xpdf</a> 0.90 package. A
+	  sample of such a converter is the doc2html.pl Perl
+	  script. It uses pdftotext to parse PDF documents, then processes
+	  the text into external parser records.
+	  The most recent version of doc2html.pl is available on our <a
+	  href="http://www.htdig.org/files/contrib/parsers/">web
+	  site</a>.</p>
+
+	  <p>For example, you could put this in your configuration file:</p>
+<pre>
+<a href="attrs.html#external_parsers">external_parsers</a>: application/msword-&gt;text/html /usr/local/bin/doc2html.pl \
+                  application/postscript-&gt;text/html /usr/local/bin/doc2html.pl \
+                  application/pdf-&gt;text/html /usr/local/bin/doc2html.pl
+</pre>
+	  <p>You would also need to configure the script to indicate where all
+	  of the document to text converters are installed. See the DETAILS
+	  file that comes with doc2html for more information.</p>
+
+	  <p>Versions of htdig before 3.1.4 don't support external converters,
+	  so you have to use an external parser script such as
+	  contrib/parse_doc.pl (or better yet, upgrade htdig if you can).
+	  See question <a href="#q4.8">4.8</a> above.</p>
+
+	  <p>Whether you use this external parser or converter, or acroread
+	  with the <a href="attrs.html#pdf_parser">pdf_parser</a> attribute,
+	  to successfully index PDF files be sure to set the <a
+	  href="attrs.html#max_doc_size">max_doc_size</a> attribute to
+	  a value larger than the size of your largest PDF file. PDF
+	  documents can not be parsed if they are truncated.</p>
+
+	  <p>This also raises the questions of why two different
+	  methods of indexing PDFs are supported, and which method
+	  is preferred.  The built-in PDF support, which uses acroread
+	  to convert the PDF to PostScript, was the first method which
+	  was provided. It had a few problems with it: acroread is not
+	  open source, it is not supported on all systems on which
+	  ht://Dig can run, and for some PDFs, the PostScript that
+	  acroread generated was very difficult to parse into indexable
+	  text. Also, the built-in PDF support expected PDF documents to
+	  use the same character encoding as is defined in your current
+	  <a href="attrs.html#locale">locale</a>, which isn't always the
+	  case. The external converters, which use pdftotext, were developed
+	  to overcome these problems. xpdf 0.90 is free software, and its
+	  pdftotext utility works very well as an indexing tool.
+	  It also converts various PDF encodings to the Latin 1 set.
+	  It is the opinion of the developers that this is the
+	  preferred method. However, some users still prefer to stick
+	  with acroread, as it works well for them, and is a little
+	  easier to set up if you've already installed Acrobat.</p>
+
+	  <p>Also, pdftotext still has some difficulty handling text in
+	  landscape orientation, even with its new -raw option in 0.90,
+	  so if you need to index such text in PDFs, you may still get
+	  better results with acroread. The pdf_parser attribute has been
+	  removed from the 3.2 beta releases of htdig, so to use acroread
+	  with htdig 3.2.0b5 or other 3.2 betas, use the acroconv.pl
+	  external converter script from our <a
+	  href="http://www.htdig.org/files/contrib/parsers/">web site</a>.</p>
+
+	  <p>See also question <a href="#q5.2">5.2</a> below and
+	  question <a href="#q1.13">1.13</a> above.
+	  See questions <a href="#q5.37">5.37</a> and <a href="#q5.39">5.39</a>
+	  for troubleshooting tips.</p>
+
+	  <strong>4.10. <a name="q4.10">How do I index documents in other
+	  languages?</a></strong><br>
+	  <p>The first and most important thing you must do,
+	  to allow ht://Dig to properly support international
+	  characters, is to define the correct locale for the
+	  language and country you wish to support.  This is done
+	  by setting the <a href="attrs.html#locale">locale</a>
+	  attribute (see question <a href="#q5.8">5.8</a>). The
+	  next step is to configure ht://Dig to use dictionary and
+	  affix files for the language of your choice. These can
+	  be the same dictionary and affix files as are used by the
+	  ispell software.  A collection of these is available from
+	  Geoff Kuenning's
+	  <a href="http://fmg-www.cs.ucla.edu/geoff/ispell-dictionaries.html">
+	  International Ispell Dictionaries page</a>, and we're slowly
+	  building a collection of word lists on our <a
+	  href="http://www.htdig.org/files/contrib/wordlists/">web site</a>.</p>
+	  <p>For example, if you install German dictionaries in common/german,
+	  you could use these lines in your configuration file:</p>
+<pre>
+<a href="attrs.html#locale">locale</a>:               de_DE
+lang_dir:             ${<a href="attrs.html#common_dir">common_dir</a>}/german
+<a href="attrs.html#bad_word_list">bad_word_list</a>:        ${lang_dir}/bad_words
+<a href="attrs.html#endings_affix_file">endings_affix_file</a>:   ${lang_dir}/german.aff
+<a href="attrs.html#endings_dictionary">endings_dictionary</a>:   ${lang_dir}/german.0
+<a href="attrs.html#endings_root2word_db">endings_root2word_db</a>: ${lang_dir}/root2word.db
+<a href="attrs.html#endings_word2root_db">endings_word2root_db</a>: ${lang_dir}/word2root.db
+</pre>
+	  <p>
+	  You can build the endings database with <code>htfuzzy endings</code>.
+	  (This command may actually take days to complete, for
+	  releases older than 3.1.2. Current releases use faster regular
+	  expression matching, which will speed this up by a few orders
+	  of magnitude.) Note that the "*.0" files are not part of
+	  the ispell dictionary distributions, but are easily made by
+	  concatenating the partial dictionaries and sorting to remove
+	  duplicates (e.g.: "<code>cat * | sort | uniq &gt; lang.0</code>"
+	  in most cases). You will also need to redefine the synonyms
+	  file if you wish to use the synonyms search algorithm. This
+	  file is not included with most of the dictionaries, nor is the
+	  <a href="attrs.html#bad_words">bad_words</a> file.</p>
+
+	  <p>If you put all the language-specific
+	  dictionaries and configuration files in separate directories,
+	  and set all the attribute definitions accordingly in each
+	  search config file to access the appropriate files, you can
+	  have a multilingual setup where the user selects the language
+	  by selecting the "config" input parameter value. In addition
+	  to the attributes given in the example above, you may also
+	  want custom settings for these language-specific attributes:
+	  <a href="attrs.html#date_format">date_format</a>,
+	  <a href="attrs.html#iso_8601">iso_8601</a>,
+	  <a href="attrs.html#method_names">method_names</a>,
+	  <a href="attrs.html#no_excerpt_text">no_excerpt_text</a>,
+	  <a href="attrs.html#no_next_page_text">no_next_page_text</a>,
+	  <a href="attrs.html#no_prev_page_text">no_prev_page_text</a>,
+	  <a href="attrs.html#nothing_found_file">nothing_found_file</a>,
+	  <a href="attrs.html#page_list_header">page_list_header</a>,
+	  <a href="attrs.html#prev_page_text">prev_page_text</a>,
+	  <a href="attrs.html#search_results_wrapper">search_results_wrapper</a>
+	  (or <a href="attrs.html#search_results_header">search_results_header</a>
+	  and <a href="attrs.html#search_results_footer">search_results_footer</a>),
+	  <a href="attrs.html#sort_names">sort_names</a>,
+	  <a href="attrs.html#synonym_db">synonym_db</a>,
+	  <a href="attrs.html#synonym_dictionary">synonym_dictionary</a>,
+	  <a href="attrs.html#syntax_error_file">syntax_error_file</a>,
+	  <a href="attrs.html#template_map">template_map</a>, and of course
+	  <a href="attrs.html#database_dir">database_dir</a> or
+	  <a href="attrs.html#database_base">database_base</a> if you
+	  maintain multiple databases for sites of different languages.
+	  You could also change the definition of
+	  <a href="attrs.html#common_dir">common_dir</a>, rather than
+	  making up a lang_dir attribute as above, as many language-specific
+	  files are defined relative to the common_dir setting.</p>
+
+	  <p>If you're running version 3.1.6 of ht://Dig, you may also
+	  be interested in the <strong>accents</strong> fuzzy match
+	  algorithm in the
+	  <a href="attrs.html#search_algorithm">search_algorithm</a>
+	  attribute, which lets you treat accented and unaccented letters
+	  as equivalent in words. Note that if you use the accents algorithm,
+	  you need to rebuild the accents database each time you update your
+	  word database, using <code>"htfuzzy accents"</code>. This command
+	  isn't in the default rundig script, so you may want to add it there.
+	  The accents fuzzy match algorithm is also in the 3.2 beta releases.
+	  There are also the
+	  <a href="attrs.html#boolean_keywords">boolean_keywords</a> and
+	  <a href="attrs.html#boolean_syntax_errors">boolean_syntax_errors</a>
+	  attributes in 3.1.6 for changing other language-specific messages
+	  in htsearch.</p>
+
+	  <p>Current versions of ht://Dig only support 8-bit
+	  characters, so languages such as Chinese and Japanese, which
+	  require 16-bit characters, are not currently supported.</p>
+
+	  <p>Didier Lebrun has written a guide for configuring htdig to
+	  support French, entitled
+	  <a href="http://www.quartier-rural.org/dl/elucu/htdig-vf/lisezmoi.html">
+	  Comment installer et configurer HtDig pour la langue fran&ccedil;aise</a>.
+	  His "kit de francisation" is also available on
+	  <a
+	  href="http://www.htdig.org/files/contrib/wordlists/">our
+	  web site</a>.</p>
+
+	  <p>See also question <a href="#q4.2">4.2</a> for tips on customizing
+	  htsearch, and question <a href="#q4.6">4.6</a> for tips where to find
+	  bad_words files.</a></p>
+
+	  <strong>4.11. <a name="q4.11">How do I get rotating banner ads in
+	  search results?</a></strong><br>
+	  <p>While htsearch doesn't currently provide a means of doing
+	  SSI on its output, or calling other CGI scripts, it does have
+	  the capability of using environment variables in templates.</p>
+
+	  <p>The easiest way to get rotating banners in htsearch is
+	  to replace htsearch with a wrapper script that sets an
+	  environment variable to the banner content, or whatever
+	  dynamically generated content you want. Your script can then
+	  call the real htsearch to do the work. The wrapper script can be
+	  written as a shell script, or in Perl, C, C++, or whatever you
+	  like. You'd then need to reference that environment variable
+	  in header.html (or wrapper.html if that's what you're using),
+	  to indicate where the dynamic content should be placed.</p>
+
+	  <p>If the dynamic content is generated by a CGI script, your new
+	  wrapper script which calls this CGI would then have to strip out
+	  the parts that you don't want embedded in the output (headers,
+	  some tags) so that only the relevant content gets put into the
+	  environment variable you want.  You'd also have to make sure
+	  this CGI script doesn't grab the POST data or get confused by
+	  the QUERY_STRING contents intended for htsearch. Your script
+	  should not take anything out of, or add anything to, the
+	  QUERY_STRING environment variable.</p>
+
+	  <p>An alternative approach is to have a cron job that periodically
+	  regenerates a different header.html or wrapper.html with the
+	  new banner ad, or changes a link to a different pre-generated
+	  header.html or wrapper.html file. For other alternatives, see
+	  question <a href="#q4.7">4.7</a>.</p>
+
+	  <strong>4.12. <a name="q4.12">How do I index numbers in documents?</a></strong><br>
+	  <p>By default, htdig doesn't treat numbers without letters
+	  as words, so it doesn't index them.
+	  To change this behavior, you must set the
+	  <a href="attrs.html#allow_numbers">allow_numbers</a>
+	  attribute to true, and rebuild your index from scratch using
+	  rundig or htdig with the -i option, so that bare numbers get
+	  added to the index.</p>
+
+	  <strong>4.13. <a name="q4.13">How can I call htsearch from a hypertext
+	  link, rather than from a search form?</a></strong><br>
+	  <p>If you change the search.html form to use the GET method
+	  rather than POST, you can see the URLs complete with all the
+	  arguments that htsearch needs for a query. Here is an example:<br>
+<code>
+http://www.grommetsRus.com/cgi-bin/htsearch?config=htdig&amp;restrict=&amp;exclude=&amp;method=and&amp;format=builtin-long&amp;words=grapple+grommets
+</code>
+	  which can actually be simplified to:<br>
+<code>
+http://www.grommetsRus.com/cgi-bin/htsearch?method=and&amp;words=grapple+grommets
+</code>
+	  with the current defaults. The "&amp;" character acts as a
+	  separator for the input parameters, while the "+" character
+	  acts as a space character within an input parameter.
+	  In versions 3.1.5 or 3.2.0b2, or later, you can use a semicolon
+	  character ";" as a parameter separator, rather than "&amp;", for
+	  HTML 4.0 compliance.
+	  Most non-alphanumeric characters should be hex-encoded following
+	  the convention for URL encoding (e.g. "%" becomes "%25", "+"
+	  becomes "%2B", etc). Any htsearch input parameter that you'd
+	  use in a search form can be added to the URL in this way.
+	  This can be embedded into an &lt;a href="..."&gt; tag.
+	   <br>See also question <a href="#q5.21">5.21</a>.</p>
+
+	  <strong>4.14. <a name="q4.14">How do I restrict a search to only meta
+	  keywords entries in documents?</a></strong><br>
+	  <p>First of all, you do <strong>not</strong> do this by using the
+	  "keywords" field in the search form. This seems to be a
+	  frequent cause of confusion.	The "keywords" input parameter
+	  to htsearch has absolutely nothing to do with searching meta
+	  keywords fields.  It actually predates the addition of meta
+	  keyword support in 3.1.x.  A better choice of name for the
+	  parameter would have been "requiredwords", because that's what
+	  it really means - a list of words that are all required to be
+	  found somewhere in the document, in addition to the words the
+	  user specifies in the search form.</p>
+
+ 	  <p>As of 3.2.0b5, the most direct way to search for a particular
+ 	  meta keyword is to specify the word as "keyword:&lt;word&gt;".
+ 	  Similarly, "title:", "heading:", and "author:" restrict searches
+ 	  to the respective fields.  To search for words in the body of the
+ 	  text, use "text:".</p>
+ 
+ 	  <p>To restrict all search terms to meta keywords only, you can set all
+ 	  <a href="attrs.html#heading_factor">factors</a> other than
+ 	  keywords_factor to 0, and for 3.1.x, you
+ 	  must then reindex your documents.  In the 3.2 betas, you can
+	  change factors at search time without needing to reindex.
+	  As of 3.2.0b5, it is possible to restrict
+	  the search in the query itself.  Note that changing the scoring
+	  factors in this way will only alter the scoring of search results,
+	  and shift the low or zero scores to the end of the results when
+	  sorting by score (as is done by default). For versions before
+	  3.2.0b5, the results with scores
+	  of zero aren't actually removed from the search results.</p>
+
+	  <strong>4.15. <a name="q4.15">Can I use meta tags to prevent htdig from
+	  indexing certain files?</a></strong><br>
+	  <p>Yes, in each HTML file you want to exclude, add the following
+	  between the &lt;HEAD&gt; and &lt;/HEAD&gt; tags:</p>
+		<blockquote>
+		   &lt;META NAME="robots" CONTENT="noindex, follow"&gt;
+		</blockquote>
+	  <p>Doing so will allow htdig to still follow links to other documents,
+	  but will prevent this document from being put into the index itself.
+	  You can also use "nofollow" to prevent following of links. See
+	  the section on <a href="meta.html">Recognized META information</a>
+	  for more details. For documents produced automatically by MhonArc,
+	  you can have that line inserted automatically by putting it in the
+	  MhonArc resource file, in the sections IDXPGBEGIN and TIDXPGBEGIN.</p>
+
+	  <p>You can also use the
+	  <a href="attrs.html#noindex_start">noindex_start</a> and
+	  <a href="attrs.html#noindex_end">noindex_end</a> attributes to
+	  define one set of tags which will mark sections to be stripped out
+	  of documents, so they don't get indexed, or you can mark sections
+	  with the non-DTD &lt;noindex&gt; and &lt;/noindex&gt; tags.
+	  The noindex_start and noindex_end attributes can also be used to
+	  suppress in-line JavaScript code that wasn't properly enclosed in
+	  HTML comment tags (see question <a href="#q4.26">4.26</a>).
+	  In 3.1.6, you can also put a section between &lt;noindex follow&gt;
+	  and &lt;/noindex&gt; tags to turn off indexing of text but still
+	  allow htdig to follow links.</p>
+
+	  <p>If you require much more elaborate schemes for avoiding indexing
+	  certain parts of your HTML files, especially if you don't have
+	  control over these files and can't add tags to them, you can
+	  set up htdig's
+	  <a href="attrs.html#external_parsers">external_parsers</a> attribute
+	  with an external converter that will preprocess the HTML before
+	  it's parsed and indexed by htdig. Examples of this are the
+	  unhypermail.sh script in our
+	  <a href="http://www.htdig.org/files/contrib/parsers/">contributed parsers</a>
+	  and the ungeoify.sh script in our
+	  <a href="http://www.htdig.org/files/contrib/scripts/">contributed scripts</a>.
+	  By preprocessing the HTML, you can strip out parts you don't want, or
+	  you can add or change tags wherever they're needed, if you're willing
+	  to put in the effort to learn awk/sed/perl enough to do the job.</p>
+
+	  <strong>4.16. <a name="q4.16">How do I get htsearch to use the star image
+	  in a different directory than the default /htdig?</a></strong><br>
+	  <p>You must set either the
+	  <a href="attrs.html#image_url_prefix">image_url_prefix</a> attribute,
+	  or both <a href="attrs.html#star_blank">star_blank</a> and
+	  <a href="attrs.html#star_image">star_image</a> in your
+	  htdig.conf, to refer to the URL path for these files. You should
+	  also set this URL path similarly in in common/header.html and
+	  common/wrapper.html, as they also refer to the star.gif file.
+	  If you want to relocate other graphics, such as the buttons or
+	  the ht://Dig logo, you should change all references to these
+	  in htdig.conf and common/*.html.</p>
+
+	  <strong>4.17. <a name="q4.17">How do I get htdig or htsearch to rewrite
+	  URLs in the search results?</a></strong><br>
+	  <p>This can be done by using the <a
+	  href="attrs.html#url_part_aliases">url_part_aliases</a>
+	  configuration file attribute. You have to set up different
+	  configuration files for htdig and htsearch, to define a
+	  different setting of this attribute for each one.</p>
+
+	  <p>A large number of users insist on ignoring that last point
+	  and try to make do with just one definition, either for htdig
+	  or htsearch, or sometimes for both. This seems to stem from
+	  a fundamental misunderstanding of how this attribute works,
+	  so perhaps a clarification is needed. The url_part_aliases
+	  attribute uses a two stage process. In the first stage, htdig
+	  encodes the URLs as they go into the database, by using the
+	  pairs in url_part_aliases going from left to right. In the
+	  second stage, htsearch decodes the encoded URLs taken from the
+	  database, by using the pairs in url_part_aliases going from
+	  right to left. If you have the same value for url_part_aliases
+	  in htdig and htsearch, you end up with the same URLs in the
+	  end. If you modify the first string (the from string) in
+	  the pairs listed in url_part_aliases for htsearch, then when
+	  htsearch decodes the URLs it ends up rewriting part of them.</p>
+
+	  <p>While you might think that if you don't use url_part_aliases
+	  in htdig, then you can use it in htsearch to alter unencoded
+	  URLs, the reality is that if you don't encode parts of URLs
+	  using url_part_aliases, they still get encoded automatically
+	  by the <a href="attrs.html#common_url_parts">common_url_parts</a>
+	  attribute. This helps to reduce the size of your databases. So,
+	  trying to use url_part_aliases only in htsearch doesn't work
+	  because there are no unencoded URLs in the database, so the
+	  right hand strings in the pairs you define won't match anything.</p>
+
+	  <p>You also can't put two different definitions of the
+	  url_part_aliases attribute in a single configuration file, as
+	  some users have attempted. When you define an attribute twice,
+	  the second definition merely overrides the first. Pay close
+	  attention to the description and examples for
+	  <a href="attrs.html#url_part_aliases">url_part_aliases</a>.
+	  You must put one definition of this attribute in your
+	  configuration file for htdig, htmerge (or htpurge) and htnotify,
+	  and a different definition of it in your configuration file
+	  for htsearch.</p>
+
+	  <strong>4.18. <a name="q4.18">What are all the options in
+	  htdig.conf, and are there others?</a></strong><br>
+	  <p>In ht://Dig's terminology, the settings in its configuration
+	  files are called <a href="attrs.html">configuration attributes</a>,
+	  to distinguish them from <a href="htdig.html">command line
+	  options</a>, <a href="hts_form.html">CGI input parameters</a>
+	  and <a href="hts_templates.html">template variables</a>. There are
+	  many, many attributes that can be set to control almost all
+	  aspects of indexing, searching, customization of output and
+	  internationalization. All attributes have a built-in default
+	  setting, and only a subset of these appear in the sample htdig.conf
+	  file. See the documentation for all default values for attributes
+	  not overridden in the configuration file, and for help on using
+	  any of them.
+	  See also question <a href="#q1.15">1.15</a>.</p>
+
+	  <strong>4.19. <a name="q4.19">How do I get more than 10 pages of
+	  10 search results from htsearch?</a></strong><br>
+	  <p>There are two attributes that control the number of matches per
+	  page and the total number of pages. The number of matches per page
+	  can be set in your configuration file, using the
+	  <a href="attrs.html#matches_per_page">matches_per_page</a> attribute,
+	  or in your <a href="hts_form.html">search form</a>, using the
+	  <strong>matchesperpage</strong> input parameter.</p>
+
+	  <p>The number of pages is controlled by the
+	  <a href="attrs.html#maximum_pages">maximum_pages</a> attribute in
+	  your search configuration file.
+	  The current default for maximum_pages is 10 because the ht://Dig
+	  package comes with 10 images, with numbers 1 through 10, for
+	  use as page list buttons. If we increased the limit, we'd have
+	  to field a whole lot more questions from users irate because
+	  only the first 10 buttons are graphics, and the rest are text.
+	  If you want more than 10 pages of results, change maximum_pages,
+	  but you may also want to set the
+	  <a href="attrs.html#page_number_text">page_number_text</a> and
+	  <a href="attrs.html#no_page_number_text">no_page_number_text</a>
+	  attributes in your search configuration file to nothing, or
+	  remove them, to use text rather than images for the links to
+	  other pages.</p>
+
+	  <p>In version of htsearch before 3.1.4, maximum_pages
+	  limited only the number of page list buttons, and not the
+	  actual number of pages. This was changed because there was no
+	  means of limiting the total number of pages, but this ended up
+	  frustrating users who wanted the ability to have more pages than
+	  buttons. In 3.2.0b3 and 3.1.6 we introduced a
+	  <a href="attrs.html#maximum_page_buttons">maximum_page_buttons</a>
+	  attribute for this purpose.</p>
+
+	  <strong>4.20. <a name="q4.20">How do I restrict a search to only
+	  certain subdirectories or documents?</a></strong><br>
+	  <p>That depends on whether you want to protect certain parts of
+	  your site from prying eyes, or just limit the scope of search
+	  results to certain relevant areas. For the latter, you just need
+	  to set the <strong>restrict</strong> or <strong>exclude</strong>
+	  input parameter in the <a href="hts_form.html">search form</a>.
+	  This can be done using hidden input fields containing preset
+	  values, text input fields, select lists, radio buttons or
+	  checkboxes, as you see fit. If you use select lists, you can
+	  propagate the choices to select lists in the follow-up search
+	  forms using the
+	  <a href="attrs.html#build_select_lists">build_select_lists</a>
+	  configuration attribute.
+	  The University at Albany has a good description of how to use
+	  the <strong>restrict</strong> or <strong>exclude</strong> input
+	  parameters: <a href="http://www.albany.edu/its/web/search/">
+	  Constructing a local search using ht://Dig Search forms</a>.
+	  <br>To include a hex encoded character (such as a %20 for a space)
+	  in a restrict or exclude string, the '%' must again be encoded.
+	  For example, to match a filename containing a space, the URL must
+	  contain %20, and so the CGI parameter passed to htsearch must
+	  contain %2520. The %25 encodes the '%'. (Note that this is only
+	  necessary for CGI input parameters, not for the corresponding
+	  configuration attributes in your htdig.conf file, as attributes
+	  aren't subjected to the same hex decoding step as parameters are.)
+	  <br>See also question <a href="#q4.4">4.4</a>.</p>
+
+	  <p>If you wish to keep secure and non-secure areas on
+	  your site separate, and avoid having unauthorized users
+	  seeing documents from secure areas in their search results,
+	  that takes a bit more effort. You certainly can't rely on
+	  the <strong>restrict</strong> and <strong>exclude</strong>
+	  parameters, or even the <strong>config</strong> parameter,
+	  as any parameter in a search form can also be overridden
+	  by the user in a URL with CGI parameters. The safest
+	  option would be to host the secure and non-secure areas on
+	  separate servers with independent installations of htsearch,
+	  each with its own ht://Dig database, but that is often too
+	  costly or impractical an option. The next best thing is to
+	  host them on the same site, but make sure that everything
+	  is very clearly separated to prevent any leakage of secure
+	  data. You should maintain separate databases for the secure
+	  and public areas of your site, by setting up different htdig
+	  configuration files for each area. Use different settings
+	  of the <a href="attrs.html#start_url">start_url</a>,
+	  <a href="attrs.html#limit_urls_to">limit_urls_to</a>
+	  and <a href="attrs.html#database_dir">database_dir</a>
+	  configuration attributes, and possibly even different
+	  <a href="attrs.html#common_dir">common_dir</a> settings as well.
+	  Make sure your database_dir, and even your common_dir, are not
+	  in any directories accessible from the web server. Run htdig
+	  and htmerge (or rundig) with each separate configuration file,
+	  to build your two databases.</p>
+
+	  <p>The tricky part is to make sure your htsearch program is
+	  secure. You don't want to use the same htsearch for the secure
+	  and public sites, because otherwise the public site could
+	  access the configuration for the secure database, making its
+	  data publicly accessible. You must either compile two separate
+	  versions of htsearch, with different settings of the CONFIG_DIR
+	  <em>make</em> variable, or you must make a simple wrapper
+	  script for htsearch that overrides the compiled-in CONFIG_DIR
+	  setting by a different setting of the CONFIG_DIR environment
+	  variable. Make sure the CONFIG_DIR for the secure area is
+	  not a subdirectory of the CONFIG_DIR for the public area.
+	  In this way, you can maintain separate directories of config
+	  files for the public and secure sites, so that the secure
+	  config files are not accessible from the public htsearch.</p>
+
+	  <p>Put the htsearch binary or wrapper script for the secure site
+	  in a different ScriptAlias'ed cgi-bin directory than the public
+	  one, and protect the secure cgi-bin with a .htaccess file or
+	  in your server configuration. Alternatively, you can put the
+	  secure program, let's call it htssearch, in the same cgi-bin,
+	  but protect that one CGI program in your server configuration,
+	  e.g.:</p>
+<pre>
+&lt;Location /cgi-bin/htssearch&gt;
+AuthType Basic
+AuthName ....
+AuthUserFile ...
+AuthGroupFile ...
+&lt;Limit GET POST&gt;
+require group foo
+&lt;/Limit&gt;
+&lt;/Location&gt;
+</pre>
+	  <p>This describes the setup for an Apache server. You'd need to
+	  work out an equivalent configuration for your server if you're
+	  not running Apache.</p>
+
+	  <strong>4.21. <a name="q4.21">How can I allow people to search
+	  while the index is updating?</a></strong><br>
+	  <p>Answer contributed by Avi Rappoport &lt;avirr@searchtools.com&gt;</p>
+	  <p>If you have enough disk space for two copies of the index
+	  database, use -a with the htdig and htmerge processes. This will
+	  make use of a copy of the index database with the extension
+	  ".work", and update the copy instead of the originals.
+	  This way, htsearch can use those originals while the update is
+	  going on. When it's done, you can move the .work versions to
+	  replace the originals, and htsearch will use them. The current
+	  rundig script will do this for you if you supply the -a flag
+	  to it. However, rundig builds the database from scratch each
+	  time you run it. If you want to update an alternate copy of
+	  the database, see the
+	  <a href="http://www.htdig.org/files/contrib/scripts/rundig.sh">contributed
+	  rundig.sh script</a>.</p>
+
+	  <strong>4.22. <a name="q4.22">How can I get htdig to ignore the
+	  robots.txt file or meta robots tags?</a></strong><br>
+	  <p>You can't, and you shouldn't. The
+	  <a href="http://www.robotstxt.org/wc/norobots.html">
+	  Standard for Robot Exclusion</a> exists for a very good reason,
+	  and any well behaved indexing engine or spider should conform to it.
+	  If you have a problem with a robots.txt file, you really should
+	  take it up with the site's webmaster. If they don't have a problem
+	  with you indexing their site, they shouldn't mind setting up a
+	  User-agent entry in their robots.txt file with a name you both
+	  agree on. The user agent setting that htdig uses for matching
+	  entries in robots.txt can be changed via the
+	  <a href="attrs.html#robotstxt_name">robotstxt_name</a> attribute in
+	  your config file.</p>
+
+	  <p>If you have a problem with a robots meta tag in a document
+	  (see question <a href="#q4.15">4.15</a>) you should take it up
+	  with the author or maintainer of that page. These tags are an
+	  all or nothing deal, as they can't be set up to allow some engines
+	  and disallow others. If htdig encounters them, it has to give the
+	  page's creator the benefit of the doubt and honour them. If
+	  exceptions to the rule are wanted, this should be done with a
+	  robots.txt file rather than a meta tag.</p>
+
+	  <strong>4.23. <a name="q4.23">How can I get htdig not to index
+	  some directories, but still follow links?</a></strong><br>
+	  <p>You can simply add the directory name to your robots.txt file
+	  or to the <a href="attrs.html#exclude_urls">exclude_urls</a>
+	  attribute in your configuration, but that will exclude all files
+	  under that directory. If you want the files in that directory to
+	  be indexed, you have a couple options. You can add an index.html
+	  file to the directory, that will include a robots meta tag
+	  (see question <a href="#q4.15">4.15</a>) to prevent indexing,
+	  and will contain links to all your files in this directory.
+	  The drawback of this is that you must maintain the index.html
+	  file yourself, as it won't be automatically updated as new
+	  files are added to the directory.</p>
+
+	  <p>The other technique you can use, if you want the directory
+	  index to be made by the web server, is to get the server to
+	  insert the robots meta tag into the index page it generates.
+	  In Apache, this is done using the
+	  <a href="http://httpd.apache.org/docs/mod/mod_autoindex.html#headername">HeaderName</a>
+	  and <a href="http://httpd.apache.org/docs/mod/mod_autoindex.html#indexoptions">IndexOptions</a>
+	  directives in the directory's <strong>.htaccess</strong> file.
+	  For example:</p>
+<pre>   HeaderName .htrobots 
+   IndexOptions FancyIndexing SuppressHTMLPreamble
+</pre>
+	  <p>and in the .htrobots file:</p>
+<pre>&lt;HTML&gt;&lt;head&gt;
+&lt;META NAME="robots" CONTENT="noindex, follow"&gt;
+&lt;title&gt;Index of /this/dir&lt;/title&gt;
+&lt;/head&gt;
+</pre>
+
+	  <p>If you don't mind getting just one copy of each directory,
+	  but want to suppress the multiple copies generated by Apache's
+	  FancyIndexing option, you can either turn off FancyIndexing or
+	  you can add "?D=A ?D=D ?M=A ?M=D ?N=A ?N=D ?S=A ?S=D" to
+          the <a href="attrs.html#bad_querystr">bad_querystr</a> attribute
+	  (without the quotes) to suppress the alternately sorted views of
+	  the directory. For Apache 2.x, you'd use "C=D C=M C=N C=S O=A O=D"
+	  instead in your bad_querystr setting.</p>
+
+	  <strong>4.24. <a name="q4.24">How can I get rid of duplicates in
+	  search results?</a></strong><br>
+	  <p>This depends on the cause of the duplicate documents. htdig
+	  does keep track of the URLs it visits, so it never puts the
+	  same URL more than once in the database. So, if you have
+	  duplicate documents in your search results, it's because the
+	  same document appears under different URLs. Sometimes the
+	  URLs vary only slightly, and in subtle ways, so you may have
+	  to look hard to find out what the variation is. Here are some
+	  common reasons, each requiring a different solution.</p>
+
+	  <ul>
+	  <li>You're indexing a case insensitive web
+	  server (e.g. an NT based server), but the
+          <a href="attrs.html#case_sensitive">case_sensitive</a> attribute is
+	  still set to true. In this case, if htdig encounters two URLs
+	  pointing to the same document, but the case of the letters in
+	  one is different than the other (even if it's only 1 letter),
+	  it will not treat them as the same URL.<br><br>
+	  <li>You have symbolic links (or hard links) to some of
+	  these documents, so they can be reached by several URLs.
+	  The solution here is to build an exclude list of URLs that
+	  are actually symbolic links, and putting these in
+	  <a href="attrs.html#exclude_urls">exclude_urls</a>
+	  (or in your robots.txt file). You can automate this using a
+	  technique similar to the find command in question
+	  <a href="#q5.25">5.25</a> which builds the start_url list, but
+	  adding a -type l to find symbolic links.<br><br>
+	  <li>You have copies of the same documents in different
+	  locations. This is similar to the symbolic link problem above,
+	  but harder to fix automatically.<br><br>
+	  <li>The duplicate URLs result from CGI, SSI or other dynamic pages
+	  that give the same content even though there may be variations in
+	  the query string or other parts of the URL. The approach to
+	  fix this is similar to the fix above, but may be less easy
+	  to automate, depending on what the variations are. You can
+	  add patterns to exclude_urls or bad_querystr to get rid of
+	  unwanted variations. These are especially important to bring
+	  under control, because in some cases, if left unchecked, they
+	  can result in an <em>infinite virtual hierarchy</em> which htdig
+	  will never be able to finish indexing. For example, in a CGI-based
+	  calendar, htdig could go on following next month or next
+	  year links to infinity, but this can be stopped by adding a
+	  stop year to <a href="attrs.html#bad_querystr">bad_querystr</a>.
+	  <br><br>Another common example happens when htdig hits a link
+	  to an SSI page and the URL has an extra trailing slash. This
+	  can happen with either .shtml pages or .html pages that use
+	  the XBitHack. The trailing slash causes the URL to be misinterpreted
+	  as a directory URL, and any relative URLs in the document are added
+	  to the URL, creating longer and longer URLs that still lead to the
+	  same SSI document. There are two things you can do:<ol>
+		<li>hunt down the pages with the incorrect links, i.e.
+		search for ".shtml/" or ".html/" in URLs in your documents,
+		and fix these links; or
+		<li>add .shtml/ and .html/ to your
+		<a href="attrs.html#exclude_urls">exclude_urls</a>
+		setting to get htdig to ignore these defective links.
+	  </ol>The second option is easier, but you run the risk that htdig
+	  will miss some SSI pages if the only links to them have the trailing
+	  slash, so you may want to try hunting down the links anyway.
+	  <br><br>See also question <a href="#q5.29">5.29</a>.<br><br>
+	  <li>The duplicates result from session IDs in PHP or other dynamic
+	  pages that give the same content even though the ID changes during
+	  the indexing process. This can lead not only to duplicates, but
+	  also to URLs that become unusable because of expired session IDs.
+	  Session IDs are the bane of search engines, and you should avoid
+	  using them if at all possible. If getting rid of them altogether
+	  isn't an option, then you can at least remove them while indexing,
+	  using the <a href="attrs.html#url_rewrite_rules">url_rewrite_rules</a>
+	  attribute. This will only work if htdig can access the documents
+	  without a session ID, as htdig rewrites the URL before fetching the
+	  document, and htsearch presents the rewritten URL (without session
+	  ID) in search results.
+          </ul>
+
+	  <strong>4.25. <a name="q4.25">How can I change the scores in
+	  search results, and what are the defaults?</a></strong><br>
+	  <p>The scores are calculated mostly by htdig at indexing time,
+	  with some tweaking done by htsearch at search time. There are
+	  a number of <a href="attrs.html">configuration attributes</a>,
+	  all called <em>&lt;something&gt;</em><strong>_factor</strong>,
+	  which can control the scoring calculations. In addition, the
+	  location of words within the document has an effect on score,
+	  as word scores are also multiplied by a varying location
+	  factor somewhere in between 1000 for words near the start
+	  and 1 for words near the end of the document. As of yet,
+	  there is no way to change this factor. For any of the scoring
+	  factors you can configure, and which are used by htdig, you
+	  will have to reindex your documents so the new factors take
+	  effect. The default values for these scoring factors, as well as
+	  information about whether they're used by htdig or htsearch,
+	  are all listed in the <a href="attrs.html">configuration
+	  attributes documentation</a>. Malcolm Austen has written some
+	  <a href="http://wwwsearch.ox.ac.uk/scores.html">notes on page
+	  scores</a> for 3.1.x which you may find helpful.</p>
+
+	  <p>Note that the above applies to the 3.1.x releases, while
+	  in the 3.2 beta releases, all scores are calculated at search
+	  time with no weight being put on the location of words within
+	  the document.</p>
+
+	  <strong>4.26. <a name="q4.26">How can I get htdig not to index
+	  JavaScript code or CSS?</a></strong><br>
+	  <p>The HTML parser in htdig recognizes and parses only HTML,
+	  which is all there should be within an HTML file. If your HTML
+	  files contain in-line JavaScript code or Cascading Style Sheets
+	  (CSS), these in-line codes, which are clearly not HTML, should
+	  be enclosed within an HTML comment tag so they are hidden
+	  from view from the HTML parser, or for that matter from any
+	  web client that is not JavaScript-aware or CSS-aware. See
+	  <a href="http://www.mcli.dist.maricopa.edu/show/interact/js_b.html">
+	  Behind the Scenes with JavaScript</a> for a description of the
+	  technique, which applies equally well to in-line style sheets.
+	  If fixing up all non-HTML compliant JavaScript or CSS code in
+	  your HTML files is not an option, then see question
+	  <a href="#q4.15">4.15</a> for an alternate technique.</p>
+
+	  <p>The HTML parser in htdig 3.1.6 tries skipping over bare
+	  in-line JavaScript code in HTML, unlike previous versions,
+	  but a small bug in the parser causes it to be thrown off by a
+	  "&lt;" sign in the JavaScript, and it may then miss the closing
+	  &lt;/script&gt; tag. This can be fixed by applying this
+	  <a href="ftp://ftp.ccsf.org/htdig-patches/3.1.6/JavaScript.0">
+	  patch</a>.</p>
+
+	  <hr noshade size=2>
+
+	  <h3>5. Troubleshooting</h3>
+	  <strong>5.1. <a name="q5.1">I can't seem to index more than X documents
+	  in a directory.</a></strong><br>
+	  <p>This usually has to do with the default document size
+	  limit. If you set <a href="attrs.html#max_doc_size">
+	  max_doc_size</a> in your config file to
+	  something enough to read in the directory index (try 100000 for
+	  100K) this should fix this problem. Of course this will require
+	  more memory to read the larger file. Don't set it to a value
+	  larger than the amount of memory you have, and never more than
+	  about 2 billion, the maximum value of a 32-bit integer.
+	  If htdig is missing entire directories, see question
+	  <a href="#q5.25">5.25</a>.</p>
+
+	  <strong>5.2. <a name="q5.2">I can't index PDF files.</a></strong><br>
+	  <p>As above, this usually has to do with the default document
+	  size. What happens is ht://Dig will read in part of a PDF file
+	  and try to index it. This usually fails. Try setting
+	  <a href="attrs.html#max_doc_size">max_doc_size</a>
+	  in your config file to a larger value than the
+	  size of your largest PDF file. Don't go overboard, though, as
+	  you don't want to overflow a 32-bit integer (about 2 billion),
+	  and you don't want to allocate much more memory than you need
+	  to store the largest document.</p>
+
+	  <p>There is a bug in Adobe Acrobat Reader version 4, in its
+	  handling of the -pairs option, which causes a segmentation
+	  violation when using it with htdig 3.1.2 or earlier. There is
+	  a workaround for this as of version 3.1.3 - you must remove
+	  the -pairs option from your pdf_parser definition, if it's
+	  there.  However, acroread version 4 is still very unstable (on
+	  Linux, anyway) so it is not recommended as a PDF parser. An
+	  alternative is to use an external converter with the xpdf 0.90
+	  package installed on your system, as described in question <a
+	  href="#q4.9">4.9</a> above.</p>
+
+	  <strong>5.3. <a name="q5.3">When I run "rundig," I get a message about
+	  "DATABASE_DIR" not being found.</a></strong><br>
+	  <p>This is due to a bug in the Makefile.in file in version
+	  3.1.0b1. The easiest fix is to edit the rundig file and change
+	  the line "TMPDIR=@DATABASE_DIR@" to set TMPDIR to a directory
+	  with a large amount of temporary disk space for htmerge. This
+	  bug is fixed in version 3.1.0b2.</p>
+
+	  <strong>5.4. <a name="q5.4">When I run htmerge, it stops with an "out
+	  of diskspace" message.</a></strong><br>
+	  <p>This means that htmerge has run out of temporary disk space
+	  for sorting. Either in your "rundig" script (if you run htmerge
+	  through that) or before you run htmerge, set the variable TMPDIR
+	  to a temp directory with lots of space.</p>
+
+	  <strong>5.5. <a name="q5.5">I have problems running rundig from cron
+	  under Linux.</a></strong><br>
+	  <p>This problem commonly occurs on Red Hat Linux 5.0 and 5.1,
+	  because of a bug in vixie-cron. It causes htmerge to fail with a
+	  "Word sort failed" error. It's fixed in Red Hat 5.2.
+	  You can install vixie-cron-3.0.1-26.{arch}.rpm from a 5.2
+	  distribution to fix the problem on 5.0 or 5.1. A quick fix for
+	  the problem is to change the first line of rundig to "#!/bin/ash"
+	  which will run the script through the ash shell, but this doesn't
+	  solve the underlying problem.</p>
+
+	  <strong>5.6. <a name="q5.6">When I run htmerge, it stops with an
+	  "Unexpected file type" message.</a></strong><br>
+	  <p>Often this is because the databases are corrupt. Try removing
+	  them and rebuilding. If this doesn't work, some have found that
+	  the solution for question <a href="#q3.2">3.2</a> works for this
+	  as well. This should be fixed in versions from 3.1.x</p>
+
+	  <strong>5.7. <a name="q5.7">When I run htsearch, I get lots of Internal
+	  Server Errors (#500).</a></strong><br>
+	  <p>If you are running under Solaris, see <a href="#q3.6">3.6</a>.
+	  The solution for Solaris may also work for other OSes that use shared
+	  libraries in non-standard locations, so refer to question 3.6 if
+	  you suspect a shared library problem. In any case, check your web
+	  server error logs to see the cause of the internal server errors.
+	  If it's not a problem with shared libraries, there's a good chance
+	  that the error logs will still contain useful error messages that
+	  will help you figure out what the problem is.
+	  <br>See also questions <a href="#q5.13">5.13</a> and
+	  <a href="#q5.23">5.23</a>.</p>
+
+	  <strong>5.8. <a name="q5.8">I'm having problems with indexing words
+	  with accented characters.</a></strong><br>
+	  <p>
+	  Most of the time, this is caused by either not setting or
+	  incorrectly setting the <a
+	  href="attrs.html#locale">locale</a> attribute. The default locale
+	  for most systems is the "portable" locale, which strips
+	  everything down to standard ASCII. Most systems expect
+	  something like <code>locale: en_US</code> or
+	  <code>locale: fr_FR</code>. Locale files are often found in
+	  <code>/usr/share/locale</code> or the <tt>$LANGUAGE</tt>
+	  environment variable. See also question <a href="#q4.10">4.10</a>.
+	  </p>
+
+	  <p>Setting the locale correctly seems to be a frequent source of
+	  frustration for ht://Dig users, so here are a few pointers which
+	  some have found useful. First of all, if you don't have any luck
+	  with the settings of the <a href="attrs.html#locale">locale</a>
+	  attribute that you try, make sure you use a locale that is
+	  defined on your system. As mentioned above, these are usually
+	  installed in <code>/usr/share/locale</code>, so look there
+	  for a directory named for the locale you want to use. If
+	  you don't find it, but find something close, try that locale
+	  name. Note that the locale may not have to be specific to the
+	  language you're indexing, as long as it uses the same character
+	  set. E.g. most western European languages use the ISO-8859-1
+	  Latin 1 character set, so on most systems the locales for
+	  all these languages define the same character types table
+	  and can be used interchangeably. Some systems, however,
+	  define only the accented letters used for a given language,
+	  so "your mileage may vary." The important thing is that the
+	  directory for your locale definition <strong>must</strong>
+	  have a file named <code>LC_CTYPE</code> in it. For example,
+	  on many Linux distributions, a language-specific locale like
+	  <code>fr</code> won't contain this file, but country-specific
+	  locales like <code>fr_FR</code> or <code>fr_CA</code> will. If
+	  you don't find any appropriate locales installed on your system,
+	  try obtaining and installing the locale definition files from
+	  your OS distribution. Also, once you've set your locale, you need
+	  to reindex all your documents in order for the locale to take
+	  effect in the word database. This means rerunning the "rundig"
+	  script, or running "htdig -i" and htmerge (or htpurge in the 3.2
+	  betas).</p>
+
+	  <p>Note also that some UNIX systems and libc5-based Linux
+	  systems just don't have a working implementation of locales,
+	  so you may not be able to get locales working at all on certain
+	  systems. The
+	  <a href="http://www.htdig.org/files/contrib/other/testlocale.c">testlocale.c</a>
+	  program on our web site can let you see the LC_CTYPE tables
+	  for any locale, to aid in finding one that works. Carefully
+	  follow the directions in the program's comments to know how to
+	  use it and what to look for in its output.</p>
+
+	  <strong>5.9. <a name="q5.9">When I run htmerge, it stops with a
+	  "Word sort failed" message.</a></strong><br>
+	  <p>There are three common causes of this. First of all, the sort
+	  program may be running out of temporary file space. Fix this
+	  by freeing up some space where sort puts its temporary files,
+	  or change the setting of the TMPDIR environment variable to a
+	  directory on a volume with more space. A second common problem
+	  is on systems with a BSD version of the sort program (such as
+	  FreeBSD or NetBSD). This program uses the -T option as a record
+	  separator rather than an alternate temporary directory. On these
+	  systems, you must remove the TMPDIR environment variable from
+	  rundig, or change the code in htmerge/words.cc not to use the
+	  -T option. A third cause is the cron program on Red Hat Linux
+	  5.0 or 5.1. (See question <a href="#q5.5">5.5</a> above.)</p>
+
+	  <strong>5.10. <a name="q5.10">When htsearch has a lot of matches, it runs
+	  extremely slowly.</a></strong><br>
+	  <p>When you run htsearch with no customization, on a
+	  large database, and it gets a lot of hits, it tends to
+	  take a long time to process those hits. Some users with
+	  large databases have reported much higher performance,
+	  for searches that yield lots of hits, by setting the <a
+	  href="attrs.html#backlink_factor">backlink_factor</a> attribute
+	  in htdig.conf to 0, and sorting by score. The scores calculated
+	  this way aren't quite as good, but htsearch can process hits
+	  much faster when it doesn't need to look up the db.docdb record
+	  for each hit, just to get the backlink count, date or title,
+	  either for scoring or for sorting. This affects versions
+	  3.1.0b3 and up. In version 3.2, currently under development,
+	  the databases will be structured differently, so it should
+	  perform searches more quickly.</p>
+
+	  <p>In version 3.1.6, the date range selection code also slows
+	  down htsearch for the same reason. Unfortunately, a small bug
+	  crept into the code so that even if you don't set any of the
+	  date range input parameters (startyear, endyear, etc.), and
+	  you set backlink_factor and date_factor to 0, htsearch still
+	  looks at the date in the db.docdb record for each hit. You can
+	  avoid this either by setting startyear to 1969 and endyear to
+	  2038 in your config file, or by applying this
+	  <a href="ftp://ftp.ccsf.org/htdig-patches/3.1.6/timet_enddate.1">
+	  patch</a>.</p>
+
+	  <strong>5.11. <a name="q5.11">When I run htsearch, it gives me a count of
+	  matches, but doesn't list the matching documents.</a></strong><br>
+	  <p>This most commonly happens when you run htsearch while the
+	  database is currently being rebuilt or updated by htdig.
+	  If htdig and htmerge have run to completion, and the problem still
+	  occurs, this is usually an indication of a corrupted database. If
+	  it's finding matches, it's because it found the matching
+	  words in db.words.db.  However, it isn't finding the document
+	  records themselves in db.docdb, which would suggest that either
+	  db.docdb, or db.docs.index (which maps document IDs used in
+	  db.words.db to URLs used to look up records in db.docdb), is
+	  incomplete or messed up.  You'll likely need to rebuild your
+	  database from scratch if it's corrupted. Older versions of
+	  ht://Dig were susceptible to database corruption of this
+	  sort. Versions 3.1.2 and later are much more stable.</p>
+
+	  <p>Another possible cause of this problem is unreadable result
+	  template files. If you define external template files via the
+	  <a href="attrs.html#template_map">template_map</a> attribute,
+	  rather than using the builtin-short or builtin-long templates,
+	  and the file names are incorrect or the files do not have
+	  read permission for the user ID under which htsearch runs,
+	  then htsearch won't be able to display the results. Also,
+	  all directories leading up to these template files must be
+	  searchable (i.e. executable) by htsearch, or it won't be able
+	  to open the files. This is the opposite problem of that described
+	  in question <a href="#q5.36">5.36</a>. If htsearch displays
+	  nothing at all, you may have both problems.</p>
+
+	  <strong>5.12. <a name="q5.12">I can't seem to index documents with names
+	  like left_index.html with htdig.</a></strong><br>
+	  <p>There is a bug in the implementation of the <a
+	  href="attrs.html#remove_default_doc">remove_default_doc</a>
+	  attribute in htdig versions 3.1.0, 3.1.1 and 3.1.2, which causes
+	  it to match more than it should. The default value for this
+	  attribute is "index.html", so any URL in which the filename ends
+	  with this string (rather than matches it entirely) will have
+	  the filename stripped off. This is fixed in version 3.1.3.</p>
+
+	  <strong>5.13. <a name="q5.13">I get Premature End of Script Headers errors
+	  when running htsearch.</a></strong><br>
+	  <p>This happens when htsearch dies before putting out a
+	  "Content-Type" header. If you are running Apache under Solaris,
+	  or another system that may be using shared libraries in non-standard
+	  locations,
+	  first try the solution described in question <a href="#q3.6">3.6</a>.
+	  If that doesn't work, or you're running on another system, try
+	  running "htsearch -vvv" directly from the command line to see where
+	  and why it's failing. It should prompt you for the search words,
+	  as well as the format.
+	  <br>If it works from the command line, but not from the web
+	  server, it's almost certainly a web server configuration problem.
+	  Check your web server's error log for any information related to
+	  htsearch's failure. One increasingly common problem is Apache
+	  configurations which expect all CGI scripts to be Perl,
+	  rather than binary executables or other scripts, so they use
+	  "perl-handler" rather than "cgi-handler".
+	  <br>See also questions <a href="#q5.7">5.7</a>,
+	  <a href="#q5.14">5.14</a> and <a href="#q5.23">5.23</a>.</p>
+
+	  <strong>5.14. <a name="q5.14">I get Segmentation faults when running
+	  htdig, htsearch or htfuzzy.</a></strong><br>
+	  <p>Despite a great deal of debugging of these programs, we haven't
+	  been able to completely eliminate all such problems on all platforms.
+	  If you're running htsearch or htfuzzy on a BSDI system, a common
+	  cause of core dumps is due to a conflict between the GNU regex
+	  code bundled in htdig 3.1.2 and later, and the BSD C or C++ library.
+	  The solution is to use the BSD library's own rx code instead,
+	  using version 3.1.6 or newer as summarized by Joe Jah:</p>
+		<ul>
+		<li> ./configure --with-rx
+		<li> make
+		</ul>
+	  <p>This solution may work on some other platforms as well (we haven't
+	  heard one way or the other), but will definitely not work on some
+	  platforms. For instance, on libc5-based Linux systems, the bundled
+	  regex code works fine by default, but using libc5's regex code
+	  causes core dumps.</p>
+
+	  <p>Users of Cobalt Raq or Qube servers have complained of
+	  segmentation faults in htdig. Apparently this is due to problems
+	  in their C++ libraries, which are fixed in their experimental
+	  compiler and libraries. The following commands should install
+	  the packages you need:</p>
+		<blockquote>
+		 rpm -Uvh ftp://ftp.cobaltnet.com/pub/experimental/binutils-2.8.1-3C1.mips.rpm<br>
+		 rpm -Uvh ftp://ftp.cobaltnet.com/pub/experimental/egcs-1.0.2-9.mips.rpm<br>
+		 rpm -Uvh ftp://ftp.cobaltnet.com/pub/experimental/egcs-c++-1.0.2-9.mips.rpm<br>
+		 rpm -Uvh ftp://ftp.cobaltnet.com/pub/experimental/egcs-g77-1.0.2-9.mips.rpm<br>
+		 rpm -Uvh ftp://ftp.cobaltnet.com/pub/experimental/egcs-objc-1.0.2-9.mips.rpm<br>
+		 rpm -Uvh ftp://ftp.cobaltnet.com/pub/experimental/libstdc++-2.8.0-9.mips.rpm<br>
+		 rpm -Uvh ftp://ftp.cobaltnet.com/pub/experimental/libstdc++-devel-2.8.0-9.mips.rpm<br>
+		 rpm -Uvh --force ftp://ftp.cobaltnet.com/pub/products/current/RPMS/gcc-2.7.2-C2.mips.rpm
+		</blockquote>
+	  <p>You may have to remove the libg++ package, if you have it installed
+	  before installing libstdc++, because of conflicts in these packages.
+	  Be sure to do a "make clean" before a "make", to remove any object
+	  files compiled with the old compiler and headers.</p>
+
+	  <p>For other causes of segmentation faults, or in other programs,
+	  getting a stack backtrace after the fault can be useful in narrowing
+	  down the problem. E.g.: try "gdb /path/to/htsearch /path/to/core",
+	  then enter the command "bt". You can also try running the program
+	  directly under the debugger, rather than attempting a post-mortem
+	  analysis of the core dump. Options to the program can be given on
+	  gdb's "run" command, and after the program is suspended on fault,
+	  you can use the "bt" command. This may give you enough information
+	  to find and fix the problem yourself, or at least it may help others
+	  on the htdig mailing list to point out what to do next.</p>
+
+	  <strong>5.15. <a name="q5.15">Why does htdig 3.1.3 mangle URL parameters
+	  that contain bare "&amp;" characters?</a></strong><br>
+	  <p>This is a known bug in 3.1.3, and is fixed with this
+	  <a href="ftp://ftp.ccsf.org/htdig-patches/3.1.3/HTML.cc.0">
+	  patch</a>. You can apply the patch by entering into the main
+	  source directory for htdig-3.1.3, and using the command
+	  "patch -p0 &lt; /path/to/HTML.cc.0". This is
+	  also fixed as of version 3.1.4.</p>
+
+	  <strong>5.16. <a name="q5.16">When I run htmerge, it stops with an
+	  "Unable to open word list file '.../db.wordlist'" message.</a></strong><br>
+	  <p>The most common cause of this error is that htdig did not
+	  manage to index any documents, and so it did not create a word
+	  list. You should repeat the htdig or rundig command with the
+	  -vvv option to see where and why it is failing.
+	  See question <a href="#q4.1">4.1</a>.</p>
+
+	  <strong>5.17. <a name="q5.17">When using Netscape, htsearch always returns the
+	  "No match" page.</a></strong><br>
+	  <p>Check your search form. Chances are there is a hidden input 
+	  field with no value defined. For example, one user had<br>
+	  <code>&lt;input type=hidden name=restrict&gt;</code>
+
+	  in his search form, instead of<br>
+
+         <code>&lt;input type=hidden name=restrict value=""&gt;</code>
+
+	 The problem is that Netscape sets the missing value to a default of "  "
+	 (two spaces), rather than an empty string. For the restrict parameter,
+	 this is a problem, because htsearch won't likely find any URLs with two
+	 spaces in them. Other input parameters may similarly pose a problem.
+	  </p>
+
+	  <p>Another possibility, if you're running 3.2.0b1 or 3.2.0b2, is
+	  that you need to make the db.words.db_weakcmpr file writeable by
+	  the user ID under which the web server runs. This is a bug, and
+	  is fixed in the 3.2.0b5 beta.</p>
+
+
+	  <strong>5.18. <a name="q5.18">Why doesn't htdig follow links to other
+	  pages in JavaScript code?</a></strong><br>
+	  <p>There probably isn't any indexing tool in existance
+	  that follows JavaScript links, because they don't know how
+	  to initiate JavaScript events. Realistically, it would take a
+	  full JavaScript parser in order to be able to figure out all the
+	  possible URLs that the code could generate, something that's way
+	  beyond the means of any search engine. You have a few options:</p>
+	  <ul>
+	  <li>Add "backup" links using plain HTML &lt;a href=...&gt; tags to
+	  all the pages that could be accessed through JavaScript,
+	  <li>Add &lt;link&gt; tags to point to all these pages (see
+	  <a href="http://www.w3.org/TR/html4/struct/links.html#h-12.3.3">Links
+	  and search engines</a> in W3C's HTML 4.0 Specification - requires
+	  htdig 3.1.3 or greater, but then <em>everyone</em> should be running
+	  3.1.6 or greater anyway),
+	  <li>Compose a list of all the unreachable documents, or write
+	  a program to do so, and feed that list as part of htdig's
+	  <a href="attrs.html#start_url">start_url</a> attribute.
+	  See also question <a href="#q5.25">5.25</a>.
+	  </ul>
+
+	  <strong>5.19. <a name="q5.19">When I run htsearch from the web server,
+	  it returns a bunch of binary data.</a></strong><br>
+	  <p>Your server is returning the contents of the htsearch binary.
+	  Common causes of this are:</p>
+	  <ul>
+	  <li>no execute permission on the htsearch binary,
+	  <li>the binary won't run on this system (it may be compiled
+	  for the wrong system type), or
+	  <li>the web server doesn't recognize the file as a CGI
+	  (for Apache, you must have a ScriptAlias directive for the
+	  program or the directory in which it's installed, or define
+	  a cgi-script handler for some suffix, e.g. .cgi, and add that
+	  suffix to the program file name).
+	  </ul>
+	  <p>By default, Apache is usually configured with one cgi-bin
+	  directory as ScriptAlias, so all your CGI programs must go in
+	  there, or have a .cgi suffix on them. Your configuration may
+	  differ, however.</p>
+
+	  <strong>5.20. <a name="q5.20">Why are the betas of 3.2 so
+	  slow at indexing?</a></strong><br>
+	  <p>
+	  As the release notes for these versions suggest, they are
+	  somewhat unoptimized and are made available for testing
+	  Since the 3.2 code indexes all locations of words to support
+	  phrase searching and other advanced methods, this additional
+	  data slows down the indexer. To compensate, the code has a
+	  cache configured by the
+	  <a href="dev/htdig-3.2/attrs.html#wordlist_cache_size">wordlist_cache_size</a>
+	  attribute.
+	  As of this writing, the word database code will slow down
+	  considerably when the cache fills up. Setting the cache as
+	  large as possible provides considerable performance
+	  improvement. Development is in progress to improve cache
+	  performance.
+	  For 3.2.0b6 and higher, see also the
+	  <a href="dev/htdig-3.2/attrs.html#store_phrases">store_phrases</a> attribute,
+	  which can turn off support for phrase searches, improving the speed.
+	  </p>
+
+	  <strong>5.21. <a name="q5.21">Why does htsearch use ";" instead of
+	  "&amp;" to separate URL parameters for the page buttons?</a></strong><br>
+	  <p>In versions 3.1.5 and 3.2.0b2, and later, htsearch was
+	  changed to use a semicolon character ";" as a parameter
+	  separator for page button URLs, rather than "&amp;", for HTML
+	  4.0 compliance. It now allows both the "&amp;" and the ";" as
+	  separators for input parameters, because the CGI specification
+	  still uses the "&amp;". This change may cause some PHP or CGI
+	  wrapper scripts to stop working, but these scripts should be
+	  similarly changed to recognize both separator characters.
+	  For the definitive reference on this issue, please refer to
+	  section B.2.2 of W3C's HTML 4.0 Specification,
+	  <a href="http://www.w3.org/TR/html4/appendix/notes.html#h-B.2.2">
+	  Ampersands in URI attribute values</a>. We're all a little
+	  tired of arguing about it. If you don't like the standard, you
+	  can change the Display::createURL() code yourself to ignore it.
+	   <br>See also question <a href="#q4.13">4.13</a>.</p>
+
+	  <p>If you want to try working within the new standard, you may
+	  find it helpful to know that recent versions of CGI.pm will
+	  allow either the ampersand or semicolon as a parameter separator,
+	  which should fix any Perl scripts that use this library.
+	  In PHP, you can simply set the following in your php.ini file
+	  to allow either separator:</p>
+<pre>arg_separator.input = ";&amp;"
+</pre>
+
+	  <strong>5.22. <a name="q5.22">Why does htsearch show the
+	  "&amp;" character as "&amp;amp;" in search results?</a></strong><br>
+	  <p>In version 3.1.5, htsearch was fixed to properly
+	  re-encode the characters &amp;, &lt;, &gt;, and &quot;
+	  into SGML entities.  However, the default value for the
+	  <a href="attrs.html#translate_amp">translate_amp</a>,
+	  <a href="attrs.html#translate_lt_gt">translate_lt_gt</a>
+	  and <a href="attrs.html#translate_quot">translate_quot</a>
+	  attributes is still false, so these entities don't get converted
+	  by htdig. If you set these three attributes to true in your
+	  htdig.conf and reindex, the problem will go away.</p>
+
+	  <p>In the 3.2 betas there was a bug in the HTML parser that
+	  caused it to fail when attempting to translate the "&amp;amp;"
+	  entity. This has been fixed in 3.2.0b3. The translate_* attributes
+	  are gone as of 3.2.0b2.</p>
+
+	  <strong>5.23. <a name="q5.23">I get Internal Server or Unrecognized
+	  character errors when running htsearch.</a></strong><br>
+	  <p>An increasingly common problem is Apache configurations
+	  which expect all CGI scripts to be Perl, rather than binary
+	  executables or other scripts, so they use "perl-handler"
+	  rather than "cgi-handler". The fix is to create a separate
+	  directory for non-Perl CGI scripts, and define it as such in
+	  your httpd.conf file. You should define it the same way as your
+	  existing cgi-bin directory, but use "cgi-handler" instead of
+	  "perl-handler". In any case, you should check your web server's
+	  error log for any information related to htsearch's failure.
+	  <br>See also questions <a href="#q5.7">5.7</a>,
+	  <a href="#q5.14">5.14</a> and <a href="#q5.13">5.13</a>.</p>
+
+	  <strong>5.24. <a name="q5.24">I took some settings out of
+	  my htdig.conf but they're still set.</a></strong><br>
+	  <p>All configuration file attributes have compiled-in, default
+	  values. Taking an attribute out of the file is not the same
+	  thing as setting it to an empty string, a 0, or a value of
+	  false. See question <a href="#q4.18">4.18</a>.</p>
+
+	  <strong>5.25. <a name="q5.25">When I run htdig on my site,
+	  it misses entire directories.</a></strong><br>
+	  <p>First of all, htdig doesn't look at directories itself. It
+	  is a spider, and it follows hypertext links in HTML documents.
+	  If htdig seems to be missing some documents or entire directory
+	  sub-trees of your site, it is most likely because there are
+	  no HTML links to these documents or directories. (See also
+	  question <a href="#q5.18">5.18</a>.) If htdig does
+	  not come across at least one hypertext link to a document
+	  or directory, and it's not explicitly listed in the
+	  <a href="attrs.html#start_url">start_url</a> attribute, then
+	  this document or directory is essentially hidden from view
+	  to htdig, or to any web browser or spider for that matter.
+	  You can only get htdig to index directories, without providing
+	  your own files with links to the contents of these directories,
+	  by using your web server's automatic index generation feature.
+	  In Apache, this is done with the mod_autoindex module, which
+	  is usually compiled-in by default, and is enabled with the
+	  "Indexes" option for a given directory hierarchy. For example,
+	  you can put these directives in your Apache configuration:</p>
+<pre>
+&lt;Directory "/path/to/your/document/root"&gt;
+    Options Indexes FollowSymLinks Includes ExecCGI
+&lt;/Directory&gt;
+</pre>
+	  <p>This will cause Apache to automatically generate an index
+	  for any directory that does not have an index.html or other
+	  "DirectoryIndex" file in it. Other web servers will have
+	  similar features, which you should look for in your server
+	  documentation.</p>
+
+	  <p>As an alternative to relying on the web server's autoindex
+	  feature, you can compose a list of all the unreachable
+	  documents, or write a program to do so, and feed that list as
+	  part of htdig's <a href="attrs.html#start_url">start_url</a>
+	  attribute. Here is an example of simple shell script to make
+	  a file of URLs you can use with a configuration entry like
+	  <code>start_url: `/path/to/your/file`</code>:</p>
+<pre>
+find /path/to/your/document/root -type f -name \*.html -print | \
+    sed -e 's|/path/to/your/document/root/|http://www.yourdomain.com/|' > \
+        /path/to/your/file
+</pre>
+	  <p>Other reasons why htdig might be missing portions of your
+	  site might be that they fall out of the bounds specified
+	  by the <a href="attrs.html#limit_urls_to">limit_urls_to</a>
+	  attribute (which takes on the value of start_url by default),
+	  they are explicitly excluded using the
+	  <a href="attrs.html#exclude_urls">exclude_urls</a> attribute,
+	  or they are disallowed by a robots.txt file (see the
+	  <a href="htdig.html">htdig</a> documentation for notes about
+	  robot exclusion) or by a robots meta tag (see question
+	  <a href="#q4.15">4.15</a>). If htdig seems to be missing the
+	  last part of a large directory or document, see question
+	  <a href="#q5.1">5.1</a>. For reasons why htdig may be rejecting
+	  some links to parts of your site, see question
+	  <a href="#q5.27">5.27</a>.</p>
+
+	  <strong>5.26. <a name="q5.26">What do all the numbers and symbols
+	  in the htdig -v output mean?</a></strong><br>
+	  <p>Output from htdig -v typically looks like this:</p>
+<pre>
+23000:35506:2:http://xxx.yyy.zz/index.html: ***-+****--++***+ size = 4056
+</pre>
+	  <p>The first number is the number of documents parsed so far,
+	  the second is the DocID for this document, and the third is
+	  the hop count of the document (number of hops from one of the
+	  start_url documents). After the URL, it shows a "*" for a link
+	  in the document that it already visited (or at least queued
+	  for retrieval), a "+" for a new link it just queued, and a
+	  "-" for a link it rejected for any of a number of reasons.
+	  To find out what those reasons are, you need to run htdig
+	  with at least 3 "v" options, i.e. -vvv. If there are no "*",
+	  "+" or "-" symbols after the URL, it doesn't mean the document
+	  was not parsed or was empty, but only that no links to other
+	  documents were found within it.</p>
+
+	  <strong>5.27. <a name="q5.27">Why is htdig rejecting some of the
+	  links in my documents?</a></strong><br>
+	  <p>When htdig parses documents and finds hypertext links to
+	  other documents (hrefs), it may reject them for any of several
+	  reasons. To find out what those reasons are, you need to run
+	  htdig with at least 3 "v" options, i.e. -vvv. Here are the
+	  meanings of some of the messages you might see at this verbosity
+	  level.</p>
+	  <dl>
+	   <dt>Not an http or relative link!</dt>
+	   <dd>In versions 3.1.5 and earlier, only "http://" URLs, or
+		URLs relative to those, are allowed.</dd>
+	   <dt>Item in the exclude list: item # <em>n</em></dt>
+	   <dd>A substring of the URL matches one of the items in the
+		<a href="attrs.html#exclude_urls">exclude_urls</a>
+		attribute. The given item number will indicate which
+		pattern matched, starting at 1. The 3.2.0 betas do not
+		give the item number.</dd>
+	   <dt>Extension is invalid!</dt>
+	   <dd>The file name extension or suffix matches one of those
+		listed in the
+		<a href="attrs.html#bad_extensions">bad_extensions</a>
+		attribute.</dd>
+	   <dt>Extension is not valid!</dt>
+	   <dd>The file name extension or suffix does not match one of those
+		listed in the
+		<a href="attrs.html#valid_extensions">valid_extensions</a>
+		attribute, if any are specified.</dd>
+	   <dt>Invalid Querystring! <em>or</em><br>item in bad query list</dt>
+	   <dd>The URL contains a query string which matches one of those
+		listed in the
+		<a href="attrs.html#bad_querystr">bad_querystr</a>
+		attribute.</dd>
+	   <dt>URL not in the limits!</dt>
+	   <dd>No substring of the URL entirely matches one of the items in the
+		<a href="attrs.html#limit_urls_to">limit_urls_to</a>
+		attribute. The purpose of this attribute is to keep htdig
+		from attempting to index the entire World Wide Web.</dd>
+	   <dt>forbidden by server robots.txt!</dt>
+	   <dd>A substring of the URL matches one of the items disallowed
+		in the servers robots.txt file. See
+		<a href="http://www.robotstxt.org/wc/norobots.html">
+		A Standard for Robot Exclusion</a>. This message exists
+		only in the 3.2.0 betas. In 3.1.5 and earlier, this condition
+		is only caught later, resulting in the message
+		"robots.txt: discarding '<em>URL</em>'" from htdig, and a
+		later "Deleted: no excerpt" message from htmerge.</dd>
+	   <dt>url rejected: (level 2)</dt>
+	   <dd>No substring of the URL entirely matches one of the items in the
+		<a href="attrs.html#limit_normalized">limit_normalized</a>
+		attribute. All the other rejections above will be indicated
+		as level 1. The 3.2.0 betas give the much more meaningful
+		message 'not in "limit_normalized" list!'</dd>
+	  </dl>
+
+	  <p>Another possibility, if none of the error messages above appear
+	  for some of the links you think htdig should be accepting, is that
+	  htdig isn't even finding the links at all. First, make sure you're
+	  not making false assumptions about how htdig finds these. It only
+	  reads links in HTML code, and not JavaScript, and it doesn't read
+	  directories unless the HTTP server is feeding it directory listings.
+	  You will need to take a close look at the htdig -vvv (or -vvvv)
+	  output to see what htdig is finding, in and around the areas where
+	  the desired links are supposed to be found in your HTML code, to see
+	  if it's actually finding them.
+	  See also question <a href="#q5.25">5.25</a>.</p>
+
+	  <strong>5.28. <a name="q5.28">When I run htdig or htmerge, I get a
+	  "DB2 problem...: missing or empty key value specified" message.</a></strong><br>
+	  <p>The most common cause of this error is that htdig or
+	  htmerge rejected any documents that had been put in the
+	  database, leaving an empty database. You need to find out the
+	  reasons for the rejection of these documents. See questions
+	  <a href="#q4.1">4.1</a>, <a href="#q5.25">5.25</a> and
+	  <a href="#q5.27">5.27</a>.</p>
+
+	  <strong>5.29. <a name="q5.29">When I run htdig on my site,
+	  it seems to go on and on without ending.</a></strong><br>
+	  <p>There are some things that can cause htdig to run on without
+	  ending, especially when indexing dynamic content (ASP, PHP,
+	  SSI or CGI pages). This usually involves htdig getting caught
+	  in an <em>infinite virtual hierarchy</em>. A sure sign of
+	  this is if the current size of your database is much larger
+	  than the total size of the site you are indexing, or if in the
+	  verbose output of htdig (see question <a href="#q4.1">4.1</a>)
+	  you see the same URLs come up again and again with only subtle
+	  variations. In any case, you must figure out the reason htdig
+	  keeps revisiting the same documents using different URLs, as
+	  explained in question <a href="#q4.24">4.24</a>, and set your
+	  <a href="attrs.html#exclude_urls">exclude_urls</a> and
+	  <a href="attrs.html#bad_querystr">bad_querystr</a> attributes
+	  appropriately to stop htdig from going down those paths.
+	  </p>
+
+	  <strong>5.30. <a name="q5.30">Why does htsearch no longer recognize
+	  the -c option when run from the web server?</a></strong><br>
+	  <p>This was a security hole in 3.1.5 and older, and 3.2.0b3 and
+	  older releases of ht://Dig. (See question <a href="#q2.1">2.1</a>.)
+	  There's a compile-time macro you can set in htsearch.cc to disable
+	  this security fix, but that's a bad idea because it reopens the hole.
+	  This should only be done as a last recourse, when all other avenues
+	  fail. The -c option was only intended for testing htsearch from the
+	  command line, and not for use when calling htsearch on the web server.
+	  Unfortunately, far too many users have needlessly latched onto this
+	  option for CGI scripts. The preferred ways of specifying the config
+	  file are as follows, in order of preference:</p>
+	  <ol>
+	  <li>use the "config" input parameter in your
+	  <a href="hts_form.html">search form</a>
+	  (see question <a href="#q4.2">4.2</a>).
+	  <li>if you need to get at files outside the default CONFIG_DIR, use a
+	  wrapper script that redefines the CONFIG_DIR environment variable,
+	  then use the config input parameter as above
+	  (see question <a href="#q4.20">4.20</a>).
+	  <li>use a wrapper script to force htsearch to use a specific config
+	  file using the -c option. This is especially for cases where you
+	  want to prevent the user from selecting other config files in your
+	  CONFIG_DIR using the config input parameter. This should
+	  be done by using the GET method to call the wrapper script, and in
+	  this script you must unset the REQUEST_METHOD enviroment variable
+	  and pass "$QUERY_STRING" as a single argument to htsearch.
+	  (This safely gets around htsearch's test which disables -c.)
+	  <li>configure and compile different htsearch binaries with different
+	  compile-time definitions of CONFIG_DIR, so you can avoid wrapper
+	  scripts altogether.
+	  <li>define ALLOW_INSECURE_CGI_CONFIG in htsearch.cc and recompile
+	  htsearch if all other approaches above fail for you.
+	  </ol>
+
+	  <strong>5.31. <a name="q5.31">I've set a config attribute exactly
+	  as documented but it seems to have no effect.</a></strong><br>
+	  <p>There are a few fairly common reasons why this might happen:</p>
+	  <ol>
+	  <li>You may have a typo. Spelling matters, so make sure the attribute
+	  name is spelled exactly as it is in the
+	  <a href="attrs.html">documentation</a>. Misspelled attribute
+	  definitions are silently ignored. This is because you're allowed
+	  to make up your own attribute definitions for use by other attribute
+	  definitions, as <strong>${myownattribute}</strong>. Also remember
+	  to put the colon ("<strong>:</strong>") separator between the
+	  attribute name and value in your definition.
+	  <li>The attribute isn't supported in your version of the software.
+	  The <a href="attrs.html">documented configuration attributes</a>
+	  on the www.htdig.org web site are for the most recent
+	  <strong>stable</strong> release. See questions
+	  <a href="#q2.1">2.1</a> and <a href="#q2.7">2.7</a> for details.
+	  If you're running an older version, or even a more recent beta
+	  release, you may not have the same set of attributes to work with.
+	  Consult the appropriate documentation, or upgrade to the current
+	  release.
+	  <li>You're not modifying the right configuration file. The default
+	  configuration file is specified when you first configure ht://Dig
+	  before compiling, but other configuration files can be specified
+	  at run time, using the -c command-line option for most programs,
+	  or the <strong>config</strong> input parameter for htsearch
+	  (see question <a href="#q4.2">4.2</a>).
+	  <li>You've got more than one definition of the attribute. Only the
+	  last occurrence of an attribute in the configuration file is the
+	  definition that's used for that attribute, overriding earlier
+	  definitions. This also applies for nested configuration files that
+	  are loaded in via the <a href="attrs.html#include">include</a>
+	  directive, so check for other definitions in all included files.
+	  Similarly for htsearch, look out for multiple definitions of input
+	  parameters in your search forms, as mentioned in question
+	  <a href="#q4.2">4.2</a> - these don't override each other but they
+	  get combined with a Ctrl-A as separator, which may not be what you
+	  want either.
+	  <li>Your attribute definition is being "swallowed up" by an
+	  incomplete multi-line definition above it. Remember that when a line
+	  of an attribute definition ends with a single backslash
+	  ("<strong>\</strong>") before the end of the line (without any
+	  space after the backslash), then the following line is appended to
+	  it as a continuation of the same attribute definition. For an
+	  attribute definition that spans several lines, all lines but the
+	  last must end with a backslash. If you want a backslash to go into
+	  the attribute definition literally, it must be doubled-up, as
+	  <strong>\\</strong>.
+	  <li>On a similar note, make sure your attribute definitions are all
+	  terminated by a newline character. Beware of text editors that do
+	  word wrapping. It may look like two separate lines on the screen,
+	  when it fact you've got two attribute definitions on the same long
+	  line, so the second is swallowed up as part of the first.
+	  <li>Your attribute definition is being overridden by an htsearch
+	  <a href="hts_form.html">CGI input parameter</a>. For example,
+	  <a href="attrs.html#template_name">template_name</a> is ignored
+	  if the <strong>format</strong> input parameter is defined. The
+	  <a href="attrs.html#allow_in_form">allow_in_form</a> attribute
+	  can define any number of new CGI input parameters that override
+	  the attributes of the same name in your config file.
+	  <li>Your attribute definition is being ignored or overridden
+	  by a related attribute.  Watch out for unexpected interactions
+	  between different attributes.  For instance, characters in
+	  <a href="attrs.html#valid_punctuation">valid_punctuation</a>
+	  are stripped out of words, so those characters may
+	  not have the effect you want if you've added them to
+	  <a href="attrs.html#extra_word_characters">extra_word_characters</a>
+	  or
+	  <a href="attrs.html#prefix_match_character">prefix_match_character</a>.
+	  Also,
+	  <a href="attrs.html#search_results_wrapper">search_results_wrapper</a>
+	  will override
+	  <a href="attrs.html#search_results_header">search_results_header</a>
+	  and
+	  <a href="attrs.html#search_results_footer">search_results_footer</a>,
+	  but only if you've set up the wrapper file correctly.
+	  <li>Watch out for possible "latent effects" of some attributes. For
+	  example, when you change attributes used by htdig, they won't have
+	  an immediate effect on entries already in the database, so you would
+	  have to reindex your site before they take effect. Similarly,
+	  attributes that affect how htfuzzy builds some of its databases
+	  don't take effect until those databases are rebuilt. Another, more
+	  subtle latent effect occurs with releases 3.1.6 and 3.2 betas:
+	  when you interrupt htdig (i.e. with Control-C or a kill command),
+	  it stores the list of currently queued URLs in db.log, in your
+	  database directory, so that the next time you invoke htdig it can
+	  resume the interrupted dig. A side-effect of this file is that if
+	  you change some attributes like limit_urls_to or exclude_urls before
+	  restarting, the URLs in the file are still taken as-is, having been
+	  checked against the old settings of limit_urls_to or exclude_urls
+	  before being queued. This might explain one reason htdig seems to
+	  ignore your new settings of these.
+	  </ol>
+
+	  <strong>5.32. <a name="q5.32">When I run htsearch, it gives a page
+	  with an "Unable to read configuration file" message.</a></strong><br>
+	  <p>The most common causes of this error are:</p>
+	  <ul>
+	  <li>Your configuration file name is misspelled in the "config"
+	  input parameter of your search form, or you have two definitions
+	  of this parameter (see question <a href="#q4.2">4.2</a>).
+	  <li>You didn't install your configuration file in the directory
+	  defined by the CONFIG_DIR compile-time Makefile variable
+	  (see also question <a href="#q4.20">4.20</a>). This is where
+	  htsearch will look for the configuration file specified by the
+	  "config" input parameter.
+	  <li>The configuration file is not readable by the user ID under
+	  which your web server, and thus htsearch, runs. Similarly,
+	  if the directories from CONFIG_DIR up to the root directory
+	  are not executable by this same user ID, htsearch won't be
+	  able to access the configuration files.
+	  </ul>
+
+	  <strong>5.33. <a name="q5.33">How can I find out which version
+	  of ht://Dig I have installed?</a></strong><br>
+	  <p>You should always check which version of ht://Dig you're
+	  running, before you report any problems, or even if you
+	  suspect a problem. You can find out the version number of an
+	  installed ht://Dig package by running the command:</p>
+	  <blockquote>
+		<code>htdig -\? | head</code>
+	  </blockquote>
+	  <p>(or use "more" if you don't have a "head" command). The
+	  full version number appears on the third line of output,
+	  after "This program is part of ht://Dig", and it should also
+	  include the snapshot date if you're running a pre-release
+	  snapshot. Always include this full version number with any
+	  bug report or problem report on a mailing list. You can save
+	  yourself and others a lot of grief by being certain of which
+	  version you're running, especially if you've installed more than
+	  one. If you're running ht://Dig from an RPM package, you should
+	  also report the package version and release number, which you
+	  can determine with the command "<code>rpm -q htdig</code>",
+	  and mention where you obtained the package. This will alert
+	  us to the ideosyncracies and/or patches in a particular RPM
+	  package. Also, if you've applied any patches yourself (see
+	  question <a href="#q2.5">2.5</a>) please mention which ones.
+	  See also question <a href="#q1.8">1.8</a>, on reporting bugs
+	  or configuration problems.</p>
+
+	  <strong>5.34. <a name="q5.34">When running htdig, I get "Error (0):
+	  PDF file is damaged - attempting to reconstruct xref table..."</a></strong><br>
+	  <p>This message comes from the pdftotext utility, when a PDF file
+	  has been truncated. Find the largest PDF file on the site you're
+	  indexing, and set max_doc_size to at least that size (see question
+	  <a href="#q5.2">5.2</a>). If you need to track down which PDF is
+	  causing the error, try running "htdig -i -v &gt; log.txt 2&gt;&amp;1" so you
+	  can see which URL is being indexed when the error occurs. The output
+	  redirects in that command combine stdout (where htdig's output goes)
+	  and stderr (where pdftotext's error messages go) into one output
+	  stream. If you're using acroread to index PDF files, the error
+	  message for a truncated PDF file is simply "Could not repair file."
+	  It's also possible to get errors like this from PDF files that are
+	  smaller than max_doc_size, if they're already truncated or corrupted
+	  on the server.</p>
+
+	  <strong>5.35. <a name="q5.35">When running htdig on Mandrake Linux,
+	  I get "host not found" and "no server running" errors.</a></strong><br>
+	  <p>The default htdig.conf configuration in Mandrake's RPM package
+	  of htdig very stupidly enables the
+	  <a href="attrs.html#local_urls_only">local_urls_only</a> attribute
+	  by default, which means you can only index a limited set of files
+	  on the local server. Anything else, where htdig would normally fall
+	  back to using HTTP, will fail. To make matters worse, they put a very
+	  misleading comment above that attribute setting, which throws users
+	  off track. This attribute is useful in certain circumstances where
+	  you never want htdig to fall back to HTTP, but enabling it by default
+	  was a very bad judgement call on Mandrake's part.</p>
+
+	  <strong>5.36. <a name="q5.36">When I run htsearch, it gives me the
+	  list of matching documents, but no header or footer.</a></strong><br>
+	  <p>The header and footer typically contain the followup search
+	  form, an indication of the total number of matches, and buttons
+	  to other pages of matches if the results don't fit on one
+	  page. If these don't show up, it could be that in attempting
+	  to customize these (see question <a href="#q4.2">4.2</a>),
+	  you removed them or rendered them unusable. Even if you didn't
+	  customize them, make sure you installed the
+	  <a href="attrs.html#search_results_header">search_results_header</a>
+	  and
+	  <a href="attrs.html#search_results_footer">search_results_footer</a>
+	  files (or the
+	  <a href="attrs.html#search_results_wrapper">search_results_wrapper</a>
+	  file) in the correct location (where you told ht://Dig they'd be
+	  when you configured prior to compiling). Also make sure they
+	  have read permission for the user ID under which htsearch runs,
+	  and all directories leading up to these template files are
+	  searchable (i.e. executable) by htsearch, or it won't be able
+	  to open the files.</p>
+
+	  <p>This is the opposite problem of that described in question
+	  <a href="#q5.11">5.11</a>. If htsearch displays nothing at
+	  all, you may have both problems or you may have no matches or
+	  a boolean query syntax error and the
+	  <a href="attrs.html#nothing_found_file">nothing_found_file</a>
+	  or <a href="attrs.html#syntax_error_file">syntax_error_file</a>
+	  is missing or unreadable.</p>
+
+	  <strong>5.37. <a name="q5.37">When I index files with doc2html.pl,
+	  it fails with the "UNABLE to convert" error.</a></strong><br>
+	  <p>This is an indication that doc2html.pl wasn't configured
+	  properly. Carefully follow all the directions for installation
+	  in the DETAILS file that comes with the script. In addition to
+	  installing doc2html.pl, you must:</p>
+	  <ul>
+	  <li>Install xpdf and check that pdftotext and pdfinfo work from
+	   the command line,
+	  <li>Configure pdf2html.pl to use pdftotext and pdfinfo and check
+	   that it works from the command line,
+	  <li>Configure doc2html.pl to use pdf2html.pl and check that it
+	   works from the command line:
+<pre>doc2html.pl /full/path/to/sample/filename.pdf "application/pdf" url</pre>
+	  </ul>
+	  <p>You should repeat a similar set of steps to configure and test
+	  doc2html.pl for other document types, such as Word, RTF, Excel and
+	  other document types. See also questions <a href="#q4.8">4.8</a>,
+	  <a href="#q4.9">4.9</a> and <a href="#q5.39">5.39</a>.</p>
+	  
+	  <strong>5.38. <a name="q5.38">Why do my searches find search terms
+	  in pathnames, or how do I prevent matching filenames?</a></strong><br>
+	  <p>htdig doesn't normally add the URL components to the index
+	  itself, but when you index a directory where the filenames are
+	  used as link description text (such as an automatic DirectoryIndex
+	  created by Apache's mod_autoindex) then these link descriptions
+	  get indexed, carrying the weight assigned to them by the
+	  <a href="attrs.html#description_factor">description_factor</a>
+	  attribute. Thus, a search for a filename will match this link
+	  description, and the file will show up in search results.
+	  To avoid that, make sure your DirctoryIndexes don't get indexed
+	  as detailed in question <a href="#q4.23">4.23</a>.</p>
+
+	  <p>Conversely, there is no way to force htdig to index URL
+	  components so that a search for a file name will yield a match
+	  on that file, unless you index an HTML file (or several) containing
+	  links to all the files you want, where the link description text
+	  does contain the full URL or the pathname components you want.</p>
+	  
+	  <strong>5.39. <a name="q5.39">I set up an external parser but I still
+	  can't index Word/Excel/PowerPoint/PDF documents.</a></strong><br>
+	  <p>You probably need to carefully re-read and follow questions
+	  <a href="#q4.8">4.8</a>, <a href="#q4.9">4.9</a>,
+	  <a href="#q5.25">5.25</a> and <a href="#q5.27">5.27</a>.
+	  When you can't index documents with an external parser or converter,
+	  there are three main issues, or points of failure, that you need
+	  to resolve. You need to figure out on which of the three stages the
+	  process is failing, and focus on that stage to get to the bottom of
+	  why it's not working at that stage. You need to run htdig with
+	  anywhere from 1 to 4 -v options, to get the debugging output you
+	  need to see where it's failing and why. This may be an iterative
+	  process, if htdig is failing at more than one stage: you might fix
+	  one problem only to run into another.</p>
+
+	  <ol>
+	  <li>Is htdig actually finding links to the PDF, Word, etc. documents
+	    you want to index? Make sure you're not making false assumptions
+	    about how htdig finds these (questions <a href="#q5.25">5.25</a>
+	    and <a href="#q5.18">5.18</a>), and then find out how htdig is
+	    looking at the links in your HTML files to see if it's ignoring
+	    or rejecting links to your externally parsed documents (questions
+	    <a href="#q4.1">4.1</a> and <a href="#q5.27">5.27</a>).<br><br>
+	  <li>If it is finding and accepting the links to these documents, is
+	    it correctly fetching them and passing them on to the appropriate
+	    external converter to be able to index them? Look at htdig -vvv
+	    output, around the time it tries to fetch one of these, and see
+	    what it does next. Does the file size look right? Are there any
+	    error messages around there? If the external converter isn't even
+	    being called, take a close look at your
+	    <a href="attrs.html#external_parsers">external_parsers</a>
+	    attribute setting to make sure it's correct (see question
+	    <a href="#q5.31">5.31</a>).<br><br>
+	  <li>If it is attempting to convert them, is the external converter
+	    doing what it should, to feed some indexable text back into htdig's
+	    parser? You can also try htdig -vvvv (4 -v options) to see if it's
+	    actually parsing individual words from any of these. If this is
+	    too much output to wade through, try setting
+	    <a href="attrs.html#start_url">start_url</a> to the URL
+	    of a single document that you want to test, so you can look in
+	    detail at what htdig does with it. You can also try running the
+	    external converter manually on one of these documents to see
+	    what it spits out. See question <a href="#q5.37">5.37</a>.
+	    Make sure your documents actually contain indexable text. Some
+	    PDFs are nothing but scanned images of pages, so it looks like
+	    text but it's just images with no computer-readable text.
+	  </ol>
+
+	  <br>
+
+	  <hr noshade size=4>
+	  	Last modified: $Date: 2004/05/28 13:15:16 $
+<br> 
+    <a href="http://sourceforge.net/"> 
+          <img src="http://sourceforge.net/sflogo.php?group_id=4593&amp;type=1" width="88" height="31" border="0" alt="SourceForge Logo"></a>
+  </body>
+</html>