Chapter 16 Using Search The iPlanet Web Server search function allows you to search the contents and attributes of documents on the server. As the server administrator, you can create a customized text search interface tailored to your user community. Note. The Search function is not available on Linux platforms.
The iPlanet Web Server search function allows you to search the contents and attributes of documents on the server. As the server administrator, you can create a customized text search interface tailored to your user community.
About Search
Configuring Text Search
Indexing Your Documents
Performing a Search: The Basics
Using the Query Operators
Customizing the Search Interface
define URL mappings for the document directories to be indexed
establish access control for files and directories
Controlling Search Access Mapping URLs Deciding Which Words Not to Search Turning Search On or Off Configuring the Search Parameters Configuring Your Pattern Files Configuring Manually
NameTrans fn="redirect" from="/foobar" url- prefix="index.html" escape="no"
/the primary document directory (sometimes called the document root), which initially maps to server_root/docs
/helpthe directory for most of the help files
/search-uithe directory for most of the search interface files
/webpub-uithe directory for most of the Web Publisher interface files
/publisherthe directory for most of the Web Publisher files iPlanet Web Server
From the Server Manager, choose Content Management.
Click the Additional Document Directories link.
The web server displays the Additional Document Directories window.
Type in a nickname that maps the URL to the additional document directory you want to define.
For example, type in the word "plans".
Type the absolute physical path of the directory you want the URL mapping to map to.
For example,
C:/Netscape/server4/docs/marketing/bizplans
If you want to apply a style to the directory, select the style in the Apply Style drop-down list.
For more information about styles, see Working With Configuration Styles.
Click OK to create the additional document directory.
Deciding Which Words Not to Search You can specify words the search engine should not index or search against. These words are sometimes referred to as stop words or drop words and typically include articles, conjunctions, and prepositions such as at, and, be, for, and the.
........................................+ at and be [0-9a-zA-Z] [0-9][0-9][0-9][0-9]+
From the Server Manager, choose Search.
Click the Search Configuration link.
The web server displays the Search Configuration window.
Type the default maximum number of search result items displayed to users at a time.
This number cannot be larger than the value for the largest possible result set size, as defined in Step 4. The default is 20.
Type the maximum number of items in a result set.
The default is 5000. For example, if you type 250 as the value, and there were 1000 documents that match the search criteria, users would only be able to see the first 250 or the 250 top-ranked documents (for searches that rank their results).
Type the format of the date/time string in Posix format.
This is how the search results are displayed to users in the search results page. For example, the format %b-%d-%y %H:%M produces Oct-1-97 14:24. You can use the symbols listed in Table 16.1.
Type a default title for the document that is to be used if the document's author has not included a title as part of the document, tagged with the HTML Title tag.
The typical HTML default is (Untitled), which appears in the search results page for HTML files.
If you want the user's access permission to be checked on a collection before displaying the search results, click Yes under the label Check access permissions on collection root before doing a search?
If you click Yes, the server checks the user's access privileges for each collection before displaying the documents found as a result of the search. Only the documents in a collection that you have permission to view are displayed.
Click OK to set your new search configuration.
Click the Search Pattern Files link.
The web server displays the Search Pattern Files window.
Type the absolute path for the directory where you store your pattern files.
The default start (header), end (footer), and query page pattern files are located in this directory.
Type in the relative pathname for the default pattern file you want to use for the top of the search results page when a collection has no defined header file or when more than one collection is being searched.
Specify the path relative to the pattern file directory, as defined in Step 3.
Type in the relative pathname for the default pattern file you want to use for the footer of the search results page when for a collection has no defined footer file or when more than one collection is being searched.
Type in the relative pathname for the pattern file you want to use for the search query page that appears when you start up the search function.
Click OK to configure your search pattern files.
This section includes the following topics:
The Configuration Files Adjusting the Maximum Number of Attributes Restricting Memory for Indexing Restricting Your Index File Size Removing Access to the Web Publishing Collection
webpub.confThis system configuration file contains system settings and file paths. In your server's obj.conf file, the search system initialization is mapped to the webpub.conf file. When you use the Search Configuration and Search Pattern Files windows, the data you input is reflected in the webpub.conf file. You can customize your server's search configuration by changing some of the settings in the webpub.conf file, but in general, you can make the changes you need through the iPlanet Web Server's windows.
userdefs.iniThis user definitions file defines the user-defined pattern variables. In the webpub.conf file, this is mapped to the userdefs.ini file for your language (English, German, Japanese, and so on).
You can customize a search interface by creating and defining your own pattern variables in the userdefs.ini file that can be used throughout your pattern files. For more information, see User-defined Pattern Variables.
dblist.iniThis collection contents file describes collection-specific information. When you create and maintain collections, the dblist.ini file is updated for you with information about your collections.
Text (a maximum of 30, including all META-tagged attributes)
Numeric (a maximum of 5)
Date (a maximum of 5)
NS-max-text-attr = 50 NS-max-numeric-attr = 10 NS-max-date-attr = 10
Restricting Memory for Indexing You can set a limit on the amount of RAM available for indexing operations. To do this, you need to manually edit the [NS-loader] section of the webpub.conf file to add a line defining a maximum memory amount. For example:
NS-max-memory = 32000000
The server is installed on a machine that has less than the suggested minimum RAM requirement.
For server administrators on Windows NT servers that require a great deal of indexing but who wish to set aside some memory for other server operations.
NS-max-idx-file-size = 1500000
In the "[web_htm]" section, change "NS-display-select=YES" to "NS-display-select=NO".
Restart the server.
About Collections About Collection Attributes Creating a New Collection Configuring a Collection Updating a Collection Maintaining a Collection Scheduling Regular Maintenance Unscheduling Collection Maintenance
<META NAME="Writer" CONTENT="R. Hunter">
<META NAME="Song" CONTENT="Stella Blue">
Creating a New Collection You can create a collection that indexes the content of all or some of the files in a directory. You can define collections that contain only one kind of file or you can create a collection of documents in various formats that are automatically converted to HTML during indexing. When you define a multiple format collection (with the auto-convert option), the indexer first converts the documents into HTML and then indexes the contents of the HTML documents. The converted HTML documents are put into the html_doc directory in the server's search collections folder.
To create a new collection, perform the following steps:
From theServer Manager, choose Search.
Click the New Collection link.
The web server displays the Create a Collection window. The Directory to Index field displays the currently defined document directory and provides a drop-down list of all the additional document directories defined for the server. For more information about additional document directories, see Mapping URLs.
You can select any of the items in the drop-down list as a starting point for finding the directory you want to index.
If you want to index a different subdirectory, click the View button to see a list of resources.
You can index any directory that is listed or you can view the subdirectories in a listed directory and index one of those instead. Once you click the index link for a directory, you return to the Create Collection window and the directory name appears in the Directory to Index field.
You can index all HTML files in the chosen directory by leaving the default *.html pattern in the Documents matching field or you can define your own wildcard expression to restrict indexing to documents that match that pattern.
For example, you could enter *.html to only index the content in documents with the .html extension, or you could use either of these patterns (complete with parentheses) to index all HTML documents:
(*.htm|*.html or *(.htm|.html)
You can define multiple wildcards in an expression. For details of the syntax for wildcard patterns, see Using Wildcards.
You cannot index a file that includes a semi-colon (;) in its name. You must rename such files before you can index them.
To index the subdirectories within the specified directory, click Include Subdirectories.
Type a name for your collection in the Collection Name field.
The collection name is used for collection maintenance. This is the physical file name for the file, so follow the standard directory-naming conventions for your operating system. You can use any characters up to a maximum of 128 characters. Spaces are converted to underscores.
Do not use accented characters in the collection name. If you need accented characters, exclude the accents from the collection name, but use accented characters in the label. The label is what is displayed to the user from the search interface.
Type a user-defined name for your collection in the optional Collection Label field.
This name is what users see when they use the text search interface. Make your collection's label as descriptive and relevant as possible. You can use any characters except single or double quotation marks, up to a maximum of 128 characters.
Type a description for your collection (up to a maximum of 1024 characters) in the optional Description field.
This description is displayed in the collection contents page.
Select the type of files the collection is to contain: ASCII, HTML, news, email, or PDF.
The kind of file format you choose indicates which default attributes are used in the collection and which, if any, automatic HTML conversion of the content is done as part of indexing. For information about the attributes for each format, see Table 16.2 and About Collection Attributes.
If you choose HTML as the file type and also try to index non-HTML files, the server creates the collection with the HTML set of default attributes and does not attempt to convert any non-HTML file it indexes. If you index HTML files into an ASCII collection, even the HTML markup tags are indexed as part of the file's contents and when you display the files, the contents are displayed as raw text. Regardless of the file type chosen, the content of the file is always indexed.
Complex PDF files, such as those that are password protected or that contain graphical navigation elements cannot be correctly converted when they are indexed as part of a multi-format collection. The file data converts correctly when they are part of a PDF-only collection. Graphic elements are not converted.
Select whether or not to extract META-tagged attributes from HTML files during indexing.
If you extract these attributes, you can search on their values. You can index on a maximum of thirty (30) different user-defined META tags in a document. You can only use this option for HTML collections.
Select the collection's language from the drop-down list.
The default is English, labeled "English (ISO-8859-1)." For more information on character sets, see Managing Server Content
Click OK to create a new collection.
Configuring a Collection After you have initially created a collection, you can modify some of the initial settings for the collection. This data resides in the collection information file, dblist.ini, and when you reconfigure a collection, the dblist.ini file is updated to reflect your changes. For more information about the configuration files, see Configuring Manually. You can revise the description, change its label, define a different URL for its documents, and define how to indicate highlighting in displayed documents, which pattern files to use, and how to format dates.
Click the Configure Collection link.
The web server displays the Configure Collection window.
In the optional Description field, you can type a description for your collection up to a maximum of 1024 characters.
In the optional Collection Label field, you can type a user-defined name for your collection.
This is what users see when they use the text search interface. Make your collection's label as descriptive and relevant as possible. You can use any characters except single or double quotation marks, up to a maximum of 128 characters.
In the URL for Documents field, you can type in the new URL mapping for the collection's documents if that has changed.
That is, if you originally indexed the directory of files that corresponded to those defined by the URL mapping /publisher/help, and you have changed that mapping to the simpler /helpFiles, you would replace the URL of /publisher/help with the /helpFiles in this field. For more information about additional document directories, see Mapping URLs.
In the Highlight Begin and Highlight End fields, you can type in the HTML tagging you want the server to use when highlighting a search query word or phrase in a displayed document.
The default is to use bold, with the <b> and </b> tags, but you can add to this or change it. For example, you could add <blink><FONT COLOR = #FF0000> and the corresponding </blink></FONT> to highlight with blinking bold red text.
You can define different default pattern files for displaying the search results: how the search result's header, footer, and list entry line are formatted, respectively.
Initially, the pattern files are in the server_root\plugins\search\ui\text.
In the Result Pattern File field, you can enter the name of the pattern file you want to use when displaying a single highlighted document from the list of search results.
In the Date Format field, you can specify how you want input dates to be interpreted when using this collection: MM/DD/YY, DD/MM/YY, or YY/MM/DD.
Click OK to change the collection configuration.
To update a collection, perform the following steps:
Click the Update Collection link.
The web server displays the Update Collection window.
Select the collection you want to update from the drop-down list.
The list of documents in the center of the form shows you what documents have index entries in the currently selected collection. The list holds 100 records, and the Prev and Next buttons get the previous (or next) set of 100 files for collections that have more than 100 files in them.
In the Documents Matching field, you can type in a single filename or you can use wildcards to specify the type of files you want added to or removed from the collection.
If you enter a wildcard such as *.html, only files with this extension are affected. You can indicate files within a subdirectory by typing in the pathname as it appears in the list of files. For example, you could delete all the HTML files in the /frenchDocs directory by typing in (no slash before the directory name): frenchDocs/*.html
Note: Be careful how you construct wildcard expressions. For example, if you type in index.html, you can add or remove the index file from the current collection. If instead you type in the expression */index.html, you can add or remove all index.html files in the collection.
Select whether to index and add all matching documents from the subdirectories of the document directory that was originally defined for the collection.
That is, if the collection originally indexed the /publisher directory, this option looks for documents matching the new pattern within all the subdirectories within /publisher. This does not apply for removing documents.
Click AddDocs to add the indicated files and subdirectories.
Click RemoveDocs to remove the indicated files.
Optimize collectionsYou can optimize a collection to improve performance if you frequently add, delete, or update documents or directories in your collections. An analogy is defragmenting your hard drive. Optimizing is not done automatically, so you must manually optimize after you reindex or update a collection. One situation when you might want to optimize a collection is just before publishing it to another site or before putting it onto a read-only CD-ROM.
ReindexYou can reindex a collection, which locates each file that already has an entry in the collection and reindexes its attributes and contents, extracting the META-tagged attributes if that option was selected when the files were originally indexed into the collection. This does not return to the original criteria for creating the collection, say *.html, and add any new documents that fit the original criteria. This option also removes collection entries when the source documents have been deleted and can no longer be found.
RemoveYou can remove a collection. This only removes the collection, not the original source documents.
Click the Schedule Collection Maintenance link.
The web server displays the Schedule Collection Maintenance window.
Choose a collection from the drop-down list.
This lists all the collections that you have created.
Choose an action from the drop-down list: Reindex, Optimize, or Update.
You can set up different schedules for different operations on the same collection.
If you choose to update your collection, two extra fields are displayed for entering the document matching criteria and for including documents found in subdirectories that match your criteria.
In the Schedule Time field, type in the time of day when you want the scheduled maintenance to take place.
Use a military format (HH:MM). HH must be less than 24 and MM must be less than 60. You must enter a time.
In the section labeled Schedule Day(s) of the Week, check one or more of the day checkboxes.
You can select all days. You must select at least one day.
Click OK to schedule the maintenance.
From the Administration Server, Choose Global Settings.
Click the Cron Control link.
If ns-cron is already on, click Restart to restart it. If ns-cron is not on, click Start to start it up.
In either case, your regularly scheduled maintenance will now be able to take place.
Click the Remove Scheduled Collection Maintenance link.
The web server displays the Remove Scheduled Collection Maintenance window.
Choose a collection from the drop-down list for Choose Collection.
This lists all your collections for which you have set up regular maintenance.
Choose an action from the drop-down list: Reindex or Optimize.
In the lower part of the frame, you can see the time and days of the week when the scheduled maintenance is currently scheduled to take place.
Click OK to remove the scheduled maintenance.
From the Administration Server, choose Global Settings.
In either case, your regularly scheduled maintenance will no longer take place.
making a queryyou enter your search criteria.
displaying search resultsthe server displays a list of the documents that match your criteria.
viewing a documentyou can view a specific highlighted document from the search results list.
viewing the contents of a collectionyou can look at the information that is maintained for each of your collections.
Search Home Page A Search Query Guided Search Advanced Search The Search Results Displaying Collection Contents
Type the following URL in the location field in your web browser:
http://serverid:port/search
In the search query page that appears, choose the collection you want to search through from the drop-down list in the Search In field.
Enter the word or phrase for your search query in the For field. You can create complex queries by combining operators. For details about the search operators, see Using the Query Operators.
Click the Search button to execute your query.
Click the Guided Search link on the home page.
Go to the standard search query page by typing the following URL in the location field in your web browser:
Click Guided Search on the standard search page and the guided Java-based query page is displayed.
Choose the collection you want to search through from the drop-down list in the Search In field.
Use the For drop-down list to select the type of element you wish to search for. In this example, choose Words.
In the blank text field, type in the word you want to search for. For details about the search operator, see "Using the Query Operators".
Click Add Line to add the first part of the query. The word appears in the large text display box at the bottom of the form.
To add to your query, choose another element from the drop-down list. In this example, choose Attribute.
A new drop-down list appears on the right side of the form, listing all attributes that are available for the chosen collection. Choose the attribute you want to search against.
From the drop-down list above the text input field, choose a query operator (Contains, Starts, Ends, Matches, Has a substring) or logical operator (=, <, , <=, =) for your query.
In the blank text field, type in the attribute value you want to search for.
Click Add Line to add another line for your query. You can click Undo Line to remove the last line you added or Clear to remove the entire query.
Click the Search button to execute the search.
Click the Advanced HTML Search link on the home page.
Disable Java for your browser. To do this, use the Languages option Preferences menu command.
Click Guided Search on the standard search page and the web server displays the advanced HTML query page.
In the For field, type in the word or phrase you want to search for. You can create complex queries by combining operators. For details about the search operators, see Using the Query Operators.
You can type in one or more attributes to sort the results by. The default is an ascending sort order, but you can indicate a descending sort order with a minus. For more information about sorting, see Sorting the Results.
Depending on how many fields are listed for each document in the search results page or how many you want to see at a time, you can expand or limit the number of matching documents you want the search to return at a time. The Prev and Next buttons allow you access to additional pages of documents if there are too many to fit on a page at once.
Use the drop-down list in the Search In field to choose the collection you want to search through. You can select more than one collection by holding down the Ctrl key as you click on another collection. All collections in a query must be in the same language, but the web publishing collection cannot be used in a multi-collection search.
When a user clicks on the icon displayed for a document in the search results, which displays the highlighted version.
When searching on a collection other than Web Publishing that has the option NS-collection-acl-check set to yes. (NS-collection-acl-check is set in the webpub.conf file and applies to all collections. When it is set, ACLs that are set on URIs matching the primary document directory defined for the collections (in dblist.ini) will be honored by not allowing search to be done on those collections.)
Whenever a user searches on the Web Publishing collection.
collection name, label, and description collection format number of attributes in the collection and a list of their names number of documents in the collection collection size and status language and character set input and output date formats
http://serverid:port/search?NS-search-page=c
Default Assumptions Search Rules Determining Which Operators To Use Using Wildcards
<STEMSearch finds all documents that contain any stemmed variant of the search word or phrase. The search engine looks at the meaning of the word, not just its spelling. For example, if you want to search on plan, the results would include documents that contain planning and plans, but not those that contain plane or planet.
<MANYSearch considers how often the search word or phrase appear in the found documents and ranks the results for frequency (or relevancy).
<PHRASESearch considers words separated by spaces to be part of a phrase. For example, Monterey otter is interpreted as a phrase and both must be present and together to be found. Such a search would not find documents containing sea otter or Monterey Bay.
Note that in any case where it's not clear that two words are to be considered as a phrase, you can use parentheses for clarity. For example, <PHRASE (rise "and" fall).
ORSearch considers each word or phrase in the query separated by a comma to be optional, although at least one must be present. In effect, this is an implicit OR operation. For example, Monterey, otter is interpreted as find documents that contain either Monterey or otter. Note that angle brackets are not required for OR.
Monterey AND Bay NOT <CONTAINS Aquarium
Monterey Bay Aquarium AND otter AND NOT shark
<CONTAINS ebb "and" flow
"plan"
Title NOT <CONTAINS theme park
Finds documents that contain plan, plane, and planet as well as any word that begins with plan, such as planned, plans, and planetopolis. See the next section for more details and examples.
You must enclose the entire string in back quotes and you cannot have any embedded spaces.
<WILDCARDZine\\*\\*\\*
comma ,
<WILDCARD\Qa{b\Q
<WILDCARD\Qc\Q\Qt\Q
Dynamically Generated Headers and Footers HTML Pattern Files Search Function Syntax Using Pattern Variables
Service fn="add-headers" path="/export2/docs/ header.html"
Service fn="add-footer" path="/export2/docs/ footer.html"
uri="/cgi-bin/header.cgi"
NS-query.pat displays the standard and advanced query pages. Contains HTML calling the Web Search (the box called "Search the Web") as part of the search query page.
tocstart.pat displays the header across the top of the search results page.
tocrec.pat displays each document listed on the search results page.
tocend.pat displays the footer across the bottom of the search results page.
record.pat displays a single highlighted document from the search results page (for more information, see Displaying a Highlighted Document).
descriptions.pat displays the collection contents.
user defined, in the userdefs.ini file, with a $$ prefix (see User-defined Pattern Variables).
defined in the configuration files, webpub.conf and dblist.ini files, with a $$NS- prefix (for more information, see Configuration File Variables).
search macros and variables generated by a pattern file, with a $$NS- prefix (for more information, see Macros and Generated Pattern Variables).
<input type="hidden" name="NS-max-records" value="$$NS-max-records"
<td align=left colspan=2$$logo</td <td align=right<h3$$sitename</h3</td
<td align=right<b$$queryLabel</b</td <td align=left <input name="NS-query" size=40 value="$$NS-display-query"</td
NS-max-records: Defined in the webpub.conf file. Because this field is hidden, users cannot change this value, which defines how many matching documents to return at a time. In the advanced HTML query pattern file, NS-advquery.pat, this is a user-modifiable input field.
$$NS-max-records: The search generates a variable from this field that can be used in subsequent searches to calculate how many result records to display at a time. Because this field is not modifiable here, the value is set to that in webpub.conf file. In the advanced query, this value could vary for each query.
$$logo: Defined in the userdefs.ini file. This could be any image or text the user wanted to display on the form.
$$sitename: Defined in the userdefs.ini file as the server's host name that is provided by the $$NS-host search macro.
$$queryLabel: Defined in the userdefs.ini file as a text label for the query input field. In this case, the label on the form is the word "For:"
NS-query: Defined in this pattern file as the name of the input field. $$NS-display-query: Defined in the userdefs.ini file. The search generates a variable from this field that can be used in subsequent searches to determine which word or phrase to highlight when an entire matching document is displayed.
http://serverid/ search?name=value[&name=value][&name=value]
<td colspan=6<font size=+2<b$$collectionLabel</b <a href=$$NS-server-url/search?NS-collection=$$NS- collection$$NS-collection-alias</a </font</td
$$NS-server-url: A search macro that determines the user's server URL. /search: The search command itself.
?: The query string indicator. Everything after the ? is information used by the search function.
NS-collection=$$NS-collection: This uses the search macro $$NS-collection to define the collection's filename.
variableName[conditionalized output]
$$Title[<PTitle: <B$$Title</B]
search query (the word, phrase, or attribute you want to search on)
collection (can specify more than once for multiple-collection searches)
NS-search-page=results (or r, in upper- or lowercase)
collection (can be specified more than once for multiple-collection searches) search query
NS-search-page=document (or d, in upper- or lowercase)
document path
collection (can be specified only once)
search query (necessary if you want to highlight the query data)
NS-search-page=contents (or c, in upper- or lowercase)
variables defined in the userdefs.ini file, to which are added a $$ prefix in decorated URLs and pattern files. For example, uidir, logo, and title become $$uidir, $$logo, and $$title.
variables defined in the configuration files, webpub.conf and dblist.ini files, which have a NS- prefix where they are defined in the configuration file and which have a $$NS- prefix when they are used in decorated URLs and pattern files. For example, NS-max-records, NS-doc-root, and NS-date-time become $$NS-max-records, $$NS-doc-root, and $$NS-date-time.
search macros and variables generated by a pattern file, which always have a $$NS- prefix. For example, $$NS-host, $$NS-get-next, and $$NS-sort-by.
advSearchNote = To search, choose collections, then enter words and phrases, separated by commas<br>(e.g., search, jet engines, basketball).<p>Sorting is done on any defined attributes. Use '-' to specify descending order sort<br>(e.g., Title,-Author,+Date) queryLabel = For: queryLabelSJIS = $$queryLabel queryLabelEUC = $$queryLabel queryLabelJIS7 = $$queryLabel collectionLabel = Search in: booleanLabel = Boolean sortByLabel = Sort by: sortByLabelSJIS = $sortByLabel sortByLabelEUC = $sortByLabel sortByLabelJIS7 = $sortByLabel freetextLabel = Freetext (unavailable) maxDocumentsLabel = Documents to return: maxDocumentsLabelSJIS = $$maxDocumentsLabel maxDocumentsLabelEUC = $$maxDocumentsLabel maxDocumentsLabelJIS7 = $$maxDocumentsLabel copyright = Copyright © 1997 Netscape Communications Corporation. All Rights Reserved. advancedButtonLabel = Advanced Button Label helpButtonLabel = Help Button Label
The file also includes references to search macros, such as $$NS-server-url, and can also refer to other user-defined variables, as in the following lines:
uidir = $$NS-server-url/search-ui icondir = $$uidir/icons
NS-max-records = 20 NS-query-pat = /text/NS-query.pat NS-ms-tocstart = /text/HTML-tocstart.pat NS-ms-tocend = /text/HTML-tocend.pat NS-default-html-title = (Untitled) NS-HTML-descriptions-pat = /text/HTML-descriptions.pat NS-date-time = %b-%d-%y %H:%M
NS-collection-alias = Web Publishing NS-doc-root = C:/Netscape/server4/docs NS-url-base = / NS-display-select = YES