Showing posts with label ESP. Show all posts
Showing posts with label ESP. Show all posts

Tuesday, May 19, 2009

FAST ESP : QRServer timed out

You will get QR server timeout error sometime when you specified the incorrect timeout value in system.

Error:


The exception is: no.fast.ds.search.SearchEngineException: Timed out while waiting for query result.

Resolution:

To resolve this issue, increase the source.xml, fdispatch.addon and fsearch.addon timeout values in the system.

1. Update FASTSEARCH/etc/fdispatch.addon with the following values:
maxdocsumwait = 80
maxsearchwait = 70
maxsocksilent = 120

2. Update FASTSEARCH/etc/fsearch.addon with the following value:
maxsocksilent = 120

3. Update FASTSEARCH/etc/qrserver/sources.xml with the following value:
timeout query="60" docsum="70" (the timeout tag)

4. Restart rtsearch/qrserver.

Handling Query Errors (Java Search API)

Query errors will appear as an exception to the search() method within the ISearchView interface. Instead of printing the full Java exception, it is possible to catch the specific exception with its error code and error message.

Example:

try {
IQueryResult result = engine.(query);
...
...
} catch (SearchEngineException e) {
System.err.println("Error " + e.getMessage() + ": " + e.getErrorCode());
}

A nonzero error code indicates query related error messages.

FAST ESP Adminserver error codes and messages

Adminserver error codes and messages.

Severity:

ERROR

Log message

com.fastsearch.esp.admin.
engine.corba.JacORBNameserviceService
[Nameservice@null:0 Local@null:
/] could not activate ORB
(org.jacorb.orb.ORB@
) : org.omg.CORBA.
INITIALIZE: Could not create server socket
vmcid: 0x0 minor code: 0 completed: No

Cause(s) :

The TCP port that Adminserver's ORB tries to bind to is unavailable.
This is most likely due to another instance of Adminserver running or not properly shut down.

Action(s):

Check if Adminserver is already running. If it is, terminate it and restart Adminserver. Check if some other process has bound to either of the two TCP ports with 'netstat -an'. Make sure no other processes are running in the FAST ESP port range before starting FAST ESP.

FAST ESP Error Messages

The following result format is returned if an error occurs during execution of a query:

XML template:
Could not open channel to server.ERROR>


Text template:

#SEG NAM _SEGNAME_
####
#ERC 1102
#ERT Could not open channel to server.
####


Note:

Error messages in the 10xx range originate from the Search Engine.
Error messages above that originate from the Query & Result Server.




Code Description :

1001 General error. Unexpected internal error message. Contact FAST Technical Support.
1002 Error parsing query. Check the query syntax. Note: When using the FQL query language a query syntax error is normally reported by error message 1201
1003 All partitions are down (Search Engine). Verify installation (System Management) and network interfaces.
1004 No such dataset. A query parameter `dataset' exists but is not supported in this version of the product. Contact FAST Technical Support if this error message appears.
1005 System is overloaded. This will happen when queries are refused due to QPS license limitations.
1006 The requested functionality is not implemented. Check the query syntax and query parameters.
1007 Query not allowed to run (due to internal resource problem). Normally a temporary resource problem - Resubmit query.
1008 Lost connection to one or more Search Engine sub-nodes. Verify installation (System Management) and network interfaces. The search client should normally resubmit the query, refer to Errors related to evaluation of complex queries.
1009 Multiple errors occurred from different search partitions. The search client should normally resubmit the query, refer to section Errors related to evaluation of complex queries.
1010 Query evaluation error (Internal Search Engine error conditions). Contact FAST Technical Support.
1011 Query timeout: One or more Search Engine nodes did not respond within the query timeout limit. The search client should normally resubmit the query, refer to Errors related to evaluation of complex queries.
1012 Not enough resources, query not possible to resolve. Analyze the query that caused the error message. Refer to Errors related to evaluation of complex queries.
1013 Not enough resources, temporary problem within the Search Engine. Search Front End may re-submit. Refer to Errors related to evaluation of complex queries if the error condition always or frequently occurs for specific types of queries.
1014 Not supported - for queries that are not supported. Check query syntax. Contact FAST Technical Support if you are not able to detect errors in the query syntax.
1015 License checkout problem
1016 Requested generation no longer available. A version of the index can no longer be reached. Contact FAST Technical Support.
1020 Document summary internal error. May be a temporary resource problem, Search Front End may try to resubmit query.
1021 Document summary internal error. May also be related to a connectivity problem or search nodes out of operation. Verify installation (System Management) and network interfaces.
1022 Document summary internal error. May also be related to a connectivity problem or search nodes out of operation. Verify installation (System Management) and network interfaces.
1101 No query state supplied, nothing to search for. Contact FAST Technical Support.
1102 Could not open channel to server (No connection to search dispatcher). Verify installation (System Management) and network interfaces.
1103 No query in the query state. Contact FAST Technical Support.
1104 Failed to send query packet. Verify installation (System Management) and network interfaces.
1105 Search timed out. The search client should normally resubmit the query, refer to Errors related to evaluation of complex queries.
1106 Unknown response for query. Contact FAST Technical Support, please include information from query log.
1107 Connection failed while searching (Connection to search dispatcher failed in the query-phase. Verify installation (System Management) and network interfaces.
1108 Failed to send docsum request packet. Verify installation (System Management) and network interfaces.
1109 Docsum fetching timed out (Timed out waiting for docsums from search engine). Refer to Errors related to evaluation of complex queries.
1110 Connection failed while fetching docsums. Verify installation (System Management) and network interfaces.
1111 Unknown response while fetching docsums. Contact FAST Technical Support, please include information from query log.
1112 Failed to store hit information. Contact FAST Technical Support.
1113 Failed to allocate memory for query. Contact FAST Technical Support.
1114 Partial Result. Not possible to retrieve results from all columns (partitions). This may be caused by a connectivity error or an error with a specific search partition. Verify installation (System Management), network interfaces and the status for the search partitions.
1201 FAST Query Language (FQL) query parsing error. Refer to the error text for error details.
1202 Result processor failure. Check query parameters.
1998 Requesting a result template that is not supported. May occur when using customized result template formats.
1999 Query & Result Server error. Contact FAST Technical Support.
2000 Failed to write data to client. (Will only be present in the query logs.) This error is caused by the client closing the connection prematurely.

Wednesday, May 13, 2009

Indexing Issue : excel macro files

PROBLEM
-----------

You found that indexing failed on an excel file with macro with "Password proctected"
error message. While the document is not password protected, we found that certain
parts of the excel document/work sheet are protected and non editable.

RESOLUTION
---------------

Password protected excel files are not supported, as the stellant converter will fail
to process it. In this event, it is normal to receive the error telling you there is
a problem with the documents being password protected or encrypted. In addition,
one will also receive the error if certain parts of the excel document/work sheet are
non editable. The explanation from Stellent on this is that their software cannot
differentiate between various kinds of protections that Microsoft has for Excel
sheets. That is, when a worksheet is being processed, all the converter sees is that
the worksheet has a password protection of some sort. How a worksheet is protected
and what parts of it are protected are not known, because this information is
unavailable to Stellent.

Have a site that recently changed IP, but the crawler is still using an old IP

When the IP is refreshed depends on the Time To Live received from the DNS server.However in crawler versions 6.4.16 and below this behavior was not correct and the dns cache was not updated. The issue is fixed in 6.4.17+, however you can refresh the dns cache the following way:

Stop the crawler and on the ubermaster and the master nodes,

Remove (consider taking a backup) the file

$FASTSEARCH/data/crawler/config/dnscache.hashdb.

Now start the crawler.

The DNS cache should now be reset.

Wednesday, November 19, 2008

RankLog in FAST ESP

The SBC (Search Business Center) provides a simple way of defining how document summaries are rendered, but only allows for the
fields returned to be used(When the rank log turned on).

To Enable RankLog:

1. Go to Search Profile Settings > Query Handling in SBC.
2. Add the static query parameter ranklog=true and save.
3. Publish the Search Profile by going into Publishing and click on Publish
Search Profile.

Thursday, October 23, 2008

FAST ESP - Enable GEO Search

With Geo Search you can control the sorting based on geographical distance from a given start position/geographical location.

FAST ESP supports geographical coordinates associated with documents, and lets you sort and filter results based on radius or a rectangular geographical area. Using the filter option you can use regular sorting or ranking. Sorting based on distance can not be combined with regular ranking.

To search with Geo sort/filter :

1. GEO search must be enabled in the back-end.
2. GEO data must be fed into FAST ESP.


To Enable Geo in Fast SFE :

1. Open $FASTSEARCH/adminserver/webapps/sfe/WEB-
INF/classes/com/fastsearch/espimpl/sfeapi/searchservice/SearchServiceImpl.properties
2. Add com.fastsearch.espimpl.sfeapi.searchservice.search.geo.LatLonGeoSearchImpl to custom_search_inputs=
3. Add com.fastsearch.espimpl.sfeapi.searchservice.result.geo.GeoGraphImpl to custom_result_aspects=

4. Restart the ESP using nctrl restart command.

Now you can see the Geo features in SFE under the Advanced search Tab.

Sunday, October 12, 2008

ESP : Error 1005

Find the below log file for this error.It will accours when the QPS exceeds the limit.

Error :

Error 1005 Query Term Refuse

Solution :

Check the QPS license limitations via Admin GUI.If it's exceeds the limit try to get a new license and restart the QR Server.

ESP : Error 28 No space left on device

Find the below log file for this error.It will accours when you did a mistake in Config file.

$FASTSEARCH/var/log/configserver.scrap

It's one of the FATAL error.The main causes for this error is,there is no space available to store the config file in that partition.

Error :

Error saving main configuration
file: IOError: [Errno 28] No space left on device

Solution :

Clear some space to allow the configserver to save configuration.

Note :
Stopping the configserver during these conditions may cause information to be lost.

ESP : Error Code 226

Find the below log file for this error.It will accours when you did a mistake in Config file.

$FASTSEARCH/var/log/configserver.scrap

It's one of the FATAL error.The main causes for this error is,some program/Application using the port what ESP use.

Error :

Failed to start ConfigServer:
error: (226, 'Address already in use')

Solution :

Start the configserver on another port or shut down the program using the one you are trying to use.(Edit the Port element in config file)

ESP : FATAL Error 128

Find the below log file for this error.It will accours when you did a mistake in Config file.

$FASTSEARCH/var/log/configserver.scrap

It's one of the FATAL error.The main causes for this error is,FAST ESP could not able to do the character encoding during the load of Cofig file.

Error :

Error loading config file: UnicodeError: ASCII encoding
error: ordinal not in range (128)

Solution :

Edit the configuration file and remove those characters.

ESP : Indexing

The purpose of storing an index is to optimize speed and performance in finding relevant documents for a search query. Without an index, the search engine would scan every document in the corpus, which would require considerable time and computing power. For example, while an index of 10,000 documents can be queried within milliseconds, a sequential scan of every word in 10,000 large documents could take hours. The additional computer storage required to store the index, as well as the considerable increase in the time required for an update to take place, are traded off for the time saved during information retrieval.

Index Design Factors
Major factors in designing a search engine's architecture include:

Merge factors

How data enters the index, or how words or subject features are added to the index during text corpus traversal, and whether multiple indexers can work asynchronously. The indexer must first check whether it is updating old content or adding new content. Traversal typically correlates to the data collection policy. Search engine index merging is similar in concept to the SQL Merge command and other merge algorithms.

Storage techniques
How to store the index data, that is, whether information should be data compressed or filtered.

Index size
How much computer storage is required to support the index.

Lookup speed
How quickly a word can be found in the inverted index. The speed of finding an entry in a data structure, compared with how quickly it can be updated or removed, is a central focus of computer science.

Maintenance
How the index is maintained over time.

Fault tolerance
How important it is for the service to be reliable. Issues include dealing with index corruption, determining whether bad data can be treated in isolation, dealing with bad hardware, partitioning, and schemes such as hash-based or composite partitioning, as well as replication.

Index Data Structures
Search engine architectures vary in the way indexing is performed and in methods of index storage to meet the various design factors. Types of indices include:

Suffix tree

Figuratively structured like a tree, supports linear time lookup. Built by storing the suffixes of words. The suffix tree is a type of trie. Tries support extendable hashing, which is important for search engine indexing.[8] Used for searching for patterns in DNA sequences and clustering. A major drawback is that the storage of a word in the tree may require more storage than storing the word itself. An alternate representation is a suffix array, which is considered to require less virtual memory and supports data compression such as the BWT algorithm.

Tree
An ordered tree data structure that is used to store an associative array where the keys are strings. Regarded as faster than a hash table but less space-efficient.

Inverted index
Stores a list of occurrences of each atomic search criterion[10], typically in the form of a hash table or binary tree.

Citation index
Stores citations or hyperlinks between documents to support citation analysis, a subject of Bibliometrics.

Ngram index
Stores sequences of length of data to support other types of retrieval or text mining.

Term document matrix
Used in latent semantic analysis, stores the occurrences of words in documents in a two-dimensional sparse matrix.

Friday, October 10, 2008

ESP - Partial Update

We can do the partial updates via Content API (feeder. updateDocument ()) when Indexer doing the Incremental Indexing.

Basically during the document processing stage, ESP will decide to do the partial update or full update (Add Document).For that we need to enable the Partial Update option in our custom processor/Pipeline.


1. $FASTHOME\etc\processors\ProcessorServer.xml

2. Add the Partial Update element in our custom processors.

3. Change the XMLMapper.xml to enable the Partial Update.



We can do the same via web Analyzer tool also .

Partial Update considerations :

The update methods provides a means for partial document updates and have certain limitations.
As a general rule these methods should only be used to update metadata or numeric elements that does not require any document processing. datetime elements can also be updated.
This means that if you need to update the actual content of an HTML page, PDF document or XML file the add methods must be used.
It is possible to implement custom document processing that supports partial update

Monday, September 15, 2008

FAST : Integrate the File Traverser

A user friendly interface to the File Traverser can be intgrated into the administrator interface.The connector controller module enables the File Traverser to be integrated into the FAST ESP administrator interface.

To integrate the connector controller, complete the following procedure on each node that you want to make available for file traversing via the administrator interface.

Note :

If it seems from the logs that the file traverser does not start, Check the
connectorcontroller scrap file $FASTSEARCH/var/log/connectorcontroller/connectorcontroller.scrap and the
filetraverser scrap file in $FASTSEARCH/var/log/FileTraverser_.scrap

1. Add the following entries to the $FASTSEARCH/etc/NodeConf.xml file.
a) Add the following to the element:



b) Add the following the list of processes:



2. Execute command: $FASTSEARCH/bin/nctrl reloadcfg
3. Execute command: $FASTSEARCH/bin/nctrl start connectorcontroller

The File Traverser should now appear as a Data Source in the administrator interface.

4. Test a normal situation scenario
a) Add collections with the file traverser as a data source.
b) Start and stop the data source.
c) Delete the collection.

FAST : Disable User Authentication

We can disable user authentication in the FAST ESP administrator interface by completing the steps in this procedure.

1. Open $FASTHOME/etc/guiConfig.php
2. Set the following parameter:
$ADMINGUI_PHP_AUTH_DISABLED=True;

Thursday, September 4, 2008

FAST ESP : Check the DocCount for Collection

Question

Is there a way to determine if the index for a collection is completely
empty and deleted.i.e. after adminclient -d AND deleting the collection in the GUI.
How can we know that everything is really gone.

Answer:

On large systems deleting all documents in a collection may take quite
some time. You should verify that all documents in the collection are gone
by issuing doccount-commands to all columns by using the rtsinfo tool.

Usage:

rtsinfo nameserver nameserverport clustername columnid rowid


For a system with three columns, one row and standard port range, run
these three commands on the admin node.

rtsinfo adminhost 16099 webcluster 0 0 doccount collectionname
rtsinfo adminhost 16099 webcluster 1 0 doccount collectionname
rtsinfo adminhost 16099 webcluster 2 0 doccount collectionname
(replace adminhost and collectionname with the entries valid for your system)
Typical output from each of these commands:

There are 1750 docs in the collection collectionname.
SUCCESS.

When "0 docs" is reported from all columns, the collection is clean.

FAST ESP 4.3.x : Delete Indexed Documents

QUESTION:

I have several collections that I would like to re-crawl from scratch, but I don't want to have to reconfigure all the settings for each. In FDS 3.x, is there a way to delete all crawled data without losing the collection configurations?


ANSWER:

Here are the steps required for deleting all crawled data and the index from a 3.2 installation without removing the crawler configuration:

IMPORTANT - This will cause complete loss of all indexed documents,
therefore, search will be unavailable for some time until the crawler has begun re-populating the collections. We strongly recommend initiating this procedure during a system maintenance window.

1. Stop FDS from the Admin GUI or using the command 'net stop FASTDSService'

2. Ensure all FAST processes have had time to stop completely and manually kill any remaining processes with the Task Manager

3. Delete all files and directories within the %FASTSEARCH%\data\directory, EXCEPT %FASTSEARCH%\data\crawler\run\domainspec (this file contains the crawler collection configurations)

4. Start FDS with the command 'net start FASTDSService'

5. Once all FDS processes are active in the System Management page, open up the collection configuration for each collection, verify that the settings are still correct and then click 'submit' on each to refresh the collection information.


NOTES:


-You may see temporary OSErrors for the PostProcessor trying to locate the collections directory (which will be in the process of being rebuilt).

- You may also see temporary errors from the QRServer, such as 'All partitions down', because the index is still being rebuilt.

- Some collections may start immediately crawling, while others may be idle for a short time before they start crawling.

FAST ESP : Term Descriptions

Question :

**********

Do you have a quick reference sheet for the terms associated with indexing and related concepts such as: Search Clusters, Search Columns, Search Rows

ANSWER
=======

This reference is found in the FAST Data Search 3.2 Configuration Guide.

A Data Search installation may consist of a number of Search Engines. A Search Engine provides indexing and search features towards a given partition of the total searchable content. The Search Engines are grouped in Search Clusters, Search Columns and Search Rows.

A Search Cluster is a group of Search Engines that share the same Index Profile (schema). This means that the collections assigned to this cluster may be mapped to the same index layout. One Search Cluster may for instance contain web pages and documents, while another Search Cluster may contain items from a content database.

The cluster may include multiple Search Rows (query rate scaling) and Search Columns (data volume scaling) that share the same index configuration. The matrix in the figure above indicates this.

Each Search Cluster will have a number of Collections assigned to it,which provides a logical grouping of content. Note that the collection concept represents a logical grouping of the content within the Search Cluster (one collection resides inside one
Search Cluster, but may be spread across multiple Search Columns).

The Document Processing is performed prior to indexing. Within the document processing each document is represented by a set of Elements, which can be further processed and later mapped to searchable Fields via the Index Profile. Elements and Fields may represent content parts and attributes related to the document (body,
title, heading, URI, author, category).

The Index Profile defines the layout/schema of the searchable index, and defines how fields are to be treated by query and result processing. Each Search Cluster has an associated Index Profile.

The Index Profile also includes one or more Result Views that defines alternative ways for a query front-end to view the index with respect to queries.

FAST ESP : Duplicate items when searching

Question :
**********

I'm getting a lot of identical hits for the same item. What have I
done wrong?

ANSWER:
*******

There are several possible causes for this, but the most common cause
is that the document ID is not present in the document summary.
Unless you've explicitly disabled incremental indexing, the first
entry in the first document summary class MUST be the document ID. If
not, incremental indexing will not work, and you will get lots of
duplicate items.