Tuesday, May 19, 2009
Fast Esp : QRServer and fdispatch not connected
Resolution:
Restart the fdispatch and QRServer processes and then complete the following general debugging procedure:
1. Turn on fnet debugging:
http://:15100/control?debug.fnet=1
2. Issue some queries.
2. Look at the output in qrserver.scrap for output similar to the following:
[2004-01-21 09:33:07] INFO : qrserver->fnet: events[/s][loop/int/io][
967.3/0.0/1.0] packets[/s][r/w][1.0/1.0] data[kB/s][r/w][0.03/0.01]
If you find a packets[/s][r/w][0.0/1.0] message displayed, then the problem
is most likely present on your system (zero packets read per second).
4. Debug logging for RTS and searchctrl. To investigate this further, it is useful to have debug logs from two components. On all search nodes:
etc/searchrc-1.xml
set debuglog="true"
On the configuration server node:
etc/config_data/RTSearch/webcluster/rtsearchrc.xml
set debugLog="true"
5. Edit %FASTSEARCH%/etc/searchrc-dispatch.xml on search nodes with topfdispatch.
set debuglog="true"
By default the option is set to false.
6. Edit the files above accordingly. Shut down the system and restart after verifying that all the processes have terminated successfully (frtsobj.exe, fsearchctrl.exe).
FAST ESP : QRServer timed out
Error:
The exception is: no.fast.ds.search.SearchEngineException: Timed out while waiting for query result.
Resolution:
To resolve this issue, increase the source.xml, fdispatch.addon and fsearch.addon timeout values in the system.
1. Update FASTSEARCH/etc/fdispatch.addon with the following values:
maxdocsumwait = 80
maxsearchwait = 70
maxsocksilent = 120
2. Update FASTSEARCH/etc/fsearch.addon with the following value:
maxsocksilent = 120
3. Update FASTSEARCH/etc/qrserver/sources.xml with the following value:
timeout query="60" docsum="70" (the timeout tag)
4. Restart rtsearch/qrserver.
Handling Query Errors (Java Search API)
Example:
try {
IQueryResult result = engine.(query);
...
...
} catch (SearchEngineException e) {
System.err.println("Error " + e.getMessage() + ": " + e.getErrorCode());
}
A nonzero error code indicates query related error messages.
FAST ESP Adminserver error codes and messages
Severity:
ERROR
Log message
com.fastsearch.esp.admin.
engine.corba.JacORBNameserviceService
[Nameservice@null:0 Local@null:
/] could not activate ORB
(org.jacorb.orb.ORB@
) : org.omg.CORBA.
INITIALIZE: Could not create server socket
vmcid: 0x0 minor code: 0 completed: No
Cause(s) :
The TCP port that Adminserver's ORB tries to bind to is unavailable.
This is most likely due to another instance of Adminserver running or not properly shut down.
Action(s):
Check if Adminserver is already running. If it is, terminate it and restart Adminserver. Check if some other process has bound to either of the two TCP ports with 'netstat -an'. Make sure no other processes are running in the FAST ESP port range before starting FAST ESP.
FAST ESP Error Messages
XML template:
Text template:
#SEG NAM _SEGNAME_
####
#ERC 1102
#ERT Could not open channel to server.
####
Note:
Error messages in the 10xx range originate from the Search Engine.
Error messages above that originate from the Query & Result Server.
Code Description :
1001 General error. Unexpected internal error message. Contact FAST Technical Support.
1002 Error parsing query. Check the query syntax. Note: When using the FQL query language a query syntax error is normally reported by error message 1201
1003 All partitions are down (Search Engine). Verify installation (System Management) and network interfaces.
1004 No such dataset. A query parameter `dataset' exists but is not supported in this version of the product. Contact FAST Technical Support if this error message appears.
1005 System is overloaded. This will happen when queries are refused due to QPS license limitations.
1006 The requested functionality is not implemented. Check the query syntax and query parameters.
1007 Query not allowed to run (due to internal resource problem). Normally a temporary resource problem - Resubmit query.
1008 Lost connection to one or more Search Engine sub-nodes. Verify installation (System Management) and network interfaces. The search client should normally resubmit the query, refer to Errors related to evaluation of complex queries.
1009 Multiple errors occurred from different search partitions. The search client should normally resubmit the query, refer to section Errors related to evaluation of complex queries.
1010 Query evaluation error (Internal Search Engine error conditions). Contact FAST Technical Support.
1011 Query timeout: One or more Search Engine nodes did not respond within the query timeout limit. The search client should normally resubmit the query, refer to Errors related to evaluation of complex queries.
1012 Not enough resources, query not possible to resolve. Analyze the query that caused the error message. Refer to Errors related to evaluation of complex queries.
1013 Not enough resources, temporary problem within the Search Engine. Search Front End may re-submit. Refer to Errors related to evaluation of complex queries if the error condition always or frequently occurs for specific types of queries.
1014 Not supported - for queries that are not supported. Check query syntax. Contact FAST Technical Support if you are not able to detect errors in the query syntax.
1015 License checkout problem
1016 Requested generation no longer available. A version of the index can no longer be reached. Contact FAST Technical Support.
1020 Document summary internal error. May be a temporary resource problem, Search Front End may try to resubmit query.
1021 Document summary internal error. May also be related to a connectivity problem or search nodes out of operation. Verify installation (System Management) and network interfaces.
1022 Document summary internal error. May also be related to a connectivity problem or search nodes out of operation. Verify installation (System Management) and network interfaces.
1101 No query state supplied, nothing to search for. Contact FAST Technical Support.
1102 Could not open channel to server (No connection to search dispatcher). Verify installation (System Management) and network interfaces.
1103 No query in the query state. Contact FAST Technical Support.
1104 Failed to send query packet. Verify installation (System Management) and network interfaces.
1105 Search timed out. The search client should normally resubmit the query, refer to Errors related to evaluation of complex queries.
1106 Unknown response for query. Contact FAST Technical Support, please include information from query log.
1107 Connection failed while searching (Connection to search dispatcher failed in the query-phase. Verify installation (System Management) and network interfaces.
1108 Failed to send docsum request packet. Verify installation (System Management) and network interfaces.
1109 Docsum fetching timed out (Timed out waiting for docsums from search engine). Refer to Errors related to evaluation of complex queries.
1110 Connection failed while fetching docsums. Verify installation (System Management) and network interfaces.
1111 Unknown response while fetching docsums. Contact FAST Technical Support, please include information from query log.
1112 Failed to store hit information. Contact FAST Technical Support.
1113 Failed to allocate memory for query. Contact FAST Technical Support.
1114 Partial Result. Not possible to retrieve results from all columns (partitions). This may be caused by a connectivity error or an error with a specific search partition. Verify installation (System Management), network interfaces and the status for the search partitions.
1201 FAST Query Language (FQL) query parsing error. Refer to the error text for error details.
1202 Result processor failure. Check query parameters.
1998 Requesting a result template that is not supported. May occur when using customized result template formats.
1999 Query & Result Server error. Contact FAST Technical Support.
2000 Failed to write data to client. (Will only be present in the query logs.) This error is caused by the client closing the connection prematurely.
Wednesday, May 13, 2009
Indexing Issue : excel macro files
-----------
You found that indexing failed on an excel file with macro with "Password proctected"
error message. While the document is not password protected, we found that certain
parts of the excel document/work sheet are protected and non editable.
RESOLUTION
---------------
Password protected excel files are not supported, as the stellant converter will fail
to process it. In this event, it is normal to receive the error telling you there is
a problem with the documents being password protected or encrypted. In addition,
one will also receive the error if certain parts of the excel document/work sheet are
non editable. The explanation from Stellent on this is that their software cannot
differentiate between various kinds of protections that Microsoft has for Excel
sheets. That is, when a worksheet is being processed, all the converter sees is that
the worksheet has a password protection of some sort. How a worksheet is protected
and what parts of it are protected are not known, because this information is
unavailable to Stellent.
Have a site that recently changed IP, but the crawler is still using an old IP
Stop the crawler and on the ubermaster and the master nodes,
Remove (consider taking a backup) the file
$FASTSEARCH/data/crawler/config/dnscache.hashdb.
Now start the crawler.
The DNS cache should now be reset.
Friday, April 17, 2009
FAST ESP Relevancy Ranking
Relevancy is the measure of how well a set of documents (results) answers or addresses the intent of a given query.
When there are many query matches, the search engines must rank the results by relevance score, sorting the results listing so that the pages most likely to be useful will appear first. Varying algorithms are used to define relevancy. Relevancy definition and tuning is one of core differentiators of FAST ESP platform. This blog post is about the relevance framework and related concepts and features in FAST ESP.
FAST ESP Search Relevance Framework
FAST ESP applies search relevancy through the following key steps:
- Data mining – A document processing framework can be used to perform real-time content refinement. This includes embedded relevancy tools and integration points for 3rd party modules. An Entity Extraction framework enables extraction of named entities and key concepts from documents that may be used for result navigation
- Linguistic normalization – Handles grammatical variations and automatic spell corrections
- Query Processing – A query processing framework applies built-in or custom query transformations based on application specific rules
- Ranking based on the FAST InPerspective model provides a multi-faceted measurement of the quality of the match between the query and a candidate result document
- Query Context Analysis indicates the ability to present the information from the query results in context of the query. FAST ESP supports dynamic document summaries that display the segments of the matching document that provide the most relevant match with the query
- Data Driven Navigation provides dynamic drill-down into the query result or related areas.
The relevancy of a document with respect to a query is represented by a ranking value. Following section lists the different elements used to calculate the rank value.
Elements of Rank Value
Element | Description |
Freshness | Age of a document compared to the time when the query is issued |
Authority | Importance of a document determined by the links to it from other documents |
Quality | Assigned importance of a document, independent of the query |
Geo | Importance of geographical distance between a document’s associated latitude/longitude and a target location specified in a query |
Context | Importance of matching a query in a given document field |
Proximity | For multi-term queries: the shorter the distance between query terms in a document, the higher the document’s rank value |
Position | The earlier a query term occurs in a field, the higher the document’s rank value |
Frequency | The more frequent a query term occurs in a document, the higher the document’s rank value |
Completeness | The greater the number of query terms present in the same field of a matching document, the higher the document’s rank value |
Number | For multi-term queries; the more query terms matched in a document, the higher the document’s rank value |
Relevant Sorting of Query Results
FAST ESP provides three main methods for sorting the results of a query:
- Sorting by rank (relevancy score) - FAST ESP computes a rank value based on a set of parameters as described below. These parameters can be tuned in order to provide the best possible perceived relevancy for the end-user. It is possible to define multiple rank profiles that can be selected on a per query basis
- Sorting by field values - You may also sort query results by value of any searchable field, such as product name, product code, price or date. FAST ESP supports numeric and full-text sorting, single and multi-level sorting, ascending and descending sorting direction and national sorting rules
- Sorting by geographic location - The Geo Search feature provides capabilities for sorting and filtering query results based on geographic location
Rank Profile
A Rank Profile concept enables full control of the relative weight of each rank component for a given query. For example, how important an article’s title is relative to the main text or how important is proximity versus freshness. This enables individual relevance tuning of different query applications using a FAST ESP installation.
In FAST ESP, the Rank Profile is a configuration element within the Index Profile and defines relative weight for the different components of the dynamic rank. Multiple Rank Profiles can be specified in the Index Profile.
Tuning the Ranking and Sorting of Query Results
The ranking and sorting of query results can be tuned in three main ways:
- Multiple Rank Profiles can be specified in the Index Profile. A Rank Profile defines relative weight for the different components of the dynamic rank
- Sorting attributes can be specified for individual fields of the documents
- Result sorting can be controlled on a per query basis. By default the result is sorted by rank as defined in the default Rank Profile. Query parameters enable you to specify an alternative rank profile for the query, or a set of fields that the result shall be sorted by
Relevance support in the Query Language
FAST ESP includes a highly expressive query language that also includes advanced proximity operators:
- Different relevance weight may be applied to different terms or phrases in a query
- Explicit proximity (ordered/unordered NEAR) operators enables precise match in semi-structured content without a need for phrase match
- Boundary match operators enables exact match with extracted entities or entire document elements such as a product name
- Wildcard query support
Dynamic Client Side Ranking
Dynamic client side ranking can be done by using the XRANK operator which is a part of the FAST Query Language (FQL). The boost value is specified with the parameter boost=n, where n is some signed integer value. Negative boost is supported, but if the result of boosting with a negative value is negative then the result will be set to 0.
Its a concept unique to FAST and I will cover it in detail in another post.
Rank Modification Tools
FAST ESP provides tools to modify rank for individual documents. These tools enable you to perform Absolute Query Boost, Relative Query Boost or Relative Document Boost for given documents in the FAST ESP index. An example could be a product database where it may be desired to boost products with highest profit margins, boost products related to campaigns, etc.
Two main tools exist for this purpose:
1) Search Business Center (SBC) - This is an optional, GUI based tool which enables query-oriented rank tuning. The SBC also includes a powerful query reporting module that may be used to assist in the rank tuning. Using the SBC you can change the ranking for each query using three different methods:
- Top Ten - to position the document in one of ten reserved places that will be returned at the top of the results list
- Add boost points - to add a value to a document to increase its relevancy relative to the other documents returned in the search results.You can also add negative boost points to a document.
- Block from query - to prevent the blocked document from appearing in the search results for the query.
2) Rank Tuning Bulk Loader - This is a standard FAST ESP tool that enables you to perform the same rank tuning as the SBC, using an XML file as input. The XML file contains a specification of the rank modifications to be performed
How SharePoint does Relevancy?
Wednesday, November 19, 2008
RankLog in FAST ESP
fields returned to be used(When the rank log turned on).
To Enable RankLog:
1. Go to Search Profile Settings > Query Handling in SBC.
2. Add the static query parameter ranklog=true and save.
3. Publish the Search Profile by going into Publishing and click on Publish
Search Profile.
Thursday, October 23, 2008
FAST ESP - Enable GEO Search
FAST ESP supports geographical coordinates associated with documents, and lets you sort and filter results based on radius or a rectangular geographical area. Using the filter option you can use regular sorting or ranking. Sorting based on distance can not be combined with regular ranking.
To search with Geo sort/filter :
1. GEO search must be enabled in the back-end.
2. GEO data must be fed into FAST ESP.
To Enable Geo in Fast SFE :
1. Open $FASTSEARCH/adminserver/webapps/sfe/WEB-
INF/classes/com/fastsearch/espimpl/sfeapi/searchservice/SearchServiceImpl.properties
2. Add com.fastsearch.espimpl.sfeapi.searchservice.search.geo.LatLonGeoSearchImpl to custom_search_inputs=
3. Add com.fastsearch.espimpl.sfeapi.searchservice.result.geo.GeoGraphImpl to custom_result_aspects=
4. Restart the ESP using nctrl restart command.
Now you can see the Geo features in SFE under the Advanced search Tab.
Sunday, October 12, 2008
ESP : Error 1005
Error :
Error 1005 Query Term Refuse
Solution :
Check the QPS license limitations via Admin GUI.If it's exceeds the limit try to get a new license and restart the QR Server.
ESP : Error 28 No space left on device
$FASTSEARCH/var/log/configserver.scrap
It's one of the FATAL error.The main causes for this error is,there is no space available to store the config file in that partition.
Error :
Error saving main configuration
file: IOError: [Errno 28] No space left on device
Solution :
Clear some space to allow the configserver to save configuration.
Note :
Stopping the configserver during these conditions may cause information to be lost.
ESP : Error Code 226
$FASTSEARCH/var/log/configserver.scrap
It's one of the FATAL error.The main causes for this error is,some program/Application using the port what ESP use.
Error :
Failed to start ConfigServer:
error: (226, 'Address already in use')
Solution :
Start the configserver on another port or shut down the program using the one you are trying to use.(Edit the Port element in config file)
ESP : FATAL Error 128
$FASTSEARCH/var/log/configserver.scrap
It's one of the FATAL error.The main causes for this error is,FAST ESP could not able to do the character encoding during the load of Cofig file.
Error :
Error loading config file: UnicodeError: ASCII encoding
error: ordinal not in range (128)
Solution :
Edit the configuration file and remove those characters.
ESP : Indexing
Index Design Factors
Major factors in designing a search engine's architecture include:
Merge factors
How data enters the index, or how words or subject features are added to the index during text corpus traversal, and whether multiple indexers can work asynchronously. The indexer must first check whether it is updating old content or adding new content. Traversal typically correlates to the data collection policy. Search engine index merging is similar in concept to the SQL Merge command and other merge algorithms.
Storage techniques
How to store the index data, that is, whether information should be data compressed or filtered.
Index size
How much computer storage is required to support the index.
Lookup speed
How quickly a word can be found in the inverted index. The speed of finding an entry in a data structure, compared with how quickly it can be updated or removed, is a central focus of computer science.
Maintenance
How the index is maintained over time.
Fault tolerance
How important it is for the service to be reliable. Issues include dealing with index corruption, determining whether bad data can be treated in isolation, dealing with bad hardware, partitioning, and schemes such as hash-based or composite partitioning, as well as replication.
Index Data Structures
Search engine architectures vary in the way indexing is performed and in methods of index storage to meet the various design factors. Types of indices include:
Suffix tree
Figuratively structured like a tree, supports linear time lookup. Built by storing the suffixes of words. The suffix tree is a type of trie. Tries support extendable hashing, which is important for search engine indexing.[8] Used for searching for patterns in DNA sequences and clustering. A major drawback is that the storage of a word in the tree may require more storage than storing the word itself. An alternate representation is a suffix array, which is considered to require less virtual memory and supports data compression such as the BWT algorithm.
Tree
An ordered tree data structure that is used to store an associative array where the keys are strings. Regarded as faster than a hash table but less space-efficient.
Inverted index
Stores a list of occurrences of each atomic search criterion[10], typically in the form of a hash table or binary tree.
Citation index
Stores citations or hyperlinks between documents to support citation analysis, a subject of Bibliometrics.
Ngram index
Stores sequences of length of data to support other types of retrieval or text mining.
Term document matrix
Used in latent semantic analysis, stores the occurrences of words in documents in a two-dimensional sparse matrix.
Friday, October 10, 2008
ESP - Partial Update
Basically during the document processing stage, ESP will decide to do the partial update or full update (Add Document).For that we need to enable the Partial Update option in our custom processor/Pipeline.
1. $FASTHOME\etc\processors\ProcessorServer.xml
2. Add the Partial Update element in our custom processors.
3. Change the XMLMapper.xml to enable the Partial Update.
We can do the same via web Analyzer tool also .
Partial Update considerations :
The update methods provides a means for partial document updates and have certain limitations.
As a general rule these methods should only be used to update metadata or numeric elements that does not require any document processing. datetime elements can also be updated.
This means that if you need to update the actual content of an HTML page, PDF document or XML file the add methods must be used.
It is possible to implement custom document processing that supports partial update
Monday, September 15, 2008
FAST : Integrate the File Traverser
To integrate the connector controller, complete the following procedure on each node that you want to make available for file traversing via the administrator interface.
Note :
If it seems from the logs that the file traverser does not start, Check the
connectorcontroller scrap file $FASTSEARCH/var/log/connectorcontroller/connectorcontroller.scrap and the
filetraverser scrap file in $FASTSEARCH/var/log/FileTraverser_
1. Add the following entries to the $FASTSEARCH/etc/NodeConf.xml file.
a) Add the following to the
b) Add the following the list of processes:
2. Execute command: $FASTSEARCH/bin/nctrl reloadcfg
3. Execute command: $FASTSEARCH/bin/nctrl start connectorcontroller
The File Traverser should now appear as a Data Source in the administrator interface.
4. Test a normal situation scenario
a) Add collections with the file traverser as a data source.
b) Start and stop the data source.
c) Delete the collection.
FAST : Disable User Authentication
1. Open $FASTHOME/etc/guiConfig.php
2. Set the following parameter:
$ADMINGUI_PHP_AUTH_DISABLED=True;
Thursday, September 4, 2008
FAST ESP ; MARSHAL_MessageSizeExceedLimitOnClient
---------------
Error: MARSHAL_MessageSizeExceedLimitOnClient What can be the reason for it?
Answer:
--------------
Error: MARSHAL_MessageSizeExceedLimitOnClient usually happens by trying to extract
records or attachments beyond a specified limit. Make sure that you have the
OMNIORB_CONFIG environment variable set to point to the omniorb.cfg file. In this file you can look for the property giopMaxMsgSize = 209715200 # 200 MBytes.
The default level I believe is 200MB.
The hanging will happen when you have this misconfigured.
FAST ESP : Check the DocCount for Collection
Is there a way to determine if the index for a collection is completely
empty and deleted.i.e. after adminclient -d AND deleting the collection in the GUI.
How can we know that everything is really gone.
Answer:
On large systems deleting all documents in a collection may take quite
some time. You should verify that all documents in the collection are gone
by issuing doccount-commands to all columns by using the rtsinfo tool.
Usage:
rtsinfo nameserver nameserverport clustername columnid rowid
For a system with three columns, one row and standard port range, run
these three commands on the admin node.
rtsinfo adminhost 16099 webcluster 0 0 doccount collectionname
rtsinfo adminhost 16099 webcluster 1 0 doccount collectionname
rtsinfo adminhost 16099 webcluster 2 0 doccount collectionname
(replace adminhost and collectionname with the entries valid for your system)
Typical output from each of these commands:
There are 1750 docs in the collection collectionname.
SUCCESS.
When "0 docs" is reported from all columns, the collection is clean.