Thursday, September 4, 2008

FAST ESP ; MARSHAL_MessageSizeExceedLimitOnClient

Question:
---------------

Error: MARSHAL_MessageSizeExceedLimitOnClient What can be the reason for it?

Answer:
--------------

Error: MARSHAL_MessageSizeExceedLimitOnClient usually happens by trying to extract
records or attachments beyond a specified limit. Make sure that you have the
OMNIORB_CONFIG environment variable set to point to the omniorb.cfg file. In this file you can look for the property giopMaxMsgSize = 209715200 # 200 MBytes.

The default level I believe is 200MB.

The hanging will happen when you have this misconfigured.

FAST ESP : Check the DocCount for Collection

Question

Is there a way to determine if the index for a collection is completely
empty and deleted.i.e. after adminclient -d AND deleting the collection in the GUI.
How can we know that everything is really gone.

Answer:

On large systems deleting all documents in a collection may take quite
some time. You should verify that all documents in the collection are gone
by issuing doccount-commands to all columns by using the rtsinfo tool.

Usage:

rtsinfo nameserver nameserverport clustername columnid rowid


For a system with three columns, one row and standard port range, run
these three commands on the admin node.

rtsinfo adminhost 16099 webcluster 0 0 doccount collectionname
rtsinfo adminhost 16099 webcluster 1 0 doccount collectionname
rtsinfo adminhost 16099 webcluster 2 0 doccount collectionname
(replace adminhost and collectionname with the entries valid for your system)
Typical output from each of these commands:

There are 1750 docs in the collection collectionname.
SUCCESS.

When "0 docs" is reported from all columns, the collection is clean.

FAST ESP 4.3.x : Delete Indexed Documents

QUESTION:

I have several collections that I would like to re-crawl from scratch, but I don't want to have to reconfigure all the settings for each. In FDS 3.x, is there a way to delete all crawled data without losing the collection configurations?


ANSWER:

Here are the steps required for deleting all crawled data and the index from a 3.2 installation without removing the crawler configuration:

IMPORTANT - This will cause complete loss of all indexed documents,
therefore, search will be unavailable for some time until the crawler has begun re-populating the collections. We strongly recommend initiating this procedure during a system maintenance window.

1. Stop FDS from the Admin GUI or using the command 'net stop FASTDSService'

2. Ensure all FAST processes have had time to stop completely and manually kill any remaining processes with the Task Manager

3. Delete all files and directories within the %FASTSEARCH%\data\directory, EXCEPT %FASTSEARCH%\data\crawler\run\domainspec (this file contains the crawler collection configurations)

4. Start FDS with the command 'net start FASTDSService'

5. Once all FDS processes are active in the System Management page, open up the collection configuration for each collection, verify that the settings are still correct and then click 'submit' on each to refresh the collection information.


NOTES:


-You may see temporary OSErrors for the PostProcessor trying to locate the collections directory (which will be in the process of being rebuilt).

- You may also see temporary errors from the QRServer, such as 'All partitions down', because the index is still being rebuilt.

- Some collections may start immediately crawling, while others may be idle for a short time before they start crawling.

FAST ESP : Term Descriptions

Question :

**********

Do you have a quick reference sheet for the terms associated with indexing and related concepts such as: Search Clusters, Search Columns, Search Rows

ANSWER
=======

This reference is found in the FAST Data Search 3.2 Configuration Guide.

A Data Search installation may consist of a number of Search Engines. A Search Engine provides indexing and search features towards a given partition of the total searchable content. The Search Engines are grouped in Search Clusters, Search Columns and Search Rows.

A Search Cluster is a group of Search Engines that share the same Index Profile (schema). This means that the collections assigned to this cluster may be mapped to the same index layout. One Search Cluster may for instance contain web pages and documents, while another Search Cluster may contain items from a content database.

The cluster may include multiple Search Rows (query rate scaling) and Search Columns (data volume scaling) that share the same index configuration. The matrix in the figure above indicates this.

Each Search Cluster will have a number of Collections assigned to it,which provides a logical grouping of content. Note that the collection concept represents a logical grouping of the content within the Search Cluster (one collection resides inside one
Search Cluster, but may be spread across multiple Search Columns).

The Document Processing is performed prior to indexing. Within the document processing each document is represented by a set of Elements, which can be further processed and later mapped to searchable Fields via the Index Profile. Elements and Fields may represent content parts and attributes related to the document (body,
title, heading, URI, author, category).

The Index Profile defines the layout/schema of the searchable index, and defines how fields are to be treated by query and result processing. Each Search Cluster has an associated Index Profile.

The Index Profile also includes one or more Result Views that defines alternative ways for a query front-end to view the index with respect to queries.

FAST ESP : Duplicate items when searching

Question :
**********

I'm getting a lot of identical hits for the same item. What have I
done wrong?

ANSWER:
*******

There are several possible causes for this, but the most common cause
is that the document ID is not present in the document summary.
Unless you've explicitly disabled incremental indexing, the first
entry in the first document summary class MUST be the document ID. If
not, incremental indexing will not work, and you will get lots of
duplicate items.

FAST ESP - Error code 1102: "Could not open channel to server."

Error code 1102: "Could not open channel to server." in the var/log/qrserver.scrap file.

Description:
--------------

1102 is the error code for "Could not open channel to server.", means that the topfdispatch process the qrserver has been configured to use, is not listening to the transport port.

In such error cases the topfdispatch is most likely down so that all queries issued in the time-period will receive the 1102 error code or in addition you may see the transition error codes listed below.

Transition error may appear when fdispatch goes down and the qrserver loses the
connection.

Typical transition error codes are:

1107: "Connection failed while waiting for query result."
1110: "Connection failed while waiting for document summaries."


Solutions:
---------
Restart of the topfdispatchers which can be done from the Admin GUI --> System
Management.

Such a down/up transition can be caused by a slow system (i.e a ping time out).

You could try to increase the "pingioctimeout" option by updating the file
$FASTSEARCH/etc/config_data/QRServer/webcluster/etc/qrserver/qrserverrc for
instance with:

pingioctimeout = 30000 # 30 seconds

and restarting the qrserver (nctrl stop/start qrserver) process on all nodes that are running qrserver.

To check if a server is running "qrserver" or any process, use the command
"nctrl sysstatus".

FAST ESP : 'ConfigServerExceptions.CollectionError'

Question:
==========
I have a collection I am trying to delete through the Admin gui. When clicking on the trashcan it says the collection was fully deleted and gives me a success message. But when I go in and try to create a new collection with the same name I get the following:

FaultCode: 1.

Reason 'ConfigServerExceptions.CollectionError: The Collection
aehcatalog1 already exists (in d:\e\win2ksp3-i686\datasearch-3.1.0.10-
filter-flexlm-000
\common\datasearch\src\configserver\ConfigServerConfig.py:CreateCollec
tion line 794)'

What am I doing wrong?

Solution:
===========
The collection isn't actually deleted when initially performing the action of deleting. When you delete the collection is "scheduled for deletion", you see all the documents that are associated with the collection are blacklisted in the search index and will be removed as the deletes are pushed though the system (this happens automatically)

However if you try to add a collection back with the same name, you will not be able to because it wasn't fully deleted. In reality you will be able to add it back again, however it might take a few hours before the system is ready to accept a collection with the same name again.

A suggestion is to create a collection with a different name.If you want to add the collection back, you'll have to wait for the system to digest your request to delete it. That will allow you at least work with the collection and pipeline, until you have it set exactly the way you want.Then you can add the collection back as the
original name.

FAST ESP : Delete the Indexed Document

Sometimes Documents remain in the index even after We have deleted the collection.We can delete these remaining documents from the index without deleting documents from other collections.

Please do the following to delete the Indexed Documents from the Collection :

1. Run %FASTSEARCH%\bin\rtsinfo
>
allids.txt

2. Run sed.exe “s/ -.*//g” < allids.txt > killthis.txt

3. Run %FASTSEARCH%\bin\rtsadmin
rdocs killthis.txt


Notes : After FAST 4.0.x Only Support