The World of J2EE Technology: 4/12/09

Friday, April 17, 2009

FAST ESP Relevancy Ranking

Relevancy is the measure of how well a set of documents (results) answers or addresses the intent of a given query.

When there are many query matches, the search engines must rank the results by relevance score, sorting the results listing so that the pages most likely to be useful will appear first. Varying algorithms are used to define relevancy. Relevancy definition and tuning is one of core differentiators of FAST ESP platform. This blog post is about the relevance framework and related concepts and features in FAST ESP.

FAST ESP Search Relevance Framework

FAST ESP applies search relevancy through the following key steps:

Data mining – A document processing framework can be used to perform real-time content refinement. This includes embedded relevancy tools and integration points for 3rd party modules. An Entity Extraction framework enables extraction of named entities and key concepts from documents that may be used for result navigation

Linguistic normalization – Handles grammatical variations and automatic spell corrections

Query Processing – A query processing framework applies built-in or custom query transformations based on application specific rules

Ranking based on the FAST InPerspective model provides a multi-faceted measurement of the quality of the match between the query and a candidate result document

Query Context Analysis indicates the ability to present the information from the query results in context of the query. FAST ESP supports dynamic document summaries that display the segments of the matching document that provide the most relevant match with the query

Data Driven Navigation provides dynamic drill-down into the query result or related areas.

The relevancy of a document with respect to a query is represented by a ranking value. Following section lists the different elements used to calculate the rank value.

Elements of Rank Value

Element	Description
Freshness	Age of a document compared to the time when the query is issued
Authority	Importance of a document determined by the links to it from other documents
Quality	Assigned importance of a document, independent of the query
Geo	Importance of geographical distance between a document’s associated latitude/longitude and a target location specified in a query
Context	Importance of matching a query in a given document field
Proximity	For multi-term queries: the shorter the distance between query terms in a document, the higher the document’s rank value
Position	The earlier a query term occurs in a field, the higher the document’s rank value
Frequency	The more frequent a query term occurs in a document, the higher the document’s rank value
Completeness	The greater the number of query terms present in the same field of a matching document, the higher the document’s rank value
Number	For multi-term queries; the more query terms matched in a document, the higher the document’s rank value

Relevant Sorting of Query Results

FAST ESP provides three main methods for sorting the results of a query:

Sorting by rank (relevancy score) - FAST ESP computes a rank value based on a set of parameters as described below. These parameters can be tuned in order to provide the best possible perceived relevancy for the end-user. It is possible to define multiple rank profiles that can be selected on a per query basis

Sorting by field values - You may also sort query results by value of any searchable field, such as product name, product code, price or date. FAST ESP supports numeric and full-text sorting, single and multi-level sorting, ascending and descending sorting direction and national sorting rules

Sorting by geographic location - The Geo Search feature provides capabilities for sorting and filtering query results based on geographic location

Rank Profile

A Rank Profile concept enables full control of the relative weight of each rank component for a given query. For example, how important an article’s title is relative to the main text or how important is proximity versus freshness. This enables individual relevance tuning of different query applications using a FAST ESP installation.

In FAST ESP, the Rank Profile is a configuration element within the Index Profile and defines relative weight for the different components of the dynamic rank. Multiple Rank Profiles can be specified in the Index Profile.

Tuning the Ranking and Sorting of Query Results

The ranking and sorting of query results can be tuned in three main ways:

Multiple Rank Profiles can be specified in the Index Profile. A Rank Profile defines relative weight for the different components of the dynamic rank

Sorting attributes can be specified for individual fields of the documents

Result sorting can be controlled on a per query basis. By default the result is sorted by rank as defined in the default Rank Profile. Query parameters enable you to specify an alternative rank profile for the query, or a set of fields that the result shall be sorted by

Relevance support in the Query Language

FAST ESP includes a highly expressive query language that also includes advanced proximity operators:

Different relevance weight may be applied to different terms or phrases in a query

Explicit proximity (ordered/unordered NEAR) operators enables precise match in semi-structured content without a need for phrase match

Boundary match operators enables exact match with extracted entities or entire document elements such as a product name

Wildcard query support

Dynamic Client Side Ranking

Dynamic client side ranking can be done by using the XRANK operator which is a part of the FAST Query Language (FQL). The boost value is specified with the parameter boost=n, where n is some signed integer value. Negative boost is supported, but if the result of boosting with a negative value is negative then the result will be set to 0.

Its a concept unique to FAST and I will cover it in detail in another post.

Rank Modification Tools

FAST ESP provides tools to modify rank for individual documents. These tools enable you to perform Absolute Query Boost, Relative Query Boost or Relative Document Boost for given documents in the FAST ESP index. An example could be a product database where it may be desired to boost products with highest profit margins, boost products related to campaigns, etc.

Two main tools exist for this purpose:

1) Search Business Center (SBC) - This is an optional, GUI based tool which enables query-oriented rank tuning. The SBC also includes a powerful query reporting module that may be used to assist in the rank tuning. Using the SBC you can change the ranking for each query using three different methods:

Top Ten - to position the document in one of ten reserved places that will be returned at the top of the results list

Add boost points - to add a value to a document to increase its relevancy relative to the other documents returned in the search results.You can also add negative boost points to a document.

Block from query - to prevent the blocked document from appearing in the search results for the query.

2) Rank Tuning Bulk Loader - This is a standard FAST ESP tool that enables you to perform the same rank tuning as the SBC, using an XML file as input. The XML file contains a specification of the rank modifications to be performed
How SharePoint does Relevancy?

Object Persistence

One of the most critical tasks that applications have to perform is
to save and restore data. Whether it be a word processing application
that saves documents to disk, a utility that remembers its configuration
for next time, or a game that sets aside world domination for the night,
the ability to store data and later retrieve it is a vital one. Without
it, software would be little more effective that the typewriter - users
would have to re-type the data to make further modifications once the
application exits.

Writing the code for saving data, however, can become boring repetitive
work. First, the programmer must create a specification document for the
proposed file structure. Next, the programmer must implement save and
restore functions that convert object data to & from
primitive data types, and test it with sample data.

If the application later requires new data to be stored, the file
specification must be modified, as well as the save and restore methods. Take it from someone who's been there -
creating save & restore functions is not a fun task.

The solution to this is object serialization.
Object serialization takes an object's state, and converts it to a
stream of data for you. With object serialization, it's an easy task to
take any object, and make it persistent, without writing custom code to
save object member variables. The object can be restored at a later
time, and even a later location. With persistence, we can move an object
from one computer to another, and have it maintain its state. This very
cool feature, in Java, also happens to be very easy to use.

Serializing
objects

Java makes it easy to serialize objects. Any object
whose class implements the java.io.Serializable interface can be made
persistent with only a few lines of code. No extra methods need to be
added to implement the interface, however - the purpose of the interface
is to identify at run-time which classes can be safely serialized, and
which cannot. You, as a programmer, need only add the implements keyword
to your class declaration, to identify your classes as serializable.

public class UserData implements
       java.io.Serializable

Now, once a class is serializable, we can write the object to any
OutputStream, such as to disk or a socket connection. To achieve this,
we must first create an instance of java.io.ObjectOutputStream, and pass
the constructor an existing OutputStream instance.

// Write to disk with FileOutputStream
FileOutputStream f_out = new 
 FileOutputStream("myobject.data");

// Write object with ObjectOutputStream
ObjectOutputStream obj_out = new
 ObjectOutputStream (f_out);

// Write object out to disk
obj_out.writeObject ( myObject );

Note that any Java object that implements the serializable interface
can be written to an output stream this way - including those that are
part of the Java API. Furthermore, any objects that are referenced by a serialized
object will also be stored. This means that arrays, vectors, lists, and
collections of objects can be saved in the same fashion - without the
need to manually save each one. This can lead to significant time and
code savings.

Restoring objects from a serialized state

Reading objects back is almost as easy. The one catch is that at
runtime, you can never be completely sure what type of data to expect. A
data stream containing serialized objects may contain a mixture of
different object classes, so you need to explicitly cast an object to a
particular class. If you've never cast an object before, the procedure
is relatively straightforward. First check the object's class, using the
instanceof operator. Then cast to the correct class.

// Read from disk using FileInputStream
FileInputStream f_in = new 
 FileInputStream("myobject.data");

// Read object using ObjectInputStream
ObjectInputStream obj_in = 
 new ObjectInputStream (f_in);

// Read an object
Object obj = obj_in.readObject();

if (obj instanceof Vector)
{
 // Cast object to a Vector
 Vector vec = (Vector) obj;

 // Do something with vector....
}

Further issues with serialization

As you can see, it's relatively easy to serialize an object. Whenever
new fields are added to an object, they will be saved automatically,
without requiring modification to your save and restore code. However,
there are some cases where this behavior is not desirable. For example,
a password member variable might not be safe to transmit to third
parties over a network connection, and might need to be left blank. In
this case, the transient keyword can be used. The
transient field indicates that a particular member variable should not
be saved. Though not used often, it's an important keyword to remember.

public class UserSession implements 
         java.io.Serializable
{
 String username;
 transient String password;
}

Summary

Java's support for object serialization makes the implementation of persistent
objects extremely easy. In contrast, the amount of code required to save
and restore every field of an object is complex and repetitive work. While it is certainly possible to write your own
serialization mechanism, the simplicity of that provided by Java would
be hard to beat.

Serialization benefits programmers by

Reducing time taken to write code for save and restoration of
object or application state

Eliminating complexity of save and restore operations, and
avoiding the need for creating a new file format

Making it easier for objects to travel over a network connection.

With relatively little effort, you can apply serialization to a
variety of tasks. Not only do applications benefit from serialization,
but also applets. Rather than specifying a long list of parameters, or
performing time consuming initialization and parsing, an applet can
simple reload a configuration object whose member variables contain all
the information needed to execute. It's not just useful for Java applications - even
applets can make benefit, by loading their configuration details or
parameters. With a little imagination, serialization may just have a
place in your next project.

Differnce between C++ and Java

As a C++ programmer, you already have the basic idea of object-oriented programming, and the syntax of Java no doubt looks familiar to you. This makes sense since Java was derived from C++.
However, there are a surprising number of differences between C++ and Java.

These differences are intended to be significant improvements, and if you understand the differences you'll see why Java is such a beneficial programming language. This
article takes you through the important features that distinguish Java from C++.

The biggest potential stumbling block is speed: interpreted Java
runs in the range of 20 times slower than C. Nothing prevents the
Java language from being compiled and there are just-in-time
compilers appearing at this writing that offer significant
speed-ups. It is not inconceivable that full native compilers will
appear for the more popular platforms, but without those there are
classes of problems that will be insoluble with Java because of
the speed issue.

Java has both kinds of comments like C++ does.

Everything must be in a class. There are no global functions or
global data. If you want the equivalent of globals, make static
methods and static data within a class. There are no
structs or enumerations or unions, only classes.

All method definitions are defined in the body of the class.
Thus, in C++ it would look like all the functions are inlined, but
they’re not (inlines are noted later).

Class definitions are roughly the same form in Java as in C++,
but there’s no closing semicolon. There are no class
declarations of the form class foo, only class definitions.
```
class aType {
    void aMethod( ) { /* method body */ }
}
```

There’s no scope resolution operator :: in Java. Java
uses the dot for everything, but can get away with it since you can
define elements only within a class. Even the method definitions
must always occur within a class, so there is no need for scope
resolution there either. One place where you’ll notice the
difference is in the calling of static methods: you say ClassName.methodName( );.
In addition, package names are established using the dot, and
to perform a kind of C++ #include you use the import
keyword. For example: import java.awt.*;. (#include
does not directly map to import, but it has a similar feel to
it).

Java, like C++, has primitive types for efficient access. In Java,
these are boolean, char, byte, short, int,
long, float, and double. All the primitive
types have specified sizes that are machine independent for
portability. (This must have some impact on performance, varying
with the machine.) Type-checking and type requirements are much
tighter in Java. For example:

1. Conditional expressions can be only boolean, not integral.

2. The result of an expression like X + Y must be used; you can’t
just say "X + Y" for the side effect.

The char type uses the international 16-bit Unicode
character set, so it can automatically represent most national
characters.

Static quoted strings are automatically converted into String
objects. There is no independent static character array string like
there is in C and C++.

Java adds the triple right shift >>> to act as a
"logical" right shift by inserting zeroes at the top end;
the >> inserts the sign bit as it shifts (an
"arithmetic" shift).

Although they look similar, arrays have a very different structure
and behavior in Java than they do in C++. There’s a read-only length
member that tells you how big the array is, and run-time checking
throws an exception if you go out of bounds. All arrays are created
on the heap, and you can assign one array to another (the array
handle is simply copied). The array identifier is a first-class
object, with all of the methods commonly available to all other
objects.

All objects of non-primitive types can be created only via new.
There’s no equivalent to creating non-primitive objects "on
the stack" as in C++. All primitive types can be created only
on the stack, without new. There are wrapper classes for all
primitive classes so that you can create equivalent heap-based
objects via new. (Arrays of primitives are a special case:
they can be allocated via aggregate initialization as in C++, or by
using new.)

No forward declarations are necessary in Java. If you want to use
a class or a method before it is defined, you simply use it – the
compiler ensures that the appropriate definition exists. Thus you
don’t have any of the forward referencing issues that you do in
C++.

Java has no preprocessor. If you want to use classes in another
library, you say import and the name of the library. There
are no preprocessor-like macros.

Java uses packages in place of namespaces. The name issue is taken
care of by putting everything into a class and by using a facility
called "packages" that performs the equivalent namespace
breakup for class names. Packages also collect library components
under a single library name. You simply import a package and
the compiler takes care of the rest.

Object handles defined as class members are automatically
initialized to null. Initialization of primitive class data
members is guaranteed in Java; if you don’t explicitly initialize
them they get a default value (a zero or equivalent). You can
initialize them explicitly, either when you define them in the class
or in the constructor. The syntax makes more sense than that for
C++, and is consistent for static and non-static
members alike. You don’t need to externally define storage for static
members like you do in C++.

There are no Java pointers in the sense of C and C++. When you
create an object with new, you get back a reference (which I’ve
been calling a handle in this book). For example:

String s = new
String("howdy");

However, unlike C++ references that must be initialized when created
and cannot be rebound to a different location, Java references don’t
have to be bound at the point of creation. They can also be rebound at
will, which eliminates part of the need for pointers. The other reason
for pointers in C and C++ is to be able to point at any place in
memory whatsoever (which makes them unsafe, which is why Java doesn’t
support them). Pointers are often seen as an efficient way to move
through an array of primitive variables; Java arrays allow you to do
that in a safer fashion. The ultimate solution for pointer problems is
native methods (discussed in Appendix A). Passing pointers to methods
isn’t a problem since there are no global functions, only classes,
and you can pass references to objects.

The Java language promoters initially said "No pointers!",
but when many programmers questioned how you can work without
pointers, the promoters began saying "Restricted pointers."
You can make up your mind whether it’s "really" a pointer
or not. In any event, there’s no pointer arithmetic.

Java has constructors that are similar to constructors in C++. You
get a default constructor if you don’t define one, and if you
define a non-default constructor, there’s no automatic default
constructor defined for you, just like in C++. There are no copy
constructors, since all arguments are passed by reference.

There are no destructors in Java. There is no "scope" of
a variable per se, to indicate when the object’s lifetime is ended
– the lifetime of an object is determined instead by the garbage
collector. There is a finalize( ) method that’s a
member of each class, something like a C++ destructor, but finalize( )
is called by the garbage collector and is supposed to be responsible
only for releasing "resources" (such as open files,
sockets, ports, URLs, etc). If you need something done at a specific
point, you must create a special method and call it, not rely upon finalize( ).
Put another way, all objects in C++ will be (or rather, should be)
destroyed, but not all objects in Java are garbage collected.
Because Java doesn’t support destructors, you must be careful to
create a cleanup method if it’s necessary and to explicitly call
all the cleanup methods for the base class and member objects in
your class.

Java has method overloading that works virtually identically to
C++ function overloading.

Java does not support default arguments.

There’s no goto in Java. The one unconditional jump
mechanism is the break label or continue label,
which is used to jump out of the middle of multiply-nested loops.

Java uses a singly-rooted hierarchy, so all objects are ultimately
inherited from the root class Object. In C++, you can start a
new inheritance tree anywhere, so you end up with a forest of trees.
In Java you get a single ultimate hierarchy. This can seem
restrictive, but it gives a great deal of power since you know that
every object is guaranteed to have at least the Object
interface. C++ appears to be the only OO language that does not
impose a singly rooted hierarchy.

Java has no templates or other implementation of parameterized
types. There is a set of collections: Vector, Stack,
and Hashtable that hold Object references, and through
which you can satisfy your collection needs, but these collections
are not designed for efficiency like the C++ Standard Template
Library (STL). The new collections in Java 1.2 are more complete,
but still don’t have the same kind of efficiency as template implementations would allow.

Garbage collection means memory leaks are much harder to cause in
Java, but not impossible. (If you make native method calls that
allocate storage, these are typically not tracked by the garbage
collector.) However, many memory leaks and resouce leaks can be
tracked to a badly written finalize( ) or to not
releasing a resource at the end of the block where it is allocated
(a place where a destructor would certainly come in handy). The
garbage collector is a huge improvement over C++, and makes a lot of
programming problems simply vanish. It might make Java unsuitable
for solving a small subset of problems that cannot tolerate a
garbage collector, but the advantage of a garbage collector seems to
greatly outweigh this potential drawback.

Java has built-in multithreading support. There’s a Thread
class that you inherit to create a new thread (you override the run( )
method). Mutual exclusion occurs at the level of objects using the synchronized
keyword as a type qualifier for methods. Only one thread may use a synchronized
method of a particular object at any one time. Put another way, when
a synchronized method is entered, it first "locks"
the object against any other synchronized method using that
object and "unlocks" the object only upon exiting the
method. There are no explicit locks; they happen automatically. You’re
still responsible for implementing more sophisticated
synchronization between threads by creating your own
"monitor" class. Recursive synchronized methods
work correctly. Time slicing is not guaranteed between equal
priority threads.

Instead of controlling blocks of declarations like C++ does, the
access specifiers (public, private, and protected)
are placed on each definition for each member of a class. Without an
explicit access specifier, an element defaults to
"friendly," which means that it is accessible to other
elements in the same package (equivalent to them all being C++ friends)
but inaccessible outside the package. The class, and each method
within the class, has an access specifier to determine whether it’s
visible outside the file. Sometimes the private keyword is
used less in Java because "friendly" access is often more
useful than excluding access from other classes in the same package.
(However, with multithreading the proper use of private is
essential.) The Java protected keyword means "accessible
to inheritors and to others in this package." There is
no equivalent to the C++ protected keyword that means
"accessible to inheritors only" (private
protected used to do this, but the use of that keyword pair was
removed).

Nested classes. In C++, nesting a class is an aid to name hiding
and code organization (but C++ namespaces eliminate the need for
name hiding). Java packaging provides the equivalence of namespaces,
so that isn’t an issue. Java 1.1 has inner classes that
look just like nested classes. However, an object of an inner class
secretly keeps a handle to the object of the outer class that was
involved in the creation of the inner class object. This means that
the inner class object may access members of the outer class object
without qualification, as if those members belonged directly to the
inner class object. This provides a much more elegant solution to
the problem of callbacks, solved with pointers to members in C++.

Because of inner classes described in the previous point, there
are no pointers to members in Java.

No inline methods. The Java compiler might decide on its
own to inline a method, but you don’t have much control over this.
You can suggest inlining in Java by using the final keyword
for a method. However, inline functions are only suggestions
to the C++ compiler as well.

Inheritance in Java has the same effect as in C++, but the syntax
is different. Java uses the extends keyword to indicate
inheritance from a base class and the super keyword to
specify methods to be called in the base class that have the same
name as the method you’re in. (However, the super keyword
in Java allows you to access methods only in the parent class, one
level up in the hierarchy.) Base-class scoping in C++ allows you to
access methods that are deeper in the hierarchy). The base-class
constructor is also called using the super keyword. As
mentioned before, all classes are ultimately automatically inherited
from Object. There’s no explicit constructor
initializer list like in C++, but the compiler forces you to perform
all base-class initialization at the beginning of the constructor
body and it won’t let you perform these later in the body. Member
initialization is guaranteed through a combination of automatic
initialization and exceptions for uninitialized object handles.
```
public class Foo extends Bar
{
   public Foo(String msg) {
      super(msg); // Calls base constructor
   }
   
   public baz(int i) { // Override
      super.baz(i); // Calls base method
   }
}
```

Inheritance in Java doesn’t change the protection level of the
members in the base class. You cannot specify public, private,
or protected inheritance in Java, as you can in C++. Also,
overridden methods in a derived class cannot reduce the access of
the method in the base class. For example, if a method is public
in the base class and you override it, your overridden method must
also be public (the compiler checks for this).

Java provides the interface keyword, which creates the
equivalent of an abstract base class filled with abstract methods
and with no data members. This makes a clear distinction between
something designed to be just an interface and an extension of
existing functionality via the extends keyword. It’s worth
noting that the abstract keyword produces a similar effect in
that you can’t create an object of that class. An abstract
class may contain abstract methods (although it isn’t
required to contain any), but it is also able to contain
implementations, so it is restricted to single inheritance. Together
with interfaces, this scheme prevents the need for some mechanism
like virtual base classes in C++.

To create a version of the interface that can be
instantiated, use the implements keyword, whose syntax looks
like inheritance:
```
public interface Face {
   public void smile();
}

public class Baz extends Bar implements Face {
   public void smile( ) {
      System.out.println("a warm smile");
   }
}
```

There’s no virtual keyword in Java because all non-static
methods always use dynamic binding. In Java, the programmer doesn’t
have to decide whether to use dynamic binding. The reason virtual
exists in C++ is so you can leave it off for a slight increase in
efficiency when you’re tuning for performance (or, put another
way, "If you don’t use it, you don’t pay for it"),
which often results in confusion and unpleasant surprises. The final
keyword provides some latitude for efficiency tuning – it tells
the compiler that this method cannot be overridden, and thus that it
may be statically bound (and made inline, thus using the equivalent
of a C++ non-virtual call). These optimizations are up to the
compiler.

Java doesn’t provide multiple inheritance (MI), at least not in
the same sense that C++ does. Like protected, MI seems like a
good idea but you know you need it only when you are face to face
with a certain design problem. Since Java uses a singly-rooted
hierarchy, you’ll probably run into fewer situations in which MI
is necessary. The interface keyword takes care of combining
multiple interfaces.

Run-time type identification functionality is quite similar to
that in C++. To get information about handle X, you can say,
for example:

X.getClass().getName();

To perform a type-safe downcast you say:

derived d = (derived)base;

just like an old-style C
cast. The compiler automatically invokes the dynamic casting mechanism
without requiring extra syntax. Although this doesn’t have the
benefit of easy location of casts as in C++ "new casts,"
Java checks usage and throws exceptions so it won’t allow bad casts
like C++ does.

Exception handling in Java is different because there are no
destructors. A finally clause can be added to force execution
of statements that perform necessary cleanup. All exceptions in Java
are inherited from the base class Throwable, so you’re
guaranteed a common interface.

            
public void f(Obj b) throws IOException {
   myresource mr = b.createResource();
   try {
      mr.UseResource();

   } catch (MyException e) {
      // handle my exception
   } catch (Throwable e) {
      // handle all other exceptions
   } finally {
      mr.dispose(); // special cleanup
   }
}

Exception specifications in Java are vastly superior to those in
C++. Instead of the C++ approach of calling a function at run-time
when the wrong exception is thrown, Java exception specifications
are checked and enforced at compile-time. In addition, overridden
methods must conform to the exception specification of the
base-class version of that method: they can throw the specified
exceptions or exceptions derived from those. This provides much more
robust exception-handling code.

Java has method overloading, but no operator overloading. The String
class does use the + and += operators to concatenate
strings and String expressions use automatic type conversion,
but that’s a special built-in case.

The const issues in C++ are avoided in
Java by convention. You pass only handles to objects and local
copies are never made for you automatically. If you want the
equivalent of C++’s pass-by-value, you
call clone( ) to produce a local copy of the argument (although
the clone( ) mechanism is somewhat poorly designed –
see Chapter 12). There’s no copy-constructor that’s
automatically called.

To create a compile-time constant value, you say, for example:

static final int SIZE = 255; static final int BSIZE = 8 * SIZE;

Because of security issues, programming an "application"
is quite different from programming an "applet." A
significant issue is that an applet won’t let you write to disk,
because that would allow a program downloaded from an unknown
machine to trash your disk. This changes somewhat with Java 1.1
digital signing, which allows you to unequivocally know
everyone that wrote all the programs that have special access to
your system (one of which might have trashed your disk; you still
have to figure out which one and what to do about it.). Java 1.2
also promises more power for applets

Since Java can be too restrictive in some cases, you could be
prevented from doing important tasks such as directly accessing
hardware. Java solves this with native methods that allow you
to call a function written in another language (currently only C and
C++ are supported). Thus, you can always solve a platform-specific
problem (in a relatively non-portable fashion, but then that code is
isolated). Applets cannot call native methods, only applications.

Java has built-in support for comment documentation, so the source
code file can also contain its own documentation, which is stripped
out and reformatted into HTML via a separate program. This is a boon
for documentation maintenance and use.

Java contains standard libraries for solving specific tasks. C++
relies on non-standard third-party libraries. These tasks include
(or will soon include):
- Networking
- Database Connection (via JDBC)
- Multithreading
- Distributed Objects (via RMI and CORBA)
- Compression
- Commerce
The availability and standard nature of these libraries allow for
more rapid application development.

Java 1.1 includes the Java Beans standard, which is a way to
create components that can be used in visual programming
environments. This promotes visual components that can be used under
all vendor’s development environments. Since you aren’t tied to
a particular vendor’s design for visual components, this should
result in greater selection and availability of components. In
addition, the design for Java Beans is simpler for programmers to
understand; vendor-specific component frameworks tend to involve a
steeper learning curve.

If the access to a Java handle fails, an exception is thrown. This
test doesn’t have to occur right before the use of a handle; the
Java specification just says that the exception must somehow be
thrown. Many C++ runtime systems can also throw exceptions for bad
pointers.

Generally, Java is more robust, via:
- Object handles initialized to null (a keyword)
- Handles are always checked and exceptions are thrown for
  failures
- All array accesses are checked for bounds violations
- Automatic garbage collection prevents memory leaks
- Clean, relatively fool-proof exception handling
- Simple language support for multithreading
- Bytecode verification of network applets

Singleton Object

For those who haven't heard of design patterns before, or who are familiar with the term but not its meaning, a design pattern is a template for software development. The purpose of the template is to define a particular behavior or technique that can be used as a building block for the construction of software - to solve universal problems that commonly face developers. Think of design code as a way of passing on some nifty piece of advice, just like your mother used to give. "Never wear your socks for more than one day" might be an old family adage, passed down from generation to generation. It's common sense solutions that are passed on to others. Consider a design pattern as a useful piece of advice for designing software.

Design patterns out of the way, let's look at the singleton. By now, you're probably wondering what a singleton is - isn't jargon terrible? A singleton is an object that cannot be instantiated. At first, that might seem counterintuitive - after all, we need an instance of an object before we can use it. Well yes a singleton can be created, but it can't be instantiated by developers - meaning that the singleton class has control over how it is created. The restriction on the singleton is that there can be only one instance of a singleton created by the Java Virtual Machine (JVM) - by prevent direct instantiation we can ensure that developers don't create a second copy.

So why would this be useful? Often in designing a system, we want to control how an object is used, and prevent others (ourselves included) from making copies of it or creating new instances. For example, a central configuration object that stores setup information should have one and one only instance - a global copy accessible from any part of the application, including any threads that are running. Creating a new configuration object and using it would be fairly useless, as other parts of the application might be looking at the old configuration object, and changes to application settings wouldn't always be acted upon. I'm sure you can think of a other situations where a singleton would be useful - perhaps you've even used one before without giving it a name. It's a common enough design criteria (not used everyday, but you'll come across it from time to time). The singleton pattern can be applied in any language, but since we're all Java programmers here (if you're not, shame!) let's look at how to implement the pattern using Java.

Preventing direct instantiation

We all know how objects are instantiated right? Maybe not everyone? Let's go through a quick refresher.

Objects are instantiated by using the new keyword. The new keyword allows you to create a new instance of an object, and to specify parameters to the class's constructor. You can specify no parameters, in which case the blank constructor (also known as the default constructor) is invoked. Constructors can have access modifiers, like public and private, which allow you to control which classes have access to a constructor. So to prevent direct instantiation, we create a private default constructor, so that other classes can't create a new instance.

We'll start with the class definition, for a SingletonObject class. Next, we provide a default constructor that is marked as private. No actual code needs to be written, but you're free to add some initialization code if you'd like.

public class SingletonObject
{
private SingletonObject()
{
// no code req'd
}
}

So far so good. But unless we add some further code, there'll be absolutely no way to use the class. We want to prevent direct instantiation, but we still need to allow a way to get a reference to an instance of the singleton object.

Getting an instance of the singleton

We need to provide an accessor method, that returns an instance of the SingletonObject class but doesn't allow more than one copy to be accessed. We can manually instantiate an object, but we need to keep a reference to the singleton so that subsequent calls to the accessor method can return the singleton (rather than creating a new one). To do this, provide a public static method called getSingletonObject(), and store a copy of the singleton in a private member variable.

public class SingletonObject
{
private SingletonObject()
{
// no code req'd
}

public static SingletonObject getSingletonObject()
{
if (ref == null)
// it's ok, we can call this constructor
ref = new SingletonObject();
return ref;
}

private static SingletonObject ref;
}

So far, so good. When first called, the getSingletonObject() method creates a singleton instance, assigns it to a member variable, and returns the singleton. Subsequent calls will return the same singleton, and all is well with the world. You could extend the functionality of the singleton object by adding new methods, to perform the types of tasks your singleton needs. So the singleton is done, right? Well almost.....

Preventing thread problems with your singleton

We need to make sure that threads calling the getSingletonObject() method don't cause problems, so it's advisable to mark the method as synchronized. This prevents two threads from calling the getSingletonObject() method at the same time. If one thread entered the method just after the other, you could end up calling the SingletonObject constructor twice and returning different values. To change the method, just add the synchronized keyword as follows to the method declaration :-

public static synchronized
SingletonObject getSingletonObject()

Are we finished yet?

There, finished. A singleton object that guarantees one instance of the class, and never more than one. Right? Well.... not quite. Where there's a will, there's a way - it is still possible to evade all our defensive programming and create more than one instance of the singleton class defined above. Here's where most articles on singletons fall down, because they forget about cloning. Examine the following code snippet, which clones a singleton object.

public class Clone
{
public static void main(String args[])
throws Exception
{
// Get a singleton
SingletonObject obj =
SingletonObject.getSingletonObject();

// Buahahaha. Let's clone the object
SingletonObject clone =
(SingletonObject) obj.clone();
}
}

Okay, we're cheating a little here. There isn't a clone() method defined in SingletonObject, but there is in the java.lang.Object class which it is inherited from. By default, the clone() method is marked as protected, but if your SingletonObject extends another class that does support cloning, it is possible to violate the design principles of the singleton. So, to be absolutely positively 100% certain that a singleton really is a singleton, we must add a clone() method of our own, and throw a CloneNotSupportedException if anyone dares try!

Here's the final source code for a SingletonObject, which you can use as a template for your own singletons.

public class SingletonObject
{
private SingletonObject()
{
// no code req'd
}

public static SingletonObject getSingletonObject()
{
if (ref == null)
// it's ok, we can call this constructor
ref = new SingletonObject();
return ref;
}

public Object clone()
throws CloneNotSupportedException
{
throw new CloneNotSupportedException();
// that'll teach 'em
}

private static SingletonObject ref;
}

Summary

A singleton is an class that can be instantiated once, and only once. This is a fairly unique property, but useful in a wide range of object designs. Creating an implementation of the singleton pattern is fairly straightforward - simple block off access to all constructors, provide a static method for getting an instance of the singleton, and prevent cloning.

The World of J2EE Technology

Friday, April 17, 2009

FAST ESP Relevancy Ranking

Object Persistence

Serializing
objects

Restoring objects from a serialized state

Further issues with serialization

Summary

Differnce between C++ and Java

Singleton Object

Search

Blog Archive

Lables

Video

Oracle JDeveloper News

Subscribe

The World of J2EE Technology

Friday, April 17, 2009

FAST ESP Relevancy Ranking

Object Persistence

Serializing objects

Restoring objects from a serialized state

Further issues with serialization

Summary

Differnce between C++ and Java

Singleton Object

Search

Blog Archive

Lables

Video

Oracle JDeveloper News

Subscribe

Serializing
objects