Friday, April 17, 2009

FAST ESP Relevancy Ranking

Relevancy is the measure of how well a set of documents (results) answers or addresses the intent of a given query.


When there are many query matches, the search engines must rank the results by relevance score, sorting the results listing so that the pages most likely to be useful will appear first. Varying algorithms are used to define relevancy. Relevancy definition and tuning is one of core differentiators of FAST ESP platform. This blog post is about the relevance framework and related concepts and features in FAST ESP. 


FAST ESP Search Relevance Framework


FAST ESP applies search relevancy through the following key steps:



  • Data mining – A document processing framework can be used to perform real-time content refinement. This includes embedded relevancy tools and integration points for 3rd party modules. An Entity Extraction framework enables extraction of named entities and key concepts from documents that may be used for result navigation

  • Linguistic normalization – Handles grammatical variations and automatic spell corrections

  • Query Processing – A query processing framework applies built-in or custom query transformations based on application specific rules

  • Ranking based on the FAST InPerspective model provides a multi-faceted measurement of the quality of the match between the query and a candidate result document

  • Query Context Analysis indicates the ability to present the information from the query results in context of the query. FAST ESP supports dynamic document summaries that display the segments of the matching document that provide the most relevant match with the query

  • Data Driven Navigation provides dynamic drill-down into the query result or related areas.

The relevancy of a document with respect to a query is represented by a ranking value. Following section lists the different elements used to calculate the rank value.


Elements of Rank Value



































Element


Description


Freshness


Age of a document compared to the time when the query is issued


Authority


Importance of a document determined by the links to it from other documents


Quality


Assigned importance of a document, independent of the query


Geo


Importance of geographical distance between a document’s associated latitude/longitude and a target location specified in a query


Context


Importance of matching a query in a given document field


Proximity


For multi-term queries: the shorter the distance between query terms in a document, the higher the document’s rank value


Position


The earlier a query term occurs in a field, the higher the document’s rank value


Frequency


The more frequent a query term occurs in a document, the higher the document’s rank value


Completeness


The greater the number of query terms present in the same field of a matching document, the higher the document’s rank value


Number


For multi-term queries; the more query terms matched in a document, the higher the document’s rank value


Relevant Sorting of Query Results


FAST ESP provides three main methods for sorting the results of a query:



  1. Sorting by rank (relevancy score) - FAST ESP computes a rank value based on a set of parameters as described below. These parameters can be tuned in order to provide the best possible perceived relevancy for the end-user. It is possible to define multiple rank profiles that can be selected on a per query basis

  2. Sorting by field values - You may also sort query results by value of any searchable field, such as product name, product code, price or date. FAST ESP supports numeric and full-text sorting, single and multi-level sorting, ascending and descending sorting direction and national sorting rules

  3. Sorting by geographic location - The Geo Search feature provides capabilities for sorting and filtering query results based on geographic location

Rank Profile


A Rank Profile concept enables full control of the relative weight of each rank component for a given query. For example, how important an article’s title is relative to the main text or how important is proximity versus freshness. This enables individual relevance tuning of different query applications using a FAST ESP installation.


In FAST ESP, the Rank Profile is a configuration element within the Index Profile and defines relative weight for the different components of the dynamic rank. Multiple Rank Profiles can be specified in the Index Profile.


Tuning the Ranking and Sorting of Query Results


The ranking and sorting of query results can be tuned in three main ways:



  • Multiple Rank Profiles can be specified in the Index Profile. A Rank Profile defines relative weight for the different components of the dynamic rank

  • Sorting attributes can be specified for individual fields of the documents

  • Result sorting can be controlled on a per query basis. By default the result is sorted by rank as defined in the default Rank Profile. Query parameters enable you to specify an alternative rank profile for the query, or a set of fields that the result shall be sorted by

Relevance support in the Query Language


FAST ESP includes a highly expressive query language that also includes advanced proximity operators:



  • Different relevance weight may be applied to different terms or phrases in a query

  • Explicit proximity (ordered/unordered NEAR) operators enables precise match in semi-structured content without a need for phrase match

  • Boundary match operators enables exact match with extracted entities or entire document elements such as a product name

  • Wildcard query support

Dynamic Client Side Ranking


Dynamic client side ranking can be done by using the XRANK operator which is a part of the FAST Query Language (FQL). The boost value is specified with the parameter boost=n, where n is some signed integer value. Negative boost is supported, but if the result of boosting with a negative value is negative then the result will be set to 0.


Its a concept unique to FAST and I will cover it in detail in another post.


Rank Modification Tools


FAST ESP provides tools to modify rank for individual documents. These tools enable you to perform Absolute Query Boost, Relative Query Boost or Relative Document Boost for given documents in the FAST ESP index. An example could be a product database where it may be desired to boost products with highest profit margins, boost products related to campaigns, etc.


Two main tools exist for this purpose:


1) Search Business Center (SBC) - This is an optional, GUI based tool which enables query-oriented rank tuning. The SBC also includes a powerful query reporting module that may be used to assist in the rank tuning. Using the SBC you can change the ranking for each query using three different methods:



  • Top Ten - to position the document in one of ten reserved places that will be returned at the top of the results list

  • Add boost points - to add a value to a document to increase its relevancy relative to the other documents returned in the search results.You can also add negative boost points to a document.

  • Block from query - to prevent the blocked document from appearing in the search results for the query.

2) Rank Tuning Bulk Loader - This is a standard FAST ESP tool that enables you to perform the same rank tuning as the SBC, using an XML file as input. The XML file contains a specification of the rank modifications to be performed
How SharePoint does Relevancy?

Object Persistence

One of the most critical tasks that applications have to perform is
to save and restore data. Whether it be a word processing application
that saves documents to disk, a utility that remembers its configuration
for next time, or a game that sets aside world domination for the night,
the ability to store data and later retrieve it is a vital one. Without
it, software would be little more effective that the typewriter - users
would have to re-type the data to make further modifications once the
application exits.





Writing the code for saving data, however, can become boring repetitive
work. First, the programmer must create a specification document for the
proposed file structure. Next, the programmer must implement save and
restore functions that convert object data to & from
primitive data types, and test it with sample data. 


If the application later requires new data to be stored, the file
specification must be modified, as well as the save and restore methods. Take it from someone who's been there -
creating save & restore functions is not a fun task.

The solution to this is object serialization.
Object serialization takes an object's state, and converts it to a
stream of data for you. With object serialization, it's an easy task to
take any object, and make it persistent, without writing custom code to
save object member variables. The object can be restored at a later
time, and even a later location. With persistence, we can move an object
from one computer to another, and have it maintain its state. This very
cool feature, in Java, also happens to be very easy to use.

Serializing
objects


Java makes it easy to serialize objects. Any object
whose class implements the java.io.Serializable interface can be made
persistent with only a few lines of code. No extra methods need to be
added to implement the interface, however - the purpose of the interface
is to identify at run-time which classes can be safely serialized, and
which cannot. You, as a programmer, need only add the implements keyword
to your class declaration, to identify your classes as serializable.

public class UserData implements
java.io.Serializable


Now, once a class is serializable, we can write the object to any
OutputStream, such as to disk or a socket connection. To achieve this,
we must first create an instance of java.io.ObjectOutputStream, and pass
the constructor an existing OutputStream instance.

// Write to disk with FileOutputStream
FileOutputStream f_out = new
FileOutputStream("myobject.data");

// Write object with ObjectOutputStream
ObjectOutputStream obj_out = new
ObjectOutputStream (f_out);

// Write object out to disk
obj_out.writeObject ( myObject );


Note that any Java object that implements the serializable interface
can be written to an output stream this way - including those that are
part of the Java API. Furthermore, any objects that are referenced by a serialized
object will also be stored. This means that arrays, vectors, lists, and
collections of objects can be saved in the same fashion - without the
need to manually save each one. This can lead to significant time and
code savings.

Restoring objects from a serialized state


Reading objects back is almost as easy. The one catch is that at
runtime, you can never be completely sure what type of data to expect. A
data stream containing serialized objects may contain a mixture of
different object classes, so you need to explicitly cast an object to a
particular class. If you've never cast an object before, the procedure
is relatively straightforward. First check the object's class, using the
instanceof operator. Then cast to the correct class.

// Read from disk using FileInputStream
FileInputStream f_in = new
FileInputStream("myobject.data");

// Read object using ObjectInputStream
ObjectInputStream obj_in =
new ObjectInputStream (f_in);

// Read an object
Object obj = obj_in.readObject();

if (obj instanceof Vector)
{
// Cast object to a Vector
Vector vec = (Vector) obj;

// Do something with vector....
}

Further issues with serialization


As you can see, it's relatively easy to serialize an object. Whenever
new fields are added to an object, they will be saved automatically,
without requiring modification to your save and restore code. However,
there are some cases where this behavior is not desirable. For example,
a password member variable might not be safe to transmit to third
parties over a network connection, and might need to be left blank. In
this case, the transient keyword can be used. The
transient field indicates that a particular member variable should not
be saved. Though not used often, it's an important keyword to remember.


public class UserSession implements 
java.io.Serializable
{
String username;
transient String password;
}

Summary


Java's support for object serialization makes the implementation of persistent
objects extremely easy. In contrast, the amount of code required to save
and restore every field of an object is complex and repetitive work. While it is certainly possible to write your own
serialization mechanism, the simplicity of that provided by Java would
be hard to beat.

Serialization benefits programmers by


  • Reducing time taken to write code for save and restoration of
    object or application state

  • Eliminating complexity of save and restore operations, and
    avoiding the need for creating a new file format

  • Making it easier for objects to travel over a network connection.


With relatively little effort, you can apply serialization to a
variety of tasks. Not only do applications benefit from serialization,
but also applets. Rather than specifying a long list of parameters, or
performing time consuming initialization and parsing, an applet can
simple reload a configuration object whose member variables contain all
the information needed to execute. It's not just useful for Java applications - even
applets can make benefit, by loading their configuration details or
parameters. With a little imagination, serialization may just have a
place in your next project.

Differnce between C++ and Java

As a C++ programmer, you already have the basic idea of object-oriented programming, and the syntax of Java no doubt looks familiar to you. This makes sense since Java was derived from C++.
However, there are a surprising number of differences between C++ and Java.


These differences are intended to be significant improvements, and if you understand the differences you'll see why Java is such a beneficial programming language. This
article takes you through the important features that distinguish Java from C++.




  1. The biggest potential stumbling block is speed: interpreted Java
    runs in the range of 20 times slower than C. Nothing prevents the
    Java language from being compiled and there are just-in-time
    compilers appearing at this writing that offer significant
    speed-ups. It is not inconceivable that full native compilers will
    appear for the more popular platforms, but without those there are
    classes of problems that will be insoluble with Java because of
    the speed issue.


  2. Java has both kinds of comments like C++ does.


  3. Everything must be in a class. There are no global functions or
    global data. If you want the equivalent of globals, make static
    methods and static data within a class. There are no
    structs or enumerations or unions, only classes.


  4. All method definitions are defined in the body of the class.
    Thus, in C++ it would look like all the functions are inlined, but
    they’re not (inlines are noted later).


  5. Class definitions are roughly the same form in Java as in C++,
    but there’s no closing semicolon. There are no class
    declarations of the form class foo, only class definitions.
    class aType {
    void aMethod( ) { /* method body */ }
    }



  6. There’s no scope resolution operator :: in Java. Java
    uses the dot for everything, but can get away with it since you can
    define elements only within a class. Even the method definitions
    must always occur within a class, so there is no need for scope
    resolution there either. One place where you’ll notice the
    difference is in the calling of static methods: you say ClassName.methodName( );.
    In addition, package names are established using the dot, and
    to perform a kind of C++ #include you use the import
    keyword. For example: import java.awt.*;. (#include
    does not directly map to import, but it has a similar feel to
    it).


  7. Java, like C++, has primitive types for efficient access. In Java,
    these are boolean, char, byte, short, int,
    long, float, and double. All the primitive
    types have specified sizes that are machine independent for
    portability. (This must have some impact on performance, varying
    with the machine.) Type-checking and type requirements are much
    tighter in Java. For example:



    1. Conditional expressions can be only boolean, not integral.



    2. The result of an expression like X + Y must be used; you can’t
    just say "X + Y" for the side effect.


  8. The char type uses the international 16-bit Unicode
    character set, so it can automatically represent most national
    characters.


  9. Static quoted strings are automatically converted into String
    objects. There is no independent static character array string like
    there is in C and C++.


  10. Java adds the triple right shift >>> to act as a
    "logical" right shift by inserting zeroes at the top end;
    the >> inserts the sign bit as it shifts (an
    "arithmetic" shift).


  11. Although they look similar, arrays have a very different structure
    and behavior in Java than they do in C++. There’s a read-only length
    member that tells you how big the array is, and run-time checking
    throws an exception if you go out of bounds. All arrays are created
    on the heap, and you can assign one array to another (the array
    handle is simply copied). The array identifier is a first-class
    object, with all of the methods commonly available to all other
    objects.


  12. All objects of non-primitive types can be created only via new.
    There’s no equivalent to creating non-primitive objects "on
    the stack" as in C++. All primitive types can be created only
    on the stack, without new. There are wrapper classes for all
    primitive classes so that you can create equivalent heap-based
    objects via new. (Arrays of primitives are a special case:
    they can be allocated via aggregate initialization as in C++, or by
    using new.)



  13. No forward declarations are necessary in Java. If you want to use
    a class or a method before it is defined, you simply use it – the
    compiler ensures that the appropriate definition exists. Thus you
    don’t have any of the forward referencing issues that you do in
    C++.


  14. Java has no preprocessor. If you want to use classes in another
    library, you say import and the name of the library. There
    are no preprocessor-like macros.


  15. Java uses packages in place of namespaces. The name issue is taken
    care of by putting everything into a class and by using a facility
    called "packages" that performs the equivalent namespace
    breakup for class names. Packages also collect library components
    under a single library name. You simply import a package and
    the compiler takes care of the rest.


  16. Object handles defined as class members are automatically
    initialized to null. Initialization of primitive class data
    members is guaranteed in Java; if you don’t explicitly initialize
    them they get a default value (a zero or equivalent). You can
    initialize them explicitly, either when you define them in the class
    or in the constructor. The syntax makes more sense than that for
    C++, and is consistent for static and non-static
    members alike. You don’t need to externally define storage for static
    members like you do in C++.


  17. There are no Java pointers in the sense of C and C++. When you
    create an object with new, you get back a reference (which I’ve
    been calling a handle in this book). For example:

    String s = new
    String("howdy");


    However, unlike C++ references that must be initialized when created
    and cannot be rebound to a different location, Java references don’t
    have to be bound at the point of creation. They can also be rebound at
    will, which eliminates part of the need for pointers. The other reason
    for pointers in C and C++ is to be able to point at any place in
    memory whatsoever (which makes them unsafe, which is why Java doesn’t
    support them). Pointers are often seen as an efficient way to move
    through an array of primitive variables; Java arrays allow you to do
    that in a safer fashion. The ultimate solution for pointer problems is
    native methods (discussed in Appendix A). Passing pointers to methods
    isn’t a problem since there are no global functions, only classes,
    and you can pass references to objects.

    The Java language promoters initially said "No pointers!",
    but when many programmers questioned how you can work without
    pointers, the promoters began saying "Restricted pointers."
    You can make up your mind whether it’s "really" a pointer
    or not. In any event, there’s no pointer arithmetic.


  18. Java has constructors that are similar to constructors in C++. You
    get a default constructor if you don’t define one, and if you
    define a non-default constructor, there’s no automatic default
    constructor defined for you, just like in C++. There are no copy
    constructors, since all arguments are passed by reference.


  19. There are no destructors in Java. There is no "scope" of
    a variable per se, to indicate when the object’s lifetime is ended
    – the lifetime of an object is determined instead by the garbage
    collector. There is a finalize( ) method that’s a
    member of each class, something like a C++ destructor, but finalize( )
    is called by the garbage collector and is supposed to be responsible
    only for releasing "resources" (such as open files,
    sockets, ports, URLs, etc). If you need something done at a specific
    point, you must create a special method and call it, not rely upon finalize( ).
    Put another way, all objects in C++ will be (or rather, should be)
    destroyed, but not all objects in Java are garbage collected.
    Because Java doesn’t support destructors, you must be careful to
    create a cleanup method if it’s necessary and to explicitly call
    all the cleanup methods for the base class and member objects in
    your class.


  20. Java has method overloading that works virtually identically to
    C++ function overloading.


  21. Java does not support default arguments.


  22. There’s no goto in Java. The one unconditional jump
    mechanism is the break label or continue label,
    which is used to jump out of the middle of multiply-nested loops.


  23. Java uses a singly-rooted hierarchy, so all objects are ultimately
    inherited from the root class Object. In C++, you can start a
    new inheritance tree anywhere, so you end up with a forest of trees.
    In Java you get a single ultimate hierarchy. This can seem
    restrictive, but it gives a great deal of power since you know that
    every object is guaranteed to have at least the Object
    interface. C++ appears to be the only OO language that does not
    impose a singly rooted hierarchy.


  24. Java has no templates or other implementation of parameterized
    types. There is a set of collections: Vector, Stack,
    and Hashtable that hold Object references, and through
    which you can satisfy your collection needs, but these collections
    are not designed for efficiency like the C++ Standard Template
    Library (STL). The new collections in Java 1.2 are more complete,
    but still don’t have the same kind of efficiency as template implementations would allow.


  25. Garbage collection means memory leaks are much harder to cause in
    Java, but not impossible. (If you make native method calls that
    allocate storage, these are typically not tracked by the garbage
    collector.) However, many memory leaks and resouce leaks can be
    tracked to a badly written finalize( ) or to not
    releasing a resource at the end of the block where it is allocated
    (a place where a destructor would certainly come in handy). The
    garbage collector is a huge improvement over C++, and makes a lot of
    programming problems simply vanish. It might make Java unsuitable
    for solving a small subset of problems that cannot tolerate a
    garbage collector, but the advantage of a garbage collector seems to
    greatly outweigh this potential drawback.


  26. Java has built-in multithreading support. There’s a Thread
    class that you inherit to create a new thread (you override the run( )
    method). Mutual exclusion occurs at the level of objects using the synchronized
    keyword as a type qualifier for methods. Only one thread may use a synchronized
    method of a particular object at any one time. Put another way, when
    a synchronized method is entered, it first "locks"
    the object against any other synchronized method using that
    object and "unlocks" the object only upon exiting the
    method. There are no explicit locks; they happen automatically. You’re
    still responsible for implementing more sophisticated
    synchronization between threads by creating your own
    "monitor" class. Recursive synchronized methods
    work correctly. Time slicing is not guaranteed between equal
    priority threads.


  27. Instead of controlling blocks of declarations like C++ does, the
    access specifiers (public, private, and protected)
    are placed on each definition for each member of a class. Without an
    explicit access specifier, an element defaults to
    "friendly," which means that it is accessible to other
    elements in the same package (equivalent to them all being C++ friends)
    but inaccessible outside the package. The class, and each method
    within the class, has an access specifier to determine whether it’s
    visible outside the file. Sometimes the private keyword is
    used less in Java because "friendly" access is often more
    useful than excluding access from other classes in the same package.
    (However, with multithreading the proper use of private is
    essential.) The Java protected keyword means "accessible
    to inheritors and to others in this package." There is
    no equivalent to the C++ protected keyword that means
    "accessible to inheritors only" (private
    protected
    used to do this, but the use of that keyword pair was
    removed).


  28. Nested classes. In C++, nesting a class is an aid to name hiding
    and code organization (but C++ namespaces eliminate the need for
    name hiding). Java packaging provides the equivalence of namespaces,
    so that isn’t an issue. Java 1.1 has inner classes that
    look just like nested classes. However, an object of an inner class
    secretly keeps a handle to the object of the outer class that was
    involved in the creation of the inner class object. This means that
    the inner class object may access members of the outer class object
    without qualification, as if those members belonged directly to the
    inner class object. This provides a much more elegant solution to
    the problem of callbacks, solved with pointers to members in C++.


  29. Because of inner classes described in the previous point, there
    are no pointers to members in Java.


  30. No inline methods. The Java compiler might decide on its
    own to inline a method, but you don’t have much control over this.
    You can suggest inlining in Java by using the final keyword
    for a method. However, inline functions are only suggestions
    to the C++ compiler as well.


  31. Inheritance in Java has the same effect as in C++, but the syntax
    is different. Java uses the extends keyword to indicate
    inheritance from a base class and the super keyword to
    specify methods to be called in the base class that have the same
    name as the method you’re in. (However, the super keyword
    in Java allows you to access methods only in the parent class, one
    level up in the hierarchy.) Base-class scoping in C++ allows you to
    access methods that are deeper in the hierarchy). The base-class
    constructor is also called using the super keyword. As
    mentioned before, all classes are ultimately automatically inherited
    from Object. There’s no explicit constructor
    initializer list like in C++, but the compiler forces you to perform
    all base-class initialization at the beginning of the constructor
    body and it won’t let you perform these later in the body. Member
    initialization is guaranteed through a combination of automatic
    initialization and exceptions for uninitialized object handles.

    public class Foo extends Bar
    {
    public Foo(String msg) {
    super(msg); // Calls base constructor
    }

    public baz(int i) { // Override
    super.baz(i); // Calls base method
    }
    }





  32. Inheritance in Java doesn’t change the protection level of the
    members in the base class. You cannot specify public, private,
    or protected inheritance in Java, as you can in C++. Also,
    overridden methods in a derived class cannot reduce the access of
    the method in the base class. For example, if a method is public
    in the base class and you override it, your overridden method must
    also be public (the compiler checks for this).


  33. Java provides the interface keyword, which creates the
    equivalent of an abstract base class filled with abstract methods
    and with no data members. This makes a clear distinction between
    something designed to be just an interface and an extension of
    existing functionality via the extends keyword. It’s worth
    noting that the abstract keyword produces a similar effect in
    that you can’t create an object of that class. An abstract
    class may contain abstract methods (although it isn’t
    required to contain any), but it is also able to contain
    implementations, so it is restricted to single inheritance. Together
    with interfaces, this scheme prevents the need for some mechanism
    like virtual base classes in C++.



    To create a version of the interface that can be
    instantiated, use the implements keyword, whose syntax looks
    like inheritance:

    public interface Face {
    public void smile();
    }

    public class Baz extends Bar implements Face {
    public void smile( ) {
    System.out.println("a warm smile");
    }
    }


  34. There’s no virtual keyword in Java because all non-static
    methods always use dynamic binding. In Java, the programmer doesn’t
    have to decide whether to use dynamic binding. The reason virtual
    exists in C++ is so you can leave it off for a slight increase in
    efficiency when you’re tuning for performance (or, put another
    way, "If you don’t use it, you don’t pay for it"),
    which often results in confusion and unpleasant surprises. The final
    keyword provides some latitude for efficiency tuning – it tells
    the compiler that this method cannot be overridden, and thus that it
    may be statically bound (and made inline, thus using the equivalent
    of a C++ non-virtual call). These optimizations are up to the
    compiler.


  35. Java doesn’t provide multiple inheritance (MI), at least not in
    the same sense that C++ does. Like protected, MI seems like a
    good idea but you know you need it only when you are face to face
    with a certain design problem. Since Java uses a singly-rooted
    hierarchy, you’ll probably run into fewer situations in which MI
    is necessary. The interface keyword takes care of combining
    multiple interfaces.


  36. Run-time type identification functionality is quite similar to
    that in C++. To get information about handle X, you can say,
    for example:

    X.getClass().getName();





    To perform a type-safe downcast you say:

    derived d = (derived)base;



    just like an old-style C
    cast. The compiler automatically invokes the dynamic casting mechanism
    without requiring extra syntax. Although this doesn’t have the
    benefit of easy location of casts as in C++ "new casts,"
    Java checks usage and throws exceptions so it won’t allow bad casts
    like C++ does.


  37. Exception handling in Java is different because there are no
    destructors. A finally clause can be added to force execution
    of statements that perform necessary cleanup. All exceptions in Java
    are inherited from the base class Throwable, so you’re
    guaranteed a common interface.

                
    public void f(Obj b) throws IOException {
    myresource mr = b.createResource();
    try {
    mr.UseResource();

    } catch (MyException e) {
    // handle my exception
    } catch (Throwable e) {
    // handle all other exceptions
    } finally {
    mr.dispose(); // special cleanup
    }
    }


  38. Exception specifications in Java are vastly superior to those in
    C++. Instead of the C++ approach of calling a function at run-time
    when the wrong exception is thrown, Java exception specifications
    are checked and enforced at compile-time. In addition, overridden
    methods must conform to the exception specification of the
    base-class version of that method: they can throw the specified
    exceptions or exceptions derived from those. This provides much more
    robust exception-handling code.


  39. Java has method overloading, but no operator overloading. The String
    class does use the + and += operators to concatenate
    strings and String expressions use automatic type conversion,
    but that’s a special built-in case.


  40. The const issues in C++ are avoided in
    Java by convention. You pass only handles to objects and local
    copies are never made for you automatically. If you want the
    equivalent of C++’s pass-by-value, you
    call clone( ) to produce a local copy of the argument
    (although
    the clone( ) mechanism is somewhat poorly designed –
    see Chapter 12). There’s no copy-constructor that’s
    automatically called.



    To create a compile-time constant value, you say, for example:



    static final int SIZE = 255;

    static final int BSIZE = 8 * SIZE;




  41. Because of security issues, programming an "application"
    is quite different from programming an "applet." A
    significant issue is that an applet won’t let you write to disk,
    because that would allow a program downloaded from an unknown
    machine to trash your disk. This changes somewhat with Java 1.1
    digital signing, which allows you to unequivocally know
    everyone that wrote all the programs that have special access to
    your system (one of which might have trashed your disk; you still
    have to figure out which one and what to do about it.). Java 1.2
    also promises more power for applets


  42. Since Java can be too restrictive in some cases, you could be
    prevented from doing important tasks such as directly accessing
    hardware. Java solves this with native methods that allow you
    to call a function written in another language (currently only C and
    C++ are supported). Thus, you can always solve a platform-specific
    problem (in a relatively non-portable fashion, but then that code is
    isolated). Applets cannot call native methods, only applications.


  43. Java has built-in support for comment documentation, so the source
    code file can also contain its own documentation, which is stripped
    out and reformatted into HTML via a separate program. This is a boon
    for documentation maintenance and use.


  44. Java contains standard libraries for solving specific tasks. C++
    relies on non-standard third-party libraries. These tasks include
    (or will soon include):



    • Networking

    • Database Connection (via JDBC)

    • Multithreading

    • Distributed Objects (via RMI and CORBA)

    • Compression

    • Commerce


    The availability and standard nature of these libraries allow for
    more rapid application development.


  45. Java 1.1 includes the Java Beans standard, which is a way to
    create components that can be used in visual programming
    environments. This promotes visual components that can be used under
    all vendor’s development environments. Since you aren’t tied to
    a particular vendor’s design for visual components, this should
    result in greater selection and availability of components. In
    addition, the design for Java Beans is simpler for programmers to
    understand; vendor-specific component frameworks tend to involve a
    steeper learning curve.


  46. If the access to a Java handle fails, an exception is thrown. This
    test doesn’t have to occur right before the use of a handle; the
    Java specification just says that the exception must somehow be
    thrown. Many C++ runtime systems can also throw exceptions for bad
    pointers.


  47. Generally, Java is more robust, via:


    • Object handles initialized to null (a keyword)
    • Handles are always checked and exceptions are thrown for
      failures
    • All array accesses are checked for bounds violations
    • Automatic garbage collection prevents memory leaks
    • Clean, relatively fool-proof exception handling
    • Simple language support for multithreading
    • Bytecode verification of network applets

Singleton Object

For those who haven't heard of design patterns before, or who are familiar with the term but not its meaning, a design pattern is a template for software development. The purpose of the template is to define a particular behavior or technique that can be used as a building block for the construction of software - to solve universal problems that commonly face developers. Think of design code as a way of passing on some nifty piece of advice, just like your mother used to give. "Never wear your socks for more than one day" might be an old family adage, passed down from generation to generation. It's common sense solutions that are passed on to others. Consider a design pattern as a useful piece of advice for designing software.

Design patterns out of the way, let's look at the singleton. By now, you're probably wondering what a singleton is - isn't jargon terrible? A singleton is an object that cannot be instantiated. At first, that might seem counterintuitive - after all, we need an instance of an object before we can use it. Well yes a singleton can be created, but it can't be instantiated by developers - meaning that the singleton class has control over how it is created. The restriction on the singleton is that there can be only one instance of a singleton created by the Java Virtual Machine (JVM) - by prevent direct instantiation we can ensure that developers don't create a second copy.

So why would this be useful? Often in designing a system, we want to control how an object is used, and prevent others (ourselves included) from making copies of it or creating new instances. For example, a central configuration object that stores setup information should have one and one only instance - a global copy accessible from any part of the application, including any threads that are running. Creating a new configuration object and using it would be fairly useless, as other parts of the application might be looking at the old configuration object, and changes to application settings wouldn't always be acted upon. I'm sure you can think of a other situations where a singleton would be useful - perhaps you've even used one before without giving it a name. It's a common enough design criteria (not used everyday, but you'll come across it from time to time). The singleton pattern can be applied in any language, but since we're all Java programmers here (if you're not, shame!) let's look at how to implement the pattern using Java.

Preventing direct instantiation

We all know how objects are instantiated right? Maybe not everyone? Let's go through a quick refresher.

Objects are instantiated by using the new keyword. The new keyword allows you to create a new instance of an object, and to specify parameters to the class's constructor. You can specify no parameters, in which case the blank constructor (also known as the default constructor) is invoked. Constructors can have access modifiers, like public and private, which allow you to control which classes have access to a constructor. So to prevent direct instantiation, we create a private default constructor, so that other classes can't create a new instance.

We'll start with the class definition, for a SingletonObject class. Next, we provide a default constructor that is marked as private. No actual code needs to be written, but you're free to add some initialization code if you'd like.

public class SingletonObject
{
private SingletonObject()
{
// no code req'd
}
}


So far so good. But unless we add some further code, there'll be absolutely no way to use the class. We want to prevent direct instantiation, but we still need to allow a way to get a reference to an instance of the singleton object.

Getting an instance of the singleton

We need to provide an accessor method, that returns an instance of the SingletonObject class but doesn't allow more than one copy to be accessed. We can manually instantiate an object, but we need to keep a reference to the singleton so that subsequent calls to the accessor method can return the singleton (rather than creating a new one). To do this, provide a public static method called getSingletonObject(), and store a copy of the singleton in a private member variable.

public class SingletonObject
{
private SingletonObject()
{
// no code req'd
}

public static SingletonObject getSingletonObject()
{
if (ref == null)
// it's ok, we can call this constructor
ref = new SingletonObject();
return ref;
}

private static SingletonObject ref;
}

So far, so good. When first called, the getSingletonObject() method creates a singleton instance, assigns it to a member variable, and returns the singleton. Subsequent calls will return the same singleton, and all is well with the world. You could extend the functionality of the singleton object by adding new methods, to perform the types of tasks your singleton needs. So the singleton is done, right? Well almost.....

Preventing thread problems with your singleton

We need to make sure that threads calling the getSingletonObject() method don't cause problems, so it's advisable to mark the method as synchronized. This prevents two threads from calling the getSingletonObject() method at the same time. If one thread entered the method just after the other, you could end up calling the SingletonObject constructor twice and returning different values. To change the method, just add the synchronized keyword as follows to the method declaration :-

public static synchronized
SingletonObject getSingletonObject()

Are we finished yet?

There, finished. A singleton object that guarantees one instance of the class, and never more than one. Right? Well.... not quite. Where there's a will, there's a way - it is still possible to evade all our defensive programming and create more than one instance of the singleton class defined above. Here's where most articles on singletons fall down, because they forget about cloning. Examine the following code snippet, which clones a singleton object.

public class Clone
{
public static void main(String args[])
throws Exception
{
// Get a singleton
SingletonObject obj =
SingletonObject.getSingletonObject();

// Buahahaha. Let's clone the object
SingletonObject clone =
(SingletonObject) obj.clone();
}
}

Okay, we're cheating a little here. There isn't a clone() method defined in SingletonObject, but there is in the java.lang.Object class which it is inherited from. By default, the clone() method is marked as protected, but if your SingletonObject extends another class that does support cloning, it is possible to violate the design principles of the singleton. So, to be absolutely positively 100% certain that a singleton really is a singleton, we must add a clone() method of our own, and throw a CloneNotSupportedException if anyone dares try!

Here's the final source code for a SingletonObject, which you can use as a template for your own singletons.

public class SingletonObject
{
private SingletonObject()
{
// no code req'd
}

public static SingletonObject getSingletonObject()
{
if (ref == null)
// it's ok, we can call this constructor
ref = new SingletonObject();
return ref;
}

public Object clone()
throws CloneNotSupportedException
{
throw new CloneNotSupportedException();
// that'll teach 'em
}

private static SingletonObject ref;
}

Summary

A singleton is an class that can be instantiated once, and only once. This is a fairly unique property, but useful in a wide range of object designs. Creating an implementation of the singleton pattern is fairly straightforward - simple block off access to all constructors, provide a static method for getting an instance of the singleton, and prevent cloning.