Grid Dynamics

Gigapult project

Blog List Current blog
Log in

 
Kirill Ishanov 
Updated
I've finally moved Gigapult project to the OpenSpaces, so now it can be found here

Gigaspaces products (such as GigaSpaces eXtreme Application Platform) are shipped with a set of scripts for running them under different platforms. These scripts are good enough for running containers and services with the default configuration, but if you need more advanced control over bootstrapping process, you'll have to spend some time setting up different configuration parameters. These parameters can be a part of space url, special environment variables, Java VM options, etc.

The standard way to override these parameters or provide some additional ones is to create some additional scripts with all needed stuff. For example, the following chunk of code was taken from examples of replicated data grid:

Code

# Initializing the common environment for GigaSpaces
JSHOMEDIR=`dirname $0`/../../../../..; export JSHOMEDIR
. ${JSHOMEDIR}/bin/setenv.sh

# Setting JARS
JARS="${JSHOMEDIR}${CPS}${JSHOMEDIR}/lib/JSpaces.jar${CPS}../classes"; export JARS

SCHEMA=default; export SCHEMA
CLUSTER_SCHEMA=async_replicated; export CLUSTER_SCHEMA
# CLUSTER_SCHEMA=sync_replicated; export CLUSTER_SCHEMA
TOTAL_MEMBERS=2; export TOTAL_MEMBERS

# Definition of spaceURL without local cache
spaceFinderURL="jini://*/*/rep_cache?groups=${LOOKUPGROUPS}"; export spaceFinderURL
usedAPI=map; export usedAP


Let's see the disadvantages of this approach.

The bin directory in GigaSpaces distribution contains a set of system-specific scripts (shell scripts for Unix and batch scripts for Windows). These scripts provide the same functionality, so if you need to create an application and then support both operating systems, you'll have to write two different versions of scripts: one for the Unix system - the other for Windows. As a result, when configuration logic becomes more complex, the support and maintenance of both scripts' versions becomes a hell, and the administrator of the end system feels like this guy from Eifel's Tower maintenance squad.

Another problem touches shell scripts only. Shell scripting language is pretty old and on different clons of Unix operating systems there are different dialects of it. On most of the linux distributives there is a bash pre-installed, on Sun Solaris there is a ksh, etc. To work properly with all these dialects shell scripts should be written in pure old shell, and it's a pain, cause it has lack of language tools to manage some advanced tasks. The fantom menace here is that on different platforms some tricky shell functions work differently. For example, toNative function from gs.sh (which is written with bash) produces different results on bash under the cygwin and linux and the ksh under the solaris. So, to test compatibility of these scripts on different platforms you'll need to test them manually on every platform. Of course, there are special parameters for each shell to check syntax of the script, but it doesn't validate the semantics. It leads to incompatibility of scripts. And because of incompatibility it is hard to migrate with the same configuration to different platforms, cause it can lead to some surprising bugs.

The third major problem with GigaSpaces configuration process is that there are different ways to configure things. «Where should I place this value?», «In what xml file should I modify the value of this attribute?», etc. There is no unified way to configure all these. By the way, all of the configuration information can be represented as a set of key-value pairs, why do I need to modify them in different places? It looks like the configuration zoo, where different animals from different ecosystems live together on a small area.

Yet another problem is that there are no any configuration validation tools, which will say that all properties were set up correctly, there is no typos, incompatible types of values (for example, when the value of the property should be a decimal number but the string value was provided). Sure, there are schema validation tools for XML files, but what about space url? The only way to get the error message is to run the configuration and look throw the log files with java stack trace messages to find out, where the error appeared. The scripts are silent and cannot notify about a mistake before this mistake causes problems.

The last problem may sound like a caprice, but shell and batch are almost unreadable. As a developer, I just want to configure my cluster, but not to deal with the reverse engineering instead of concentrating on the domain.I had the feeling that I'm reading hieroglyphs when I saw shell scripts for the first time.

That's why the Gigapult project appeared.

The main goal of it is to simplify the GigaSpaces' configuration and bootstrapping process and make configuration files more maintainable and readable.

To do it, we should use the Force - the JDK force. GigaSpaces is written in Java, so the first requirement for running it is the JDK installed. The JDK provides a great cross-platform API for building the applications so we can delegate system-specific tasks like choosing the correct classpath separator char. So, JDK can become such Force. As a result, we don't need to implement 2 versions of scripts. But Java itself is the compiling language, so it is not very handy for scripting purposes. As the real Jedys we need something interpreted. Fortunately, there is a nice solution: there are plenty of dynamic interpreted languages written in 100% pure Java. For example, Jython - an implementation of Python programming language, JRuby - an implementation of Ruby programming language and Groovy - very young but ambitious programming language with some interesting features.

OK, we can simply rewrite existing scripts, but what about simplicity, readability and intelligibility? Of course, the python or ruby code is more readable and clear, but it is still the code written on general-purpose programming language with it's own syntax and semantics. When we're exporting a LOOKUPGROUPS variable we think about assigning the value to the VM option, not about the configuring cluster. But we don't want to override the value of the variable, we want to configure the container and the space. The difference is almost imperceptible for the programmers, but for the fellow from support it makes sense.

There is a Sapir-Whorf hypothesis which describes this idea. Following this hypothesis we need to create a language to describe the configuration. Such language is called Domain Specific Language (or DSL) and it has been a hot topic in a software development for the recent years. There is a great presentation on this topic from ThoughtWorks and their chief scientist Martin Fowler.

So the sample configuration file can looks like this:

Code

configure {
  java_options {
    jvm_memory 512
  }

  vm_options {
    com.gs.jini_lus.groups 'myGroups'
    com.gs.start-embedded-lus true
    com.gs.start-embedded-mahalo false
  }

  space_url {
    space_name 'mySpace'
    cluster_schema 'partitioned'
    total_members 2
  }

  node {
    id 1
  }

  node {
    id 2
  }
}


The project now in early beta phase and will be available as open-source project (will be distributed under Apache 2.0 License).

Category:  GigaSpaces


© 2008 Grid Dynamics Consulting Services, Inc. All rights reserved.