January 2012 – Shekhar Gulati

Accessing PostgreSQL Server From Remote Machine

Today I was trying to remotely connect to PostgreSQL server and it took me some time to configure it. So, in this short blog I am sharing the procedure to make it work.

Edit the pg_hba.conf file to enable client connections as shown below. The important thing is the use of 0.0.0.0/0 to allow client access from any IP address. You can also define a subset of ip here.

# TYPE  DATABASE        USER            ADDRESS                 METHOD

# "local" is for Unix domain socket connections only
local   all             all                                     trust
# IPv4 local connections:
host    all             all             127.0.0.1/32            trust
host    all             all             0.0.0.0/0                trust
# IPv6 local connections:
host    all             all             ::1/128                 trust
# Allow replication connections from localhost, by a user with the
# replication privilege.
#local   replication     postgres                                md5
host    replication     postgres        0.0.0.0/0               trust
host    replication     postgres        ::1/128                 trust

Edit the postgresql.conf file and change listen_addresses = ‘*’ . This means any client can access the server. By default value is local. You can also specify a comma separated list of ip address also.
Restart the postgreSQL server either using postgres service or first killing the process and then starting using pg_ctl script.
The last and most important thing is to edit the /etc/sysconfig/iptables file and add entry to allow access to 5432 port. This can be done as shown below. Please add the line shown below in iptables file.
-A INPUT -m state –state NEW -m tcp -p tcp –dport 5432 -j ACCEPT
Then restart the iptables service using service iptables restart
Finally test it with psql client by executing following command ./psql -U postgres -h 192.168.0.10 -p 5432

Spring Roo + OpenShift Express == Extreme Productivity In Cloud Part 1

Today, I released the second version of OpenShift Express Spring Roo Add-on. With this release add-on also support creation of domain namespace and changing the domain namespace. Now the only important thing left in the add-on is support for session management so that users does not have to enter email and password with every command. From today, I am starting a series of short blog posts which will cover the features of this add-on. In this post I will be talking about creating and changing domain namespace. Before I get into details lets me first tell you what is Spring Roo and OpenShift Express in case you don’t know about them.

Note :You can also refer to my Spring Roo series on IBM DeveloperWorks for more details.

What is Spring Roo?

Spring Roo is a lightweight productivity tool for Java™ technology that makes it fast and easy to develop Spring-based applications. Applications created using Spring Roo follow Spring best practices and are based on standards such as JPA, Bean Validation (JSR-303), and Dependency Injection (JSR-330). Roo offers a usable, context-aware, tab-completing shell for building applications. Spring Roo is extensible and allows add-ons, enhancing its capability.

What is OpenShift Express?

OpenShift Express is a Platform as a Service offering from RedHat. OpenShift Express allows you to create and deploy applications to the cloud. The OpenShift Express client is a command line tool that allows you to manage your applications in the cloud. It is currently free and runs on Amazon EC2. Currently it supports Java, Ruby, PHP, Python run times. You can refer to OpenShift Express documentation for more details.

OpenShift Express Client Tools

OpenShift Express has three client tools for creating and deploying applications to cloud. These are RHC Ruby Gem., OpenShift Express Eclipse Plugin, and SeamForge RHC Plugin. I have mainly used RHC Ruby Gem and it is very easy to use and work. There were two problems why I decided to write Roo add-on. One is that you need Ruby Runtime and second is I use Spring Roo a lot so it allows me to perform full lifecycle of the application from within Roo shell. Because I am writing add-on I can also do lot of interesting stuff like session management, making some changes to the code to avoid repetitive work. One such thing that I have already added is adding OpenShift profile in the pom.xml. This will help me automate repetitive work.

Lets Get Started

Getting started with the add-on is very easy. First you need to download Spring Roo and fire the Roo shell. Once inside the Roo shell we have to install the add-on. To do this execute the following command as shown below.

osgi start --url http://spring-roo-openshift-express-addon.googlecode.com/files/org.xebia.roo.addon.openshift-0.2.RELEASE.jar

It will take couple of seconds to install the add-on. You can view that a new OSGI process has started using osgi ps command.

Creating Domain

Before you can create domain please signup at https://openshift.redhat.com/app/express. The email and password with which you signup will be the credentials for accessing the cloud. After you have signed up the next step is to create a domain. You can’t create the applications before creating a domain. The domain is a logical name within which all your applications will reside. It forms the part of your application url. For example if you created a domain with name “paas” and application with name “openshift” your application url will be http://openshift-paas.rhcloud.com . We can create the domain using rhc-create-domain command but in this blog I will show you how to create using Spring Roo add-on. To create domain using Spring Roo execute the command shown below.

rhc create domain --namespace openpaas --rhLogin <rhlogin> --password <password> --passPhrase <passphrase>

This command does the following things :

It will create the ssh keys under user.home/.ssh folder if they does not exist. The passphrase is required for creating the sshkeys.
If ssh keys already exists it loads the ssh keys from user.home/.ssh location and reuse them.
finally it creates the domain with namespace openpaas.
In case your credentials are wrong it will show a log line saying that credentials are wrong.

Changing Domain Namespace

Although you can’t delete a domain after you have created but you can change the domain name. To do that you can use the command shown below.

rhc change domain --namespace xyz --rhLogin <rhlogin> --password <password>

This will change the domain name to xyz.

Thats it for this blog. In the next blog we will look at how to create Spring JPA application using Spring Roo and deploy it to OpenShift Express.

Spring Roo Add-on Release Problems

I am writing a add-on for deploying Spring Roo applications to OpenShift Express cloud just like the Cloud Foundry Spring Roo add-on. The add-on is available at http://code.google.com/p/spring-roo-openshift-express-addon/. But I face problems when I release the add-on using Maven Release plugin (mvn release:prepare release:perform). So, this is simple guide to help me next time I face these issues.

SnapShot Dependencies : The first problem I face is that my add-on depend on OpenShift Java client which is available as Snapshot dependency. So, I need to ignore the snapshot dependencies. This is not recommeded but some times you don’t have any other solution. For more information refer to this post http://stackoverflow.com/questions/245932/how-to-release-a-project-which-depends-on-a-3rd-party-snapshot-project-in-maven
```
mvn release:prepare release:perform -DignoreSnapshots=true
```
Adding Third Party Jars in an OSGI bundle. I have blogged this at https://whyjava.wordpress.com/2012/01/02/adding-third-party-jars-to-an-osgi-bundle/.

The third problem I face is that svn client is not able to authenticate with Google Code. The exception that I get is shown below. To fix this I specify username and password with mvn release command as shown below.

mvn release:prepare release:perform -DignoreSnapshots=true -Dusername=shekhar.xebia@gmail.com -Dpassword=password

The exception that I get is

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-release-plugin:2.2:prepare (default-cli) on project org.xebia.roo.addon.openshift: Unable to commit files
[ERROR] Provider message:
[ERROR] The svn command failed.
[ERROR] Command output:
[ERROR] svn: Commit failed (details follow):
[ERROR] svn: MKACTIVITY of '/svn/!svn/act/b8000ba7-c3c6-4bb2-9d3b-bd3c1db73dd3': authorization failed: Could not authenticate to server: rejected Basic challenge (https://spring-roo-openshift-express-addon.googlecode.com)
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException

Recover Lost Cartridge Password in OpenShift Express

I have been working with OpenShift express for quite some time and I create lot of applications and add different cartridge to the application. To access the cartridges using clients you need to remember the credentials. I developed a small application which was using MongoDB cartridge and I wanted to access the database using mongo shell client on the express instance but I forgot the password. To retrieve the lost password a simple recipe is to log into the express instance using ssh. Get the user information from rhc-user-info command. Once you log into instance just type env command. It will list down all the environment variable and from the list you can recover password. A sample output is shown below.

[example-example.rhcloud.com ~]\> env
OPENSHIFT_NOSQL_DB_USERNAME=admin
SELINUX_ROLE_REQUESTED=
OPENSHIFT_NOSQL_DB_TYPE=mongodb
TERM=xterm
SHELL=/usr/bin/trap-user
OPENSHIFT_LOG_DIR=/var/lib/libra/2b80xxxxxxxxxxxxxxxx4547a74fb42b9/example/logs/
SSH_CLIENT=122.161.141.170 13010 22
OPENSHIFT_NOSQL_DB_URL=mongodb://admin:aCJ5muBZKtjw@127.1.40.1:27017/
OPENSHIFT_TMP_DIR=/tmp/
SELINUX_USE_CURRENT_RANGE=
OPENSHIFT_REPO_DIR=/var/lib/libra/2b80xxxxxxxxxxxxxxxx4547a74fb42b9/example/repo/
OPENSHIFT_HOMEDIR=/var/lib/libra/2b80xxxxxxxxxxxxxxxx4547a74fb42b9/
OPENSHIFT_INTERNAL_PORT=8080
SSH_TTY=/dev/pts/0
USER=2b80xxxxxxxxxxxxxxxx4547a74fb42b9
OPENSHIFT_NOSQL_DB_PASSWORD=aCJ5muBZKtjw
TMOUT=300
OPENSHIFT_NOSQL_DB_MONGODB_20_RESTORE=/usr/libexec/li/cartridges/embedded/mongodb-2.0/info/bin/mongodb_restore.sh
OPENSHIFT_NOSQL_DB_MONGODB_20_DUMP_CLEANUP=/usr/libexec/li/cartridges/embedded/mongodb-2.0/info/bin/mongodb_cleanup.sh
PATH=/usr/libexec/li/cartridges/jbossas-7.0/info/bin/
OPENSHIFT_RUN_DIR=/var/lib/libra/2b80xxxxxxxxxxxxxxxx4547a74fb42b9/example/run/
OPENSHIFT_NOSQL_DB_PORT=27017
OPENSHIFT_INTERNAL_IP=127.1.40.1
PWD=/var/lib/libra/2b80xxxxxxxxxxxxxxxx4547a74fb42b9
OPENSHIFT_NOSQL_DB_MONGODB_20_EMBEDDED_TYPE=mongodb-2.0
JAVA_HOME=/etc/alternatives/java_sdk_1.6.0
OPENSHIFT_APP_DNS=example-example.rhcloud.com
OPENSHIFT_NOSQL_DB_HOST=127.1.40.1
LANG=en_IN
PS1=[example-example.rhcloud.com \W]\>
OPENSHIFT_APP_CTL_SCRIPT=/var/lib/libra/2b80xxxxxxxxxxxxxxxx4547a74fb42b9/example
/example_ctl.sh
JENKINS_URL=http://jenkins-sgulati.rhcloud.com/
SELINUX_LEVEL_REQUESTED=
OPENSHIFT_APP_DIR=/var/lib/libra/2b80xxxxxxxxxxxxxxxx4547a74fb42b9/example/
SHLVL=1
M2_HOME=/etc/alternatives/maven-3.0
HOME=/var/lib/libra/2b80xxxxxxxxxxxxxxxx4547a74fb42b9
OPENSHIFT_APP_TYPE=jbossas-7.0
OPENSHIFT_APP_NAME=example
OPENSHIFT_DATA_DIR=/var/lib/libra/2b80xxxxxxxxxxxxxxxx4547a74fb42b9/example/data/
LOGNAME=2b80xxxxxxxxxxxxxxxx4547a74fb42b9
SSH_CONNECTION=122.161.141.170 13010 10.211.147.239 22
OPENSHIFT_APP_UUID=2b80xxxxxxxxxxxxxxxx4547a74fb42b9
OPENSHIFT_NOSQL_DB_MONGODB_20_DUMP=/usr/libexec/li/cartridges/embedded/mongodb-2.0/info/bin/mongodb_dump.sh
OPENSHIFT_NOSQL_DB_CTL_SCRIPT=/var/lib/libra//2b80xxxxxxxxxxxxxxxx4547a74fb42b9//mongodb-2.0//example_mongodb_ctl.sh
_=/bin/env

Deploying Application To OpenShift Express using Spring Roo OpenShift Express Add-on

Last couple of years Spring Roo has been one of my favorite tool to rapidly build Spring applications for demo, POC, and learning new technologies. With Spring Roo Cloud Foundry integration you can not only build applications rapidly but deploy to Cloud Foundry public cloud with a couple of commands. In case you are not aware of Spring Roo and Cloud Foundry you can refer to my Spring Roo series on IBM DeveloperWorks. From last 5-6 months I have been following OpenShift platform as a service and I am really in love with OpenShift Express because of its feature set and simplicity. In case you are not aware of OpenShift Express please refer to the documentation. There are two ways Java applications can be deployed to express — using ruby command line tool called RHC and using Eclipse plugin. Personally I like rhc command line more than eclipse plugin. The rhc command line tool is great but you should have ruby runtime installed on your machine. Being Spring Roo afficianado I decided to write Spring Roo OpenShift Express add-on to create, deploy, add cartridges from with Roo shell. This can be thought of as a third way to deploy applications to OpenShift Express. The project is hosted on Google Code at http://code.google.com/p/spring-roo-openshift-express-addon/. In this blog I will show you how you can install the add-on, create a template OpenShift Express application, convert that application to a Spring MongoDB application using Spring Roo, and finally deploy application to OpenShift Express. You can read the full blog at my company blog http://xebee.xebia.in/2012/01/09/running-spring-java-mongodb-apps-on-openshift-express-using-spring-roo-openshift-express-addon/

How Working Set Affects MongoDB Performance?

This is third post in my series of posts on MongoDB. This post will talk about how working set affects performance of MongoDB. The idea of this experiment came to me after I read a very good blog from Colin Howe on MongoDB Working Set. The tests performed in this blog are on similar lines as the ones talked by Colin Howe but performed with Java and MongoDB version 2.0.1. If you have worked with MongoDB or read about it you might have heard of the term Working Set. Working Set is the amount of data(including indexes) that will be in used by your application and if this data fits in RAM then the application performance will be great else it would degrade drastically When the data can’t fit in RAM MongoDB has to hit disk which impacts performance. I recommend reading blog from Adrian Hills on the importance of Working Set. To help you understand working set better I am citing the example from Adrian blog :

Suppose you have 1 year’s worth of data. For simplicity, each month relates to 1GB of data giving 12GB in total, and to cover each month’s worth of data you have 1GB worth of indexes again totalling 12GB for the year.

If you are always accessing the last 12 month’s worth of data, then your working set is: 12GB (data) + 12GB (indexes) = 24GB.

However, if you actually only access the last 3 month’s worth of data, then your working set is: 3GB (data) + 3GB (indexes) = 6GB.

From the example above if your machine has more than 6GB RAM then your application will perform great otherwise it will be slow. The important thing to know about working set is that MongoDB uses LRUstrategy to decide which documents are in RAM and you can’t tell MongoDB to keep a particular document or collection in RAM. Now that you know what is working set and how important it is let’s start the experiment.

Setup

Dell Vostro Ubuntu 11.04 box with 4 GB RAM and 300 GB hard disk. Java 6 MongoDB 2.0.1 Spring MongoDB 1.0.0.M5 which internally uses MongoDB Java driver 2.6.5 version.

Document

The documents I am storing in MongoDB looks like as shown below. The average document size is 2400 bytes. Please note the _id field also has an index. The index that I will be creating will be on name field.

{
"_id" : ObjectId("4ed89c140cf2e821d503a523"),
"name" : "Shekhar Gulati",
"someId1" : NumberLong(1000006),
"str1" : "U",
"date1" : ISODate("1997-04-10T18:30:00Z"),
"index" : 1,
"bio" : "I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a
Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java
Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I
am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java
Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I
am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java
Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I
am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java
Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I
am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java
Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I
am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java
Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I
am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java
Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I
am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java
Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I
am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java Developer. I am a Java
Developer. I am a Java Developer. "
}

Test Case

The test case will run 6 times with 10k, 100k, 1 million,2 million, 3 million, and 10 million records. The document used is shown under document heading and is the same as the one used in first post just with one extra field index(a simple int field which just auto increments by one) . Before inserting the records index is created on index field and then records are inserted in batch of 100 records. Finally 30000 queries are performed on a selected part of the collection. The selection varies from 1% of the data in the collection to 100% of the collection. The queries were performed 3 times on the selected dataset to give MongoDB chance to put the selected dataset in RAM. The JUnit test case is shown below.

	@Test
	public void workingSetTests() throws Exception {
		benchmark(10000);
		cleanMongoDB();
		benchmark(100000);
		cleanMongoDB();
		benchmark(1000000);
		cleanMongoDB();
		benchmark(2000000);
		cleanMongoDB();
		benchmark(3000000);
		cleanMongoDB();
		benchmark(10000000);
		cleanMongoDB();
	}

	private void benchmark(int totalNumberOfKeys) throws Exception {
		IndexDefinition indexDefinition = new Index("index", Order.ASCENDING)
				.named("index_1");
		mongoTemplate.ensureIndex(indexDefinition, User.class);
		int batchSize = 100;
		int i = 0;

		long startTime = System.currentTimeMillis();
		LineIterator iterator = FileUtils.lineIterator(new File(FILE_NAME));
		while (i < totalNumberOfKeys && iterator.hasNext()) {
			List users = new ArrayList();
			for (int j = 0; j < batchSize; j++) {
				String line = iterator.next();
				User user = convertLineToObject(line);
				user.setIndex(i);
				users.add(user);
				i += 1;
			}
			mongoTemplate.insert(users, User.class);

		}
		long endTime = System.currentTimeMillis();
		logger.info(String.format("%d documents inserted took %d milliseconds",	totalNumberOfKeys, (endTime - startTime)));

		performQueries(totalNumberOfKeys, 1);
		performQueries(totalNumberOfKeys, 1);
		performQueries(totalNumberOfKeys, 1);
		performQueries(totalNumberOfKeys, 10);
		performQueries(totalNumberOfKeys, 10);
		performQueries(totalNumberOfKeys, 10);
		performQueries(totalNumberOfKeys, 100);
		performQueries(totalNumberOfKeys, 100);
		performQueries(totalNumberOfKeys, 100);

		String collectionName = mongoTemplate
				.getCollectionName(User.class);
		CommandResult stats = mongoTemplate.getCollection(collectionName)
				.getStats();
		logger.info("Stats : " + stats);
		double size = stats.getDouble("storageSize");
		logger.info(String
				.format("Storage Size : %.2f M", size / (1024 * 1024)));

	}

	private void performQueries(int totalNumberOfKeys, int focus) {
		int gets = 30000;
		long startTime = System.currentTimeMillis();
		for (int index = 0; index < gets; index++) {
			Random random = new Random();
			boolean focussedGet = random.nextInt(100) != 0;
			int key = 0;
			if (focussedGet) {
				key = random.nextInt((totalNumberOfKeys * focus) / 100);
			} else {
				key = random.nextInt(totalNumberOfKeys);
			}
			mongoTemplate.findOne(Query.query(Criteria.where("index").is(key)),
					User.class);
		}
		long endTime = System.currentTimeMillis();
		logger.info(String.format("%d gets (focussed on bottom %d%%) took %d milliseconds", gets,focus, (endTime - startTime)));
	}

Results

In the table above number of records are in million and time to do 30k queries is in seconds.

One thing that this data clearly shows is that if you have you have working set which can fit in RAM performance almost remains same agnostic of total number of documents in MongoDB. This can easily be seen by comparing 3 run of 1 % dataset of all dataset values. Performance of 30k queries on 1% of dataset on both 3 million records and 10 million records are very close.

Adding Third Party Jars To An OSGI Bundle

Few days back I was writing a Spring Roo add-on(an OSGI bundle) which required me to add some external 3rd party jars to the OSGI bundle pom.xml file. I am not very well verse with OSGI so it took me some time to figure out how to add 3rd party jars to an OSGI bundle. In this short blog post I am sharing the recipe to add third part jars.

The exception that you get when OSGI bundle is not able to find the dependency is like this.

org.osgi.framework.BundleException: Unresolved constraint com.shekhar.roo.addon [66]: Unable to resolve 66.0: missing requirement [66.0] package; (package=com.shekhar.utils)

To solve this error you need to add Embed-Dependency tag to your maven bundle plugin as shown below. The example shows that I need to add commons-io and utils jar to my OSGI bundle. To get more information about Embed-Dependency tag and maven bundle plugin please refer to the documentation.

<plugin>
	<groupId>org.apache.felix</groupId>
	<artifactId>maven-bundle-plugin</artifactId>
	<version>2.3.4</version>
	<extensions>true</extensions>
	<configuration>
		<instructions>
		<Bundle-SymbolicName>${project.artifactId}</Bundle-SymbolicName>
		<Bundle-Copyright>Copyright Shekhar Gulati. All Rights Reserved.</Bundle-Copyright>
		<Bundle-DocURL>${project.url}</Bundle-DocURL>
		utils,jsch,commons-io;scope=compile|runtime;inline=false</Embed-Dependency>
		</instructions>
		<remoteOBR>true</remoteOBR>
	</configuration>
</plugin>

2011 in review — How my blog performed in 2011?

The WordPress.com stats helper monkeys prepared a 2011 annual report for this blog.

Here’s an excerpt:

The concert hall at the Syndey Opera House holds 2,700 people. This blog was viewed about 56,000 times in 2011. If it were a concert at Sydney Opera House, it would take about 21 sold-out performances for that many people to see it.

Click here to see the complete report.