Musings of the lazy developer

Monday, June 24, 2013

Mixin it up Jackson style

We have already discovered the goodness of Jackson for vending out JSON data in a JAX-RS setup. In many cases, our domain objects have references to other domain objects that are not under our control (e.g. classes from third-party JARs). We can easily add annotations to our domain objects to indicate which fields to ignore, but what do we do for the third party objects? One could choose to write custom bean serializers, but that is pretty onerous on the part of the user and one that requires higher maintenance since it is away from the original class. Here is where Jackson Mixins comes to the rescue.

Know your disk latency numbers

I was looking around for a tool that will help quantify the latency difference between a NFS-mounted location to a local location and came across ioping. As the documentation indicates, it tries to represent disk latency in pretty much the same way as the ping command shows network latency to a desired host.

I had to build it from source for our linux installation - but the process went about without a hitch. I first tried it on my NFS mounted home drive where the checked-out code, eclipse workspace and maven repository reside and is the hotspot for IO activity when developing. Using a sample invocation from the documentation site, I tried it for a chunk of size 1MB, ten times.

$>ioping -c 10 -s 1M
--- /home/kilo(nfs fs1.kilo.com:/vol/home/kilo) ioping statistics ---
10 requests completed in 9101.5 ms, 102 iops, 102.1 mb/s
min/avg/max/mdev = 9.5/9.8/10.0/0.2 ms

Next was to test out the performance of our local partition with the same parameters:

--- /local/kilo (ext4 /dev/ssda) ioping statistics ---
10 requests completed in 9052.4 ms, 201 iops, 200.8 mb/s
min/avg/max/mdev = 4.8/5.0/5.6/0.2 ms

Result was roughly a speed-up of 200%!

For kicks, I tried it out on a /tmp location as well with the same parameters:

--- /tmp (tmpfs none) ioping statistics ---
10 requests completed in 9004.1 ms, 5219 iops, 5219.2 mb/s
min/avg/max/mdev = 0.1/0.2/0.3/0.0 ms

That was roughly 30 times faster - but hold on - what is that tmpfs that is mentioned. Digging further, it seems tmpfs is a special filesystem that doesn't reside on a physical disk but rather stored in the physical memory (albeit volatile) - so the speed-up in access is kind of expected. Which also means that we should be extra nice about what goes in /tmp in such a setup. Keeping a lot of garbage in there and not cleaning up will come and bite us at unpredictable times. Guess many people already know about this

Coming back to the utility itself, there are options to change working set sizes, offsets into files that wish to be seeked, etc. One particularly interesting feature is the option to use write IO, but that option was not available in the 0.6 that I downloaded. An issue indicates that this is a yet to be released feature and will be interesting to see it in action.

I think this will a good utility to have in our linux installations by default. If there are other utilities that do similar stuff but comes out of the box from a stock RHEL installation, please let me know. Hope this helps!

References:

1. https://code.google.com/p/ioping/

2. http://en.wikipedia.org/wiki/Tmpfs

3. https://code.google.com/p/ioping/wiki/man

Monday, May 27, 2013

Lets go uber!

ow do you run your non-webapp maven-based java program from command line? One might be using exec:java route and specify the main class. The only sticking point here is that the classpath will be filled with references to the user’s local repository (usually NFS mounted). I would be much more comfortable if I had all the set of jars that I would be dependent on packaged WAR-style and be ready at my disposal. In that sense, the application would be much more self-contained. It also becomes that much easy for someone to testdrive your application. Hence the concept of an “uber” jar – a unified jar that will house all classes/resources of the project along with classes/resources of its transitive dependencies.

Blocked persistence nuance with GET*DATE()

Noticed an interesting (and desired) behavior when dealing with persisting timestamps that is blocked by a competing transaction, which might be useful for others as well. For unitemporal and bitemporal tables, we frequently update the knowledge_date as either GETDATE() or GETUTCDATE(), but it is not guaranteeing that the wall-clock time of when record got persisted to the db is the same as the value noted in the knowledge_date column.

To illustrate this, let's say we have a table as

CREATE TABLE special_table (
special_key INT PRIMARY KEY,
special_value INT,
knowledge_date DATETIME)

An we insert a few values into it:

INSERT INTO special_table VALUES(1, 100, GETUTCDATE())
INSERT INTO special_table VALUES(2, 100, GETUTCDATE())
INSERT INTO special_table VALUES(100, 10000, GETUTCDATE())

Now, let's say, we've got two competing transactions, one a read and another a write with the read preceding the write and the read transaction taking a lot more time to finish (simulated with a WAITFOR DELAY).

Read Transaction (at isolation level repeatable read):

BEGIN TRANSACTION
SELECT GETUTCDATE() --ts1
SELECT * FROM special_table WHERE special_key = 2
WAITFOR DELAY '00:00:10'
SELECT * FROM special_table WHERE special_key = 2
SELECT GETUTCDATE() --ts2
COMMIT

Write Transaction (at isolation level read committed):

BEGIN TRANSACTION
SELECT GETUTCDATE() --ts3
UPDATE special_table
   SET special_value = special_value + 1, knowledge_date=GETUTCDATE()
 WHERE special_key = 2
SELECT GETUTCDATE() --ts4
SELECT * FROM special_table WHERE special_key = 2
COMMIT

Execute these two batches in two windows of SSMS with the read preceding the write. Since the read started before the write, ts1+10 ~= ts2 because the read transaction will experience no blocking. The write operation was kicked off a little after read was kicked off (say with interval d). Hence ts1 + d = ts3.

Question: will the knowledge_date updated be closer to ts3 or ts4?

One might think that the knowledge_date value is closer to ts4 when the write transaction actually gets unblocked, however, this is not the case. For Sql Server itself to figure out whether or not the transaction needs to get blocked (because of the default page level locking scheme followed), the query needs to be evaluated and hence the value to be assigned to knowledge_date has to be evaluated at a time closer to ts3 itself. Hence the knowledge_date timestamp will be persisted to the DB at a wall-clock time closer to ts4even if the DB claims the timestamp as closer to ts3.

This can be verified with the output from the read and writes where there is a marked delay between the knowledge_date updated and the ts4. This becomes even more interesting when you have multiple updates in the same write transaction - some of which can proceed - till the point that it gets blocked because of the read and one can notice varying knowledge_date across records even though they were all kicked off in the same transaction.

BEGIN TRANSACTION
SELECT GETUTCDATE() --ts1
UPDATE special_table
   SET special_value = special_value + 1, knowledge_date=GETUTCDATE() --ts
 WHERE special_key = 100
SELECT GETUTCDATE() --ts2
UPDATE special_table
   SET special_value = special_value + 1, knowledge_date=GETUTCDATE() --tss
 WHERE special_key = 2
SELECT GETUTCDATE() --ts3
SELECT * FROM special_table WHERE special_key = 2
COMMIT

Here, knowledge_date for key 100 would be closer to ts1 due to it not getting blocked by the read and the knowledge_date for key 2 would be closer to ts3 and away from ts2 since it was blocked by the read.

BTW, this should not haunt the trigger based td_bl temporal tables as the trigger gets only fired after the base table is updated and effectively captures the timestamp of when the base table was changed (but may not be the exact time when the temporal record got persisted to the db due to blocking concerns).

Hope this helps!

References:

Saturday, March 9, 2013

Quick headless JAX-RS servers with CXF

If one needs to vend out JSON data in a JAX-RS compatible way with minimal setup fuss, CXF + Spring provides good out-of-the-box solution for you.

The steps would be:

Write your service class (interface and impl preferably)
Annotate your service impl methods with

@Path annotation indicating the URI on which it will serve the resource
@Get/@Post indicating the HTTP method which it serves
@Produces("application/json") indicating that the output format is JSON

Define a jaxrs:server directive in your spring context file indicating the address and resource path on which the service is hosted
Add maven dependencies of javax.ws.rs-api (for annotations), cxf-rt-core (for stubbing RS communication over a http conduit) and cxf-rt-transports-http-jetty (for embedded jetty)

and voila you are done.

Concretely:

public interface SpecialService {
    String getSomeText();
}

public class SpecialServiceImpl implements SpecialService {

    @GET
    @Produces("application/json")
    @Path("/someText/")
    @Override
    public String getSomeText() {
        return "kilo";
    }
}

    <bean id="specialService" class="com.kilo.SpecialServiceImpl"/>

    <bean id="inetAddress" class="java.net.InetAddress" factory-method="getLocalHost" />

    <jaxrs:server id="specialServiceRS"
        address="http://#{inetAddress.hostName}:${com.kilo.restful.port}/specialServiceRS">
        <jaxrs:serviceBeans>
            <ref bean="specialService" />
        </jaxrs:serviceBeans>
    </jaxrs:server>

And now hit http://yourhostname:yourportnum/specialServiceRS/someText to get the response as "kilo". If you examine the request via some developer tools, you will see that the content type is application/json.

CXF JAX-RS uses an embedded jetty as the http container, so we don't really need a tomcat for setting this up. This might bring up the question of Tomcat vs Jetty overall and here are my thoughts:

Tomcat

Lightweight
Servlet 3 style async threading in the works
Known beast in terms of configuration

Jetty

Even more lightweight
Implements servlet 3 style async thread allocation (like node) and hence more responsive and efficient
Easy to have an embedded server with cxf (embeddability is synonymous with jetty)
Ability to have multiple simple java processes that act as headless servers quickly

Overall, I believe we should give Jetty a chance and see how it performs. If it ever lets us down, it is easy to take the process and house it in a Tomcat container.

We will try to cover some more involved use cases of passing in inputs via JAX-RS, dealing with complex objects, CORS and GZIP in subsequent posts (the samples already have them explained).

References:

Friday, March 8, 2013

Staging paradigms universe in MS SQL Server

How do you pass bulk data from the application to a staging table in the database layer in a typical 3-tier java application?

Options:

Use multiple INSERT clauses (with iBATIS/spring-jdbc/jdbc)
Use INSERT clause with multiple values clauses [SQL Server 2008 onwards] (withiBATIS/spring-jdbc/jdbc)
Use batching of INSERT clause with varying batch sizes (with iBATIS/spring-jdbc/jdbc)
Create a bcp file and use the bcp executable
Create a bcp file and use the BULK INSERT T-SQL command (with iBATIS/spring-jdbc/jdbc)
Create a bcp file and use the OPENROWSET BULK T-SQL command (withiBATIS/spring-jdbc/jdbc)
XML shredding (with iBATIS/spring-jdbc/jdbc)

We will analyze each of these options and try to establish benchmarks in trying to stage data of two data sets sized 1k and 100k records having columns belonging to a motley of data types and try to establish best practices as to where what should be used.

The staging candidate chosen looks something like this:

CREATE TABLE motley
  (
     date           DATETIME,
     name           VARCHAR(50),
     id             INT,
     price          NUMERIC(18, 4),
     amount         NUMERIC(18, 4),
     fx_rate        NUMERIC(18, 7),
     is_valid       TINYINT,
     knowledge_time DATETIME
  )

The stats were as follows:

As you may imagine, there are pros and cons with each approach which I have tried to outline here.

Feel free to suggest ways to further fine-tune each of these approaches or suggest brand new ones or plainly comment on the comparison metrics or factors.

Hope this helps!

References:

Monday, February 4, 2013

Experimenting with MyBatis

iBATIS has been the workhorse for our data heavy applications for quite some time, and we were largely happy with it – till MyBatis took it to the next level. While iBATIS has been decommissioned circa mid 2010 (and now lives in the apache attic), the real trigger to move to see what MyBatis offered, was Spring’s decision to deprecate support for iBATIS in Spring 3.2.0.RELEASE and all of my DAO classes would show those pesky warnings in eclipse. Some time ago, we had a training on the nuances of iBATIS in which we outlined a few examples. I thought that converting this same setup to MyBatis would be a good exercise to see the features in action and how these two are similar and where they differ.

To start off, I fired up the migration tool viz ibatis2mybatis on my existing XML files in the previous setup. The tool is pretty decent and gets us almost 90% there. The major changes that still needed to be done were:

Add namespacing to be consistent with mapper package name
Changing of primary key attributes of a result map from <result> tag to <id> tag (needed for group by operations and performance optimizations)
Strict ordering of elements in the resultMap tag so that <id/> is followed by <result/> followed by <association/> followed by <collection/> (not sure why the DTD has been defined thusly)
Replace intermediate resultMap created for collections by providing the type information to the <collection/> tag itself. For an example see userWithLoginLocationsAndUniqueSites resultMap.
Cache settings are not migrated and need to be defined by hand again. But that is extremely simple now — <cache/> and you have a cache local to the mapper!

After this, the only other changes were to the config file to disable the default lazyLoading. We prefer the objects to be POJO’s and also helps in reflective comparison of objects for hashCode and equals. Since we use mybatis-spring integration (provided by mybatis), we don’t need to deal with the SqlSesssionFactory ourselves (and hence worry about clean closing).

    <bean id="sqlSessionFactory" class="org.mybatis.spring.SqlSessionFactoryBean">
        <property name="dataSource" ref="testerDataSource" />
        <property name="configLocation" value="classpath:com/kilo/dao/sqlmap-config.xml" />
        <property name="mapperLocations" value="classpath:com/kilo/dao/mapper/*.xml" />
    </bean>

    <bean class="org.mybatis.spring.mapper.MapperScannerConfigurer">
        <property name="basePackage" value="com.kilo.dao.mapper" />
    </bean>

The mappers just need to be interfaces with the names of the methods matching the sql fragment that needs to be fired. Hence, no need for any DAOImpl’s whatsover! However, it has its downside as you may have already imagined – now you can’t craft fancy params to pass down to your mybatis sql fragment as parameters.

References:

Musings of the lazy developer

Monday, June 24, 2013

Mixin it up Jackson style

Wednesday, June 12, 2013

Know your disk latency numbers

Monday, May 27, 2013

Lets go uber!

Thursday, March 14, 2013

Blocked persistence nuance with GET*DATE()

Saturday, March 9, 2013

Quick headless JAX-RS servers with CXF

Friday, March 8, 2013

Staging paradigms universe in MS SQL Server

Monday, February 4, 2013

Experimenting with MyBatis

About Me

Blog Archive