Thursday, March 14, 2013

Blocked persistence nuance with GET*DATE()


Noticed an interesting (and desired) behavior when dealing with persisting timestamps that is blocked by a competing transaction, which might be useful for others as well. For unitemporal and bitemporal tables, we frequently update the knowledge_date as either GETDATE() or GETUTCDATE(), but it is not guaranteeing that the wall-clock time of when record got persisted to the db is the same as the value noted in the knowledge_date column.
To illustrate this, let's say we have a table as
CREATE TABLE special_table (
special_key INT PRIMARY KEY,
special_value INT,
knowledge_date DATETIME)
An we insert a few values into it:
INSERT INTO special_table VALUES(1, 100, GETUTCDATE())
INSERT INTO special_table VALUES(2, 100, GETUTCDATE())
INSERT INTO special_table VALUES(100, 10000, GETUTCDATE())
Now, let's say, we've got two competing transactions, one a read and another a write with the read preceding the write and the read transaction taking a lot more time to finish (simulated with a WAITFOR DELAY).
Read Transaction (at isolation level repeatable read):
BEGIN TRANSACTION
SELECT GETUTCDATE() --ts1
SELECT * FROM special_table WHERE special_key = 2
WAITFOR DELAY '00:00:10'
SELECT * FROM special_table WHERE special_key = 2
SELECT GETUTCDATE() --ts2
COMMIT
Write Transaction (at isolation level read committed):
BEGIN TRANSACTION
SELECT GETUTCDATE() --ts3
UPDATE special_table
   SET special_value = special_value + 1, knowledge_date=GETUTCDATE()
 WHERE special_key = 2
SELECT GETUTCDATE() --ts4
SELECT * FROM special_table WHERE special_key = 2
COMMIT
Execute these two batches in two windows of SSMS with the read preceding the write. Since the read started before the write, ts1+10 ~= ts2 because the read transaction will experience no blocking. The write operation was kicked off a little after read was kicked off (say with interval d). Hence ts1 + d = ts3.
Question: will the knowledge_date updated be closer to ts3 or ts4?
One might think that the knowledge_date value is closer to ts4 when the write transaction actually gets unblocked, however, this is not the case. For Sql Server itself to figure out whether or not the transaction needs to get blocked (because of the default page level locking scheme followed), the query needs to be evaluated and hence the value to be assigned to knowledge_date has to be evaluated at a time closer to ts3 itself. Hence the knowledge_date timestamp will be persisted to the DB at a wall-clock time closer to ts4even if the DB claims the timestamp as closer to ts3.
This can be verified with the output from the read and writes where there is a marked delay between the knowledge_date updated and the ts4. This becomes even more interesting when you have multiple updates in the same write transaction - some of which can proceed - till the point that it gets blocked because of the read and one can notice varying knowledge_date across records even though they were all kicked off in the same transaction.
BEGIN TRANSACTION
SELECT GETUTCDATE() --ts1
UPDATE special_table
   SET special_value = special_value + 1, knowledge_date=GETUTCDATE() --ts
 WHERE special_key = 100
SELECT GETUTCDATE() --ts2
UPDATE special_table
   SET special_value = special_value + 1, knowledge_date=GETUTCDATE() --tss
 WHERE special_key = 2
SELECT GETUTCDATE() --ts3
SELECT * FROM special_table WHERE special_key = 2
COMMIT
Here, knowledge_date for key 100 would be closer to ts1 due to it not getting blocked by the read and the knowledge_date for key 2 would be closer to ts3 and away from ts2 since it was blocked by the read.
BTW, this should not haunt the trigger based td_bl temporal tables as the trigger gets only fired after the base table is updated and effectively captures the timestamp of when the base table was changed (but may not be the exact time when the temporal record got persisted to the db due to blocking concerns).
Hope this helps!
References:

Saturday, March 9, 2013

Quick headless JAX-RS servers with CXF


If one needs to vend out JSON data in a JAX-RS compatible way with minimal setup fuss, CXF + Spring provides good out-of-the-box solution for you.
The steps would be:
  1. Write your service class (interface and impl preferably)
  2. Annotate your service impl methods with
    1. @Path annotation indicating the URI on which it will serve the resource
    2. @Get/@Post indicating the HTTP method which it serves
    3. @Produces("application/json") indicating that the output format is JSON
  3. Define a jaxrs:server directive in your spring context file indicating the address and resource path on which the service is hosted
  4. Add maven dependencies of javax.ws.rs-api (for annotations), cxf-rt-core (for stubbing RS communication over a http conduit) and cxf-rt-transports-http-jetty (for embedded jetty)
and voila you are done.
Concretely:
public interface SpecialService {
    String getSomeText();
}
public class SpecialServiceImpl implements SpecialService {

    @GET
    @Produces("application/json")
    @Path("/someText/")
    @Override
    public String getSomeText() {
        return "kilo";
    }
}
    <bean id="specialService" class="com.kilo.SpecialServiceImpl"/>

    <bean id="inetAddress" class="java.net.InetAddress" factory-method="getLocalHost" />

    <jaxrs:server id="specialServiceRS"
        address="http://#{inetAddress.hostName}:${com.kilo.restful.port}/specialServiceRS">
        <jaxrs:serviceBeans>
            <ref bean="specialService" />
        </jaxrs:serviceBeans>
    </jaxrs:server>
And now hit http://yourhostname:yourportnum/specialServiceRS/someText to get the response as "kilo". If you examine the request via some developer tools, you will see that the content type is application/json.
CXF JAX-RS uses an embedded jetty as the http container, so we don't really need a tomcat for setting this up. This might bring up the question of Tomcat vs Jetty overall and here are my thoughts:
Tomcat
  • Lightweight
  • Servlet 3 style async threading in the works
  • Known beast in terms of configuration
Jetty
  • Even more lightweight
  • Implements servlet 3 style async thread allocation (like node) and hence more responsive and efficient
  • Easy to have an embedded server with cxf (embeddability is synonymous with jetty)
  • Ability to have multiple simple java processes that act as headless servers quickly
Overall, I believe we should give Jetty a chance and see how it performs. If it ever lets us down, it is easy to take the process and house it in a Tomcat container.

We will try to cover some more involved use cases of passing in inputs via JAX-RS, dealing with complex objects, CORS and GZIP in subsequent posts (the samples already have them explained).

References:

Friday, March 8, 2013

Staging paradigms universe in MS SQL Server


How do you pass bulk data from the application to a staging table in the database layer in a typical 3-tier java application?
Options:
  1. Use multiple INSERT clauses (with iBATIS/spring-jdbc/jdbc)
  2. Use INSERT clause with multiple values clauses [SQL Server 2008 onwards] (withiBATIS/spring-jdbc/jdbc)
  3. Use batching of INSERT clause with varying batch sizes (with iBATIS/spring-jdbc/jdbc)
  4. Create a bcp file and use the bcp executable
  5. Create a bcp file and use the BULK INSERT T-SQL command (with iBATIS/spring-jdbc/jdbc)
  6. Create a bcp file and use the OPENROWSET BULK T-SQL command (withiBATIS/spring-jdbc/jdbc)
  7. XML shredding (with iBATIS/spring-jdbc/jdbc)
We will analyze each of these options and try to establish benchmarks in trying to stage data of two data sets sized 1k and 100k records having columns belonging to a motley of data types and try to establish best practices as to where what should be used.
The staging candidate chosen looks something like this:
CREATE TABLE motley
  (
     date           DATETIME,
     name           VARCHAR(50),
     id             INT,
     price          NUMERIC(18, 4),
     amount         NUMERIC(18, 4),
     fx_rate        NUMERIC(18, 7),
     is_valid       TINYINT,
     knowledge_time DATETIME
  )
The stats were as follows:



As you may imagine, there are pros and cons with each approach which I have tried to outline here.

Feel free to suggest ways to further fine-tune each of these approaches or suggest brand new ones or plainly comment on the comparison metrics or factors.
Hope this helps!
References: