Thursday, October 23, 2014

SPARQL Integration tests with jena-nosql

In a previous post I illustrated how to set up a working environment with jena-nosql using either Apache Solr or Apache Cassandra. Now it's time to write some integration tests. 

The goal of the jena-nosql project is to have Apache Jena, one of the most popular RDF frameworks, bound with your favourite NoSQL database.

Among a lot of things that Jena can do, SPARQL definitely plays an important role so, in my project, I want to make sure the data model of the underlying pluggable storages is able to efficiently support all the query language features.

As a first step I need an integration test for running SPARQL verifiable examples. In order to do that I will set up two Jena Models, in the @Before method: a first coming from jena-nosql:

final StorageLayerFactory factory = StorageLayerFactory.getFactory();
final Dataset dataset = DatasetFactory.create(factory.getDatasetGraph());   
final Model jenaNoSqlModel = dataset.getDefaultModel();   

and a second using the default in-memory Jena Model:

final Model inMemoryModel = DatasetFactory.createMem().getDefaultModel();   

Now what I need is a set of verifiable scenarios, each of one consisting of
  • one or more dataset to load 
  • a query
  • the corresponding query results
I would need this "triplet" for each scenario...and as you can imagine, that's a huge work!

Fortunately, some time ago I bought a cool book, "Learning SPARQL" which had a lot of downloadable examples. After re-having a quick look, I realized that was exactly what I need :)

Each example in the book is associated with three files:
  • a file containing a short dataset
  • a file containing a query
  • a file containing results in a human readable way
Perfect! I don't need the third file because the verification can be done by comparating the execution of the load / query sequence both in jena-nosql and in-memory model (assuming the Jena in memory model is perfectly working)

So before running each scenario both models are loaded with the example dataset:, ...);, ...);   

// Make sure data has been loaded and graphs are isomorphic

Once did that, it's time to execute the query associated with the example and then verify the results on both models:

final Query query = QueryFactory.create(readQueryFromFile(queryFile));
assertTrue(    ResultSetCompare.isomorphic(
          QueryExecutionFactory.create(query, jenaNoSqlModel).execSelect(),
          QueryExecutionFactory.create(query, inMemoryModel).execSelect());

For simplicity here I'm wrongly using jena resources like QueryExecution (you should close that in a finally block) and I didn't write any exception handling code.

I'm still working on that, but if you want to have a quick look here's the code. As explained in previous posts you can run this test against one of the available storages (Solr or Cassandra):

   > mvn clean install -P cassandra-2x 
   > mvn clean install -P solr-4x