Tuesday, February 10, 2015

SPARQL Integration tests with SolRDF

Last year, I got a chance to give some contribution to a wonderful project, CumulusRDF, an RDF store on a cloud-based architecture. The Integration Test Suite was one of the most interesting task I worked on.
 
There, I used JUnit for running some examples coming from Learning SPARQL by Bob DuCharme (O'Reilly, 2013). Both O'Reilly and the Author (BTW thanks a lot) gave me permissions to do that in the project.

So, when I set up the first prototype of SolRDF, I wondered how I could create a complete (integration) test suite for doing more or less the same thing...and I came to the obvious conclusion that something of that work could be reused.

Something had to be changed. mainly because CumulusRDF uses Sesame as underlying RDF framework, while SolRDF uses Jena...but at the end it was a minor change...they are both valid, easy and powerful. 

So, for my LearningSPARQL_ITCase I needed:
  • A setup method for loading the example data;
  • A teardown method for cleaning up the store; 
The example data is provided, in the LearningSPARQL website, in several files. Each file can contain: a small dataset or a query or an expected result (in tabular format). So, returning to my tests, the flow should load the small dataset X, run the query Y and verify the results Z. 

Although this post illustrates how to load a sample dataset in SolRDF, this is something that you can do from the command line, and not in a JUnit test. Instead, using Jena, in my Unit tests, I load the data in SolRDF using these few lines:

// DatasetAccessor provides access to  
// remote datasets using SPARQL 1.1 Graph Store HTTP Protocol
DatasetAccessor dataset = DatasetAccessorFactory.createHTTP();

// Load a local memory model
Dataset memoryDataset = DatasetFactory.createMem();
Model memoryModel = memoryDataset.getDefaultModel();
memoryModel.read(dataURL, ...);

// Load the memory model in the remote dataset
dataset.add(memoryModel);

Ok, data has been loaded! In another post I will explain what I did, in SolRDF, for supporting the SPARQL 1.1 Graph Store HTTP Protocol. Keep in mind that the protocol is not fully covered at the moment.

Now, it's time to run a query and check the results. As you can see I'll execute the same query twice: the first is against a memory model, the second towards SolRDF. In this way, assuming the memory model of Jena is perfectly working, I will be able to check and compare results coming from the remote dataset (i.e. coming from SolRDF):

final Query query = QueryFactory.create(readQueryFromFile(...));
QueryExecution execution = null;
QueryExecution memExecution = null;  
    try {
       execution = QueryExecutionFactory.sparqlService(SOLRDF_URL, query);
       memExecution = QueryExecutionFactory.create(query, memoryDataset);

       ResultSet rs = execution.execSelect();
       ResultSet mrs = memExecution.execSelect();
       assertTrue(ResultSetCompare.isomorphic(rs, mrs));
    } catch (...) {
       ...
    } finally {
       // Close executions
    }

After that, the RDF store needs to be cleared. Although the Graph Store protocol would come in our help, it cannot be implemented in Solr because some HTTP methods (i.e. PUT and DELETE) cannot be used in RequestHandlers. The SolrRequestParsers, which is the first handler of the incoming requests, allows those methods only for /schema and /config requests. So while a clean up could be easily done using something like this:

dataset.deleteDefault();

Or, in HTTP:

DELETE /rdf-graph-store?default HTTP/1.1
Host: example.com

I cannot implement such behaviour in Solr. After checking the SolrConfig and SolrRequestParsers classes I believe the most, non RDF, simple way to clean up the store is:

SolrServer solr = new HttpSolrServer(SOLRDF_URI);
solr.deleteByQuery("*:*");
solr.commit();

I know, that has nothing to do with RDF and with the Graph Store protocol, but I wouldn't like to change the Solr core and at the moment that represents a good compromise. After all, that code resides only in my unit tests.

That's all! I just merged all those stuff in the master so feel free to have a look. If you want to run the integration test suite you can do that from command line:

# cd $SOLRDF_HOME
# mvn clean install

or in Eclipse, using the predefined Maven launch configuration solrdf/src/dev/eclipse/run-integration-test-suite.launch. Just right-click on that file e choose "Run as..."

Regardless the way: you will see these messages:

(build section)

[INFO] -----------------------------------------------------------------
[INFO] Building Solr RDF plugin 1.0
[INFO] -----------------------------------------------------------------


(unit tests section)

-------------------------------------------------------
 T E S T S
-------------------------------------------------------

...
Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.691 sec
Tests run: 15, Failures: 0, Errors: 0, Skipped: 0

(cargo section. It starts the embedded Jetty)

[INFO] [beddedLocalContainer] Jetty 7.6.15.v20140411 Embedded starting...
...
[INFO] [beddedLocalContainer] Jetty 7.6.15.v20140411 Embedded started on port [8080]

(integration tests section)

------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running org.gazzax.labs.solrdf.integration.LearningSparql_ITCase
[INFO]  Running Query with prefixes test...
[INFO]  [store] webapp=/solr path=/rdf-graph-store params={default=} status=0 QTime=712
...

[DEBUG] : Query type 222, incoming Accept header... 

(end)

[INFO]  [store] Closing main searcher on request.
...

[INFO] [beddedLocalContainer] Jetty 7.6.15.v20140411 Embedded is stopped
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 42.302s
[INFO] Finished at: Tue Feb 10 18:19:21 CET 2015
[INFO] Final Memory: 39M/313M
[INFO] --------------------------------------------------


Best,
Andrea 

No comments: