DSpace REST API testing

While funded by JISC via Mimas, Hedtek has recently providing assistance to Jorum, the UK’s Open Educational Resources repository; this varies from architectural to development assistance that is aimed at transforming the Jorum user experience.

Jorum is built on the DSpace repository, and part of our work involves building a new front end to Jorum. For this we need an API to DSpace, and while, conveniently, there is the DSpace REST API [1] module available for DSpace, it has not been used with the version of DSpace that Jorum uses, and until recently, it had no automated tests available for it. Given the centrality of this API for Jorum’s future development, we have started developing a suite of automated tests for the API. This post discusses progress, and mentions where our tests can be found on github.

Because the first front-end work we intend to perform is for OER resource discovery and download, so far we have only focussed on the ‘read’ functionality of the API. For this we have tested endpoints (URLs) that focus on read capabilities for:

  • communities
  • collections
  • items
  • bitstreams
  • searching
  • metadata harvest

For all of these, we wrote integration tests that tested the API endpoints running on DSpace,and then re-ran the tests on a full Jorum build.

The test development process

To implement the tests, we had to create a framework that would run the DSpace REST API using Jetty and control this programmatically.

We also needed to load database fixtures in order to have a known DSpace state to test against. When we started writing tests, loading SQL fixture files was problematic. We went through several iterations where, using a given means of loading our fixtures,  all our existing tests would run, but frustratingly, for the very next test we tried, our fixture load mechanism would fail. Eventually we discovered that the appropriate way of loading a fixture is via the DatabaseManager class in DSpace.

We also needed to trace a bug in the DSpace REST API code where database connections weren’t being closed. This bug caused intermittent freezes during our test runs, which in turn made a stable test framework impossible while the bug existed.

Later, fixture creation became quite intricate when we needed to test file downloads were correct, as this involved finding and editing download directories in the data. This was solved neatly by the creation of a command line script that generates a database dump from a DSpace database, processes it, and replaces all the download paths with a Maven token that can be replaced with the actual file path location when the tests are run.

To test search, we also needed to create and load a Lucene index for DSpace. This was primarily done by using DSpace to construct a more intricate test fixture for the search actions and then taking a copy of the Lucene index from the DSpace installation. The index is then copied into place for use by the embedded Jetty server during the test run.

The tests themselves do a lot of validation of JSON structure. This is done with the aid of a simple JSON library which processes the JSON into JSONObject and JSONArray instances. This was a good starting point and was used to write the first few tests. As more tests were written, there was a lot of messy test fixture matching going on. To reduce this, we refactored our tests and started building up a set of matchers and matcher methods that allowed us to write tests much more quickly. The end result of this process was the beginning of a DSpace-specific test matcher library. This process could be taken further, but the stage we got to was enough to enable us to write tests quickly and with a minimum of effort.

Running the tests

The tests are built and run using Maven, so if you want to run the tests yourself (e.g. on a new DSpace release), you just need to get a copy of the REST API with our tests added, available at github.

Once you have the code, you will need to create a postgres database for the test run. This is aided with the script ‘create_integration_test_db’. Once the database is set up, you can run the tests with the command ‘mvn test’.

Here is the end of a successful test run (please click on this thumbnail image to see the output detail):

There is a lot of ‘noise’ on the output of a test run. This is unfortunately caused by various parts of the REST module that are logging directly to the console, rather than using a logging framework. Hopefully this will be fixed in future development on the module.

[1] We would prefer to call the API restful, since it does not follow the HATEOAS principle. For further details see any good reference on REST.

This entry was posted in projects. Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.


  1. Posted December 2, 2011 at 12:13 pm | Permalink

    Notes on test results
    1. Didn’t test sorting in the list returned in index queries [ie list of communities, list of collections]

    2. Request parameter “detail” doesn’t seem to be doing anything [in get and list communities, the only place this was tested]. Results appear the same for the three specified values “extended”, “standard” and “minimum”.

    3. “Failure processing entity request” error in many places in the API due to badly handled context object. FIXED in many places in DSpace REST API in our branch on github.

    4. Copyright and Intro text sections returned null in the API for /collections request. The issue was FIXED in our branch.

    5. /collections/id/invalid_element should return BAD REQUEST error code but returns a response with NULL in the data attribute.

    6. When fetching items, the field “collection” has a field “countItems” that is set to 0. This should be set to number of items in collections. The issue was FIXED in our branch.

    7. In retrieving a specific item, the attribute “communities” should have the listed item in the “Recent Submissions” array within the community. But it does not have this.

    8. When you retrieve a particular element for any entity, if the data for the element is of a simple data type (string/number/boolean), it is returned in the key “data” in the response object. However, if the data to be returned is another entity (in form of another JSON object) then the keys inside the expected JSON object are returned as a part of the response JSON object itself, instead of appearing as a JSON object value for the “data” field. For example, if we request “anchester” element on a specific community with a resquest such as GET community/1/anchester, we get the following structure:

    “name”: ….,
    “id”: …,
    “collections”: …,

    “entityURL”: …,
    “entity”: …
    Whereas the expected behaviour should be:

    “data”: {
    “id”: …,
    “collections”: …,

    “entityURL”: …,
    “entity”: …
    NOTE: This behaviour is not exhibited when the requested element is a list of entities. (eg. collections).

    9. Elements with “null” in data return with a “Not Found Exception” and a 404 status code rather than response JSON object with “data” key with null value. We are not sure whether this is the expected behaviour or a bug.

    10. Handle for bitstreams is null. Is there supposed to be one?

    11. Result count in search results returns “actual_count – 1”. So if the search returns 10 items then the item count will be 9.

    12. Harvest request with “withdrawn=true” parameter returns items with withdrawn as false as well.

  2. Posted February 22, 2012 at 11:15 pm | Permalink

    Why a SWORD and a REST API for DSpace? If something is missing to SWORD, shouldn’t it be added there?

  3. Posted April 12, 2012 at 8:09 pm | Permalink

    Hi Christophe, We were concerned with building a front end service to DSpace: SWORD is very specifically a protocol for deposit only; we consequently use the DSpace REST API for retrieval of resource data from DSpace

One Trackback

  • By Work in progress for Jorum on February 2, 2012 at 2:12 pm

    […] Skip to content hedtekblogcurrent projectspast projectstechnologyabout uscontact hedtek « DSpace REST API testing […]

Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>