Linked Open Data and Digital Humanities: more messiness, but more possibilities, tooPosted: March 12, 2015 | Author: Paige Morgan | Filed under: Uncategorized | 2 Comments »
Last week, I wrote about a few of the things that make linked open data (aka RDF) so attractive for digital humanities projects, and some of the reasons that RDF (and its more complex sibling, OWL) are challenging platforms for researchers to work with.
Today, I want to address one more of the challenges, but also say a little more about DPLA’s Heidrun, and how it might make things better (as I understand what it’s trying to do).
For a particular computer function (or set of functions) to become widely used in DH, it needs to be supported by a range of different resource types:
- detailed documentation of what to do to & how to troubleshoot;
- simpler, user-friendly tutorials that walk people through the process of getting started;
- tools that simplify the function(s) enough for people to simply mess around and experiment, and easily show other people what they’ve been doing
- tools that allow the construction of a graphical user interface (GUI) so that people other than the creator(s) can play with the tool.
It’s the third of these that linked open data especially lacks right now. Protégé and WebProtégé (produced by Stanford) make it pretty easy to start adding classes and properties, sourcing from some of the more prevalent ontologies (skos, foaf, dc, etc.). Franz’s AllegroGraph also makes this process easy (though personally, I’ve found it a bit buggy to get working, along with its browser, Gruff). Jena and Marmotta (both Apache products) are large-scale triplestores (websites where you can store the triples you’ve created). I have yet to be able to successfully get Jena going, though I did get Marmotta up and running without too much difficulty last weekend). There are other more up and coming tools: Dydra and K-Infinity are both trying to make working with RDF easy for newbies.
Unfortunately, structuring your data and getting it into a triplestore is only part of the challenge. To query it (which is really the point of working with RDF, and which you need to do in order to make sure that your data structure works), you need to know SPARQL — but SPARQL will return a page of URIs (uniform resource identifiers — which are often in the form of HTML addresses). To get data out of your triplestore in a more user-friendly and readable format, you need to write a script in something like Python or Ruby. And that still isn’t any sort of graphical user interface for users who aren’t especially tech-savvy.
In short: understanding the theory of RDF and linked open data isn’t too difficult. Understanding all the moving parts involved in implementation is much more hairy.
And: as smarter people than I have said, DH isn’t just about tech skill & knowledge. Part of the field’s vitality comes from humanists asking questions and wanting to do things, without the fetters of expertise structuring their idea of what is possible.
Even writing this post and the previous two, I’m aware of the possibility that what I really ought to do is go back, and start writing more simplified commentary on RDF and linked open data that helps digital humanists/digital scholars make more sense of all the implementation details: what I’ve learned so far, and what more I learn, as I learn it. I’m also aware that such commentary isn’t a tool — so it doesn’t really let people get their hands dirty and play around.
Anyways: DPLA’s Heidrun and Krikri: named after goats: curious animals that will try to eat anything, and integrate what they consume into DPLA’s linked open data structure. They’re intended to grab data from metadata hubs, like HathiTrust and the National Archives. There’s a good article in D-Lib Magazine titled “On Being a Hub” that explains more about the work involved; or you can read the DPLA’s guidelines for becoming a hub.
I have to admit — when I saw the announcement about Heidrun, I took “try to eat anything” too literally, and thought that the DPLA was working to ingest metadata from more than just official hubs. I was wrong, and even if I hadn’t been, Visible Prices is a long way from being ready to become a hub. However: I’m still very excited about Heidrun’s existence, because it looks like the DPLA is working on finding good ways to harvest and integrate rich and complex metadata from all sorts of cultural/heritage organizations — so, not just bibliographic metadata. Working towards that harvest and integration should raise awareness of existing ontologies that are constructed for or well-suited to humanities data — and quite possibly encourage the development of new ontologies, when appropriate.
And: the work involved in making Heidrun a success will, I think, be applicable/useful in developing the tools that digital humanists need to really start exploring the potential of linked open data. It would certainly be to DPLA’s advantage if more humanities and heritage professionals were able to develop confidence and competence with it, so that the DPLA would have more to ingest, and so that it would be more likely that the metadata being ingested was well-formed.
This is a tiny thing to be hopeful about, but I think it’s worth documenting: both to make the stakes/challenges of working with linked open data more transparent, and because I’m fascinated by the way that platform development often see-saws between massive organizational (enterprise-level) users and individual users. I’m not yet ready to write eloquently about that, but perhaps in 5-10 years I will be, and I want this post to be a record of my thoughts at this point in time.