Summer 2014 update

Last summer, at DHOXSS, I took John Pybus and Kevin Page’s Linked Open Data course, and knew that I’d finally found the platform that would work for Visible Prices. However, a week-long course does not a semantic web programmer make. I was ready to tackle the ontological thinking (the relationships between different pieces of data) — but I didn’t have a sandbox to do it in.

The natural platform to use would be Apache Jena and Fuseki — they’re free and open-source, after all. However, I consistently ran into errors while trying to get them installed on my Mac, and all the how-to-guides and StackOverflow pages got me nowhere. I’m not surprised, because I am not a professional dev, and back-end server-side stuff isn’t my strong point, mainly due to lack of training.

After several frustrating sessions, I remembered that when I’d been looking around at different graph database platforms, I’d signed up for a beta account with Dydra, and went back to the site to explore. It turns out that it’s exactly what I’m looking for as a sandbox — they handle the system administration and maintenance, and I can simply start uploading triples and road-testing the structure. I can also download what I put in — I still own my data.

Dydra isn’t open-source, so it may not be a long-term solution — but I’ll deal with that when I’ve made more progress. For now, I’m utterly thrilled to have found a set-up that can allow me to move forward.

 


Progress Update: Winter 2014

The biggest news is that Visible Prices has been awarded a small grant from the European Association for Digital Humanities, which will fund consultations with an ontologist, so that I can more effectively design the structure. Those are in process currently. I’m also working on finding a good host for the Visible Prices triplestore (a key aspect of its migration into RDF format).

The most recent demo of Visible Prices is in Scalar, the free platform developed for digital humanities projects by USC:

Visible Prices in Scalar

Activity on this site will be a little bit slow in the next few months as I am making final revisions to my dissertation, which I will defend in June 2014.

 


Visible Prices: Technical Statement

(cross-posted from Visible Prices in Scalar)

Visible Prices is, in many ways, a simple project — so simple that when I first had the idea, about one-third of the people I spoke with said that surely, it had been done already. Another third counseled me not to tell anyone about it, lest Google or Microsoft steal my idea, and do it themselves. After all, it was just a database, making connections between things that cost the same amount of money. It shouldn’t be that hard. Surely, people suggested, there were existing software programs that could do it.
In fact, there weren’t software programs that would work, though I’ve certainly made an effort to find them. Part of the issue was that any platform would need to be equipped to deal with pre-decimal British currency values. However, there was –and is– a larger problem: most of the existing database and cataloging platforms for humanities projects are intended to work with bibliographic metadata. Instead, Visible Prices is intended to work with the contents of the texts themselves. Those contents don’t conform neatly to the existing metadata parameters. Thus, the collection has been, and continues to be, a project that needs to be built from scratch.

One of the major challenges of working with economic data is its heterogeneity. Some prices will be fixed, and others variable. Some wages will include non-monetary supplements: 6 shillings a day, plus beer and potatoes. XML-based encoding languages (which have been highly popular for digital humanities and scholarly editing projects) assume that the things you’re encoding (plays by Shakespeare, poems by late Victorians) will look relatively similar, so that the same tags can be used to describe them.

MySQL, and other relational databases, which use tables rather than markup language, assume that if you have a table with 5 columns that describe certain qualities, that most of the entries in the table should have data in each column. Otherwise, the structure becomes rickety, making it harder to construct queries, and more likely to return errors, or crash the database.

Choosing a platform has been a long process, because learning enough to evaluate how well a particular tool will work with the data is slow work. It seems probable to me that if I had been willing to limit my scope; say, to prices related to governesses’ salaries, or to prices in a particular author’s body of work, that MySQL or TEI might have worked more effectively. But a smaller project, while more immediately gratifying, wouldn’t have taught me nearly as much as I’ve learned in the past few years.

In July 2013, I attended the Digital Humanities Oxford Summer School, and took Kevin Page and John Pybus’ course in semantic web programming, focusing on two closely-related specifications: OWL (Web Ontology Language) and RDF (Resource Description Framework). Both OWL and RDF are intended to model complex data. You’ve encountered them before — they provide the structure behind resources like Wikipedia, and the databases of music metadata that iTunes uses to identify your CDs. Semantic web description attempts to capture as much detail as it can, and make it searchable.

The basic unit of OWL is called a triple, and it contains a subject, a predicate, and an object. For example:

Subject: Jane Eyre Predicate: hasAuthor Object: Charlotte Bronte

Subject: Charlotte Bronte Predicate: hasBirthdate: Object: April 21, 1816

Triples are linked together to form what semantic web programmers call a graph. (To outsiders, it looks more like a cluster). For example, the graph for Charlotte Bronte would involve both of the above triples (as well as several others, including triples that would tell you that Bronte has two sisters, used the pseudonym Currer Bell, etc.).

Triples are queried using SPARQL (SPARQL Protocol and RDF Query Language – pronounced “sparkle”) – a language that matches the subject, object, or predicate – or all three, or a combination of two, and returns the information that matches.

So, you might write a SPARQL query that asks for all the novels that match the “hasAuthor” predicate with “Charlotte Bronte.” Alternately, you might write a query that asks for all the novels written by authors with pseudonyms; and you might specify that the pseudonym include the name “Bell.” This would return the Bronte sisters’ works – and the works of any other authors whose pseudonym included “Bell.”

The advantage of OWL, and other semantic web specifications, is that they can handle my highly heterogeneous data without crashing. They balance of structure and flexibility for modeling data from sources — even if those sources have significantly different types of metadata. They can represent detailed features of the source texts without sacrificing expressivity. This means that OWL and semantic web programming are a good fit for Visible Prices.

Like TEI, which encourages users to customize markup language for their needs, and to develop new terms and categories, OWL allows users to develop new vocabularies for particular subjects, and to share them, making them available for other similar projects. This is why semantic web programming is often referred to as “linked open data.” It’s meant to be open and shareable, meaning that if another scholar developed a digital humanities project, focusing only on Charlotte Bronte, they could utilize my data on the prices that show up in Bronte’s novels.

The vocabularies that semantic web programmers develop are called ontologies, because they define concepts and relationships within a specific area. If you’ve worked with metadata, then you may have made use of the Dublin Core ontology.

A semantic web database may make a certain set of data usable. One of the difficulties of building Visible Prices has involved the non-decimal currency values for British money before 1971. No existing database has indexed currency values, from one farthing and upwards. As a result, the values that you might see in texts (1 shilling or 5 pounds or 20 guineas) are arguably data – but they’re not good data, because they’re much harder to work with. Making such an index would be slightly tedious (though much of the process could be automated) – but once created, it would transform currency amounts from unwieldy to usable objects.

Creating the dataset for pre-1971 currency is part of my continuing work on the Visible Prices project. But I’ll also be working to create my own ontology that allows me to encode prices into my database. There are parts of my database that will make use of existing vocabularies, like DublinCore. Other parts of it will require me to develop my own terms – for things like the non-monetary supplement to wages of alcohol. Because my own semantic web programming knowledge is still relatively new, I’ll be consulting with a professional web ontologist this spring as I work out the structure. This work will be supported by a Small Project Grant from the European Office of Digital Humanities.

Developing my own ontology is a significant step forward for Visible Prices. It will allow me to populate the database, and set up an interface through which users can query my data. Once that database is set up, Visible Prices will be ready to grow at a much faster rate. At that point, I’ll be ready to seek large-scale grants for its ongoing expansion and support.


Progress Update: Autumn 2013

I’m pleased to announce that I’ll be showing one, and possibly two prototypes of Visible Prices at MLA 2014, on the panel DH From the Ground Up, where I’ll also be discussing the role that this project played in helping me finish my dissertation.

You can see my latest prototype, which is built in ANVC’s Scalar platform, here.

Working with Scalar has been really exciting for several reasons: it allows me to draw on some of the semantic web knowledge that I picked up this summer at the Digital Humanities Oxford Summer School, and it’s by far the most sophisticated ready-made platform that allows me to create visualizations that users can navigate.

In addition, while Scalar does have a learning curve, and require a mental adjustment in order to think effectively about how you’ll use it, it requires very little technical programming knowledge. What you need most in order to work with it is to be able to think clearly and flexibly about how you organize your data for users.

Between DHOXSS and working with Scalar, I realized something that should have been clear to me earlier. In the past, my long-term goal has been to make VP a fully realized project, holding thousands of offers and prices, with a way of collecting data automatically from digitized texts and records.

That’s still a good long-term goal, but there’s a better short-term goal that I should be focusing on. That short-term goal is making a miniature prototype that displays the potential of the project, and prepares me for writing grants that will fund its larger development through future iterations. This means I need to think in terms of objectives that help me identify the specific tasks and problems involved in building the collection; and identify the ways that I want users to be able to interact with it.

With that goal in mind, here is my task list for the next four months:

  • Add  between 200 and 250 prices from VP master sheet into Scalar:
    • Historical branch: Approximately 100-125 of these prices will be drawn from the period of 1785-1794, and therefore encompassing the period of the French Revolution, and economic anxieties that accompanied it within England. They will include a mixture of prices drawn from literary and from historical texts, but the majority will most likely be historical.
      • For the historical branch collection, create a master page which includes the different types of objects included, i.e. foodstuffs, live goods, textiles, entertainment, etc., which may be of interest to historians who work in those areas in particular.
    • Literary branch: Approximately 100-125 of these prices will be drawn from 18th and 19th century English novelists (Richardson, Fielding, Burney, Austen, Eliot etc.).
      • For the literary branch collection, create a visualization page of authors included. I anticipate that this page will be of interest to literature scholars who may or may not be highly interested in the historical side of the project; and potentially of interest to non-academics who stop by purely out of curiosity and/or their interest in a particular author, i.e., Jane Austen.
  • Create two major visualization paths that traverse each set of prices in Scalar.
    • I want users to be able to see just how many prices occur in literary texts that go unnoticed. The literary branch collection is meant to highlight this point.
    • The historical branch collection has two primary purposes. It should provide users with a view of different goods, services, and experiences that have the same price — i.e., 1 gallon of wheat is equivalent to 1 day’s wages for a general day labourer. It should continue providing me with greater knowledge about the forms in which these prices are recorded, and the challenges that are involved in including them in the collection. For example, one challenge is that certain prices are recorded only as a range, rather than a specific amount, i.e. pork is sold for between 2.5-4.5 pence. This makes it difficult to present the offers in a way that connects them with equivalent prices.
  • Produce a document identifying the specific challenges involved in including prices in the database. (This will be based on previously acquired notes and work, and on the work listed above.)
  • Produce a document identifying the specific challenges involved in representing Visible Prices in Scalar. Identify the most important challenges (1-2), and write to ANVC Scalar inquiring about potential customizations, or whether any of these challenges are areas in which they have plans for future iterations of the platform.
  • Bonus goal #1: create a Visible Prices twitter feed, which will tweet 2 prices per day, with links back to their page in the Scalar database. Program the feed with multiple prices, scheduled in advance.
  • Bonus goal #2: create a collection of unusual prices, and a path to navigate them.
    • Both bonus goals are primarily for the purpose of building a greater audience for Visible Prices.

 


TEI vs. MySQL: Data Encoding Decisions

I was fortunate enough to attend the Digital Humanities @ Oxford Summer School last month, and even luckier in that I was able to present a poster about Visible Prices during the event.

That poster is here, in a size that is relatively easy to view. (Just click to make it large enough to read!) It’s my first attempt at poster making, and I found the task of making an effective poster — one that would be visually interesting, as well as provide useful information to start conversations — to be an entertaining challenge.

Briefly, here’s what I learned about academic poster creation:

1) With posters, conversation trumps argument: It might be possible to put a traditional academic argument (the sort you’d give at a conference in 15-20 minutes) onto a poster, if that argument had plenty of visuals that could be used). Maybe. But that poster would be a work of artistic and rhetorical sophistication, and require a great deal of knowledge about how audiences respond to various layouts. I do enough graphic design on the side to know that I don’t have that level of experience. Far better, then, to aim for a presentation that sets up a wide range of conversations. The point of a poster session isn’t to dazzle people with my stunning knowledge; it’s to create a space where we can have interesting and widely-varied conversations that grow out of where our backgrounds meet. That leads me to my second point:

2) Subtlety is wasted in this format: in a room full of posters, it simply makes you blend in, rather than stand out. I didn’t fully grasp this until I went looking for inspiration, and found the circus-themed poster featured here, on the Oxford IT Services blog. From there, it wasn’t too hard to realize that a banknote would be the most blatant background possible for this project. Failing an obvious connection, though, an unrelated graphic theme might still be better than a simple Powerpoint textured background.

Does this mean that you have to become a Photoshop/InDesign wiz, on top of everything else, and able to draw as well? Maybe — at least, you need to have a little bit of proficiency with them. Check your local academic library to see whether they offer introductory workshops — many do. On the other hand, there are plenty of visual creations on the Internet with rough-hewn, rather than polished, artistic quality. My favorite is Hyperbole and a Half, which is not what most people would call visually slick, but which works brilliantly. Academia doesn’t provide all that many opportunities for being over-the-top and playful – but poster sessions allow precisely this sort of experimenting. Don’t miss your chance!

Do let me know if you have feedback, or questions.

DHOxSS Visible Prices Poster


Slides and Rough Notes from The Permissive Archive Conference – 9 November 2012

I’m having a fabulous time at the Permissive Archive conference in London, so the least I can do is offer you a version of my presentation.

Here are the slides:

 

And the rough notes I used as guide are here.


VP rough demo

ETA: the link is now fixed!

For those of you who are visiting this site from DHSI, the demo I showed this morning is here. It’s rough, and not without flaws, but it gives, I think, a good idea of what the project will eventually do.

Comments, questions, and suggestions are always welcome — find me this week, or via Twitter or email — and thanks, very much, for taking the time to look around!

 

 

 


Visible Prices is coming to the DHSI Colloquium in June!

I am delighted to announce that I’ll be speaking about Visible Prices at the DHSI Colloquium at the University of Victoria this June (exact date and time TBA). I’ll be speaking about the challenges that arise from working with TEI on a project at the intersection of literary history and economics; the thrills of coaxing price information out of massive databases using APIs; not to mention just showing what VP might do, and how it might enrich classroom experience in several different ways.

I’ve been fairly quiet on most of my usual web fronts. The last part of the dissertation has demanded a lot of attention, as have a couple of new teaching methods that I’ve been implementing — but look for more in this space shortly, including the first official VP twitter feed.


Update, as of January 2012

Since the Simpson Center DRSI completed, I’ve continued work on the database — and with the start of the school year, returned to teaching and dissertating.

I’m currently learning to work with the APIs for Google Books, and the Hathi Trust Repository; and to effectively search the collection of titles released to the public by Eighteenth Century Collections Online. This is my first experience working with APIs, but it’s important — once I can effectively use APIs to gather data, I’ll be much closer to being able to present Visible Prices as a tool that can make existing archives more useful. (Editing and correcting the data, however, will be another big step.)

Much of the project development is available for anyone to see — but not all of it. If there’s something you’re curious about, or want to see more of, just write me — and I’ll be happy to tell you more, or show you what I’m doing in greater detail.

 


About Visible Prices

Visible Prices is an ongoing digital humanities project. It began as a side project when I was finishing my Ph.D. dissertation, and led to my specializing in digital humanities and digital scholarship. I’m continuing to develop it while I work as a postdoctoral researcher at the Sherman Centre for Digital Scholarship at McMaster University. This site is the primary repository for information about the project’s history, future, and its current status. To learn more about my other work and projects, visit my primary site.

You can navigate the pages above to learn more about the rationale behind the project, and the various people and institutions with whom I’ve worked in developing it; or check the most recent blog entry below to find out more about what I’ve been working on recently.