Internship Blog 3 – Superusers, Standardised Reports, and Customisation of Google Analytics

In online crowdsourcing sites, ‘superusers’ are users who use the site at an exponentially higher rate than the average user. Digital Collections is not a crowdsourcing site, it provides no incentive for users to come back over and over again, so finding superusers is interesting.
These superusers can be found on the “frequency and recency” function on Google Analytics (Audience>Behaviour>Frequency and Recency), but Google doesn’t provide the number of visitors, only the number of sessions by those visitors. This sessions metric is problematic because the higher the number of visits, the more sessions they account for, but Google tends to avoid giving data on individuals out like the plague, as they promise not to in their terms of service. But with a total of 3,230 sessions by visitors with over 201 sessions under their belt, we can tell that there is up to 16 computers out there that account for over 201 separate sessions on the site, which is massive. To put this in perspective, DRIS staff have only gotten to about 150 at the highest, and they are working on the site every day.

By applying the Irish and non-Irish segments to the frequency and recency view, we get some interesting information. You would Superuser locationexpect that visitors who return 150, 200, or more times would be more likely to be researching something in the archives, but actually the opposite is true: the ratio of Irish visitors tends to increase the bigger the return count is, but after 201 return visits, this trend reverses and the visitors from abroad dominate the numbers even more.

Finding out who these superusers are is key to growing from a user-base to a community, so to get around Google Analytics protecting identities of these users, we turn to StatCounter. If you have the free version of StatCounter, you can download the logs for the site for the previous 24 hours, which are really handy. They come in csv format, so you do some pretty advanced stuff with them in R when you get a big data set, but as we’re just looking for the few visitors with over 201 visits, we can just open them up in Excel and sort by descending number of sessions. As I could only look back Castell'arquatoover 24 hours at any point, I only found one superuser, whose location is surprising: a tiny historical walled town of Castell’arquato. Because of the type of town this is and because they are looking at the Book of Kells every time the page view data is available, Tim and Gerald in DRIS got the impression that this is a municipal institution or museum that has a computer dedicated for viewing artefacts. The fact that they never visit on Sunday or Monday and the large screen resolution backs this up. This is why it’s really good to mix your analytics packages – Google Analytics is pretty unbeatable for functionality but StatCounter can give you gems like this in the logs.

Custom Dashboards and Automatic Reporting

One of the great things about Google Analytics is that you can set custom dashboards and automatic reports to make accessing your key metrics easier. So as a deliverable for this internship I put together a dashboard that shows important metrics of all of the key segments, visitors from academic institutions, and Dublin versus total Irish visitors. The key thing to remember in putting together a report is that they just contain data points, it’s the people reading them who draw conclusions from them. So you need to put together a set of metrics that not only reflect the institution’s goals, but also allow the person reading them to cross reference the data points to avoid drawing conclusions that are quick but incomplete. The best way to do this generally is to create a balance between acquisition, behaviour, and outcomes. This ‘end-to-end’ view helps prevent you jumping to conclusions based on noticeable findings from single data point. The dashboard shows where these visitors are coming from (referrals, direct, and social), and the average time spent on the site. Specific to the data discussed in the previous blog, which analysed how DRIS could catalyse more research, a great way to put this in place would be to compare the general, Irish, and Dublin use over time to

Standardised Reporting Across libraries

A big factor in what I saw in the data is that online digital resources, particularly in Irish universities, are possibly more dependent on each other than is apparent at first. As I mentioned in the second blog, University users of digital resources are probably very siloed, and the glimpse here of individual users suggests that heritage institutions and facilities can use these resources over and over again across institutions and really get value out of the digitised archive. But the only way to actually measure the prevalence of siloed research behaviour is by comparing data across institutions, where online archives would swap data on usage figures so they can see how they can meet shortcomings in each other’s collections. This is done already on a small scale – like if an archive has a specific collection (such as the Samuels ephemera or, of course, the Book of Kells), other institutions will guide people to it. But once we can see what people are looking ffor and what content they are finding (or not finding), it will be easier to provide a more seamless cross between the collections to put a wider range of materials for researchers to search for.

This isn’t that far of a leap to make. Archivists care about their collections, and anything that helps people who are interested in the artefacts in to see these collections stands a good chance of being considered. If you look at the Facebook profiles of online archives, they tend to like each other, so at least some connection is already happening organically, so the will to connect to each other’s archives is there.


Internship Blog 2 – Introducing the Data

In this blog, I’ll list a couple of things I found in the data using Google Analytics and Webmasters.

The first thing you notice about the TCD library is that it has an odd setup. It is a functioning university library, a tourist attraction, and a UK and Ireland copyright repository all at once. This is because TCD has been a university since 1592 and has been evolving all the time.

You can see this immediately when you look at the basic analytics usage statistics: it’s no big surprise that the vast majority of organic traffic comes from people searching for the Book of Kells, and that the biggest visitor countries are Ireland, the US, and the UK. Top 10 cities DRISLook at the top 10 cities that had the most visitors on a couple of recent weeks: these are all cities known for having large Irish populations. So we can make a pretty obvious assumption that the Irish diaspora are driving the segments in the UK, US and Australia. The big usage spikes around St. Patrick’s Day back this up.

The most fun part of using Google Analytics is often looking into the outliers to bigger trends, and first obvious outlier in the popular cities list is Mission Viejo, which is in Orange County. A quick bit of detective work shows us that this small town is home to Saddleback College, which holds a trip to Dublin and the West of Ireland every year, for which the application deadline just passed. Also, there are two Columbuses: one is Columbus, Ohio (pretty Irish), but the other is Columbus Georgia, which is home to Fort Benning, which in turn is home to a University… that also visits Ireland.

Identifying Academic Users

So to pick up from the first blog the challenge here is to identify academic users from more casual users and see how the Repository is catalysing research, without being able to see what the users are actually looking at or having rich data about their journey through the site. As most colleges and universities have their own internet networks for student and staff to log in to, we can isolate the Internet Service Providers that have words like “university,” “college,” or “.edu” in their name by going to Audience>Technology>Network. When we segment all of these ISPs, we can see that they account for an average of between 8-10%, right through the year.

We can also look at academic behaviour over time: when we isolate the Trinity ISP specifically (below), we get a clear pattern that’s different from the site’s overall usage graph: the weekday median session count is 15.5 times the weekend median session count (I think the median is more appropriate than mean here because we get a better sense of usage when we focus less on the big outliers).

TCD ISPAnd we can cross reference the Trinity usage with that of all ISPs with “university” in the name to make sure it’s not specific to Trinity (which could be skewed by the DRIS staff working during the week and looking through the collection). You can see the pattern holds up, as you can pick out the weekends as troughs except for one at the end of March:

University ISP

By putting this in a custom dashboard and automatic report, DRIS can monitor how their academic audience is growing over time. We can also back this up by looking at the average duration of sessions, and we can see that the session duration is the same – a minute and a half – which isn’t great, as we’d like to see them stay longer.

Key Segments

Out of curiosity, I compared this data against the location data that Google Analytics provides in Audience>Geo>City. I segmented off visitors from Ireland and visitors from Dublin, and found something pretty interesting: Dublin usually accounts for about 90% of the total Irish daily site visits.

Dublin v Irish users

This is a great nugget of information because from that we can make two interesting observations, one definite and the other a hunch: first, it is clear that there are universities across Ireland outside Dublin that have great humanities departments (UCC, NUIG, and UL in particular), where the artefacts in the Repository aren’t being seen very much. Second, it also suggests that people in these universities are sticking with the digital resources produced by their own libraries, even though the collections are much smaller. These are research silos, and are a problem in research.

Trinity backlinks are the most important to Digitalcollections

You can also see a possible cause of this research silo problem in the referral traffic that the site gets: when we take a look at the backlink logs on Google Webmaster, we can see that the 19 links from the domain account for a massive portion of the traffic. That means that Trinity’s .4% of links account for 42% of referral traffic. A lot of this could be to do with the popularity of the library’s blog with a massive 40,000 coming from a blog post titled ‘The Book of Kells is Now Free Online’ – Webmasters showed that ‘book of kells free online’ was one of the biggest search terms. To counteract people’s natural inclination to stick with what they are familiar with, you really would have to link directly from other universities’ resources, and in turn link to theirs. It would be very interesting to see these two data points on other digital collections.

To find out what the trend is here (if it’s getting better or worse), I also put the Webmasters logs through a pivot chart to get a look at the daily backlink creation count for the past two years. By comparing this with a screengrab of the referral traffic, you can get a nice insight into two types of referrals: there is a big spike in referrals in September and October 2014, when Trinity’s first MOOC linked to resources on the website. This generated huge usage spikes, but not too many people were talking about it in blogs, which in the long run won’t get Trinity’s PageRank higher, which in turn bumps up organic search referrals. What is interesting is that the Harry Clarke Symposium in February 2015 not only gave the biggest referral spike in visitor numbers the site has seen, it also generated the most backlinks, so it really got people talking as well as clicking. This kind of user behaviour is how the archive will really grow as an online resource – being the reference used by people discussing archives online. It’s not only great because people are talking about it of their own accord, but also because these referrals will make the archive’s pages feature higher on PageRank, getting more organic search traffic.

Webmasters Pivot TableReferrals over time

Internship Blog 1 – Using Google Analytics to Understand Users of a Digital Archive

This is the first of three blogs about a work placement in the Digital Resources Imaging Services (DRIS) in the Trinity College Library. This blog will introduce the project and cover some digital humanities theory and background, while the second will be an introduction to the data, and the third blog will go a bit further into the data and summarise some findings.

DRIS_LogoDigital Collections is a free, online archive of TCD’s digitised resources, which since early 2013 has grown to about 12,000 digitised subcollections, containing over 100,000 ultra-high resolution images of items including manuscripts, letters, and photographs. After this first phase of growth, DRIS proposed the internship to analyse the archive’s user base, its key segments, and possible areas for growth in the future. In the first placement meeting, DRIS staff were particularly interested in finding out more about academic users in particular, and how the library is catalysing research. So in this internship I explored the data with analytics packages to find out who is using the site and where they are coming from.

There is a lack of research on digital resource management

In digital humanities there is a lot of awareness of the need to digitise archives, but not a lot of research on what do with these digitised resources after their creation. While the need for digitisation has spurred on entire books, there isn’t much research on how these resources can be managed to catalyse research in the humanities, a problem that other researchers have come up against. Over the past couple of decades the attention has (rightfully) been on enhancing access to collections, but the development of a collective vision of how these digitised collections should be managed has fallen by the wayside. The research that I have found I have put on this bibliography.

Up to now, a couple of the DRIS staff have been using Google Analytics and StatCounter to monitor the site usage: StatCounter for the visualisation of overall usage Google to keep an eye on visitor spikes and occasionally to drill down into the data.

UX web design vs traditional archival design

A great thing about the site is that it has an image viewer that allows you to search and navigate images. This means that you can look through books and collections of photographs without the webpage needing to change or reload, which makes for a seamless and natural user experience. I think experience makes a big contribution to the high average session duration – over a minute and a half.
DRIS home screenshotDRIS screenshot

The use of the JavaScript viewer is a good example of conflict that digital humanities has sparked. Something everyone in class has noticed in looking through online digitised resources is that a lot of them have a 1990’s feel to them. This isn’t to do with a lack of tech-savvy DHers, but more to do with the siloed nature of archives. Archives are by their nature some of the most siloed collections of knowledge on the planet. They have to be, in order to provide a structure to that knowledge, and archivists can be very protective of this. So when it comes to putting a wireframe on the digitised collections, artefacts tend to end up as HTML pages accessed via HTML pages of the collection they are in, mirroring the archive ontology. But the overwhelming majority of web users are conditioned by their usage of Facebook, Instagram, and Twitter, so navigating in and out of HTML pages can frustrate people. That’s why it’s great that using the Digital Collections site, you get a sense that you’re looking at the actual artefacts, not archives.

JavaScript viewer makes it harder to record data within Google Analytics

The JavaScript viewer is a bit of a double-edged sword when it comes to the analytics end of the website, because it means that every session is designed to stay on one or possibly two HTML pages. The searching is conducted on a search bar that uses JavaScript to autocomplete a search term, then hands this search to the PHP, which grabs the result from the database and puts it in the viewer, all while the visitor stays on the same webpage. The different results are navigated after a ‘#’d in the url, which doesn’t count as a new HTML page.

So for Google Analytics, this means that some of the most important metrics are unusable for the time I need to use them: the only pages it records are the homepage and the ones where people basically ‘fall off’ the viewer (like when you click on a link to a single photo on Instagram instead of a profile, you end up on an actual HTML page of that photo, whereas you usually view them through the viewer on a profile). The bounce rate in this case is useless because when a visitor accesses more than one as it will only measure people who ‘fall off’ the JavaScript viewer. Similarly the only data we have on pageviews are from people in this situation – we can see what content people look at when they look at the HTML page for that content. But there’s no way of finding out how representative that is, especially with a site that is ‘long tail’ – that is, a few items will be massively popular, while there will be a massive volume of items that people will rarely search for (like the books on Amazon, for example). The Centre for High Performance Computing has developed a JavaScript analytics tool that will collect all information on what is accessed and searched, but that will only be completed and implemented towards the end of the internship, so I won’t be blogging about that.

But for now, this makes the internship a bit more fun: every the analysis of the site usage has to be based on really drilling down into the data by cross referencing everything with other data. Usually you would use Google Analytics to look at the usage of a website with a lot of HTML pages in the wireframe, and a clearly defined goal (leaving email addresses to join a mailing list or ask a question). So it’s going to be fun digging deeper into the data to see what a single HTML page can tell you about a ‘long tail’ kind of site like the Digital Collections.

Class Blog #3: Ngram & ‘Culturomics’

In the winter of 2010/2011, a group of mathematicians and statisticians added fuel to a fire already kindling in literature departments. The development of algorithmic criticism already had a prominent debate in the humanities: Critical Enquiry had published a debate between Franco Moretti and Katie Trumpener about the merits of quantitative and qualitative approaches. Moretti outlined the possibilities of extrapolating hypotheses from quantitative data on titles, Trumpener argued that it was dangerous to allow students to back up their arguments with this type data taken out of context.

However with the publication of Michel et al., ‘Quantitative Analysis of Culture Using Millions of Books’ and the release of Ngram software, anyone could instantly carry out quantitative analysis of Google Books, and a brief ‘redditification’ of the debate occurred. God’s public popularity was bandied about and statistics were used to back up moderately bitter retorts (by academic debate standards, anyway). Who would have thought the internet would help a debate get angrier?

At a time when computing power is increasing exponentially, it seems almost foolish to suggest that quantitative algorithmic methods will not play a central role in cultural analytics in the future. They will also be available to anyone with computer and internet access. Below I’ll outline how Ngram works, its basic limitations, in the hope of making the point that humanities scholars need to engage with this data to put a stop to the idea of ‘Culturomics’ – the idea that human culture can be mapped as accurately as the genome.

Ngram analyzes a corpus of millions of books on Google Books by counting the number of times that a given word or string of up to five words appear in the corpus (‘Ngram’ translates as ‘the number of that which is written’: n is the mathematical term that denotes ‘number,’ ‘gram’ is the Greek suffix for ‘that which is drawn/written’). The results are given as a graph displaying the frequency of the words as a percentage of the total words published each year.

Vietnam & Vietnam War

A good example of Ngram’s application is searching appearance of the term ‘Vietnam.’ As the chart shows the highest frequency of the term occurred in 1970 with .00353% (which is also Ireland’s international dialling code – let the conspiracy theories begin).  This means that for every legible word appearing in books that were published in 1970, scanned by Google and selected for Ngram (5,000,000), ‘Vietnam’ accounted for 0.00353% of them (this can also be written as 3.53 x 10-3). We can make an obvious assumption by looking at the graph: Vietnam was talked about a lot in the late sixties and early seventies because it was involved in a war involving a large, English-speaking nation. But we can also notice the extent to which the Vietnam War was referred to as ‘Vietnam.’

Ngram Joyce Beckett

The reason I selected ‘Vietnam’ as a basic example is because the impact of the war on its appearance in the English language is undeniable. A clear issue in the use of Ngram is that it analyses words’ appearance, not their usage, which has been ignored in the debate. Using the graphs to determine a word’s usage can be a massive jump in logic that is only really obvious when you compare terms like ‘Joyce’ and ‘Beckett.’ The overwhelming majority of ‘Beckett’ will be Samuel Beckett, but Joyce is a common Irish surname and a very common first name. The obvious solution is to search ‘James Joyce’ and ‘Samuel Beckett,’ but doing this unavoidably leads to the loss of appearance of references to ‘Joyce.’ The jump in logic from the appearance of ‘Vietnam’ to determining its usage pertaining to the war is small and backed up by the data, but as the type of data changes to words that are used to mean multiple things (like Joyce), we need further methodological steps that can then be refined into an algorithm and refine Ngram’s results.

It is most likely that as the n-range expands (currently only five words at a time), algorithms will be written to crunch the words and guess the context, which will take us a step away from appearance and towards usage (this will get really interesting when the semantic web becomes more advanced). Perhaps by contributing to the development of these algorithms, humanities scholars can not only develop a tool that can create new knowledge in their field, but also impact the sciences by applying schools of thought such as structuralism to the tools’ development.

The potential payoff for humanities is huge: in Ngram’s first research paper (Michel et al.), in plotting the appearance of the names of victims of Nazi suppression, the researchers discovered other authors’ and artists’ results matched the results of known victims of suppression. This has led to the study of these names to determine if they were suppressed by the Nazis and then lost to historians.

The use of ‘usage’ instead of ‘appearance’ in the literature about Ngram shows the lack of humanists stepping up to impact the field. Nunberg pointed out the effect of having statisticians and mathematicians implicit critique of culture by naming their project ‘Culturomics’: implying that human culture can be mapped just like the genome.

Class Blog 2: Avatars in Cultural Heritage Visualisations

My classmate and fellow digital humanist-at-large Eoin over at Long Pig Blog recently analysed the use of avatars in an interesting blog post, talking about how they are used to mediate between the virtual world and the user. Using avatars has a big input into how we experience 3d cultural heritage visualisations, which are an emerging method of connecting more people with sites like the Ancient Roman Forum or Skellig Micheal, sites that are inaccessible to many people for various reasons.

In some digital constructions (not reconstructions, but that’s for another time) of cultural heritage sites the viewer is a floating viewpoint that can take any position on the x-, y-, z-axis, while others place the viewer as an avatar, or even  populate the world with other avatars.

The use of avatars (or lack of thereof) fits in well to a major debate that has been going on in the archaeological community for years. This is a good example of the fact that although digital technology is shifting paradigms, many of the same debates will probably rage on. This particular debate is about how archaeologists can best understand sites, and to what extent phenomenology has a place in the understanding of the past.

Descartes vs Husserl

Everybody is familiar with Descartes’ cogito to varying extents. The important aspect here is that it establishes a subject/object divide, i.e. I am thinking about this, an approach which is best summed up by the practices of Descartes’ coordinate system and map making – the idea that you can isolate a part of the world for singular reference. This divide came under attack with the emergence of structuralism, which emphasised how much our concepts of things are linked to other things, i.e. our idea of a ship is essentially that it is a big boat, so then our concept ‘ship’ is dependent on our concept ‘boat’. After structuralism, phenomenology (led by Husserl) emerged as a school of thought that opposed the subject/object divide. This philosophy emphasised that the actual ‘being in the world’ is central to our understanding of it, and so everyday routine and common experiences we take for granted are taken as key elements of the world that should be explored.

In archaeology, the embrace of phenomenology meant analysing sites not just by creating maps of them, but also by experiencing them as much as possible in the context of the community who created the site would have experienced them. For instance, Christopher Tilley’s seminal A Phenomenology of Landscape emphasises using the body as a tool to understanding sites, for example noting how hard a site is to access, what else is in the field of vision from certain parts of the site (e.g. monuments, important sites). Right away, critics pointed out that his experience was inherently subjective as that of a modern, able-bodied male, limiting the effectiveness of his argument.

Descartes vs Husserl: Online

So, in the spirit of a blog post that touches on philosophy, what does it all mean (for avatars in 3d cultural heritage visualisations)?

First of all, there is an obvious divide that falls roughly along the same lines as the Cartesian/phenomenology debate: viewing the site as an avatar means that the experience of the site becomes more subjective, as we are bound to see and travel around the site while bound to the same laws of physics that applied to the Ancient Romans in the forum or Irish monks on Skellig Micheal. This conforms to Tilley’s view on using the body as a tool for interpretation (as much as possible in a digital world).

On the other hand, you have visualisations such as those from the Discovery Programme, which are extremely high resolution, accurate, and impressive models of Irish cultural heritage sites. These models are free from their surroundings and can be lifted, rotated, flipped, and zoomed into, so are a great digital example of the Cartesian objectification of the world. In terms on the above debate, although they are technically very impressive, it is hard to estimate how people interacted with them in the context we are trying to put them in.

Below is a video showing a visualisation with avatars (Rome Reborn), and a video showing a model produced by the Discovery Programme.

Phenomenology in Motion

I was recently lucky enough to get to use an Oculus Rift headset to view a digital visualisation of a peak sanctuary in Crete (Petsofas) created by Frank Lynam. What is immediately striking is just how immersive the headset is, cancelling out all view of the real world, to such an extent that when I was hurtled across the map, I genuinely felt a bit nauseous and dizzy (cue laughter from next generation’s children). After this wave of Luddite terror passed, I began to appreciate the imposing nature of the building, on top of a hill. If I were to experience this as an avatar by having to climb up a hill, I would have a heightened appreciation of the sanctuary’s relation to its surroundings – and how imposing it is. This is central to the ‘being in the world’ espoused by phenomenology. On the other hand, if the building was separate from everything else, and I would appreciate the technical aspects of the building more.

Frank Lynam Petsofas Screenshot
Petsofas Visualisation (computer-generated avatar for scale)

Though the digital phenomenological approach might seem outlandish to some now, constrained by relatively primitive graphics and usability, the speed at which technology is evolving will make it central to cultural heritage in the future. Graphics that were impressive a few years ago now seem like relics themselves. With the proliferation of ever more advanced graphics and technologies like the Oculus Rift, common practices will change and we will both be experiencing heritage visualisations as artefacts separate from their context as though they were items in a lab, and we will be immersed in virtual buildings, cities.

Crowdsourcing and Digital Humanities: Gamification and its Discontents

I. Introduction

There is something inconsistent about crowdsourcing in the humanities. Crowdsourcing in the sector is employed mostly for the transcription or classification of data, tasks which are largely devoid of interpretation. However, humanities students in most disciplines are trained to do exactly this – to interpret and communicate information and opinion. Despite this, there have been some great successes, with the designers of humanities projects doing fantastic work in setting up websites employing the intelligent use of gamification, a quite maligned design technique (see, for example, the article on the class reading list with the concise title ‘Gamification is Bullshit’).

Put simply, gamification is the application of game design theory and techniques in non-game settings to engage users. The term ‘gamification’ emerged in 2008 to describe the increasingly widespread online use of game elements like leaderboards and badges, was quickly billed as an easy way to engage customers and increase sales, and was then just as quickly deemed a useless trend that should be consigned to the past.

The problem with the extensive recent coverage of gamification is that it largely ignores the fact that gamification was not ‘invented’ in 2008, that was just when the term was coined. Game design principles have been employed in other contexts for years. Take reward cards as an example (buy a certain amount of something to get one free), which appeared in the late 1800s. Zichermann and Cunningham point out that they rarely make economic sense for the consumer to change their behaviour, but they are successful because customers have a challenge, a reward, and can see their progress. Likewise, frequent flyer programmes don’t just reward airline customers with free trips, they also ‘level up’ with gold and platinum cards. So gamification as a concept is quite catchall – game design is quite broad – and it is likely that ideas currently classed as ‘gamification’ will be absorbed into good user-focused design.

At the moment, gamification seems somewhere around the nadir of the post-hype comedown that most innovations suffer. Because of this, digital humanists are reluctant to acknowledge the use of the design theory in their publications. Below I’ll briefly introduce gamification and give an example of its appearance in digital humanities crowdsourcing.

 Gartner Technology Hype Graph

II. Gamification Online

In recent years, as internet usage has pervaded almost every aspect of daily life, many organisations from online stores to social media have found themselves dependent on keeping people engaged with their website. It was quickly realised that the gaming industry had for decades been relying on exactly this – keeping people engaged, sitting in front of a computer. For this, game designers had developed the theory of ‘flow,’ the state which is produced in a game player by the correct balance of challenge and reward – if something is too hard, a player will get frustrated and quit, if something is too easy, they will quickly tire of it (this is why, for example, games are split into levels that get progressively harder).Flow

Another, less mentioned element that is key to gamification is how it should build on the existing motivation of the user. Great examples of gamification are sites and apps that expand upon the existing motivation of the user. Yelp is a crowdsourced review site that allows people to gain status in the site’s community by becoming top commenters and having greater authority on the site. This works well because it builds upon the motivation to share reviews – the desire to have one’s opinions listened to and considered important. Nike+, is a fitness app that organises and simplifies users’ fitness goals and tracks their progress, collecting badges along the way like ‘Streak Week’ (that particular badge is for running every day for a week). This works well because it reinforces long-term motivation of improving fitness with short-term, easily reachable goals like levelling up to a new badge, keeping the user in a state of flow.

III. Gamification in Digital Humanities

So, returning to the opening inconsistency, how do good humanities crowdsourcing projects tailor their design around existing motivations and keep their users engaged?

The best example of gamification being used well in a crowdsourced humanities project is University College London’s Transcribe Bentham project. The challenge is huge: to transcribe 40,000 unpublished folios handwritten by Jeremy Bentham, many of which were written in increasingly bad handwriting and chaotic order as the philosopher aged and lost his sight. However, the designers of Transcribe Bentham have built a platform that has created a community of frequent contributors who, by logging in and transcribing in their spare time have completed around 40% of the task as of October 2014.

Transcribe Bentham incorporates leaderboards and badges, allowing transcribers to move up through the ranks from ‘apprentice’ to ‘scribe,’ and on up. As with Yelp, the gamification builds on the users’ pre-existing motivation (to explore previously unavailable writings by Bentham himself), and as with Nike+, the users have frequent, achievable goals and rewards (levelling up), keeping users in a state of flow. Although Stuart Dunn has validly pointed out that gamification has the potential to trivialise input, the application of its techniques can definitely work in this transcription context.







Bibliography & Further Reading

Bogost, Ian. ‘Gamification is Bullshit: My position statement at the Wharton Gamification Symposium,’ blog post 2011. Available from


Causer, Tim; Terras, Melissa.’”‘Many hands make light work. Many hands together make merry work”: Transcribe Bentham and Crowdsourcing Manuscript Collections’ in Ridge, Mia, Crowdsourcing our Cultural Heritage, Ashgate, 2014. Available as a pdf file from


Clancy, Heather. ‘Looks Like That Whole Gamification Thing is Over,’ in Fortune, June 6, 2014


Deterding, Sebastian; Dan Dixon; Rilla Khaled; Lennart Nacke. ‘From Game Design Elements to Gamefulness: Defining “Gamification,”’ in Proceedings of the 15th International Academic MindTrek Conference: Envisioning Future Media Environments. ACM, 2011, 9-15.

Dunn, Stuart. Crowdsourcing Scoping Report,London, 2012. Available from

Holley, Rose. “Crowdsourcing: How and Why Should Libraries Do It?” D-Lib Magazine 16.3/4 (2010)

Cunningham, Christopher; Zichermann, Gabe. Gamification by Design. 2011, 5-13. Available as a free ebook here


Image Sources:

Gartner Hype Cycle of Innovation:

Flow diagram: