An example of place-name authority usage within JDB1745.
In last week’s post, we set out to introduce the value of a historical database by thinking critically about historiographical and biographical data related to the Forfarshire Jacobite regiment lead by David Ogilvy in 1745-6. While this may seem like a straightforward prerequisite, a comprehensive survey of both primary and secondary sources that address the constituency of this regiment presents a labyrinthine paper trail that requires us to carefully scrutinize the information heretofore recorded. Getting a firm grasp of this ‘lineage’ of data is essential to upholding the accuracy of what is finally entered into our database.
As we suggested last week, simply copying biographical information from published secondary- and tertiary-source name books or muster rolls is not enough to ensure that the data is accurate or even relevant. In short, this practice is ‘bad history’ and opens up the analysis to errors, inconsistencies, and others’ subjective interpretations of primary-source material. In the effort to combat this, we need a methodology that maintains the integrity of the original sources as much as possible while still allowing us to convert them into machine-readable (digital) format. Part 2 of this technical case study will demonstrate one possible method of doing this.
When we discuss the term ‘clean data’, we are referring to information that is transcribed into digital format with as little subjectivity as possible. This means misspellings and known errors from primary sources are left intact, conflicting evidence from disparate documents is retained, and essentially no liberties are taken by the modern historian or data entry specialist to interpret or otherwise blend or ‘smooth out’ information upon entry. Though it might seem unwieldy to use raw data with so many chaotic variables, it would be fundamentally distorting the results to do otherwise. As long as we take the time to set up an effective taxonomy for transcribing (now) and analyzing (later) our data, the results will be well worth the extra care.
Just a handful of men from Lord Ogilvy’s Forfarshire regiment in spreadsheet form…
If you enjoy bewilderingly complex historiographies and you’re wondering exactly what is the purpose for the creation of a historical database like JDB1745, this post is for you. What follows is a use case involving a limited analysis of the Forfarshire Jacobite regiment under David Ogilvy, 6th Earl of Airlie, and how a tool like JDB1745 can help us collect and define detailed information across a number of disparate primary sources. This method of analysis, called prosopography, is essentially an intersection between historical sociology and data-based biography that has risen to prominence as our ability to collate and process big data has matured. By comparing and contrasting large amounts of discrete characteristics about historical personae, we can better understand the context of their lives and we can make more confident assertions about their roles and characteristics in the historical timeline.
Perhaps no more deserving of this disciplinary application is the ever-popular Jacobite era, which has long suffered from misinterpretation, mythistoire, and insufficient data. Though we are currently enjoying a popular resurgence of interest in the subject during the lead-up to the 275th anniversary of the Battle of Culloden, scholarly exploration of plebeian Jacobite demographics is extremely limited and many primary sources remain generally out of easy public reach. This, at its core, are the reasons that we created The Jacobite Database of 1745.
To demonstrate its value, we present a short step-by-step example of how the database can be used as a tool for data analysis that both professional and armchair historians alike will be able to use for their own research. We chose Lord Ogilvy’s regiment because it was significant through the entire Jacobite campaign of 1745-6 and is a unit for which we have a good number of distinct sources to turn to in the example. It is also the intention of this post to illustrate the importance of thinking about the lineage of data to keep it as raw (objective) as possible, as well as organizing it in a way that eases analysis rather than hinders it.