Using Math to Decode History
Statistician Dr. David Holmes is a modern day Sherlock, but with a twist. Channeling his namesake’s gift for catching culprits, this word-sleuth applies the techniques of stylometry — the statistical analysis of literary style — to uncover the authorship of anonymous works of literature.
Originally from England, Holmes received his PhD in statistics from King’s College, University of London and taught at the university-level in his native country for twenty-eight years. It was during this time that he was asked to develop a computing class for humanity students, enabling him to apply his professional background in statistics to an interest in literature. Eventually, what began as a course proposal sparked a lasting passion. “Stylometry is a branch of statistics that has grown a great deal in the past twenty years and in which I have undertaken many successful projects, but I always need to work with specialists in the field of application, be they classicists, historians, or literary scholars,” said Holmes. “In the simplest terms, I use multivariate statistics to investigate the authorship of disputed literary works.”
After a year abroad as a visiting professor at California State University in Sacramento, Holmes was so enchanted with the United States and its school system that he vowed to make a permanent move across the pond. In the Fall of 1997, he took a position at The College of New Jersey, where he is now in his thirteenth year as head of the statistics program. “The best aspect of TCNJ is the students, no question about it,” Holmes explained. “I love the fact we have a statistics major and that I can use my own research in working with students.”
During his time at TCNJ, Holmes has involved students in stylometry projects through independent study and research teams. Among these was the “Pickett Letters” investigation into the authorship of Civil War-era correspondence supposedly penned by the Confederate General George Pickett to his fiancée. These letters were published after his death, as Holmes explained it, “to show the world what a gallant soldier he was,” but historians have long doubted his authorship due to anachronisms and the constraints of writing on the battlefield. Working with a team of statistics majors, Holmes collected the letters in question, in addition to others written by soldiers and generals to their families during the Civil War for control purposes. This textual data was processed using specialist computer software and applied statistics, and the results supported historians’ speculations: it was his wife, LaSalle Corbell Pickett, and not Pickett himself who wrote the published letters.
“Stylometry is all about the way we subconsciously use common non-contextual function words,” explained Holmes. “If you collect the rates of occurrence of seventy or so of these words (e.g. by, from, to, with…), you have what is called a ‘wordprint,’ very much like a fingerprint, that can be used to identify an author.”
In another project, which has been accepted for publication this year, Holmes teamed up with TCNJ history professor Daniel Crofts to determine the author of “The Diary of a Public Man.” Published anonymously in 1879, the diary details behind-the-scenes accounts of the inter-workings of Abraham Lincoln’s cabinet just prior to the outbreak of the Civil War. From a historian’s point of view, Crofts narrowed the field down to about six candidates and after exhaustive statistical and literary comparisons of contemporary diaries, Holmes and Crofts concluded the author to be newspaperman William Henry Hurlbert, chief editorial writer for the New York Times.
With over forty years of teaching under his belt, Holmes took his first sabbatical in Spring 2009 and traveled to South Africa to learn more about his most recent project, the Chard Report. “Deep down, I am a very keen historian,” said Holmes. Written and sent to Queen Victoria following a military action at Rorke’s Drift during the Anglo-Zulu War of 1879, it is believed that Lieutenant Chard, although signing the report, did not actually write it. Most likely it was written by a staff officer, for it contains details of grave implications in the murky world of Victorian politics, namely who should take the blame for the worst defeat in England’s colonial history at Isandlwana that same day. Working with a statistics student, Holmes will collect letters written by soldiers and staff officers in the war, establish word prints and eventually compare these to the Chard Report itself. Holmes is also working now on an Independent Research project with a Biology major who has taken his statistics classes and become enthused with the subject. They are investigating the effect of varying sizes of word-block samples on authorship attribution.
Channeling his interest in history, Holmes has ventured outside the statistics department this semester to teach some TCNJ history classes all about the Anglo-Zulu War. “It may be a rather unusual thing for a statistician to do,” he said. “but I very much enjoy history.”
Holmes is a member of the American History Forum, which tours battlefields, and is a keen sportsman, playing tennis, skiing, and orienteering with the Delaware Valley Orienteering Association.
“I love working in stylometry,” Holmes asserted. “I have managed to publish my work extensively and have given numerous talks on both sides of the Atlantic. Plus, I know I’ll never be short of work!”
“But one thing is for certain,” he said with a chuckle. “I don’t do Shakespeare. Wouldn’t touch it with a bargepole!”