Friday, February 14, 2014

Stylometric Analysis of Mormon Scripture

About 15 years ago John Hilton spoke to my senior religion seminar for science majors about his statistical analysis of Book of Mormon authorship. It was exciting to see the non-contextual word methods explained and to see the graphs going up showing that Joseph Smith, Oliver Cowdery, Sidney Rigdon, and others had different stylistic signatures from Nephi and Alma. It was exciting that this could be shown with objective methods that didn't rely on the highly subjective types of authorship analysis typically used in sorting out biblical authorship questions. He showed how the same methods were used to identify authors on the Federalist Papers, and how the results were very convincing compared to the results by other statistical authorship attribution methods. In particular, he criticized the vocabulary richness methods used by David Holmes in attributing Book of Mormon authorship entirely to Joseph Smith. He showed that Holmes's methods were unable to distinguish among know authors on things like the Federalist Papers, and mentioned that Holmes had moved on in other projects to use the non-contextual word analysis like that used by Hilton and his colleagues. Hilton showed us how non-contextual word methods could distinguish between an author's own writings and that same author's translations of other works. We saw convincing, multidimensional graphs showing that the Doctrine and Covenants had a different signal from Joseph Smith's and Oliver Cowdery's personal writings, suggesting a different revelatory voice for Joseph Smith (and still different from the Book of Mormon).

Hilton also explained to us a number of pitfalls in stylometric (statistical analyses of word use) studies. Apparently it is well documented that changes in genre can drastically change word use. For example, when we speak we use a smaller vocabulary than when we write, and we also use non-contextual words at significantly different rates. So it is very important to compare similar genres when doing stylometric analyses.

I basically fell in love with Hilton's work and took away from it a disdain for Holmes's study. In 2008 another stylometric analysis of the Book of Mormon came out. I found out about it roughly a year ago. It wasn't easily accessible, but a couple of reviews and some explanations of new stylometric analyses were published in the Journal of the Book of Mormon and Restoration Scripture, so I read those. Those articles pointed out what seemed like obvious, fatal flaws due to the hypothesis put forward in the 2008 study, so I never got very interested in actually looking at the original. Then I got more involved with internet Mormonism and a couple of things happened. I discovered that a number of thoughtful Mormons are automatically suspicious of anything that comes out of the Neal A. Maxwell Institute. I also had a chance encounter with Craig Criddle, the primary investigator on the 2008 study. He and I didn't hit it off (he was intent on pushing a Spaulding/Rigdon hypothesis for Book of Mormon authorship, however tenuous the data may be, and I am totally confident in the, at least primarily, ancient origins of the Book of Mormon), but I was able to listen a little and get a little better perspective on the work he was involved with.

Now I come to why I'm writing this series of posts. Few of my internet Mormon friends find Hilton's work as convincing as I do. I think an interpretation of Hilton's work limited to his strongest conclusions is quite compelling. The minimal message is that Nephi is not Alma is not Anybody Modern who was involved with the Book of Mormon according to any (even weak) historical evidence. His data don't make any claims about who Nephi and Alma were. They don't make any claims about when Nephi and Alma lived. They don't make any claims about the moral authority of Joseph Smith or the Book of Mormon. Yet for me this is the most objective evidence available regarding the origins of the Book of Mormon. It is reproducible. It uses methods exactly as they have been applied to answer equivalent questions in peer-reviewed literature. It explains its controls and limitations. It doesn't go beyond the best data. I know this because Hilton talked with us about some other, more tentative results. The method suggests several more authors exist in the Book of Mormon, but Hilton's confidence in those results was less either because of shifts in genre or just not obtaining quite the 95% confidence chosen by statisticians as a cut off. Because of this I have felt no qualms about claiming the Book of Mormon contains at least two authors who were not the proposed 19th century authors. I've stated that any critic of Book of Mormon antiquity needs to deal with this objective fact, and I don't think any have.

Recently, I have been asked by a couple of people to help them understand this assertion. I decided to get all of the original papers and try to understand them myself, as I would a chemistry paper. Very often a chemist does not have the specific expertise to critique all of the methods and assertions in a paper. We rely on a history of expertise, logical presentation of the material, thorough citation of relevant papers on the subject, and our own skills in data analysis. We look at professional credentials of the authors and knowledge of earlier uses of the methods employed. Thorough citations show that the authors have a command of the subject and have given due consideration to previous work. Then I ask if I know how to look at a graph and interpret what is shown. In the posts that follow, I will show how a chemist trained in data analysis interprets the work of linguists and statisticians. Hopefully by being up front with my known biases and by showing you the data presented in the original papers, I can help you become more comfortable with what stylometric analyses of the Book of Mormon have and have not demonstrated. I think there is an excellent analysis of the various studies written by Matthew Roper, Paul Fields, and Bruce Schaalje, but I am going to take a different approach.

I intend to only look for the clearest, strongest results from each of the stylometric studies, and to see if there is any way to integrate these results into a coherent, non-contradictory whole. Where it is not possible, I hope to explain my reasons for choosing one result over another. I will also add some personal analyses and questions as I go. If you are interested, be prepared to look at a lot of graphs and numbers. I'm considering length no object, but I will try to summarize my conclusions at the beginning and end of each post. I will not be discussing historical evidence of Book of Mormon authorship. The vast majority (and maybe all) of first and second hand, contemporary evidence is that Joseph Smith dictated the vast majority of the book, without reference to any other texts, to Oliver Cowdery, over a period of a couple of months. Everything else is, to my mind, speculation and invention. That doesn't imply that the speculations are false, only highly subjective. My conclusions will not rely on any claims about how the words got into Joseph Smith's head before coming out of his mouth.

Here is a list of the papers I'll be working through, not in any particular order: 

A Stylometric Analysis of Mormon Scripture and Related Texts
D. I. Holmes

Stylometric Analyses of the Book of Mormon: A Short History
Matthew Roper, Paul J. Fields, and G. Bruce Schaalje

Examining a Misapplication of Nearest Shrunken Centroid Classification to Investigate Book of Mormon Authorship
Reviewed by Paul J. Fields, G. Bruce Schaalje, and Matthew Roper

On Verifying Wordprint Studies: Book of Mormon Authorship
John L. Hilton

Who Wrote the Book of Mormon? An Analysis of Wordprints

Wayne A. Larsen and Alvin C. Rencher (Preprints available)
Jockers, Matthew L. “Testing Authorship in the Personal Writings of Joseph Smith Using NSC Classification.” Literary and Linguistic Computing. 28.3, (2013): 371-381
Jockers, Matthew L., Daniela M. Witten, and Craig S. Criddle. “Reassessing Authorship of the Book of Mormon Using Delta and Nearest Shrunken Centroid Classification.” Literary and Linguistic Computing, 23.4 (2008): 465 – 492.

The following are not freely available online. I have (or am acquiring) personal copies, which I may be able to share for personal use. You may also be able to access them through a university library.

Extended nearest shrunken centroid classification: A new method for open-set authorship attribution of texts of varying sizes
G. Bruce Schaalje and Paul J. Fields (not free)

Open-Set Nearest Shrunken Centroid Classification
G. Bruce Schaalje and Paul J. Fields

1 comment: