Sunday, March 2, 2014

Prophetic Voice and Holmes Tries New Methods

  • Separation of data points by measures of vocabulary richness is useful for distinguishing different authors. Clustering of data points is uninformative for determining authorship. Thus, a cluster may represent multiple authors.
  • Joanna Southcott's prophetic prose has a vocabulary richness signal that is approximately an average of her personal writings (diary) and her prophetic verse. Thus, by changing genre, an author is able to significantly change her stylometric signal according to vocabulary richness measures. Therefore, genre must be controlled for when using vocabulary richness for author identification.
  • The Book of Revelation is more similar in vocabulary richness to both Southcott's prophetic voice and to the Book of Isaiah than Southcott's own different styles
  • For Joseph Smith to have written all of his personal writings, the Doctrine and Covenants, the Book of Mormon, and the Book of Abraham, he would have had to employ several times larger variation in vocabulary richness than any single author studied by Holmes--even including Southcott's entire variation.
  • Holmes used methods of noncontextual word analysis to determine authorship of contested Federalist Papers, and found that these methods were more successful and informative than vocabulary richness.
  • Holmes's data continue to support multiple authorship for the Book of Mormon, and strongly suggest that Joseph Smith was not the sole author of Mormon scripture.


Holmes, David I. (1991), "Vocabulary Richness and the Prophetic Voice," Literary and Linguistic Computing, 6(4):259-268.
Holmes, David I. and R. S. Forsyth, "The Federalist Revisited: New Directions in Authorship Attribution," Literary and Linguistic Computing, 10(2):111-127
The Prophetic Voice 

Holmes continued to study statistical methods for authorship attribution for at least several more years after examining authorship of Mormon scripture. The immediate follow-up to his 1990 paper, which I discussed in a previous post, was to look at how varied an author's styles could be. He investigated the vocabulary richness of Joanna Southcott, a moderately prolific author who kept personal diaries, wrote extensive inspired verse, and recorded many prophetic sayings. Holmes made three collections from her writings:

I think it's important to note, here, the three very different literary styles: diaries, verse, and prophecy. Let's get right to the main figure now. In fact, this is the only figure, aside from a dendrogram, that Holmes included in this paper, justifying my identification of the equivalent figure in the previous paper as the most important. Here it is:
I've done the same type of normalization as before. I measured the greatest distance between Isaiah samples (I1 and I2) and set that as 1.0. I measured other distances in terms of this unit length. Let's make two comparisons. First, let's take Holmes's position that Southcott (SD, SP, and SV) is representative of how varied an author can be when employing a prophetic voice. Southcott still only manages to change her style by 3.2 units, or 6.1 units if you include prophetic verse. Joseph Smith's writings (J1-3 and D1-3) change by as much as 8.8 units between his personal writings and his revelations in the Doctrine and Covenants. Adding the data from Holmes's last paper to this one, we can make the following observations regarding the variability in vocabulary richness in the first two principle components:

1st Principle Component Variability
2nd Principle Component Variability
Joanna Southcott
Joseph Smith (personal)
Joseph Smith and Doctrine and Covenants
Joseph Smith and Mormon Scripture

In this analysis I measured the greatest variability in the first and second principle components separately. In the first principle component (PC1), Isaiah varies by 1.0, setting our unit. In the direction of PC2, Isaiah shows almost no variability in signature. Joseph Smith approximately doubles Isaiah's signature in both PC1 and PC2. Each of Southcott's changes in genre is a little larger than the variation within the personal writings of Joseph Smith. Joseph Smith and the Doctrine and Covenants, combined, show a variability almost twice that of Southcott in PC1 (the most significant component of authorial signature), and a still large variation in PC2 (6 times Isaiah, 2 times Smith's personal writings, 2/3 of Southcott's total change, and 1 1/3 times each of Southcott's individual shifts in genre).

Holmes's interpretation of Southcott's prophetic voice as an imitation of Revelation is very plausible, both from the perspective of vocabulary richness and history. Southcott was, apparently, trying to imitate biblical style. Joseph Smith, on the other hand, had his own, original prophetic voice unlike the biblical styles which Smith often quoted in Mormon scripture.

Vocabulary Richness in Mormon Scripture Revisited

Adding in the rest of Mormon Scripture from the Holmes 1990 paper (this is necessarily approximate, but justified by the same variability ratios existing for Isaiah, Smith's personal writings, and the Doctrine and Covenants), writings authored and dictated by Joseph Smith vary by 9 units in PC1 and 6 units in PC2. Variation in the Book of Mormon alone is approximately 2.5 in PC1 and 5.5 in PC2. Claiming that the mind of Joseph Smith was alone responsible for all of these authorial signatures requires not just large, but believable, changes in genre as illustrated by Southcott, but changes at least 1 1/2 times that large.

Now let's examine the ability of vocabulary richness measures to distinguish among known authors. The differences between Revelation and Isaiah, and Revelation and Southcott's prophetic prose are 1.2 and 1.5 units, respectively, and almost exclusively in PC1. By the descriminatory standards of Holmes's conclusions, Isaiah, Revelation, and some of Southcott's writings are indistinguishable, while J1-3 and D1-3 are clearly distinct. This evidence proves conclusively that Holmes's conclusion of single authorship for all of Mormon scripture is completely unjustified by the data. The most he can claim is that Mormon scripture was written with a very wide range of authorial stylometric signatures by an unknown number of authors.
I'm going to add a few additional observations:
  • Nephi, Alma, and Moroni each show reasonable sizes of variability in authorial signature based on variation in Smith's personal writings and Southcott's prose (personal and prophetic), and each of these is distinguishable from the others.
  • Lehi and Abraham are as believably different from Nephi, Alma, Moroni, and the Doctrine and Covenants as Revelation is from Isaiah and Southcott, especially if you add in the third principle component.
  • This multi-author grouping of Mormon scripture removes N1 and D1 as  extreme outliers--a reality unexplained by Holmes in his 1990 paper.
  • Mormon (M2-M5) are believably a single author, but M1 still remains as an outlier. A major genre shift, not even as extreme as the shift from Southcott's diaries to her prophetic verse, could explain this. Unfortunately, in lumping all of Mormon scripture together, Holmes failed to examine or report information that might have helped us understand M1 as an outlier. 
  • It is also possible that single outliers of the degree of M1 may just occur randomly, from time to time. However, believing this renders Holmes's analysis meaningless, because he only has single samples for each of Southcott's voices, and for Revelation.

The Federalist Papers

Holmes reported a study of the Federalist Papers in 1995. He employed various authorship attribution methods to the Federalist Papers of both known and disputed authorship. The paper seems interesting in its own right, with much more than relevance to the question of Book of Mormon authorship. I, however, will focus only on one part relevant to the discussion of Book of Mormon authorship.

Holmes compared his vocabulary richness measures with noncontextual word use measures. Here they are:
You'll notice in the top left panel that vocabulary richness was only able to distinguish among four known authors with some overlap and a fair amount of gerrymandering. The noncontextual word analysis was able to cleanly distinguish all four known authors (top right). In assigning authorship for the disputed Federalist Papers, vocabulary richness incorrectly assigned one text, and left several ambiguous. Noncontextual words also incorrectly assigned one text, but much more cleanly assigned all of the remaining 11. These data confirm that vocabulary richness is a useful measure for selecting among known authors, that it is a problematic measure for making really clean distinctions, and that noncontextual word analyses are at least sometimes significantly better. It should also be noted that there are no changes in genre for the Federalist Papers. We know from Southcott that changes in genre can adversely effect the ability of vocabulary richness to distinguish among authors. We don't know how it effects noncontextual word use.

Authorship possibilities for Mormon scripture

Holmes's analysis allows for several possibilities to explain the authorship of Mormon scripture. I can imagine these categories of explanations:
  1. Joseph Smith wrote and dictated his personal writings in his 'normal' voice. He dictated Mormon scripture as revealed by God or 'translated' from ancient texts written by multiple authors. This is Joseph's explanation.
  2. As number 1, but rejecting divine or miraculous ancient origins. Possible explanations include: the Devil, hallucinations, psychotic breaks, aliens, mundane translations from ancient texts, individual genius, or whatever you want to imagine that can't be explicitly tested without magic or a time machine.
  3. The genres of Joseph Smith's personal writings, revelations, and other Mormon scripture are different, and this accounts for the vocabulary richness differences.
  4. Some other modern author (or authors) really wrote the Doctrine and Covenants, the Book of Mormon, and the Book of Abraham, rather than their being translated or dictated by Joseph Smith.
  5. Scribal influence changed the vocabulary signal.
  6. Joseph Smith was trying to disguise his signal to appear like different beings were giving the texts when it was really his imagination and invention.
(1) Mixed divine and ancient origins for Mormon scripture are not contradicted by any statistical evidence (yet examined). The very large variability in vocabulary richness found in texts dictated by Joseph Smith supports a multiple authorship hypothesis. Establishing who the authors actually were, in this scenario, will likely never be conclusive due at least to incomplete historical documentation.
(2) Alternative unexplained or supernatural origins are equally possible. These would include explanations claiming that Joseph Smith had access to obscure, detailed, but extant knowledge and documents regarding any number of topics (first century Christianity, ancient Mesoamerica, ancient temple rites, etc.), despite the lack of historical evidence supporting such assertions. They would also include explanations like, Joseph was a genius sponge, soaking up ideas around him that he might have heard about and constructing them into a coherent set of scriptures. Other naturalistic explanations involving mental aberrations in Joseph Smith would fall into this category, as would other, non-divine, supernatural explanations.
(3) It is perhaps possible that changes in genre could completely account for the variety of vocabulary richness measurements in Mormon scripture. Further statistical analysis of the works of single authors who wrote in many different genres could speak to the likelihood of this explanation. The example of Joanna Southcott is insufficient to explain the variety of styles in Mormon scripture. Do any other authors show a change of 8.8 units simply through changes in genre or through imitation of other styles? How many have done it? How successfully? Superficially it seems unlikely to be the sole source of variation, but further testing would be required for any statistical certainty.
(4) I am unaware of any historical evidence claiming that Joseph Smith did not dictate essentially all of Mormon Scripture (of course there are a couple of other-authored and perhaps co-authored sections in the Doctrine and Covenants, but I'm assuming Holmes correctly excluded those from his analysis). I am unaware of any contemporary historical evidence showing Joseph to have used external, modern sources for Mormon scripture. Third parties who were not present at the events of translation have claimed Joseph used the work of other authors. For these claims to fit the data, Joseph would have had to memorize long passages of unknown documents by these other authors. Alternatively, if Joseph didn't simply copy the work of other 19th century authors, you are back at (2), with Joseph's genius creating new content and new styles--even if based on documents extant in 19th century New England. We will have to keep these requirements in mind when we examine various authorship claims, after having looked at all of the stylometry papers.
(5) The effects of scribes on authorship styles is worth serious consideration, but at first glance seems unlikely as the explanation for the vocabulary richness variation in Mormon scripture. Almost everything Joseph Smith authored was dictated to scribes. The Book of Mormon largely to one scribe. His diaries and revelations to many scribes, with a small portion in his own hand. The Doctrine and Covenants and Book of Abraham to other scribes. Despite this, there appear to be clearly separate signals--even if broad compared to Isaiah--for Joseph Smith's personal writings, for the Doctrine and Covenants, and for much of the Book of Mormon. Further controls might establish this with greater certainty, but it appears that scribal influence does not erase the vocabulary richness signature
(6) With (1) and (2) only partially testable (i.e. it may be possible to clearly establish multiple authorship without identifying the authors), (3) and (5) unlikely at first glance, and (4) contradicted by the best historical evidence, some form of fraud seems the most statistically and historically tenable alternative to (1) and (2). Joseph Smith could have disguised his vocabulary richness signature multiple times for multiple different authors. Some evidence, although using different metrics for authorship attribution, suggests that authors can intentionally disguise their own stylistic signal, and even copy another stylistic signal, if they are intending to deceive (or imitate. Southcott's prophetic voice is, according to Holmes, an example of imitation). I will look at the above paper in a future post. This topic is worthy of further examination, because it is subject to a degree of objective verification. There are a number of conditions necessary for Joseph Smith to have intentionally and successfully created documents with fraudulent authorship signatures, so it may be possible to assign a statistical probability (or improbability as the case may be) to this scenario. Various authors throughout history have attempted to create new voices for their narrators within various books. In theory, these are subject to statistical examination, and I will look to see if I can find what is known about these shifts in narrative voice. As with (1) and (2), fraud of this kind can't be proven through statistics, but finding another author who successfully changed his or her vocabulary richness signature to the same degree as found in Mormon scripture would strengthen the statistical case for (6) while weakening it for (1) and (2). Failing to find such an author does the opposite for anyone who has not ruled out a priori the alternatives in (1) and (2).

