Thursday, July 10, 2014

(Nephi != Alma) && != Joseph Smith

We've looked at all of the later studies, found some good things and some bad things about them, but we've consistently seen evidence supportive of multiple authorship for the Book of Mormon. Supposedly since the later studies were later, they should have built upon and improved upon the earlier studies, giving us new information. Unfortunately, neither Holmes nor Jockers et al. did enough controls to adequately interpret their data, since both assumed a priori that authorship was 19th century. Schaalje and Fields targeted their studies to a broader academic audience and resisted making any claims about Book of Mormon authorship beyond its not being from the proposed 19th century authors. So if we want to conclude anything about the authors who did write the Book of Mormon, we have to go back to the papers published primarily for an LDS audience. Here is the later of those:

On Verifying Wordprint Studies: Book of Mormon Authorship
John L. Hilton

I'm still working on acquiring the paper that justifies the details of the methods employed here, but having read other books and papers on stylometry I'm quite confident that the methods are solid. So here's the summary:
  1. Use blocks of 5000 words. This is a substantial sample size, but not too large. Many papers refer to sizes of 2,000-6,000 words being sufficient or ideal for authorship identification. Holmes used 10,000 words for vocabulary richness.
  2. Use pairs of non-contextual words because such word pairs are known to be sensitive measures of authorship and very difficult to imitate, even for skilled authors.
  3. Choose texts of the same genre. Genre shifts have the biggest influence on contextual words, but also influence non-contextual word patterns in some cases. Just to be safe Hilton focused on didactic religious writings.
  4. Do lots of controls. Make 325 pairwise comparisons of 26 texts by 9 authors. Determine how many of the stylometric features agree and how many disagree between each pair of texts.
  5. Compare works translated into English all by the same translator, and also compare them with original works written by the translator.
  6. Once you know what your method can and can't tell you, apply it to texts in the Book of Mormon and by 19th century proposed authors.
The results follow. First is this figure showing the 325 control comparisons:

What this shows is a simple count of how many of the 40-47 stylometric features capable of comparison between each pair of texts don't match. If a stylometric feature doesn't match it is called a "rejection" and counted. The black bars are comparisons between works by the same author. The grey bars compare works by different authors. First note the overlap between the grey bars and the black bars. You are seeing that texts by the same author can be a little different in their stylometric features, averaging statistically significant differences in 2-3 features. Also, texts by different authors can be stylometrically similar, with as few as one or two rejections in unusual cases. This means we can only be completely confident that two texts are by the same author if there are zero rejections. But if there are more than seven rejections it is nearly certain that the two texts are written by different authors. This means that Hilton's methods aren't all that great at telling us if texts are by the same author (although they probably aren't worse than other methods if there is a closed-set of candidates). Fortunately, we can test several Book of Mormon authorship hypotheses by figuring out who didn't write it. Once you have identified 7 statistically significant differences between stylometric features, there isn't a single case of the two texts having been written by the same author. In fact, if there are 7 or more differences, the chance that the texts were written by the same author is much less than 1%.

Only two authors in the Book of Mormon wrote 10,000 or more words in the same genre--Nephi and Alma. The Book of Mormon attributes to each approximately 15,000 words of didactic religious writing, allowing for three, pairwise comparisons of blocks of text for each author.

None of the comparisons agreed in all of the stylometric features (black bars), so it's remotely possible that a single author didn't write all the words attributed to Nephi, and that Alma didn't write all of his words, either, but none of the pairs differ in more than five features, and the distribution parallels the distribution of rejections consistent with single authorship for each set of texts. More interesting are the nine, between author comparisons.
Notice in this graph that four of the nine comparisons show 7 to 10 rejections. This is as good as certainty that the writings of Nephi and Alma were at least two separate authors. Remember, that while Holmes criticized this work, his data also show Nephi and Alma as distinct. Jockers et al. identified a stronger signal for Rigdon in 1st Nephi and a signal for Spalding in Alma, again confirming multiple authorship styles, even if the data don't really support the Rigdon-Spalding hypothesis. That's three independent groups identifying Nephi and Alma as different authors, with Schaalje and Fields refraining from comment.

Unfortunately, as regards the ability of a translator to maintain different styles for the translated authors, we are for the moment left to believe the reports that stylometric features can survive translation. This is supported by a study in stylometric obfuscation (related studies can be found at Some have argued that certain stylometric features measured by Hilton and others would not have existed in the original reformed Egyptian or Hebrew. It is worth noting that the automated translation in the linked study went from English to German to Japanese and back to English. Despite these changes through grammatically distinct languages, many stylometric features survived the process in tact.

This final table summarizes the 19th Century author and Nephi/Alma data.

While there may be a little question as to whether Nephi is like Nephi, it is very clear that Nephi is not Alma, Smith, Cowdery, and especially not Spaulding. It's remotely possible that Alma is not like Alma, but it is statistically indisputable that Alma is not Nephi, Smith, Cowdery, or Spalding. It is unfortunate in some respects that Rigdon was not included here, but the work of Schaalje and Fields remedied that oversight, as we saw previously.

Summing it up

Nephi is not Alma is not Joseph Smith is not Oliver Cowdery is not Solomon Spaulding. The data of every study we have examined thus far either supports these conclusions or does not contradict them. This is true with various statistical methods and three different types of stylometric features: non-contextual words (Jockers et al.), vocabulary richness (Holmes), non-contextual word pairs (Hilton), and all three combined (Schaalje and Fields).

Hilton did continue to look for an author who could change his style to fool stylometry, and we will look at how William Faulkner succeeded and Mark Twain and Robert Heinlein both failed in another post.

No comments:

Post a Comment