Monday, May 12, 2014

Written by Persons Unknown

  • Open-set nearest shrunken centroid methods show that the Book of Mormon was not written by any of the proposed 19th century authors--at least not without some unexplained (and likely unprecedented) amount of disguising or modification of style.
  • Schaalje and Fields extended the numbers of stylometric features by including not only 74 of the noncontextual words used by Jockers et al. (removing contextual words retained by Jockers et al.), but also two of the vocabulary richness features used by Holmes, and a number of noncontextual word pairings used by Hilton and coworkers. As we learned from the adversarial authorship studies, increasing the number of features studied improves our chances of identifying the correct author.
  • Just like the Holmes and Jockers et al. studies, the data of Schaalje and Fields clearly support multiple authorship for the Book of Mormon. 
In the last post we saw how, if a text is written by an unknown author, the stylometric features usually don't line up with the candidate authors in a study. Schaalje and Fields looked at the Book of Mormon texts several different ways. They simply cut the Book of Mormon text into 2000 word chunks. They divided it like by chapter, just like Jockers et al. They compared it with the writings of Rigdon, Cowdery, Spalding, Pratt, and Smith. They pretended they 2 of these 5 authors were unknown and tested those 2 authors and the Book of Mormon texts against the 3 remaining authors. Every single time the Book of Mormon came up as different from the test authors. Here is a summary figure for the comparison with 2000 word chunks of text:

Apparently Jockers et al. had the data to figure out they were mostly measuring false positives right in front of them. Besides using the NSC method, they also used a method called Delta. One of the originators of this method determined empirically that Delta values between 0 and -1.9 were most likely false positives. Schaalje and Fields reproduced the Delta calculations of Jockers et al. and produced this graph:
You will notice that only 16 of the 239 chapters are higher than 1.9 (the graph made everything positive), and 10 of those are quotes from Isaiah and Malachi. That's over 93% likely false positives. Oops. The fact that Delta agreed significantly with their closed-set NSC results did tell them something--just not what they claimed it did.

The most interesting of the Schaalje and Fields figures for us Book of Mormon aficionados is this one:

You can see that Isaiah/Malachi significantly overlaps a small portion of the Book of Mormon chapters. Late Sidney Rigdon overlaps a few. You can also see that almost all of the Book of Mormon chapters are different in style from any of the candidate authors--and that's only looking at the first two principle components. Adding additional principle components would only emphasize differences further. In addition, this plot gives another representation of how much each individual author's style varies. When you draw boundaries around each author's samples you get areas that are roughly comparable in size. Some are bigger, some are smaller. None of them are close to the size of variation we see in the Book of Mormon samples. If the Book of Mormon were written only by authors whose styles had zero overlap, these data would indicate at least 4 or 5 different authors, even allowing for style to change with age of the author. If we allow for overlap like what is seen for the five 19th century authors, the Book of Mormon could easily have been written by 20 authors. Schaalje and Fields don't make that claim in either of these peer reviewed papers, it's not what the method was looking for, but we don't have to be geniuses to see it ourselves.


  • We've looked at studies from three different research groups. 
  • Two groups were critical of LDS claims of Book of Mormon authorship. One group was pro-LDS authorship claims. 
  • The results of all three groups support multiple authorship for the Book of Mormon.
  • None of the groups support viewing Joseph Smith as a primary author of the Book of Mormon (although vocabulary richness gives limited overlap between the Book of Mormon and the Doctrine and Covenants).
  • Vocabulary richness tentatively distinguishes styles for Nephi, Lehi, Abraham, and Alma/Mormon, and maybe others based on the first 3 principle components.
That's a pretty clear picture forming, whatever the rhetoric put forward in the background and conclusions sections of the various papers. It looks like two possibilities remain for rational critics of Joseph Smith's authorship hypothesis, namely, multiple non-19th century authors. The first comes from adversarial authorship studies. It is possible for some authors to change and disguise their styles according to a wide variety of measures. If critics could show historical evidence of Joseph Smith having attempted to do this that dates to close to 1830, that would be telling. It would be best if they could show that he was trying to imitate specific authors. Maybe comparing styles in the Book of Mormon with styles of specific books in the King James Bible would be good test, or with some of the books found by the Johnson's to have linguistic similarities with the Book of Mormon. Authors did best at disguising their style if they were imitating someone else, so if different voices in the Book of Mormon matched different, non-candidate voices from the 19th century, that would be telling. If they could show another author who has successfully created a book with such a wide variety of statistical authorship styles as is found in the Book of Mormon, that would be telling, too, even if it weren't proof. If they could even write a book of similar length with as wide a variety of styles, that would be something. If they could find a single authored book with this variety of styles then at least the Book of Mormon wouldn't be unique. Based on the data so far, single authorship and 19th century candidate authorship are really extraordinary claims in need of evidence.

The second alternative, although somewhat tenuous, was put forth by Jockers et al. Maybe the text is collaborative from the 19th century. Unfortunately for critics (please remember not to include Jockers among them--he really doesn't care), they need to provide solid control data for what collaboration does to style. Then they need to show that collaboration can shift style so that it is unrecognizable as belonging to either of the collaborators, and that it can spread the styles out to look like at least four or five completely distinct authors. Collaboration is still a possible explanation, as is disguised and varied styles created by Joseph Smith. It seems, however, that examples of single or dual author texts with stylometric features like the Book of Mormon are as rare as angels and gold plates--maybe more.

Sunday, May 11, 2014

Mother's Voice

Since my poem submission was not selected for the A Mother Here contest, I can now post it. This isn't the version I submitted, but I now like this sonnet better.

Mother's Voice

Before I was, I heard you call:
Come be My son.
There is no safety here,
no sameness,
no return,
But I will teach you freedom.

Mother, I listened. I came.
I’m listening. I’m coming.
I heard—I hear your voice—
Love and be Free. . .

I followed others who had gathered round
  Into your presence, learning at your feet.
  I came too late to hear my brother speak—
Great Abraham above the hallowed ground—
Today we are engaged in a great war,
  That all who dwell on earth Gods’ joy will prove.
  That this world, under God, shall have new love
A birth of freedom greater than before.

Though lives had passed, You thought that I should keep
  The words he spoke—Our Parents brought forth our
    New World, conceived in Liberty. This World
    From its conception dedicated toward
  This propositionall Gods' children are
Created equal, born for freedom. Free.

My favorite poem from among the winners and honorable mentions was by Rachel Hunt Steenblick. This is one of my favorite paintings, and here's another. I hoped this contest might give us more poems that would inspire theological connection to Heavenly Mother, and give us material to talk about in Sunday School and on blog posts to heighten our awareness of Heavenly Mother in Mormonism. I still need to read the other 24 selections, but 6 poems in one day is enough to digest. My first impression is a much more emotional attachment and awareness of Heavenly Mother in the poems and art. Maybe this is what we need first for us to seek out revelation of Heavenly Mother and better understand her place in the eternities. Also, it's been a long time since poetry like Eliza R. Snow's has been considered great, new art.

Friday, May 9, 2014

Closed- v. Open-Set Authorship Attribution


  • Bruce Schaalje and Paul Fields have published two papers in trade journals for linguistic computing. From the professional end, they improve on the Nearest Shrunken Centroid methods for authorship attribution first employed by Jockers and coworkers, incorporating tests to determine if a closed-set method is appropriate or if open-set methods are required.
  • Schaalje and Fields show that their new methods work well for a number of test cases where the methods of Jockers et al. fail. Included among these is a test showing that, when the same closed-set of authors is used as was used to determine supposed Book of Mormon authorship, Sidney Rigdon is shown to have written over half of the Federalist Papers penned by Alexander Hamilton.
  • Closed-set methods are misapplied to open-set questions. Next time we will see that the Book of Mormon is an open-set question.
Extended nearest shrunken centroid classification: A new method for open-set authorship attribution of texts of varying sizes
G. Bruce Schaalje and Paul J. Fields (not free)

Open-Set Nearest Shrunken Centroid Classification
G. Bruce Schaalje and Paul J. Fields

Apparently computational linguistics journals work differently from chemistry journals. Studies can take years to get through the queue before publication. These studies, and the 2013 Jockers paper were apparently completed some years before. That said, they are the latest word in peer reviewed journals on Book of Mormon authorship.

I'm going to combine my discussion of these two papers and break it into two parts. The first part will illustrate the failure of a closed-set nearest shrunken centroid (NSC) approach to some known problems, and the success of Schaalje and Fields's open-set modification of these methods. The second part will examine what the open-set modification reveals about Book of Mormon authorship.

How does the open-set NSC work? Basically, start the closed-set analysis. To do this you calculate all of your features for your set of candidate authors and make a set of multidimensional vectors--a bunch of arrows pointing off in different directions. An author can be recognized because all of that author's arrows are approximately the same length and point in approximately the same direction. You also calculate the features for the texts your want to classify in the same way. In the closed-set approach, you measure which author has arrows closest in length and direction to the unknown text and you assign the text to that author. In the open-set approach you look at the arrows before you assign an author. If the arrow from the unknown text isn't close to the arrows of any of the candidate authors then you stop. You say that it looks like an unknown author. Of course the arrow being close to a known author doesn't mean that author wrote it, but if it doesn't agree with any, you need to keep looking.

Arrows represent the stylometric features vectors of four hypothetical authors
The same thing can be done in classifying tumor cells--in fact, it was first done with tumor cells and not with authorship. What you are looking at in this first figure is a principle components analysis of some tumor cell DNA data. I will probably get some details wrong in my explanation, but I'll try to help you get the essentials. Instead of noncontextual words or vocabulary richness as features, short DNA sequences are used as features. Sequences from four different tumor types were measured. Three tumor types (1-3) were used as the training set. Using closed-set NSC of course assigned all of the fourth tumor type to one of types 1-3 in the training set. Using an open-set NSC method resulted in the graph shown here:

Since we can't show 150 dimensions on a 2-dimensional plot, some math is done in the background to pick the 2-dimensions that best differentiate among tumor types. These are the 1st and 2nd principle components of the arrows I talked about before. The circles represent the tips of the arrows for tumor types 1-3. You can see that tumor type 4, indicated by +'s, overlaps with the others, so the feature differences aren't big. Despite these similarities, the open-set NSC method classified 95% of the fourth tumor type samples as not belonging to types 1-3. Take a note: that's 95% correctly identified as different compared to 0% by closed-set NSC.

How does open-set NSC do with authorship? A common benchmark for new authorship attribution studies is the Federalist Papers. In my unsystematic foray into authorship attribution, I have found no fewer than 6 new methods tested on the Federalist Papers, and many papers that cite the earliest study on the Federalist Papers by Mosteller and Wallace, and it's not uncommon to find references to a study done by Holmes in the 1990s. The papers by Schaalje and Fields are one of the new methods. Let's see how closed-set and open-set NSC do. I'll summarize. you can find, or I can send you, the papers if you want to see the details.
  • All the arrows for the 12 disputed Federalist Papers fall within the range of arrows for Hamilton and Madison, so a closed-set test is appropriate.
  • All 12 of the disputed papers were assigned to Madison, in agreement with the majority of previous studies by other methods.
  • Pretending that they didn't know who wrote the five papers by Jay, the arrows for those papers don't match up with the arrows for Madison and Hamilton, so the open-set method tells you to go look for another author.
That's the background. This is where it starts to get interesting for the Book of Mormon enthusiast (or critic). Schaalje and Fields took the same set of candidate authors (plus a couple) used by Jockers et al. for testing Book of Mormon authorship: Joseph Smith, early Sidney Rigdon (1831–46), late Sidney Rigdon (1863–73), Solomon Spalding, Oliver Cowdery, and Parley P. Pratt. Rigdon is split into two because it is well known that people's writing styles change over time, and he wrote over a long time period. Shaalje and Fields then tested who wrote the 51 Federalist Papers written by Alexander Hamilton. Here are the results of how likely it is that Rigdon wrote each of the papers:

Oops. Looks like Rigdon wrote at least half of them. Let's look at what would have happened if a visual, principle components analysis of the stylometric features had been performed first:
As you can see, Hamilton is clearly not any of the candidate authors, and embarrassment is averted. Only 2 of the 51 texts were assigned to Rigdon, and 49 were assigned to an unknown author. If you add the first 25 Hamilton papers to the set of candidate authors, then you can see that the signal from the remaining 26 moves from being a cloud by itself (+'s on the left) into the middle of the set of candidate author vectors (+'s on the right graph), and every one of the 26 is assigned to Hamilton.

[Technical note: You may have noticed that the values for the training set (candidate authors) shift in each graph. This is not because the values for the stylometric features change (although they may, somewhat, with NSC analysis since the training set has changed), but because the principle components change. I can discuss this with anyone who wants to understand it better.]

As one last technical test, Schaalje and Fields added a further calculation to the model to account for text length. It improved the results on a stylometrically ambiguous data set, but did not solve the problems resulting from small text lengths. What this means for us is that short chapters in the Book of Mormon are more likely to have false positive assignments than longer ones, even with open-set NSC methods.

The take home message is, closed-set NSC can give 100% false positives if the right checks aren't made. Instead of making these statistical checks, Jockers, Witten, and Criddle relied on subjective historical analysis. Next we will see what they would have learned had they made one simple, statistical check, and see what Schaalje and Fields found using open-set NSC methods on the Book of Mormon.

Wednesday, May 7, 2014

Archaeology of Zion

One sentiment I've heard expressed claims that the value of the Book of Mormon is independent of the truthfulness of its historical narrative. Beyond helping literally minded people like me to maintain belief in Mormonism, it really doesn't serve any purpose to believe in the historicity of the Book of Mormon. In fact, believing in it is harmful, as evidenced, for example, by the pride members of the LDS church express in being the "one true church" and having the "most correct book". I'm going to push back a bit. I believe Mormons and Mormonism would be better served by reading history into the Book of Mormon more rigorously. What would this accomplish?
  • We could teach more strongly that Book of Mormon prophets were real people with real biases. We could explore in Sunday School--or at least in Seminary and Institute--how those biases shaped their messages, and how God interacted with real people instead of with glorified heroes.
  • We could acknowledge that Nephi's believing in a universal flood and in the parting of a huge sea are stories that were already exaggerated by the time Nephi learned them, and we wouldn't have to believe every word as literal truth. We could take it as obvious that there were lots of other people on the American continents when the Book of Mormon peoples arrived, and that the whole story is one of two small groups of rulers.
  • We could see how symbol and story and myth have been used throughout history and give greater place for symbolic readings without having to fight about whether every word is literal and inspired direct from the mind of God. We could apply our own modern history of the complexities of prophetic utterances to add nuance to our understanding of the Book of Mormon. If I understand it correctly, this is what many faithful (or would-be faithful) Latter-day Saints want--room to question and explore. Room to believe that not every word is literal truth. Room to read beyond the narrow and hurtful messages propagated by shallow, ethnocentric readings of the Book of Mormon. Recognizing that the Book of Mormon prophets were sometimes shallow and ethnocentric, but still good and inspired, could allow us to do this.
  • We could find greater understanding through recognizing the ancient (rather than 19th century) Biblical roots. For example, here is Daniel C. Peterson teaching about symbols of a Divine Feminine in the Book of Mormon. How many of you knew that was there? Of course you could draw the comparisons without believing Nephi was from the Middle East in 600 BC, but then you are just playing literary games rather than claiming that Nephi--a real prophet--believed in female representations of deity and was willing to talk about them and rhapsodize for his posterity and millions yet to be born over his visions of the female divine. Historicity allows us to read one more female presence into the intensely partriarchal Book of Mormon. That's harder to do if we lean toward 19th century origins, however miraculously inspired we want to make them.
  • Lastly, if the Book of Mormon is about a real, historical time and place, then archaeology and cultural history can teach us better how to understand and interpret the Book of Mormon. Maybe we can even learn something about building a real, future Zion society. I want to take a first pass at this, now.
The Book of Mormon reports a brief period--approximately 3-4 generations--when the Book of Mormon peoples achieved Zion:
. . . the people were all converted unto the Lord, upon all the face of the land, both Nephites and Lamanites, and there were no contentions and disputations among them, and every man did deal justly one with another. And they had all things common among them; therefore there were not rich and poor, bond and free, but they were all made free, and partakers of the heavenly gift. . .
. . .there was no contention in the land, because of the love of God which did dwell in the hearts of the people. And there were no envyings, nor strifes, nor tumults, nor whoredoms, nor lyings, nor murders, nor any manner of lasciviousness; and surely there could not be a happier people among all the people who had been created by the hand of God. There were no robbers, nor murderers, neither were there Lamanites, nor any manner of -ites; but they were in one. . . (4 Nephi 1:2-3,15-17)
There is our goal--to become a Zion people and be one with each other and with God--and we've got a handful of verses describing this most important event in completely generic terms.

How did they do it?!! Has that ever bothered anyone else? I don't think I'm unique in asking this question, but now maybe I can partially formulate an answer. Here are some quotes from Mormon's Codex in which John Sorenson describes some of what is known regarding the cultures in the most likely Book of Mormon lands in the period from 1-200 AD.
Dahlin et al. spoke of the "collapse of Terminal Preclassic civilization" (ca. AD 200-300), an event characterized by "severe population reductions, site abandonments, an increasing balkanization in material culture, and disruption of interregional communications networks." The total impact was catastrophic: "The effects of this collapse were almost as calamitous as those resulting from the Late Classic Maya civilization." Bauer refers to approximately the same period in the Maya area when he reports "profound historical, ideological, demographic and socio-political changes that occurred at the start of the Early Classic." To Braswell the area around Guatemala City exhibited "great disruption: population levels dropped, construction decreased, literacy and [the] carved-stone sculptural tradition disappeared," and so on. (pp. 636-7)
Precise dating on these events is very difficult, but natural disasters close to 50 AD have been proposed as significant causes of the collapse. To illustrate how serious the impact can be from a large volcanic erruption, Sorensen summarizes the effects of a later event on Mayan civilization:
". . . the more potent blow to the entire southern Maya realm" was that the system of trade collapsed. This resulted in "the political destabilization and decentralization" of societies over a wide area, entending even to a "weakened Kaminaljuyu" in the Valley of guatemala. . ." [a few hundred miles away].
Back to our Zion time period:
Population declines and cultural fluctuations such as those contributing to the sudden decline in level of civilization evident in the Santa Clara period at Kaminaljuyu [likely location of the city of Lehi-Nephi] plausibly could have resulted from one or a combination of natural disasters. (p. 646)
There is significant evidence of large scale volcanic activity in the middle of the 1st century AD--possibly multiple volcanic events. Lots of things changed around 50 AD in southern Mexico and the Guatemala highlands. Among them were the disappearance of evidences of certain cultic practices:
  • figurines (mainly female) of modeled clay
  • three-pronged incense burners (in several base forms)
  • flat-stemmed and roller stamps
  • effigy whistles
  • stelae (both plain basalt columns and sculpted pillars)
  • inscriptions
  • tombs
  • "mushroom stones" and, perhaps, cultic ingestion of mushrooms (p. 650)
While it's not possible to say what this means, it is further evidence of major cultural changes in the mid-1st century.

What else happened after the disasters?
. . . a period of retrenchment was ushered in throughout Mesoameric. Among the results was that in the second half of the first century AD and throughout the second century a process of cultural and political fragmentation prevailed. Each sizable community became more or less a center of power unto itself. No doubt the reduced population owing to the recent natural disasters forced remaining local leaders to focus more on internal problems than on external relations. The disasters would also have drastically disturbed previous patterns of commerce, rendering old intersociety tensions merely minor concerns. (p. 653)
Each political unit was based on a related-but-separate-and-equal status. (p. 654)
The exceptions to this rule were a few big nothern centers that eventually became great foci of power--Teotihuacan, Cholula, and perhaps Monte Alban [all north of the narrow neck of land, and not likely Book of Mormon cities]. But even those places offer little evidence that focused political power was involved in their growth during the period from AD 50 to 200. For example, Cowgill suggested that at Teotihuacan [in Mexico Valley] power may not have been concentrated in a few ruling hands but may have been dispersed in a council style of leadership, such as that which prevailed at Cholula at the time of the Spanish conquest. (p. 654)
This dispersal of power to small units may have been an important factor in the ability of the Book of Mormon peoples to build Zion. In addition,
. . . competition for (and conflict over) agrarian land would have decreased with a smaller population. This brief period shows no archaeological signs of intercommunity aggression. (p. 655)
Instead of constructing palaces and other major civic works,
Energy might instead have focused on rebuilding the population, settling the forms of damaged or modified social norms and institutions, and renewing the agricultural and craft infrastructure. There were apparently few or no upper social ranks demanding surplus energy and wealth for prestige projects. The limited trade, which had always aimed at providing goods of mainly elite concern, would have shrunk the "social overhead" of local societies considerably. Beyond AD 50, the bare bones of everyday social control--through kinship, tribal, and community-level structures--could have taken care of most of the tasks that government under an elite class had formerly carried out. (p. 656)
That's it. That's the total of relevant information I could glean from Sorenson's summary that was relevant to our understanding of Zion. It's hardly a drop, but it at least doubles what we knew before about building Zion. Apparently we need to:
  • Diffuse and decentralize power, ruling through councils and cultural influence rather than through social rank or military or political might
  • Eliminate the social distinctions resulting from wealth
  • Focus on growing agricultural and craft infrastructure
  • Eliminate competition for our means of livelihood
What did it take for this to happen?
  • Natural disasters resulting in severe depopulation and geographical fragmentation of societies
  • Enough resources for growth without competition
  • Enough pressing demands on time and energy that significant social distinctions and wealth were insupportable
I originally liked the idea that archaeology might teach me how to build Zion. I didn't expect this depressing message:

If you want to build Zion, you have to wait until natural disasters depopulate the earth, overthrow governments, and erase social distinctions.

Maybe there is another way. Maybe the Book of Mormon is a warning. Maybe it says, either you can build Zion the way they did in 50 AD in a small region of Mesoamerica, or you can figure out how to get there another way. These are the conditions you need to meet. If you don't want Me to make it happen through natural disasters (echoes of the apocalypse, anyone?) get to work and figure out how to do it on your own. It has to happen. Your choice.

Maybe a mythical and symbolic Book of Mormon can save us. Maybe a historical Book of Mormon can give us clues to save the world.

Monday, May 5, 2014

Looking for 19th Century Authors


  • Jockers, Witten, and Criddle attempted to address the question of whether two, 19th century authors combined might have produced the Book of Mormon, thus explaining why earlier studies found that the Book of Mormon styles didn't match single 19th century authors.
  • They wrote a long paper that failed to demonstrate anything. At least 19.8% of unattributed Book of Mormon chapters are objectively mis-attributed. A significant percentage of others are plausibly mis-attributed after examining the results of Jockers's 2013 paper.
  • Without even examining the papers produced by BYU and Maxwell Institute authors, only those committed for subjective reasons to the idea of purely known, 19th century authorship have any grounds for believing a purely 19th century authorship hypothesis, unless . . .
  • . . . Joseph Smith created multiple fraudulent authorship styles. Of course, without evidence of Joseph attempting to make up different styles, this is pure speculation. It could be true, and if it is, Joseph, I can't love you for it--but I can respect your ever growing genius. . . (Preprints available)
Jockers, Matthew L. “Testing Authorship in the Personal Writings of Joseph Smith Using NSC Classification.” Literary and Linguistic Computing. 28.3, (2013): 371-381
Jockers, Matthew L., Daniela M. Witten, and Craig S. Criddle. “Reassessing Authorship of the Book of Mormon Using Delta and Nearest Shrunken Centroid Classification.” Literary and Linguistic Computing, 23.4 (2008): 465 – 492.

The 2008 Paper

If you take the time to read the background presented in any of the stylometry papers, you will find they are all very self-serving. That's normal for academic papers. They want to play up the importance of their contribution as much as possible. You have to be a salesperson if you want to keep getting paid. This paper from Jockers, Witten, and Criddle spends a lot of time pointing out the flaws of earlier studies. Many of the things they point to as flaws are, to my mind, not flaws at all, but are instead asking different questions. They criticize all three of the earlier studies for two things: 1. 'arbitrarily' dividing up the Book of Mormon text according to the authors indicated in the text, and 2. not considering that two 19th century authors might have written the text, together, thus giving a stylistic signature that is different from any individual author. I have three reactions to this. First, it is a valid question to ask if the styles of authors indicated by the text have distinct and self-consistent styles. The earlier studies did this in logical ways, and that is hardly a fault. It admittedly doesn't cover all possibilities for Book of Mormon authorship. Second, what does a combined or collaborative authorship signature look like? Third, the question of dual authorship is a valid question to ask. I won't go into the more detailed criticisms. Many of them are valid but have almost no consequence on the conclusions we can draw. I'll be happy to reexamine and discuss them on an individual basis upon request. Let's take a look at the results.

The authors use two methods, Delta and Nearest Shrunken Centroid. The first is an established method for authorship attribution. The second is borrowed from other statistical comparison problems. Both certainly have potential to illuminate authorship questions, so we will trust that the authors can do the math and programming right and skip straight to how the question is framed.

The samples of unknown authorship were the 239 chapters of the modern Book of Mormon (but with the 1830 text). Ultimately, 110 words (mostly noncontextual) were used to characterize the texts. These 239 texts were compared against 7 known authors: Oliver Cowdery, Parley Pratt, Sidney Rigdon, Solomon Spalding, Isaiah and Malachi, and Joel Barlow and Henry Wadsworth Longfellow as controls. The test was designed to rank which author's style was closest, 2nd closest, 3rd closest, etc. to each of the 239 unknown samples. Before running the test on the Book of Mormon, the authors did an important control.

There were 217 samples from the 7 known authors. Jockers et al. removed a small number of known samples from the 217 member training set, say 17 of them. They pretended like these were unknown, and classified them based on the remaining 200. In doing this a number of times, they only misclassified texts 8.8% with the NSC method. They got the author right over 90% of the time. Remember this number when we start looking at the Book of Mormon results. Jockers et al. were pretty thorough with these internal controls, and ran their comparisons a number of different ways to see if minor variations in how they formulated the questions resulted in big changes in the results. Bottom line was they didn't, so I will focus on just the main set of results.

# of Book of Mormon Chapters Assigned to Author
Proposed Author
1st Choice
2nd Choice
Isaiah & Malachi

This is an excerpt from the first table in the 2008 paper. We see that 38.9% of the chapters were assigned to Rigdon, 26.4% to Isaiah and Malachi, and 21.8% to Spalding. We know Isaiah and Malachi are quoted extensively in the Book of Mormon, and this powerful method (remember it got the authors right over 90% of the time on the test sets) identified Rigdon and Spalding as the authors of 60.7% of the text. That's most of the text that isn't Biblical quotes. With evidence like this, the Rigdon-Spalding theory of authorship looks pretty good, despite the tenuous historical support for the theory. Some problems do start to appear when we look closer, however. Isaiah and Malachi only wrote 36 chapters in the Book of Mormon. That means 13.3% of the chapters not written by Isaiah and Malachi are falsely attributed to them. Add to that the 2 assigned to Longfellow and the 9 assigned to Pratt, and 18.7% of the unattributed chapters are known false positives.

Now let's remember the initial test set and the results of the 2013 study. Only 8.8% of the test texts were assigned incorrectly by NSC. We know, based solely on chapters we know weren't penned by Isaiah, Malachi, Pratt, and Longfellow that the method is getting it wrong 18.7% of the time. From the 2013 study we saw that Rigdon and Spalding showed up as false positives 19.8% of the time, and that was in a data set where the actual author was nominally known. In addition, Cowdery showed up most often when he wasn't even scribe for the texts--31.3% of the total texts. We also saw that the NSC method got it 'right' only 15.6% of the time. No matter how you spin this, the chance that a large percentage of these Book of Mormon chapters are mis-attributed is far from zero. Even authorship pairing doesn't help. In the 2013 study, Rigdon and Spalding showed up paired 5 times for texts dictated to 5 different scribes. One of those was for Smith dictating to Cowdery, so maybe by a stretch you could say that the strong Rigdon-Spalding signal is actually evidence of Smith dictating to Cowdery--except no other previous group found a match between those dictated diaries and letters and the Book of Mormon. The only thing I can possibly see us learning from finding strong Rigdon, Spalding, and Cowdery signals in the Book of Mormon is that there are at least three distinct authorship styles in the Book of Mormon. And that's just from looking at the results produced by Jockers and his coworkers. If we look at the results from their Delta method, the picture looks even worse. Isaiah and Malachi wrote almost half of the Book of Mormon by that method. Then there's the trump card.

Jockers et al. asked their authorship question with closed-set methods. From the very start their method says: "One of these seven authors wrote each chapter of the Book of Mormon." This fails to answer the question of Book of Mormon authorship on two levels. The first is internal to their framing. Jockers et al. suggested that the Book of Mormon might have been written by two or more authors in collaboration. Besides being unable to demonstrate such collaboration historically, they didn't present a single control experiment illustrating what happens to stylometric signatures when authors collaborate. They only looked to see if chapters could be assigned to one author, not to collaborating authors. Finding authors ranked first and second only tells you that those authors both have similar styles to the passage, not that those authors collaborated to produce it. The second way it fails is best illustrated.

A closed-set method will always give a positive answer. Here's a simplified visual representation of what this closed-set study has definitively shown us:
I've collapsed the many dimensional stylometric measures of the candidate authors into 1 dimension. In reality, they will overlap in different ways in different dimensions, but the concept is easier to visualize in two. Each author has a distinct style with some overlap with one or more candidate authors. The blue numbers represent hypothetical styles of chapters of the Book of Mormon. What a closed-set study tells us is not whether the chapter has the same style as a candidate author, but whether it is closest to that author. For example, chapter 1 would be assigned to Cowdery, chapter 5 to Pratt, 6 to Spalding, etc., despite none of the chapters matching the styles of the candidate authors. Without some absolute measure of style, we just don't know how good the matches really are.

There is one important thing we can conclude from the study--Book of Mormon stylometric signatures span the range of at least 4-5 normal authors. This conclusion is 100% consistent with what we saw from Holmes's studies.


Anyone who thinks this study shows anything conclusive about the Book of Mormon, except confirming that it is likely the product of at least four different authors, is not looking at the data critically or objectively. Even if the person is rational in believing that the Book of Mormon is a purely 19th century product, this study gets them no closer to identifying who the 19th century author or authors were. Absolute measures of style, measures of what happens to style when authors collaborate, or an open set method, would be required to give us that knowledge.

Next Time

I'll start looking at what you learn when applying open-set methods to Book of Mormon authorship questions, where you allow the possibility that at least one unknown author might have written the text.

Friday, May 2, 2014

Who Wrote What Joseph Smith Wrote?


  • Joseph Smith appears to have a naturally diffuse or generic style when compared with other 19th century New Englanders and early Mormons as measured by noncontextual word use.
  • Smith's style significantly overlaps Cowdery, Pratt, Rigdon, and Spalding.
  • Having dictated so many of his personal writings through scribes, Smith's already diffuse or multiply overlapping style may have been further diffused through the influence of individual scribal styles.


I've dreaded writing about these papers, a little, because I find the conclusions embarrassing, frankly. I'm going to give Matthew Jockers credit I think he is due, up front. In requesting access to these papers from him, and then asking him a follow up question regarding his results, he explicitly told me some things I had come to believe through reading the 2013 paper. Jockers performed most of the data analysis and computation. He is not emotionally invested in the issue like I am or like many of the other authors on Book of Mormon authorship papers (including one of his coauthors). His conclusions from his 2013 paper are the most reasonable conclusions to be drawn from the data if one is convinced beforehand that the Book of Mormon must have been produced strictly by 19th century authors, and without considering the possibility of unknown authors. His conclusions in the 2013 paper are also much more modest than those put forth in the more ideologically charged 2008 paper. Jockers did not write the historical analysis presented in the 2008 paper, although anyone who puts their name on a paper is taking some responsibility for the entirety of its content. I have no intention of addressing the historical evidence for the Spaulding-Rigdon hypothesis.

In my opinion, the methods employed by Jockers and coworkers are shown to be embarrassingly misapplied to the questions of Book of Mormon authorship solely based on internal evidences of the studies without referencing the problems due to closed-set attribution, and that is what I'm going to illustrate, here.

The 2013 Paper (Preprints available)
Jockers, Matthew L. “Testing Authorship in the Personal Writings of Joseph Smith Using NSC Classification.” Literary and Linguistic Computing. 28.3, (2013): 371-381
Jockers, Matthew L., Daniela M. Witten, and Craig S. Criddle. “Reassessing Authorship of the Book of Mormon Using Delta and Nearest Shrunken Centroid Classification.” Literary and Linguistic Computing, 23.4 (2008): 465 – 492.

Beginning with the 2013 paper, we see the problems inherent in applying the Nearest Shrunken Centroid methodology to the particular data set chosen for study. I emphasize this point because NSC is a proven method--when applied to the right kind of data. I don't really know what those data are, but we will see how ineffective the method choice is for the writings of Joseph Smith.

In the 2013 paper, Jockers tested whether documents supposedly dictated by Joseph Smith to scribes could be identified as having come from Joseph Smith. He collected handwritten examples of Joseph Smith's style into a training set:
[T]he corpus of Smith material available for training contained 25 documents in Smith's handwriting. These works ranged in length from 112
words to 2,300 words with an average length of 527 (13,172 words total). The test corpus contained 96 additional documents attributed to Smith but in the handwriting of one of 23 different scribes. These works ranged in length from 105 words to 10,927 words with an average length of 1,168 words.
Jockers was able to compare 106 noncontextual words rather than the approximately 40 considered in the earliest Book of Mormon stylometry paper. In this regard the measures are potentially more powerful, but the samples in this study are much smaller, on average, and vary in length. Both these factors reduce the ability to identify authors. As controls, Jockers included texts by Isaiah and Malachi, Henry Wadsworth Longfellow, and Joel Barlow--all authors loosely associated with Joseph Smith (in the Book of Mormon or from the same time period) and that wrote about some similar topics, but were known to have differing styles. Personal writings from most of the 23 scribes were not included for testing. Now I need to give you Jockers's main conclusion:

Joseph Smith did not have a clear style when he was dictating or writing. This means it was the correct choice to leave Joseph Smith's style out of the tests when they were trying to identify which 19th century authors wrote the Book of Mormon.

I agree that it was a reasonable choice to leave Joseph Smith out of the Book of Mormon analysis--three other studies (Holmes's study is the only one I've written about, so far) had all previously established that Joseph Smith's personal style was measurably different from any of the styles in the Book of Mormon. Those studies would all claim there is no point in including Joseph Smith in Book of Mormon authorship studies because he had a measurably distinct style. Of course, Jockers points out that including the many dictated documents might not have shown Joseph Smith's personal style, but his scribes' styles, or a mix of styles. So let's take a look at what Jockers found to see what claims are best supported.

The following is an excerpt from the first table in the 2013 study. It shows the number of the 96 texts that identified a particular author as the 1st or 2nd most similar style to the text:
Table 1

Identified Author
1st choice
2nd choice

Notice that:
  • 13/96 (Barlow, Longfellow, Isaiah/Malachi, and Spalding) were attributed to authors with no connection to the texts.
  • 32/96 were assigned to Cowdery
  • 24/96 were assigned to Pratt
  • 12/96 were assigned to Rigdon
  • 15/96 were assigned to Smith
On the surface this would seem to say that the method got 13.5 % 'wrong', 15.6 % 'right', and 70.8 % were at least assigned to the scribes. That's not bad, but lets look a little closer. How many of the texts assigned to Cowdery, Pratt, and Rigdon were penned by those scribes?

# texts assigned to scribe
# of those texts for which scribe acted as scribe

In summary, the method does a terrible job of identifying either the the dictator or the scribe. In addition, Rigdon and Spalding yielded a total of 19.8 % false positives. Taken altogether, the method was able to identify the correct author or scribe as the 1st most likely candidate a total of 18.8 % of the time. About the only thing the study consistently got right was assigning low probabilities that most texts were written by Barlow, Longfellow, or Isaiah/Malachi.

Jockers explored an alternative approach. Considering the possibility that Joseph was collaborating, maybe we should see paired authorship signatures. When looking at 1st and 2nd place assignments together, Smith is paired with Cowdery 32 times, and 7 more times with Rigdon and Pratt--all influential contributors to early Mormon thought and documents. It sounds superficially impressive until you remember that Cowdery only had a hand in 2 of those 32, Pratt in 1 of the 7, and Rigdon in none.

An Alternative Interpretation

I would like to propose an alternative set of conclusions that is consistent with Jockers's data, as reported, and also with the data of Holmes's studies. One problem with these data is that none of it tells us on some absolute scale how much stylometric variation there is among the various texts. All we know is relative similarities. What would the results show if Smith had a moderately ambiguous style? From the adversarial authorship studies, there were hints that some authors might have more generic writing styles than others. If Smith were one of these, his style could well show a broader range of stylometric measures than most authors. What if parts of this style then overlapped the styles of Cowdery, Rigdon, Pratt, and Spalding? Any texts which fell within the regions of Cowdery, for example, would be assigned as most likely to have come from Cowdery. This is because Cowdery's signature is much more defined than Smith's, not because the text doesn't match Smith's style or because it wasn't written by Smith. Here's how a graph of the style overlaps might look:
Smith's style is represented by the blue circle. Some of the other authors styles are represented in different colors. Orange circles represent the unmeasured, personal styles of a number of other scribes. Because Smith's style is so diffuse, any texts that fall in the Pratt section of his style will be classified as the more clearly defined Pratt, any who fall in the Cowdery section will be assigned to Cowdery, etc. If scribes influenced Smith's style, it could diffuse his style even further. This is imagined with the dashed blue oval. You can see how, in a scenario like this, many of the texts would be assigned to other authors even if they were all in Smith's style. If Smith's style was further diffused through influence of scribes (whose personal styles could be anywhere, since most of them weren't tested), then the probability of false assignment increases greatly through suggesting a dual authorship influence. Smith's signal could even expand to give a mistaken match for very different authors like Barlow. While allowing for a dual author influence, this hypothesis avoids claiming that Smith has no style of his own and allows for explanations of why Smith's dictations would be assigned to pairings like Pratt and Spalding without either one having taken part in their creation.

I wondered if any of the pairings gave consistent results, like if the texts assigned to Pratt and Spalding in 1st and 2nd place line up with documents dictated by Smith and recorded by one or two particular scribes. Jockers provided the data to answer this in an appendix, and the answer is no. Maybe there are some weak correlations, but none show strong consistency. This is unsurprising, given the normal statistical distributions of stylometric signatures.

I can't prove the truth of this hypothesis from these data, but it seems like a reasonable, and likely testable, hypothesis. It is also consistent with conclusions from earlier studies that Smith had a style distinct from LDS scripture. If Smith had no distinction to his style, the data of all the earlier studies would have to be meaningless. In addition, Jockers didn't include vocabulary richness measures or noncontextual word pairings as used by two of the earlier studies. It seems unlikely that Holmes did such a bad job, even if you don't want to credit the BYU studies that I haven't covered, yet.

The Take Away

Joseph Smith's personal style is somewhat generic when compared with the writings of Cowdery, Pratt, Rigdon, and Spalding. Scribal influence may be responsible for further diffusing this style. Furthermore, false positives appear to be the norm for these methods when applied to the writings of Joseph Smith. I don't believe this means the data and methods are useless, just that they don't support the principle interpretations discussed by Jockers.

Next Time

We will take a look at the 2008 paper which supposedly supports the Rigdon-Spalding hypothesis for authorship of the Book of Mormon. Having examined the 2013 paper, we will be in a better position to interpret the results presented in the 2008 paper.