Wednesday, June 18, 2014

Literary Detection: Part 2

Oversimplifying the Complex

Regarding the authorship of the Homeric poems, A. Q. Morton wrote:
“. . . his role is not that of the composer but, at least in part, that of the compiler. The role of the compiler, as of the modern editor, can range from literally using scissors and paste to make a narrative from fragments of other sources without adding one word of original composition, to the other extreme of taking material and completely digesting it so that even when the source is known and exists separately the version included in the compilation cannot be distinguished from an original composition of the redactor. . . . “Between these extremes of altering nothing or transforming everything, there lies a complete spectrum. . . . So before anyone can ask the question, Did Homer create the Iliad and/or the Odyssey? must come a preliminary question--Who do you define Homer to be?” p. 159
Questions of style changes due to author/editor/scribe interactions is immensely relevant to Book of Mormon stylometry. Instead of asking who we define Homer to be, we are faced with defining, Who wrote the Book of Mormon? Just Joseph Smith doesn't match the stylometric evidence. Scribes can't explain differences within the Book of Mormon, but can they explain differences between the Book of Mormon, Joseph's personal writings, and other LDS scripture? If we accept the Book of Mormon's own narrative, how did scribes and abridgements affect the style? The questions for Homer are very relevant, although the answers are quite different.

In discussing authorship of the Illiad and the Odyssey, Morton illustrates how the mode of composition can constrain stylometric features so that two authors appear to be the same in some features simply because they are using the same mode of composition (e.g. iambic pentameter, listing ships and armies, etc.). This observation strengthens the assertion that Book of Mormon language appears 19th century because Joseph was dictating with pseudo-biblical language. It also suggests an explanation for why the war chapters would have so many superficial similarities with a book on the war of 1812 written imitating the same biblical style. Certain features are going to match. Discovering that they do is not a grand revelation, but a confirmation of what would have been predicted from earlier stylometric studies.

But getting back to Morton's conclusion regarding the Homeric poems: Statistically there is not evidence of multiple authorship for the majority of the books in the Homeric poems. There are two exceptions. The first (Iliad Book 2) is a list of ships and armies involved in the war, and is believed to predate Homer. The second (Iliad Book 11), it is not possible to decide through statistics if it was a later addition or an undigested source.
“It may seem a very negative attitude to take, to say that the problem of authorship and integrity in Homer is unique and so a scientific answer to either question is unlikely to emerge, but this is the case. The poems are composed in a manner quite different from any others which have survived, and without comparable material the differences which are detectable cannot be separated into those due to the mode of composition and those due to a difference in authorship or of origin. The best service which an investigation of this kind may do is to show that many scholars have been too simple-minded in their approach to Homer. The poems contain many more problems than some people have supposed.
“It may well be that one of the oldest arguments, one with some scientific basis, is the best. No one knows what the population of ‘Greece’ in Homer’s day was, but that the country should nourish two geniuses of such stature at much the same time is a coincidence beyond acceptance. Against this can be set the view that the poems have no author, but are the result of generations of accretion and adaptation. Of this hypothesis it can be said that it is a scholar’s dream. It can be endlessly argued but never resolved.” p. 164
From stylometry alone, with minimal reference to content, value, or date of creation, we have a similar choice with the Book of Mormon. Was it translated from the writings of multiple ancient authors with their varied styles preserved, or was it constructed in a matter of months by a genius young man who absorbed and synthesized in some unexplained way ideas from a vast variety of sources and then dictated without revision in several self-consistent authorship styles? If the second hypothesis weren't competing with people's disbelief in Mormonism, there would be no debate between the two.

Jane Austen and the Other Lady 

“. . . the foundations of stylometry are habits shown not to change under the circumstances of the particular problem being investigated.” p. 189

In the 1970s, an unknown writer and great fan of Jane Austen completed a book which Austen had left unfinished. Stylometry was able to clearly distinguish between the portions written by Austen and those written by the "Other Lady". After reading how relatively easily stylometric measurements were fooled through imitation of a different author in the adversarial authorship studies I reviewed, I wondered how stylometry could so confidently distinguish Jane Austen's writing from a skilled writer carefully crafting an imitation. In the adversarial studies, most of the authors were minimally trained (college education, but not typically in writing), and they only had a short time to write the passages. How could they fool stylometry while the "Other Lady" failed? I can identify a few differences:
  • The Austen study made absolute comparisons rather than forcing a choice of the closest match. In other words it was an open-set analysis which asked, "Is this passage like Jane Austen or not?" rather than a closed-set analysis which asked, "Which of these five authors is this passage most like?"
  • The numbers of words used in the test samples were probably a few thousand words written by the "Other Lady" rather than the 1,000 words written by the imitators in the other study. This meant that fewer meaningful stylometric features could be extracted from the adversarial authorship tests than from the Austen texts. 
  • Noncontextual word pairings were analyzed in the Austen study. The automated programs tested in the adversarial authorship study did not use word pairings. 
  • The adversarial authorship study selected an author with an unusually distinct prose style from which the untrained imitators extracted obvious traits. The obvious traits included long words and descriptive language (I'm interpreting a little). Word and sentence lengths and vocabulary richness were among the features measured, and all would be affected by such an attempt at imitation. The Austen study did not use such intuitively reproducible features for analysis.
In other words, the adversarial authorship study was in some ways designed to give the imitators the greatest chance of success. They didn’t have to match the target author’s style, they just had to get closer to that style than to their own. They only had to write 1000 words, so stylometric features that might have given them away in 5000 words would not have been an issue. The study employed Method 1: word lengths, letter usage, and punctuation; Method 2: number of different words, lexical density, Gunning-Fog readability index, character count without whitespace, average syllables per word, sentence count, average sentence length, and an alternative readability measure; Method 3: a measure of vocabulary richness. None of these methods employed the noncontextual, function words or word pairings because such pairings typically require larger text sizes to be statistically significant, and they are not as easily automated. The “Other Lady” did match some characteristic punctuation habits and other features, but missed enough noncontextual word pair frequencies to be clearly identified as different.

Imitating Sherlock Holmes

Another example where imitation was distinguishable from the original was in the attempt of two writers and Sherlock Holmes fans to write a new Sherlock Holmes story. Many knowledgeable readers agreed that the two imitators had successfully copied the style of Holmes, yet Morton reports the following results of stylometric examination of the imitation:
“. . . when perpetrating a counterfeit which a number of people well qualified in literary criticism regard as being very like Holmes, neither imitator can reproduce the habits of Holmes which are of value to the stylometrist nor can he apparently suppress his own habits.” p. 193 
It was known from the beginning that the text was an imitation, and even that the first part was written by one author and the second by another. However, noncontextual word pairings easily differentiated the imitators from the original, and were also able to differentiate between the two different imitators. The study was further able to assign the portions of the text to the appropriate imitators through comparison with stories published by one of the imitators. As might be expected, Morton touts the value of careful statistical work and criticizes those who are overconfident in their non-statistical literary analysis--even if they are highly skilled at it:
“It is not what we do not know that causes us to remain ignorant, it is what we assume that we know when we do not know. In the resolution of cases of disputed authorship a feeling for literature is not without value but it comes a long way after a few well chosen experiments.” p. 194

The Book of Mormon Fraud Debate

The Book of Mormon, it seems, falls firmly with the Austen and Holmes studies rather than with the adversarial authorship studies in a number of regards. It is an open-set question, and the LDS researchers have treated it as such. Text sizes are large enough to extract features that appear very hard to imitate, including noncontextual word pairings, and LDS researchers have examined these features and thoroughly reported their results. Additional factors further support our confidence in stylometric findings regarding the Book of Mormon. In many ways it is not unique linguistically. We have examples of hundreds of texts written in the early 19th century using pseudo-biblical language, so we have great potential to control for how using pseudo-biblical language affects shifts in stylometric features. Joanna Southcott failed to produce such a wide range of styles, but the potential is there for a critic to show that stylistic imitation could explain the variety of styles evident in the Book of Mormon. I personally think, based solely on objective stylometric measurements and nothing else, that it's generous to give the chances of authorial fraud at one in a million, but show me statistics that disagree and I will reconsider.

Thursday, June 12, 2014

Literary Detection: a Book of Mormon-centric Review

“If the odds for or against some event are prodigious, particularly if they support some conclusions uncongenial to us, then they will only appear to be ludicrous. So there are really two problems and not one. The first is what odds are required to settle the dispute? And the other is what odds will convince the scholar that it has been settled? Rather surprisingly it appears that the first odds may be larger than the second.” A. Q. Morton. Literary Detection: How to prove authorship and fraud in literature and documents. A. Q. Morton. Charles Scribner’s Sons, New York, 1978, p. 154.
In trying to understand stylometry, I followed the footnotes back to an early book on the subject. A. Q. Morton wrote a mid-sized book that is an introduction to and a review of stylometry up to the 1970s. He takes time to not only explain stylometry, but to help the reader sort through the statistical methods employed. It's a little less than a thorough explanation of the statistics, and a lot more than a discussion aimed at experts only. While I found many things in the first 2/3 of the book useful, it is the last 1/3 where he explores specific applications of stylometry which really grabbed my attention.

Near the beginning of the section on applications, Morton wrote the sentences I quoted above. He was bemoaning the same fact that so many encounter when the become 'scientifically enlightened' on a subject. Statistics can show something to be true with 95, or 99, or 99.9999999% certainty, and people will still refuse to believe it if "some conclusions are uncongenial to us". 95% certainty, with all the proper controls performed and all variables accounted for, is enough to convince most experts that the dispute is settled, but 99.9999999% certainty may be insufficient to end the dispute if strong sentiment is attached to related conclusions. This discrepancy comes up most strongly in Morton's discussion of authorship of the Pauline Epistles.

Pauline Epistles 

In many ways the Pauline Epistles are good candidates for stylometry. They are composed of ~50,000 words, and they can be compared with epistles on similar topics by other Greek authors of the same time period. As Morton puts it:
“There are numbers of letters from the same period, from earlier and later periods too. The mode of composition and of reproduction is unremarkable. Any claim that the letters might have been based on other sources is stoutly contested.” 
Stylometry studies using these resources conclude that only four (or five) of the epistles share the same stylometric signature. To explain the varieties of style some biblical scholars have followed this tack:
“a special case was made that Paul had used an amanuensis to write down what he wished to say and that this was the cause of the differences. This argument has two fatal defects. The first is the question--If an amanuensis gives us something quite unlike Paul, what right has anyone to call it Paul’s? The second objection is that though amanuenses were commonly employed, only in this case has it to be argued that the use made any difference to the text.” p. 166
The last observation is not quite true, today. As we saw from his 2013 study, Jockers has made the claim that scribal influence on Joseph Smith's writings have made it impossible to identify a uniform style for Joseph, but that the writings are still meaningfully from the Prophet. Of course, if Jockers were correct Biblical scholars would be in a bit of a bind. We would have historical evidence of religious documents all coming from the same author but having multiple distinct styles. Those would continued to claim multiple authorship for Pauline epistles would have lost their objective, statistical evidence for it and would be stuck with their more methodologically driven evidence. Fortunately for them, Jockers's conclusions are contradicted by the results of Holmes, Schaalje, and the other two groups that have measured Joseph Smith's style, and he does have a consistent enough style in his dictated writings to separate it from many authors. But back to Morton and Paul:
“The Pauline letters are Greek prose and every feature of them can be paralleled in Greek prose writing. There is no reason why they should not be treated as a particular example of a general problem, the authorship of Greek prose. Yet this is just what many scholars cannot bring themselves to accept. Their resentment is clear, but it is seldom clear which object it is directed against; it may be that mere human agency and intelligence presumes to measure the word of God, or that the results of such enquiries do not support their own published views. . . . Whatever the source of the animus, the consequences are the same; hypotheses are ridiculed rather than tested, the normal imperfections of publication, printer’s errors and omissions, are blamed on the author, challenges are thrown down but refused when the gauntlet is picked up. In 1965 the Bishop of London wrote a letter to The Times saying he would believe this kind of science only when it was shown that Henry James had stable habits in his range of writings. Evidence of this having been available for more than two years, the Bishop has not been heard from.” p. 166
I'm proud to view the efforts of LDS scholars in this regard. The stylometric gauntlet has been thrown down several times and they have taken it up each time by publishing more and more rigorously controlled analyses of our own historical texts. As to the interesting conclusions of Morton regarding the Pauline epistles, he concludes that stylometry supports the view that four epistles (Romans, I and II Corinthians, and Galatians) are by a single author. Philemon is too short to say anything about. Other epistles appear to be stylistically different necessitating other explanations of their origins than purely Pauline. Referring not only to the stylometric evidence, but to a larger sum of evidence:
“The weight of evidence accumulated about the authorship of the Pauline epistles is impressive. So far, attempts to counter it have involved special pleading, but each piece of such pleading further complicates the simplicity which alone would commend the traditional position. If you assume all fourteen letters are by Paul and any difference must have an explanation which shows that special circumstances applied to modify the pure Paul, then you are really in the same position as the Ptolemaic astronomers who argued that all the planets followed the sun round the earth. Every discrepancy needed another complication to be added to the theory, until it collapsed under its own weight and it was at last conceded that the sun was the centre of the system and not the earth. . . . The most salutary experience for the advocates of such refinements is to look at the historical records and see what has been attributed to Paul and why. They might well begin with the museum in Tarsus which showed the boots in which Paul walked the Mediterranean world or the Italian village which had on display both skulls of the Apostle, one as a man and the other as a boy. Many traditions last a long time for reasons other than verity.” p 183
So biblical authorship is a complicated mess. Clear origins are lost to history, and available evidence makes clear that they are more complicated than the traditional story. While I enjoy Morton's poke at the superstition and stupidity that can accompany religious belief, I don't find these observations so simple to apply to Book of Mormon origins, and such belittling tends to polarize emotions more than is useful. For example, one set of arguments points out apparent 19th century influences on the Book of Mormon, emphasizes the supernatural elements of Joseph Smith's story, and eventually implies that we historical believers are superstitious fools who haven't faced the hard evidence (even if they refrain from saying as much). I, on the other hand, see apparent ancient elements in the Book of Mormon and take Joseph Smith's story seriously as it applies directly to the text. I thus eliminate the seemingly supernatural from the equation, since no one can test if Joseph Smith saw an angel, and no one can go back and see what Joseph Smith was seeing with his head in a hat as he dictated. What we can examine is his claims that the Book of Mormon is a translation, that it was written by multiple authors, and that it plausibly captures views of ancient life rather than Joseph Smith's (or his supposed 19th century sources) imaginations stemming from 19th century life and understandings.

Another Book of Mormon Challenge

Stylometry has answered the second of these questions. Every study has shown multiple authorship, including the critical ones that weren't trying to. They didn't calculate the probabilities of different numbers of authors because they were committed from the outset to limiting authorship to Joseph Smith or to a few of his contemporaries, but they supported multiple authorship nonetheless. An insufficiently controlled, informal stylometry study has shown that large numbers of unusual, 19th century, pseudo-biblical phrases are in the Book of Mormon. This observation is superficially easily explained by claiming Joseph Smith plagiarized large portions from early 19th century sources. It is also easily explained by viewing the Book of Mormon as a translation into 19th century, pseudo-biblical language. I suppose it's my turn to throw down the gauntlet for stylometrists who are critical of Joseph's story--statistically show the plagiarism, please. You don't have any historical evidence to show that Joseph copied any book (not even the Bible, and that's clearly in the Book of Mormon). No passage in the Book of Mormon is a direct copy of any of the proposed sources, but rather an intricate, coherent pastiche of at least a dozen unrelated and incoherent sources. Since Joseph didn't copy either the exact words (other than short, disjointed phrases) or exact stories (other than short groupings of significantly modified but sequentially similar events), and he used a single scribe for most of it, and all the scribes wrote his dictation word for word, and we have most of the original manuscript, please explain statistically why there are multiple, distinct authorship styles? Whose styles was Joseph copying? Why was he changing it up? We have statistical and objective evidence that he did. Objectively incorporate it into your story. At least show us other documents where these things have been done, in whole or in part. If it weren't tied up in religious feeling, like the Pauline epistles, I suspect that the question of multiple authorship for the Book of Mormon would be long since resolved. Joseph didn't write it, and no single person did. It is a translation of something into 19th century, pseudo-biblical English. That is what stylometry shows, and it is only the religious (or anti-religious) fervor that keeps most people from accepting it. We are unable or unwilling on all sides of the debate to separate these objective observations from manifold ideological conclusions.

Next time . . .
When I continue the review, we will examine Morton's evidence for the difficulty of stylometric fraud in authorship and see how it differs from the later, adversarial authorship studies previously discussed.

Wednesday, June 11, 2014

Choose Your Own Adventure: Reality Flow Chart (take 2)


Tuesday, June 10, 2014

I Don't Care About Determinism (Agency as a Law of Nature, take 2)

I don't care about deterministic universes. If I live in one then everything I do is predetermined. If Compatibilists are correct, then I can still meaningfully experience and act on choice, but more than one outcome of my choices was never a possibility. I also couldn't do anything else than what I am doing in my search to understand agency or free will. It's nice that Compatibilists would conclude that I ultimately want to do what I'm doing, but I don't find a deterministic universe meaningful. I know there are people who believe the universe is deterministic and still find meaning, but I (possibly misguidedly) want my life to have meaning that is both an internal experience and shapes the future of the universe in undetermined ways. If my choosing to become a God is not something I can help doing, then I will get there whether I explore the nature of reality or not (or I won't be able to help exploring the nature of reality because it is what I want and the deterministic system I call me enables that path). I care about universes where my becoming a God (or failing to) is not a foregone conclusion. The universe may be deterministic, but if a possible understanding of nature leads me to conclude the universe is deterministic, I discard it--not because it can't be true, but because what I do with it isn't going to change the future at all. I don't care about the kind of free will that might exist in a deterministic universe.

Some people have argued that the alternative to a deterministic universe is one where events are governed by purely random happenings. Narrowing this scope to the events in my brain that I call my choices, if these are reducible to ultimately random occurrences, then I'm not freely choosing. Some random number generator somewhere is doing the choosing for me--at least some of the time. If the system that is me is able to shape outcomes so that random events don't ever determine my choices, then all I've done is made a completely determined system from a random beginning. This may be possible, since it looks a lot like what we learn from physics and psychology--random quantum events scale up to perfectly predictable Newtonian mechanics, or very nearly. Again, I don't want my future to be ultimately controlled by random impulses. I reject this type of universe because I believe I make some choices that are not fundamentally random.

Until recently, I was unwilling to fully reject the mixed random/determined universe for two reasons: 1. it looks most like what physics tells us we have, and 2. I couldn't find any hope for a type of free will I could care about in a purely deterministic universe. The problem was that the type of agency I wanted was being forced into this tiny little box where, instead of a random quantum event determining my thought and making me choose something, a magical, free will thought happened that superficially looked exactly like a random/determined event going on in my brain. But since no one could prove the magical event wasn't happening, I could continue to tentatively believe in it. Unfortunately, I didn't much like this rapidly shrinking kind of Free Will--the Free Will of the Gaps. 

Thanks to John Conway and Simon Kochen's Free Will Theorem, I have seen an alternative. I will summarize the Free Will Theorem: If you or I can choose, in some sense that is neither predetermined nor random, one out of 33 or 40 buttons, then subatomic particles also make limited choices that are not predetermined. This is true if it is impossible to know the outcome of some distant event before that event happens. Basically, it's true if time travel is impossible. The other two criteria for the Free Will Theorem to be true are both experimentally demonstrated results of quantum mechanics. There is one research group which argues that in a stochastic and determined universe the "no time travel" requirement of the Free Will Theorem doesn't behave as Conway and Kochen assert. After rereading the articles several times, I think the critics are wrong, but ultimately it doesn't matter to me. The only possible universes in which they are right are deterministic, and I don't care about those. So there are currently no rigorous objections to the Free Will Theorem that hold within any universe I care to wonder about.

I think (as does John Conway) that it is important to note that the Free Will Theorem does not disprove determinism. If determinism holds, then no experimenter can choose among the buttons in a way that isn't predetermined, so we have no proof of free will among subatomic particles. However, I have already concluded that deterministic universes are uninteresting, so in any universe I care about the Free Will Theorem holds. This means that not only do you and I have the ability to make some choices without predetermined outcomes, but all of matter, and possibly all of energy, have this ability, although the choices available to a photon are not as complex as those available to a human. Instead of agency being something that happens at specific moments that we call decisions, agency is embedded in the very fabric of existence. We exist as we currently do because all the matter in us is choosing to be in these states that are compatible with our being.

It's not too much of a stretch to think this relationship extends to all that is, both within and outside of our universe. The choices available likely change as the qualities of matter, energy, space, and time change, but if we aren't in a deterministic universe, and we aren't in a random universe, then the most likely conclusion is we are in a universe where agency is a Law of Nature.

Thanks to Benjamin Kelsey for helpful discussions on this topic. Hopefully I have removed a number of misconceptions and misrepresentations in my now abbreviated references to various philosophical positions. I also want to thank him specifically for putting into words the idea of agency being a continuous state rather than occasional events.