An Author’s DNA

Hartosh Singh Bal turned from the difficulty of doing mathematics to the ease of writing on politics. Unlike mathematics all this requires is being less wrong than most others who dwell on the subject.
Page 1 of 1
Physicists in Sweden have come up with a mathematical fingerprint that could identify an author.

Is there a unique fingerprint to the work of an author? Could we find a way of saying which of Shakespeare’s plays were indeed written by him? A paper published in the New Journal of Physics by researchers from Umea University suggests this may be possible based on an analysis of the works of Thomas Hardy, Herman Melville and DH Lawrence.

They begin by considering the vocabulary of any author. However learned an author, his vocabulary is finite. Let M be the number of words in any text. Now consider N the number of different words in any text. This means ignoring repetitions—for example, the word ‘the’ will only be counted once. When M is 1, N is 1 but as the length of a text keeps growing N will stop growing once the author exhausts his vocabulary. When you consider N/M, or the proportion of new words introduced by the author it will start off as 1, but as the text grows, M becomes larger and larger and N stops growing, the ratio nears zero. The authors suggest the rate of decrease from 1 to 0 in an author’s work is unique, i.e. the rate at which an author introduces new words as he writes a manuscript is his fingerprint.

In case of Hardy, Melville and Lawrence, consider the graph plotted between N and M (scaled by some factors, but ignore that). The graph, the physicists suggest, is the unique signature of an author, a universal text of a particular writer’s output because it does not matter if the author writes a short story of 1,000 words or a novel of 80,000 words. If you plot N versus M for that short story or novel it will match the author’s graph determined from his other works. This certainly holds true for these three authors. They suggest it may be universal. Consider then those works of Shakespeare we know as genuine and plot this graph, the graph for any of the other plays must match this graph—if it doesn’t, we would know it is a fake.