Anatomy of the audiobook

What is an audiobook, and what should one look for in a healthy, robust specimen? As always, the market is keen to find objective characteristics that are the reason, or even among the top three reasons, for the growing popularity of narrated literature. Yet it is a truth at least as old as the Enlightenment that, if you spend all night dissecting something in search of its soul, all you have by morning is a lonely laboratory, a hefty cleaning bill, and a lot of explaining to do. In this article, I take a look at the form, content, and performance of audiobooks, and investigate why the whole is not always a simple sum of the parts.


“Each new form, also, as soon as it has been much improved, will be able to spread over the open and continuous area, and will thus come into competition with many others”

Charles Darwin, The Origin of the Species

Form, or perhaps ‘the audio format’, describes the species in general. It has undergone an astonishingly rapid evolutionary process in recent decades, changing beyond recognition since it first made a tentative appearance at the end of the 19th century. Before we get to differences between individual examples, it’s worth looking a bit at the course of that evolution and its implications for the audiobook’s current form and onwards development.

Creating even small samples of recorded speech used to be difficult. Edison just about did it using tinfoil in 1877 and again (better) in 1927. All the expense of production, transportation, and storage of early audiobooks was only justifiable on the practical grounds that there are instances where print just cannot transmit information. The most obvious case is that of the blind or visually impaired, and public sympathy for recently-blinded First World War veterans played an important part in driving Robert Irwin and John Dyer to pioneer speaking books in the 1930s. This humanitarian motivation to bring all the benefits ascribed to literacy and literature to an excluded population helped to overcome the deterrence of cost: the phonograph player would have cost around $100 in 1878, or $2,450 in 2017 money, never mind the 118 phonograph cylinders that comprised, for example, the first audio edition of War and Peace.

As Martin Rubery emphasises in his excellent history of the audiobook, however, contemporary innovators had wider ambitions for recorded literature even before technology could back them up. After all, visual impairment is not the only reason people might need recorded books; young children can’t read, but their demand for stories very quickly outstrips supply. Dyslexia, physical difficulty in holding or carrying books, or illiteracy can render printed books useless to sighted adults, too, without their capacity or desire to absorb literature being any the less. Then again, people who are both literate and sighted might simply prefer listening to reading. The story of audiobooks’ formal evolution is therefore principally the story of audio technology, and its progress from expensive and cumbersome (vinyl, cassettes and CDs) to cheap and convenient (streamed .mp3).

The current digital format combined with online distribution has created a situation that would seem like science fiction to an economist of the 1980s: a product that requires no physical inputs, that uses only time, talent, and a small amount of power to create, and that can be costlessly replicated, transported and stored in any quantity. This radical evolutionary coup explains a great deal of the format’s ascent up the literary food-chain, as people who always wanted, for example, to cycle to work listening to The Lord of the Rings can now do that without needing backpack full of AA batteries and 38 cassettes weighing 3kg (6.6 lbs). Yet audiobooks are not yet the apex predator; no matter how cheap and convenient recorded book become, some people just don’t like them. There are all sorts of reasons for this, which appear frequently in the more rushed-out ‘5 reasons why…’-type articles in which publisher-affiliated journalists gloomily foretell the death of reading just as did their forebears following the rise of television over the radio over the newspaper over papyrus scrolls over cave painting etc…

Considerations of the advantages and disadvantages of the audio format in general aren’t without value. Someone thinking about buying a dog might well start by looking at the whole genus of Canidae and asking themselves things like ‘Might I be allergic to it?’ and ‘How will I have to change my habits?’. Some people are naturally auditory learners, and enjoy filling otherwise useless time with what they consider to be a beneficial and self-improving activity. Other people need text and notes to ‘anchor’ ideas, and would rather spend their few moments of leisure without earphones in. There’s no accounting for taste. But once we make it past the ‘dog or cat?’ question to ‘dachshund or Rottweiler?’, it’s time to start thinking about what particular features we’re concerned with, and why. It is one of the laws of evolution that success breeds competition and, so, diverse specialisations. What’s nice, particularly for audiobook producers, is that barriers to making a sale are now less likely to be practical, which is hard for them to do anything about, and more likely to relate to the quality of the content or performance, which presents an opportunity as well as a challenge.


“A fertile soil alone does not carry agriculture to perfection”.

E. H. Derby, A Defence of Agriculture

‘Grass-fed beef’; ‘corn-fed chicken’; ‘acorn-fed Iberico ham’. The quality of the input is often given a great deal of credit for the quality of the final product, but every schoolchild knows that corn alone does not a chicken make. In the same way, it’s not enough to feed great literature into an audiobook and wait placidly for it to grow into a prize-winner. This isn’t the place to discuss the whole concept of literary merit, though it is quite true that what makes a story compelling to read is very similar to what makes it good to listen to. Rather, there are certain reasons why a book might lend itself particularly to being read aloud, and why it might not.

The most obvious—and literal—blind spots for audiobooks are maps, pictures, diagrams, and formulas. There have been viable work-arounds, such as including downloadable .pdf files, yet these sacrifice the element of mobility that is one of the principal attractions of the format, as we discussed above. The market can support the absence of highly technical audiobooks on, for example, physics and chemistry, but the ‘visual aids’ problem also affects types of content situated firmly in the audiobook heartland: business non-fiction, popular science, and history. In some cases, the diagrams are incidental, are barely dealt with in the text, and the argument does perfectly alright without it. In which case, the ease of metamorphosis into audio form quickly exposes visuals that merely act as page-padding. Other diagrams are central, and useful as a way of showing information ‘at a glance’, yet this very centrality means that the text recounts all the relevant conclusions attainable from the figure. In this latter case, the listening experience is slightly diminished compared to the reading one, though always allowing that anyone sufficiently interested could always consult the .pdf diagram later.

A category of content that benefits particularly from the immediacy of the audio format is philosophy and psychology. This supports the recent re-emergence of such writing, alongside more traditional ‘self-help’ literature, as an actual tool for life, rather than only as a subject sober, academic study[1]. For example, being able to have Marie Kondo’s advice on clutter-free living while one is actually tidying up invests both acts with extra significance. Having Marcus Aurelius’ exhortation to “be like the cliff against which the waves continually break, but which stands firm and tames the fury of the water around it” spoken right in your ear just when things are going wrong can be an immediate help. More broadly, the portability of the audio form, and the fact you can be walking around with your hands and eyes free, allows the same ‘fourth-wall-breaking’ sense of nearness between life and art. This, again, approaches the subjective, but it’s still true that content for audio production will have greater potential to resonate individually with readers, based on the inconceivable variety of other things they might be experiencing in real life while listening to it. I listened to R. L Stevenson’s Kidnapped! on a very long drive through the highlands to Skye, for example, but it might be as simple as listening to Do Androids Dream of Electric Sheep?while walking the neon-lit streets of a midnight city, or A Christmas Carol in front of a coal fire as the December snow falls outside.

There are also certain works for which texts are extant that nevertheless composed and circulated orally. The Illiad and The Odyssey are the most famous examples, but Neil Gaiman’s no-frills, demotic rendering of Norse Mythology retains the ‘pub/mead-hall chat’ vibe even at so far a historic remove, as does Heaney’s Beowulf. Even if the style of translation helps to dispense with the air of ‘literary canon’ with which such texts—as books—were endowed in the 19th century, the return to a spoken form can also serve to make them far more accessible and comprehensible (just as it’s marginally easier to understand Shakespeare as acted, rather than as read). As we’ll see in the next section, it’s easy to over-romanticise the ‘oral tradition’ angle. While writing has been the dominant medium of composition as well as distribution, there are few authors who have never heard the words they write spoken in their heads, and the vast majority acknowledge that reading aloud is an all-important check for prose ‘naturalness’; naturalness for which the standard is speech. Dialogue is the clearest example, but whole novels, written in the first person, are implicitly statements, even if a reader’s suspension of disbelief can allow for any amount of literary style.

Almost any book or poem is suitable for production as audio, and in all cases, though to differing degrees, audio versions can bring literature to life. That cliché should have real meaning in this case: the listening subject can continually find whole swathes of the in-book world exemplified in the real one, and features of book’s plot and characters are either borne out by lived experience, or else they challenge and intrigue by jarring with it. Yet all this doesn’t happen as a matter of course by the mere act of recording. The element of performance is all-important.


“Skald | scald, n. An ancient Scandinavian poet. Also sometimes in general use, a poet”

Oxford English Dictionary

“The earliest known skálds prided themselves on their originality and artistic flair, and were keen that their poetry should not only engage their audience’s interest and be remembered (and presumably, re-performed), but also that it should bear their own unique creative stamp whilst keeping within the bounds of poetic tradition”

Anna Millward, Skaldic Slam: Performance Poetry in the Norwegian Royal Court

Performance is what allows the form to make best use of the content and vice versa: those parts of the source material that lend themselves most to the audiobook format are picked out and distinguished. The second quotation above captures what I view as a crucial tension to be found in audiobook narration more clearly than anywhere else: that between meeting expectations and making the performance distinctive. This same conflict exists in theatre and film as well, but the balance there is skewed far more towards distinction, since the best examples of both those artforms tend to defy expectations or establish new ones. There are three main considerations: who the narrator is (and why that might matter), whether they understand (and respect) the material, how they deliver it (and dramatize the characters).

The reason why it might matter who the narrator is independently of how he or she narrates descends, at least in the West, from the enduring idea of the writer as the same cultural figure as the bard or skald, indicated by the retained meaning “more generally, a poet” in the dictionary definition above, as well as the specific, historical meaning. The perspective that treats authors like shipwrights and books like ships—certainly skilled craftsmen, but the point of interest is the ship itself, which has a life out of its creator’s hands—is modern. The older view focused mainly on the storyteller him/herself as a remarkably (even divinely) gifted person with the power to entertain, flatter, and make or break reputations. In the more extravagant cases, such as among 6th century Welsh bards like Taliesin, this power extended to prophecy, curses, and supernatural meddling in general. A lot of this was probably to do with the patronage system: if one sings for one’s supper, it’s best that the noble audience’s attention is more on the person needing to eat rather than the qualities, good or bad, of the song as a piece of literature[2]. It also doesn’t hurt if they believe you can and will curse them into oblivion if they don’t pay up. This idea of a poet whose person and works were indistinguishable personal statements of individual will was ostensibly revived—as usual, mostly invented[3]—by 18th century Romantics in reaction to the conceptual separation of people from their works that came with limited liability companies, agricultural enclosure, and mass media.

The upshot of this is that there is a vague, tacit consensus that, if the author of a work meets a certain standard of vocal performance, he or she is the better choice to narrate it even if a dedicated professional might be an objectively better performer. It’s a question of that most controversial artistic chimera: authenticity. Even if we don’t believe that creative talent makes such storytellers into magical beings, it’s a performance we can wholeheartedly believe in as faithful to the source material, since authors know and presumably subscribe to what they wrote, and surely they’re the ultimate authority on the ‘right’ way to bring the characters to life. Yet, where the author is dead, or won’t read aloud, there are still those narrators who we feel, instinctively, have a right to read a certain text. Often it’s the association of having been in a film version of the book: Michael Jayston (Guillaume in the 1979 TV adaptation of Tinker, Tailor, Soldier, Spy) reads almost all Le Carré’s works, Rosamund Pike (Jane in Joe Wright’s 2005 adaptation) reads Pride and Prejudice, and Meryl Streep (Rachel Samstat in Mike Nichols’ 1986 version) reads Heartburn. Or an actor’s image may so exactly align with a book’s cultural significance that the choice is made to seem self-evident; Stephen Fry has—is—the quintessential English eccentricity that gives Harry Potter most of its cultural resonance for certain groups of readers, over and above J. K. Rowling’s skill as a worldbuilder. Joe Mantegna (who voices the Italian-American mobster archetype Fat Tony in the Simpsons) reads The Godfather.

I think this insistence that the narrator should have some connection to the story is really just about respect and understanding, without which even the greatest literature is impossible to listen to. This can be shown by an experiment, which demonstrates the same law that forbids any screen portrayal of a character’s creative writing, unless the joke is that it’s ‘terrible’. Take any classic of world literature, and read any passage aloud as though it’s ridiculous; as though you’re playing the role of the football jock who returns to his dorm to find the manuscript of his roommate’s angsty teen novel, and reads it to his friends who collapse with laughter at how overwritten, self-indulgent, and pretentious it is. Anything will sound awful when read this way. It’ll probably make you feel awful, too, like a public book-burner. Yet the happy flip-side of this natural law is that, if a narrator reads something with as much conviction of its value as its author (easier if they’re the same person, but not at all necessary), it will sound good. It’s a lot like being a defence lawyer: no matter the reader’s personal assessment of the material as literature, the performance must have total conviction and present the piece in the best possible light. Books aren’t just innocent until proven guilty; they should be read as though they’re timeless classics until proven otherwise.

If a narrator understands a book (knows what it’s saying) and respects it (believes that what it’s saying should be said), they will read it ‘well’. Good reading is in service of the content, and the content has a hierarchy of needs. First, the recording must be intelligible, and not physically unpleasant to listen to. This is a question of technical skills: recording properly, not over-processing, and minimising mouth noise. Second, the content must be communicated, not just read aloud. This is where understanding comes in, since the rhythm and tonal variation of the delivery should respond to the meaning of the words, not just the words themselves. A narrator who simply reads the words aloud to a rhythm of their own—even if, taken by itself, that rhythm has plenty of modulation and is not technically bad— is shirking their responsibilities as a communicator. The content’s final need is to remain the origin and engine of its artistic value. Narration is certainly performance; it’s even an art, requiring practice to master and having no concrete rules. Yet, unlike theatrical or film acting, it is primarily the servant of the content rather than a medium of original expression, and if the performance is innovative, it should only be so in picking up and enhancing ideas first proposed by the content. To get this right involves both understanding and respect of the content, enough to decide where a more realistic performance—an authentic historical or regional accent, for example—is in service to the story, and in which cases it only distracts. Similarly, to know when the use of a character archetype (‘the butler’, for example, or ‘the hired gun’) acts as shorthand to help the listener focus on what that character means for their protagonist, and when cliché interpretations undermine the content’s themes.

Audiobook welfare

In a final—terminal—extension of this metaphor, we turn again to livestock farming. Just as the welfare sacrifices of battery meat and egg production are less and less acceptable, so it is no longer enough that an audiobook should only meet the minimum requirement of being book in audio format, and that there should be as many of them as possible. Production costs are now so trivial that it’d be feasible to give every book every written its audio version within a few years. Production technology has progressed to the point where As usual, the market tends towards efficiency, more or less: apart from some oversights, those books particularly amenable to audio adaptation are adapted, while those not suitable are not.

Yet responsible farming doesn’t stop at better nutrition. It turns out audiobooks, like animals, require respect and understanding to thrive. This responsibility for an audiobook’s broader welfare falls almost wholly to the narrator, the producer and technicians having made sure that the format possesses the necessary characteristics.

[1] The work of Alain de Botton springs particularly to mind, and I recently finished Derren Brown’s Happy, which says much the same thing.

[2] See for example Deborah McGrady, The Writer’s Gift or the Patron’s Pleasure? The Literary Economy in Late Medieval France,

[3] See a great article on the Bardic Sublime by Naomi Lloyd-Jones of