Ah, well, audio recording is my thing, even if I haven't recorded for a speaking books for the blind, on LPs that ran at sixteen and two thirds RPM. I might be doing another this autumn, if the economics work out.
Firstly, for reasons of audio fidelity, the recording has to be done in a recording studio or similarly soundproofed, acoustically treated environment (not the editing, the longest part of the job, but once you've started in optimum conditions it's difficult to go back to simpler later). No "the same headphones I use for Skype and in my own living room" stuff. A motorcycle behind a phrase might be perfectly tolerable for a radio play; when it repeats every time the recording is listened to, it becomes intensely irritating, so that bit of the recording must be redone. The studio will have the requisite programs that allow for rapid and efficient editing, loudspeakers on which you can truly judge the quality of the final product, and personnel who know how to get the most out of above, and all of this must be paid for.
If the book lasts a couple of hours, you can expect the recording to last at least a day, the rough cut and relisten (with the occasional rerecord of a portion) half as long again (and that's assuming a professional speaker, not the author himself or someone he thinks has a "nice voice and delivery" but no mic training or experience in going back to cover occasional errors). Professional speakers need paying; less, perchance, for this than a voiceover for a Rolex film, but that's still a lot of hours. If the author can be there, to check the fidelity of the performance relative to his original concept, the final result will be closer to his ideal, but the process will take at the very least twice as long. A recording engineer with the text, following what is going on will save hours in final edit, but costs money (despite what the doctors say, I like to eat).
The "prooflistening", the maintaining speech rhythms while editing the audio, the level and tone equalisation take much longer than their print equivalents. Speakers can't go on talking hour after hour, until the job is done; they need breaks, and if you spend these listening to what has already been recorded, tend to grow stale (and find dozens of paragraphs they "could have done better", and want to try again). Time evaporates, and all of this is after all the corrections already done for the words themselves, and their punctuation, chapterisation, POV changes, et al.
Then there's the legal side, and lawyers earn considerably more than sound engineers. The standard contract for books has been refined over generations, that for music over nearly as long (even if technology has been continuously mutating it). All that is required is putting figures in the blank spaces. But for the recorded word? Does the speaker get royalties, or just a lump sum for the job? In the case of a foreign version, does the original interpretation get credited the way the first artist to record a song does, even though he might not be composer or lyricist? Lots of legalistic cash registers going "ding".
Of course, if you sell a million copies (or even a hundred thousand) all of these details – two hundred dollars here, a thousand there – become irrelevant. But most audiobooks won't, and either cash will by syphoned off the successful ones to pay the costs of the others (something that best-selling authors could justifiably take exception to) or audio books will stay at the expensive end of the market.
Distribution of audio titles requires considerably more data travelling than text files, or even music. Your average novel fits into a couple of Megs of text (assuming you're not using a profligate program like Word) and, with cover, map and a couple of illustrations probably fits comfortably in ten. Audio, for several hours of material, needs hundreds of Megs, even if quality has taken a poor second place to convenience.
For many applications a physical medium is essential; although some cars are equipped with mp3 CD players, few have download capacity from the Web. Standards aren't – although mp3 has a nice solid foot in the door (or, for other sound engineers, in the DAW) it is a long way from being universal, and technically better, newer compression algorithms abound, clamouring for entry. If you're going to listen to your books on your computer, this is not a problem, but you aren't, are you? You're going to take them jogging, or driving, to the gym, the spa, the massage parlour, on transatlantic jets…
The market is not yet big enough that iTunes and Amazon (and probably Google, just to keep things stirred up) are assassinating each other's agents for greater market share, unfortunately for the consumer but not the producer; we've seen what competition does for prices. But it's growing, along with the selection of titles available, and it will never be a 'self publishing" market under the control of authors – too many different people involved. Perhaps that's just as well, as it allows for a modicum of quality control, sadly lacking in some other branches. It opens possibilities of special effects, music and sound effects for children's books, embedded graphics and metadata for iPods in the train, subtitles and footnotes for scholars, and abuse of every one of these for those who only want their bedside stories read to them.
Perhaps I should be convincing my sister to do audio versions of her poetry books, with live versions from readings and festivals to studio recordings…