While I was working on a guest post over at Julie Karen’s book design site recently (she’s awesome, BTW), she asked me the following question:
Once you have compiled your list of search terms manually, do you have a software that at least generates the page numbers for you automatically using your list? I think once an indexer told me that if there were slight changes to the book, she could generate a new index quite quickly, so I figured that at that point, it must be a matter of running a page number update, like InDesign can do for updating a Table of Contents.
Julie Karen
The short answer is no, not really. It would be nice, but computer software isn’t quite smart enough in the sense of having any kind of contextual understanding of how a term is used in a paragraph. And if it did, I’d pretty much be out of a job. 🙂
A Concordance
Technically, though, what Julie asked is entirely possible. There are several software programs I’ve run into over the years that create something called a “concordance” by taking a list of terms and searching in the text for them, then listing next to the term all the places in the document in which it is used. Note the term “all.” This is where the software’s lack of semantic understanding causes trouble. You end up with simple terms (single words or phrases) with long strings of page numbers after them, with no other information that, say, subheadings could provide. How would you like to look up all the places that the following term turned up in a book?
Brazil, 33-34,38-39,45-46,48-50,51,54,59-60,62-63,66-67,68,70,79,82-84,86,95,104,132n86,137n21,165n69
Oh dear! (That was just a draft entry and subheadings were made later, but plenty of indexes have been made that just stay that way). And for a scholarly book with complex terms and shades of meaning, you’d get all kinds of useless page numbers that don’t tell you anything useful about the subject or name in the index.
There are places where a concordance might be enough, if the document is simple and short, and if all the subjects are discrete and only mentioned where substantive discussion happens. But most of the time, people need a real index.
Back-of-Printed Book Index
For a back-of-book index for a book only going to print: There’s really no useful way around having an indexer read (or at least scan paragraphs for main topics) and understand to some level what the author is talking about, what their themes are, the more general and more specific topics that an indexer user might care about, whether they have read the book or are browsing it before buying.
That’s not what my indexing software is for; I have to read and make those decisions. I could have some search program go in and look for terms I’d picked out and find every mention of them rather than putting the page numbers in as I go, but that kind of defeats the purpose of the human judgment on what’s indexable. I have tried to start with an author’s preferred term list, but it’s almost never a mechanical job of searching for discrete terms and simply putting page numbers in. Besides, since I need to go through the text and understand it anyway, it’s actually more efficient to put the page numbers (and decide on page ranges) as I go rather than having to go back through the material or have Adobe Acrobat search on terms so I can go read the bit again and put the page number in.
My separate professional indexing software is my index composition buddy. It allows me to choose the terms, decide on subheadings, and put in the page numbers as I go. It saves me a lot of time by knowing how to format the index, and will also keep everything in alphabetical order as I add terms, as well as allowing me to quickly flip a subheading into a main heading. It makes my job faster and saves time on those repetitive mechanical tasks. Yay!
Embedded Index
For an embedded index, I still find it most efficient to build the index in my indexing program as if it were going to be a back-of-book index. Reason being that once you insert terms into the document, they better all be organized into the final form you want, because when you do have Word or InDesign generate the index from those embedded tags, any changes you want to make will take a bunch of time going to the place in the book where the term is embedded and making changes to individual terms. You can’t change the generated index and have those changes automatically back flow into the embedded terms. The coders at Microsoft and Adobe never thought of that or decided it was too much work for the smaller audience it would serve (at least before the rise of the ebook). They were basically thinking like a concordance and assuming you’d just highlight terms and then have a long list of main headings with lots of page numbers next to them. They also didn’t take into account that if you highlight a given term in the document, one instance might be capitalized, and another not. Those differences would create two separate index headings because they are not exactly the same.
The advantage of the embedded index lies only in its ability to adjust to changes in page flow; you still have to do intelligent term selection and organization of entries prior to the embedding task. So, it’s turned out to be a great tool to allow indexes to be “live” links in ebooks.
It just looks like the artificial intelligence level necessary for intelligent automatic indexing will continue to be too expensive to make my indexing career obsolete. And I’m OK with that. For a quality index, a professional indexer is still a great asset. 🙂
Leave a Reply