Charlotte Spinner and Christine Rasmussen, AARP
Staff needed to be able to find articles in back issues of AARP: The Magazine, which goes back to 2003. Library decided to take it on. Wanted to use XML. Approval from management, money from pubs dept.
Different versions of the magazine for different age groups. Sometimes tiny variations in article. Regional variations. A third of the database turned out to be content variations.
A quarter of the issues not available electronically at all. The rest had missing pages, etc.
Spawned new digitization projects, including Modern Maturity, which was published 1958-2003. Also digitizing the founder’s papers.
Increased the visibility of the library. Contracts with Ebsco and Gale, which will bring in money for the association.
1. It’s always harder than you think.
2. It always takes longer than you think.
3. It always costs more than you think.
Richard Hulser, Natural History Museum of Los Angeles County
Scanning books for Biodiversity Heritage Library. Old books with odd fonts and smudges, ink bleed-through, foxing don’t do well with OCR.
Used a crowdsourced game to get the general public to fix OCR errors. You can work on a word or phrase at a time. Beanstalk and Smorball.
Lessons learned: didn’t select game designer in advance, who then spent too long designing, and didn’t leave enough time to collect data within the grant period. But did determine that games are a viable way to improve OCR. Games are open source and could be used by others.
Question about AARP’s XML conversion: Hired a company to do that part.
AARP database: Cuadra Star.
Another blog post about this session: http://www.libconf.com/2016/10/18/digitizing/