Digital archive from scratch, Solomon Blaylock #InternetArchive @SolomonBlaylock

Presentation at http://conferences.infotoday.com/documents/319/A205_Blaylock.pptx

Until recently at Middlebury Institute of International Studies, Monterey

When we started at the school, made a point of asking people, “What are you working on?”

Retired special ops/student is working on a special operations research database going back to World War II.

File types: videos, PDFs, Word docs, images, streaming links. Proof-of-concept site on WordPress.

Challenges:
* Standardization, naming conventions
* Site organization
* Improved searchability
* Scalability

* WordPress update
* Project plan: interviews, task list
* Resources: no time from I.T. Staff, had to do most of it himself and use network of contacts. UCLA intro to digital humanities; The Getty’s intro to metadata; Dublin Core guides.
* Workflow
* Guide, so he could hand it off to the project team. Data input standards. How to upload to Omeka. How to upload videos to YouTube and have it to do auto-transcriptions.

Installed Omeka on a server. I.T. Wanted to vet any plug-ins they wanted to use.

Documentation at: https://library.woodbury.edu/c.php?g=878987

Crawled & Collected, now what? Access & discovery in web archives #InternetLibrarian @IndustryDocs @StanfordLibs @UCDavisLibrary @archiveitorg

Slides at Google docs

Jillian Lohndorf, Internet Archive

Largest web archive in existence. Web archives aim to collect as much of the content/code as possible so it looks as close as possible to the original experience.

Topical collections.

Web history for a specific institution: records retention, FOIA laws, historical record. (Also national libraries collecting web sites from their countries.)

* Capture: Heritrix
* Storage: WARC is industry standard, redundant storage
* Access: Playback mechanism is necessary. Wayback has its own.

Additional consideration:
* Search (Archive-It)
* Metadata

Integrations:
* Catalog
* Web site
* WorldCat

Can create derivative files: metadata, visualization

Kris Kasianovitz, Stanford

Archive of state and local government web sites. *ca.gov web space. California Digital Library, State Library, State Archives, U. Of California, Stanford.

Using Archive-It.

700+ seed URLs.

Realized they were missing metadata. Collection-level records on WorldCat. Did a “metadata sprint,” call for volunteers. Used Dublin Core fields: coverage, subject, languages, etc.

Agencies go away. Take info from “About” page.

Public libraries could call for suggestions for “seed” pages in their communities.

Kevin Miller, UC Davis

https://archive-it.org/collections/5778

Using Archive-It.

Archives sites related to the campus and Davis community. Including UC Davis individuals, such as prominent faculty. A way to capture their work (such as on blogs) before they retire. Using ORCiD to find UC Davis faculty and their publications.

Automated taxonomy pilot for Archive-It. Script determines the “aboutness” of a website.

Integration with:
* Library catalog
* Finding aids

Rachel Taketa, UCSF

https://idl.ucsf.edu

Industry documents from companies “that negatively impact public health.” Where they have strategies to mislead the public. For example, tobacco, chemicals (e.g., Monsanto), pharmaceuticals.

Collect documents that are produced during lawsuits, documents from whistleblowers.

Tobacco industry site has about about 15 million documents.

Coming soon: sugar industry.

E-cigaratte advertising. Campaigns against cigarette taxes. (Campaign web site go down the day after the election.)

Access through Archive-It and on their site.

Using spreadsheet to cross load metadata from Archive-It to IDL.

Question re. Copyright: Internet Archive leaves it up to their partners. UCSF doesn’t worry about it: commercial and electoral materials. Stanford: state and local government sites can be copyrighted. Depend on robots.txt files to alert them. If they run into that, contact the agency. UC Davis: alert faculty and let them opt-out (in itself an opportunity for outreach as well).

Updated to add links.

Digitizing and Archiving, Susie Kopecky #InternetLibrarian

Allan Hancock College, Santa Maria

Have an archive from the Hancock family, whose estate is the site of the college. Also have an archive devoted to the history of the college.

Hancock was a wealthy oilman in Southern California in the early 20th century.

They have approximately 60 large, flat archival boxes, 28 wider boxes, photo boxes, newspaper clippings, correspondence, etc. Previous librarians cleaned the items and moved them to acid-free containers. None trained as archivists.

Sorted:
1. Cleaned and entered into an Access database.
2. Cleaned but not entered into the dB.
3. Neither cleaned nor entered into the dB.

A storage container suffered water damage.

Metadata: decide what info to enter and how to label items with unique identifiers. Former librarians volunteered.

* Year object created
* Accession no.
* Title
* Author
* Brief description
* Part of another collection?
* Subject
* Cross-listed events and individuals

Got a scanner.

Some of the first scans: telegrams and evidence of Captain Hancock’s pursuits.

Challenges:

* Scanner went missing
* Having limited time
* Not currently having volunteers
* MS Access is kind of clunky
* Scanning large amounts of info in a timely fashion. (Want to have something ready for 100th anniversary of college in 2020.)
* Want a cloud-based system to share collection

Hosting possibility: Airtable. Currently being used by a performing arts archive.

Software possibility: ArchivesSpace (formerly Archivist’s Toolkit)

Crowdsourcing ideas for the above. (History students as interns, California Digital Library)

Another problem: nitrate films are flammable. May not be able to keep them.

Suggestion: LC site on personal archiving. (Possibly
http://www.digitalpreservation.gov/personalarchiving/)

Suggestion: Anything you do yourself takes lots of babying. ContentDM is expensive, but very nice.

Open source: Omeka S, Islandora

Brainstorming a content management program, Jaye Lapachet @JayeLapachet #InternetLibrarian

Jaye Lapachet, J8 Consulting

Slides here: http://www.jayelapachet.com/2018/10/17/internet-librarian-2018/

San Bruno fire, PG&E gas line blew up. PG&E had to go through pallets of documents. (SF Chronicle 3/5/2011)

Companies come to her when they are about to do an IPO and need to find documents for the SEC.

Start somewhere: it can be paper or digital.

* Culture
* People
* Process
* Systems
* Audit & control

Vague, but you can make them work for your organization. It has to work for your organization.

Culture: Try to disrupt ongoing business as little as possible.

People: Listen to ideas. Try to do things upfront that are quick wins. Findability can be one. People need to know that the way they do their work and find their information are being considered. They have to know that they’re being heard.

Process: Identifying silos. Don’t segregate by format. When someone goes to look for information, it’s all in there. Where content is needed. Information governance. Review taxonomies, but allow personal terms that may only show up for an individual user. Taxonomies need “care and feeding” (updating, etc.).

Systems: Not just buying tools. Inventory systems, which could be tools or software, but could be processes. What you have and how it’s being used. Expand those, merge them when possible.

Audit & control: regulations, etc.

Get a champion. Have succession planning in place (for yourself as the content manager).

Team collaboration spaces (e.g., Microsoft Teams, Google documents).

Find people’s hidden talents. You can make a database from that.

Think outside the box. Can blockchain help with content management? It’s good with people who don’t trust each other. Walmart is testing it with food products. Maybe a QR code could keep track of who opened a document and changed it.

Question about products to share documents with specific people: Lucidea has a project called Presto.

Question about Google Drive and privacy: Google is going to know what you put up there. Make sure you use their business products. I don’t think you have any privacy with Google, but read your contract. Dropbox and Box might be better, since they are meant for business.

When you work for a company, any work you do is for the company. But if people are concerned about privacy, you can anonymize things.

You can read the contract. You can ask for changes.

Her web site is: www.jayelapachet.com.

Edited to add links.

Culture in Transit: Digitizing and Democratizing NYC’s Cultural Heritage #InternetLibrarian @AnneKZ

Anne Karle-Zenith

Slides here: http://conferences.infotoday.com/documents/259/E302_Karle-Zenith.pdf

Scanning program for METRO (New York City + Westchester)

Digital Culture of New York

(Everything gets harvested by the NY State digital library, then Digital Public Library of America)

Switched from ContentDM (an OCLC product) to Islandora for database.

Blog: Culture in Transit

Toolkit coming soon.

Institutional scanning:

Had a goal to get to 10-15 institutions — small, but interesting libraries that didn’t have time/money/staff to do it themselves — in a year.  They got to 10 and scanned 1,600 items.  One person would go to the institution with portable scanning gear, spend about two weeks scanning, then another two weeks back at the office doing processing, metadata, etc.

Community Scanning:

Went to Brooklyn and Queens public libraries, scanned people’s materials and talked to them about what they were.  Returned digital copies to donors on a thumb drive.  Not just libraries: also schools, churches, cemeteries, bars.  Three to four staff who knew all phases of the project would go to a site.  Encouraged community groups (for example, Filipinos of Queens).

Computational Text Analysis #InternetLibrarian

Cody Hennessy, UC Berkeley

Slides are available here: http://conferences.infotoday.com/documents/259/A204_Hennesy(1).pptx

Not exactly my line of work, but interesting.

Group of people at UC Berkeley who do or are interested in text analysis/text mining/distant reading (as opposed to close reading). Hennessy attends so he can learn and advise (for example, not to download the whole Proquest database, because it’s copyrighted and that would be a violation of the university’s license agreement).

The Congressional Record is a favorite source, because it’s in the public domain and includes both spoken and written text.

Another blog post on this session: http://www.libconf.com/2016/10/19/computational-text-analysis-k-text-mining/

Digitizing #InternetLibrarian @CybrarianViews

Charlotte Spinner and Christine Rasmussen, AARP

Presentation here: http://conferences.infotoday.com/documents/259/A203_Spinner.pptx

Staff needed to be able to find articles in back issues of AARP: The Magazine, which goes back to 2003.  Library decided to take it on.  Wanted to use XML.   Approval from management, money from pubs dept.

Different versions of the magazine for different age groups.  Sometimes tiny variations in article.  Regional variations.  A third of the database turned out to be content variations.

A quarter of the issues not available electronically at all.  The rest had missing pages, etc.

Spawned new digitization projects, including Modern Maturity, which was published 1958-2003.  Also digitizing the founder’s papers.

Increased the visibility of the library.  Contracts with Ebsco and Gale, which will bring in money for the association.

1. It’s always harder than you think.

2. It always takes longer than you think.

3. It always costs more than you think. (actually under budget)

3. Pave the way.

4. Have solutions ready for the naysayers.

5. Roll up your sleeves.

6. (Gently) push, and push some more.

Richard Hulser, Natural History Museum of Los Angeles County

Scanning books for Biodiversity Heritage Library.  Old books with odd fonts and smudges, ink bleed-through, foxing don’t do well with OCR.

Used a crowdsourced game to get the general public to fix OCR errors.  You can work on a word or phrase at a time.  Beanstalk and Smorball.

Lessons learned: didn’t select game designer in advance, who then spent too long designing, and didn’t leave enough time to collect data within the grant period.  But did determine that games are a viable way to improve OCR.  Games are open source and could be used by others.

Question about AARP’s XML conversion: Hired a company to do that part.

AARP database: Cuadra Star.

Another blog post about this session: http://www.libconf.com/2016/10/18/digitizing/

Transforming Our View of Roles & Services, part 2 #InternetLibrarian @RebeccaJonesgal @desertlibrarian @stembrarian

Rebecca Jones, manager of branches for a large public library

Has worked in corporate libraries. Skills: project management, training (i.e., adult learning), knowledge management, I.T., consulting.

Important right now: project management, knowledge management, data management.

“Seize whatever you want to do.”

Ruth Kneale, system librarian at Daniel K. Inouye Solar Observatory

embedded, solo, runs all the databases, web sites, document manager, tech support.

Turned them on to things like Skype and Dropbox

Testing equipment at new observatory under construction.

Engineers still do “red lines” on paper drawings.  She takes pictures of them every three months to create as-built drawings.

Her job ends when construction is done in 3 1/2 years.

As the only librarian, she gets reference requests and does publication tracking (i.e., articles written based on work at the observatory).

Camille Mathieu, JPL

Six librarians, but also “knowledge managers” and “information managers” elsewhere and a large I.T. dept. that builds things in-house.

Does reference and publication tracking.

Shifting focus to internal information management.

Teresa Powell, Raytheon (previously Boeing and Rochester Electronics)

At Boeing, had to integrate collections and databases from companies that they acquired.  Eventually closed satellite libraries, centralized and digitized collections.

At Raytheon, again there are satellite libraries, which report to different manufacturing groups.  Have to justify space.  Wants to do something other than the traditional library.

Rebecca Jones:

Any organization has research and development.  Librarians could be part of that.

Librarians need to think more about ongoing operations and maintenance of service.

Librarians need to use our metadata skills to curate local data/documents.  What is happening with local newspaper, university publications, etc.?

Questioner:

Asking people, “What can we do for you?”

Or, “We can do X.”

Rebecca Jones:

Don’t do the first one.  Know what people’s needs and info seeking behaviors are and tell them how you can help.  Don’t ever ask people what they want.  They don’t have a clue.  Watch what people are doing, listen to what they say, do interviews, what are your biggest barriers, how can you expedite that?  Then figure out how you can help.

Transforming Our View of Roles & Services, part 1 #InternetLibrarian

Teresa Powell, Raytheon

Has been there 1 month.  Formerly archive manager at Rochester Electronic, before that at Boeing library.

Slides: http://conferences.infotoday.com/documents/259/C201-202_Powell.pptm

At Rochester, in charge of design documentation. No books, journals, electronic resources.  In boxes with spreadsheets listing contents. Powell was hired to organize this in 2013.  Two staff members worked for her.

Drawings on tapes in a “CADD-like format.”

No standards, no authority control, manual checkout, materials scattered.

Got materials physically in the library.  Implement ILS (Soutron Global).

Lots of abbreviations and non-standard metadata in Excel spreadsheets.

Called their catalog the “Chip Crypt.”

Needed to set up categories:

  • US vs non-US
  • Intellectual property (original manufacturer vs. Rochester)

Did not show location info (box, etc.) to users.

Tracked service requests in ILS.

Built thesauri to track part names and numbers — which could be expressed multiple different ways — and make cross references.  The cross reference thesaurus became useful as a stand-alone database for staff to be able to figure out what chips they could  make with existing materials.

File submission page: Brief form for users to submit forms and add notes.  Brief as possible to encourage people to use it.

Archives expanded to include knowledge management for all manufacturing documentation.

Couldn’t browse ILS.  So they implemented the archive module of the ILS.  Developed hierarchical tree similar to what engineers were used to seeing on a shared drive.

Talked about re-branding from “Archive Services,” but that hadn’t happened while she was at Rochester.

Are you positioned to be effective?  Where are you in the org. chart?  Should you change your library’s name?  Can you get a seat at the table with management?  Does your org. have someone setting info. policy?  Do they know what knowledge management is?  (I.T. people often have a different idea.)  Can you lead the way?

How can you add value?  What are the info. pain points?  Need to learn the business.  (She took a one-week crash course in semiconductor mfg.)  “How can we help?”  Market your capabilities.

Look beyond traditional librarian services for your next opportunity.

Questioner talks about his organization, where I.T. suggested crawling everybody’s e-mail and Sharepoint to make one big knowledge management system.  He and Powell agree that Sharepoint isn’t much use if there isn’t good metadata.

Ask people what pieces of info. are useful, what would you search by?

Question about retention: how do you get rid of records about obsolete products?  Powell says they deal with products with a very long life.

Mighty morphin’ map rangers #internetlibrarian #il2015

Carol Doyle and Patrick Newell, CSU Fresno

Slides

Map catalogs often give you a lot of individual maps. It’s not easy to find what you want.

Often people want aerial photos for a given place over time.

Idea: a GIS interface. Map and Aerial Locator Tool (MALT)

Can use map or start with address or APN. Get down to image in ContentDM, also metadata. If something is not online, you get info about how to find it.

Footprint digitizing and ESRI geo database describes the place.

Contacted Calif. State Library, which convened a meeting. State, local agencies as well as libraries.

Lots of map collections aren’t well documented, because they weren’t cataloged and the person who knew about them has left.

State government maps are copyrighted, unlike federal ones. Asking state librarian to advocate for public domain.

Recommend state librarian arrange training on best practices for digitization.

Recommend state library set up some common archive.

California Preservation Project!

State agencies sometimes have the only copies of map, but there’s no access and no preservation.

Require that institutions use standard metadata as a condition of funding.