Computational Text Analysis #InternetLibrarian

Cody Hennessy, UC Berkeley

Slides are available here: http://conferences.infotoday.com/documents/259/A204_Hennesy(1).pptx

Not exactly my line of work, but interesting.

Group of people at UC Berkeley who do or are interested in text analysis/text mining/distant reading (as opposed to close reading). Hennessy attends so he can learn and advise (for example, not to download the whole Proquest database, because it’s copyrighted and that would be a violation of the university’s license agreement).

The Congressional Record is a favorite source, because it’s in the public domain and includes both spoken and written text.

Another blog post on this session: http://www.libconf.com/2016/10/19/computational-text-analysis-k-text-mining/

Advertisements

Digitizing #InternetLibrarian @CybrarianViews

Charlotte Spinner and Christine Rasmussen, AARP

Presentation here: http://conferences.infotoday.com/documents/259/A203_Spinner.pptx

Staff needed to be able to find articles in back issues of AARP: The Magazine, which goes back to 2003.  Library decided to take it on.  Wanted to use XML.   Approval from management, money from pubs dept.

Different versions of the magazine for different age groups.  Sometimes tiny variations in article.  Regional variations.  A third of the database turned out to be content variations.

A quarter of the issues not available electronically at all.  The rest had missing pages, etc.

Spawned new digitization projects, including Modern Maturity, which was published 1958-2003.  Also digitizing the founder’s papers.

Increased the visibility of the library.  Contracts with Ebsco and Gale, which will bring in money for the association.

1. It’s always harder than you think.

2. It always takes longer than you think.

3. It always costs more than you think. (actually under budget)

3. Pave the way.

4. Have solutions ready for the naysayers.

5. Roll up your sleeves.

6. (Gently) push, and push some more.

Richard Hulser, Natural History Museum of Los Angeles County

Scanning books for Biodiversity Heritage Library.  Old books with odd fonts and smudges, ink bleed-through, foxing don’t do well with OCR.

Used a crowdsourced game to get the general public to fix OCR errors.  You can work on a word or phrase at a time.  Beanstalk and Smorball.

Lessons learned: didn’t select game designer in advance, who then spent too long designing, and didn’t leave enough time to collect data within the grant period.  But did determine that games are a viable way to improve OCR.  Games are open source and could be used by others.

Question about AARP’s XML conversion: Hired a company to do that part.

AARP database: Cuadra Star.

Another blog post about this session: http://www.libconf.com/2016/10/18/digitizing/

Transforming Our View of Roles & Services, part 2 #InternetLibrarian @RebeccaJonesgal @desertlibrarian @stembrarian

Rebecca Jones, manager of branches for a large public library

Has worked in corporate libraries. Skills: project management, training (i.e., adult learning), knowledge management, I.T., consulting.

Important right now: project management, knowledge management, data management.

“Seize whatever you want to do.”

Ruth Kneale, system librarian at Daniel K. Inouye Solar Observatory

embedded, solo, runs all the databases, web sites, document manager, tech support.

Turned them on to things like Skype and Dropbox

Testing equipment at new observatory under construction.

Engineers still do “red lines” on paper drawings.  She takes pictures of them every three months to create as-built drawings.

Her job ends when construction is done in 3 1/2 years.

As the only librarian, she gets reference requests and does publication tracking (i.e., articles written based on work at the observatory).

Camille Mathieu, JPL

Six librarians, but also “knowledge managers” and “information managers” elsewhere and a large I.T. dept. that builds things in-house.

Does reference and publication tracking.

Shifting focus to internal information management.

Teresa Powell, Raytheon (previously Boeing and Rochester Electronics)

At Boeing, had to integrate collections and databases from companies that they acquired.  Eventually closed satellite libraries, centralized and digitized collections.

At Raytheon, again there are satellite libraries, which report to different manufacturing groups.  Have to justify space.  Wants to do something other than the traditional library.

Rebecca Jones:

Any organization has research and development.  Librarians could be part of that.

Librarians need to think more about ongoing operations and maintenance of service.

Librarians need to use our metadata skills to curate local data/documents.  What is happening with local newspaper, university publications, etc.?

Questioner:

Asking people, “What can we do for you?”

Or, “We can do X.”

Rebecca Jones:

Don’t do the first one.  Know what people’s needs and info seeking behaviors are and tell them how you can help.  Don’t ever ask people what they want.  They don’t have a clue.  Watch what people are doing, listen to what they say, do interviews, what are your biggest barriers, how can you expedite that?  Then figure out how you can help.

Transforming Our View of Roles & Services, part 1 #InternetLibrarian

Teresa Powell, Raytheon

Has been there 1 month.  Formerly archive manager at Rochester Electronic, before that at Boeing library.

Slides: http://conferences.infotoday.com/documents/259/C201-202_Powell.pptm

At Rochester, in charge of design documentation. No books, journals, electronic resources.  In boxes with spreadsheets listing contents. Powell was hired to organize this in 2013.  Two staff members worked for her.

Drawings on tapes in a “CADD-like format.”

No standards, no authority control, manual checkout, materials scattered.

Got materials physically in the library.  Implement ILS (Soutron Global).

Lots of abbreviations and non-standard metadata in Excel spreadsheets.

Called their catalog the “Chip Crypt.”

Needed to set up categories:

  • US vs non-US
  • Intellectual property (original manufacturer vs. Rochester)

Did not show location info (box, etc.) to users.

Tracked service requests in ILS.

Built thesauri to track part names and numbers — which could be expressed multiple different ways — and make cross references.  The cross reference thesaurus became useful as a stand-alone database for staff to be able to figure out what chips they could  make with existing materials.

File submission page: Brief form for users to submit forms and add notes.  Brief as possible to encourage people to use it.

Archives expanded to include knowledge management for all manufacturing documentation.

Couldn’t browse ILS.  So they implemented the archive module of the ILS.  Developed hierarchical tree similar to what engineers were used to seeing on a shared drive.

Talked about re-branding from “Archive Services,” but that hadn’t happened while she was at Rochester.

Are you positioned to be effective?  Where are you in the org. chart?  Should you change your library’s name?  Can you get a seat at the table with management?  Does your org. have someone setting info. policy?  Do they know what knowledge management is?  (I.T. people often have a different idea.)  Can you lead the way?

How can you add value?  What are the info. pain points?  Need to learn the business.  (She took a one-week crash course in semiconductor mfg.)  “How can we help?”  Market your capabilities.

Look beyond traditional librarian services for your next opportunity.

Questioner talks about his organization, where I.T. suggested crawling everybody’s e-mail and Sharepoint to make one big knowledge management system.  He and Powell agree that Sharepoint isn’t much use if there isn’t good metadata.

Ask people what pieces of info. are useful, what would you search by?

Question about retention: how do you get rid of records about obsolete products?  Powell says they deal with products with a very long life.

The Story of Telling: Future-Proofing Libraries, Brendan Howley #InternetLibrarian

Brendan Howley, BothAnd

Library asked them to “show me the heart of my community.”

Why are flamingoes pink?  To look stunning, also recursiveness.

Mind-controlling zombifying tapeworms cause crustaceans to turn pink, who get eaten by flamingoes.

It’s a feedback loop (recursiveness)

Tapeworms come out the back end of the flamingoes, where they reproduce.

If you want to grow community networks through storytelling, you have to build stories people want to share. Recursiveness make stories people want to share.

A virtuous circle of sharing and yet more sharing.

When people share stories, they also share behaviors.

How we share stories tell us who we are.  Shared stories are intelligence tools.

Relevancy: why should I care about your library programming?

Currency: Is this story about your library important to me now?

Intensity: Does this story about your library “have legs”?

[Howley promises to post this online, so I will take less detailed notes.]

“The Story Engine”: the best stories aren’t one-offs, they’re non-linear.  Complex stories interweave with themselves.

Libraries have great brands, most trusted than almost anyone.

Sustainable stories keep telling themselves.

Either infectiously funny or so human, so wise, so moving we can’t help but share them.

Idea: Find the funniest person you can and have them make a video about the library.

Make heroes of your users, you cardholders.  It’s not about you, your storytelling should be about your community.  Use on social media and people keep telling their stories.  Recursive, virtuous circle.

“The library digital relevancy index.”

Most library mobile apps have a bounce rate of 50%, which means people come twice and don’t come back.

People don’t care about you, they care about themselves. Empathy is the goal. (See yesterday’s keynote.)

Every company wants to change the world.  Libraries really do change the world.

Regular content updates.  Good content now is worth more than perfect content on Friday.

Don’t sell them news, sell them a relationship.

Think snack size: what people will see on their phones.

Someone needs to own your social media presence.

Created OpenMediaDesk

Agile process, failing as fast as we can to get to success.  Treat Facebook, Twitter, Instagram posts like tests.  Again, a recursive pattern.

“Nobody knows anything.”

Test, fail, reassemble, re-test.

Unsplash.com Open source photography you can re-use.

Libraries sharing ideas. One library had a comic book giveaway program that drew 12- to 18-year-old boys and shared idea with other libraries.

Friend Brendan Howley or M’lissa Story on Facebook to follow along with what the libraries are doing.

CXI: tool for library data to demonstrate social ROI.

Every expectation and interaction a cardholder has with your library, its services and staff.

The end (flamingo backside)

Another blog post about this session: http://www.libconf.com/2016/10/18/future-proofing-libraries-tuesday-keynote/

Enterprise Search #InternetLibrarian

Camille Mathiem, Jet Propulsion Lab:

Presentation available at http://conferences.infotoday.com/documents/259/A105_Mathieu.pptx

Hard to get money, prove ROI

Architecture: no enterprise is the same as another

“Why can’t it work like Google?”

They use Elasticsearch.

5,000 knowledge workers

150 content silos (that they know of)

Nice display with facets.

At first there were some unacceptable results: failed 50% of the time to provide relevant results in top 5 results.

Librarians were able to make searches more relevant by looking at top 100 searches and tweaking results.  Also, deleting irrelevant pages.

Poor metadata, missing or generic titles (such as “slide 1”) was fixed.

However, band-aid solutions are not sustainable solutions.  Just looking at the top 100 searches misses all the other searches.

Responsive communication with users (who are often content creators as well) is the greatest asset for improving search.

Respect the link between DAM and search.

Most effective solutions may lie in curating content.

Future work:

Machine learning

User-search interaction: anticipatory design, social tagging

Content or repository tagging: consistent metadata

Enterprise search needs a librarian’s respect for curation and metadata.

Sarah Dahlen and Kathlene Hanson, CSU Monterey Bay

Presentation here: http://conferences.infotoday.com/documents/259/A105_Dahlen.pdf

Wanted to find if abstracting and indexing databases provided added value with a discovery tool.

Used 50 students and tested 3 search tools:

Social Sciences Abstracts

Default Summon

“Pre-scoped” Summon

Results split evenly, but that means 2/3 preferred one of the discovery layers.

Further areas to explore:

Student use of controlled vocabulary, metadata, search facets.

Library search tools’ use of: subject/discipline scoping, relevance ranking

Question re. missing metadata in documents uploaded to a repository:

JPL: Working on metadata standards.

JPL: Using Sharepoint and Docudata (?)