Roy Tennant, OCLC
Catalog data is:
* A collection of text strings
* Taken from the piece itself
* Sometimes enhanced with extra information
* Additional statements (like subject headings)
* Punctuation, which can be inconsistent
* Mostly uncontrolled and loosely connected to anything else
* Designed for description rather than discovery (describing differences, rather than similarity)
* Identification: titles (the “Hamlet” problem)
* Identification: names (the Wang, Lee problem; John Rock the scientist or the abolitionist?)
* Data quality problems (e.g., inconsistencies in format indicator [245 h])
First, define all the things (entities)
Author: wrote a Title (Work): about a Subject
Authors can be identified by URIs in wikidata or VIAF.
Works can be identified in WorldCat Works.
Subjects can be identified in LCSH authority file.
Could have catalogs that show works, then all the different manifestations of them.
War Between the States page.
Fiction Finder (different editions of fictional works) – helps solve the title problem.
Can link together translations of a given work.
Bringing authority control to a web environment (VIAFs in Wikipedia)
Make your MARC records more consistent, standard; use URIs — to make them more linked data-ready. (One recommendation: minimize use of 500 fields.)
Report: “Library Linked Data in the Cloud.”
Question about using 5xx fields to make things more searchable.
Structured fields work better.