Secret Sauce of Search #InternetLibrarian

Marydee Ojala, editor of Online Searcher

Presentation slides

Our work starts where Google ends.
Anybody can Google, but not everybody searches well.
Search does not equal Google.

Other search engines:
Bing
Yandex
Country versions of Google
Duck Duck Go
Peekier (another one devoted to privacy)
Wolfram alpha
Million Short
Similar Sites
Wayback Machine (archive.org)

Advanced search:
Special syntax, prefixes (site:, filetype:, inurl:) (Bing and Yandex have others)
Phrase searching
Word order, synonyms, language

Non-textual search:
Images, audio, video, datasets
Specific databases at Google, Bing, Yandex
* YouTube, vimeo
* Flickr, Morguefile
* Zanran, Datahub

Specialized search engines:

Topic specific:
* Biznar
* Millie.northernlight.com (market research)
* PubMed

Academic search engines:
* Google Scholar
* Microsoft Academic
* BASE
* Semantic Scholar
* MetaBus

Academic document delivery:
* ResearchGate
* Academia.edu
* Sci-Hub (a pirate site)

Subscription search engines

Skepticism:
* combating fakes and frauds
* look for additional documentation
* what is the source?
* Not every issue has two sides
* Retractionwatch.com
* Not just fake news
* AllSides – News from left, center, and right political viewpoints

Being ethical:
* Doing the right thing
* Teaching people about copyright
* General Data Protection Regulation (GDPR) – protecting people’s personal info (European regulation, coming in 2018)

Going under the hood:
Knowing how search works
Search technologies
Personalization
Machine learning, AI
Semantic search, contextual
Moving away from keywords
Why did I get this result?
Why are these ads following me around?

Google decides to disregard some of your search terms and puts a note telling you what’s missing.

Secret sauce:
Knowledge about search
Be willing to experiment
Think non-linearly, accept imprecesion
Constant updating of our brains
Power of the info pro

Updated to add link to slides.

Advertisements

Practitioner’s Panel: Search Tips and Millenial Searcher Secrets #InternetLibrarian @aainfopro

Amy Affelt
Slides at http://conferences.infotoday.com/documents/293/A103_Affelt.pptx

Fake news. Shared knowing it’s fake, shared without reading page linked to.

Facebook said it was crazy to think it had anything to do with influencing U.S. election. Then said maybe it did. Has some remedies, but they are much too slow.

Google has automatic links to face-checking sites. But are those correct? Or consistent?

CNN pointed to an IFLA document for spotting fake news. https://www.ifla.org/node/11175

Look at domain names. Read “about us” pages.

But respectable sites aren’t always correct. CBS News said Tom Petty died before it was true. Look for supporting sources. One source may not be enough.

Read about the author.

Check the date. Is it old? Is it April 1?

Check your bias. Does the site have a political slant?

Watch out for provocative headlines.

Watch for the “promoted” label.

Fake health news: Check source, plausibility. “The secret doctors won’t tell you.” Look for peer-reviewed article, human trials. HealthNewsReview.org debunks fake health news. Google it for other reactions.

Melissa Zimdars’ list of dubious news sources.

How to Lie with Statistics. Book from 1950s showing misleading graphics.

Fake videos coming next.

Tom Reamy:

Presentation:
http://conferences.infotoday.com/documents/293/A103_Reamy.ppt

Text analytics, taxonomy, training.

Book, “Deep Text”

There is no single definition of “fake news.”

Fake people, automatic bots (e.g., on Twitter)

Google: can be manipulated so top stories on a topic are fake news.

Two drivers: making money and manipulating people.

People make up news to get clicks, which get ad revenue.

Debunking: a fraction of the people who saw the original post will see the debunking one. No money in it. Effects linger.

Can block ads on a site, but that does nothing for politically motivated fake news.

Technical tools to finding misleading domain names, etc.

Automated systems aren’t smart enough and can be manipulated.

Case study: hybrid analysis of news
Inxight Smart Discovery (now SAP), multiple taxonomies.

Pulled in thousands of news stories, used rules to categorize them.

Faster than human review, smarter than automatic solutions.

Weight if word occurs in title, etc. There is no such thing as unstructured text.

Pronoun analysis: “The Secret Life of Pronouns.” All the words that search engines throw out. Analyzed e-mail and could often tell age, gender, power status of writer. Lying and fraud detection: fewer and shorter words, more positive emotion words. C. 76% accuracy.

Deep text solutions:
1. Database of known sites.
2. Deep learning of text/linguistic patterns
3. Flexible categorization rules

Fake news is a serious problem: undermines democracy, communication, civilization.

Multiple factors: multiple solutions

Hybrid human-machine solutions are getting best results. Ultimate solution is better education.

Books:

“Weaponized lies”
“Post-truth”
“Don’t Think of an Elephant”

Libraries can keep pushing information about how to spot fake news.  Colleges and universities can teach critical thinking.

Take a minute to check it out.  You don’t want to be “first but wrong.”

 

Maximizing the Use of the Open Web, Gary Price #InternetLibrarian @InfoDocket

Gary Price

Slides at http://bit.ly/netlibrarianC

Open Web, for lack of a better term, is what you find on Google.

Interested in specialty resources, primary sources.  Example, recent report from International Red Cross had lots of great data on disasters.

How to get this info into traditional library resources?  People don’t know about this material.

People don’t know about putting phrases in quotation marks or using the site: operator.

How he searches for new resources on Google:

  1. Search by domain, such as site:senate.gov
  2. Use tools to limit within, say, past year
  3. Turn off relevance and sort by date
  4. Limit to filetype:pdf

Could send out by e-mail, blog, RSS feed.  You could use IFTTT to convert RSS to e-mail.

Zapier is similar to IFTTT.

Central repository for librarians to put their finds?

Could automatically update LibGuides or open textbooks.

Early attempts for librarians to curate the web: LII, IPL, BUBL.  People thought Google would solve all of these problems and we wouldn’t need curation.  Gary says curation is needed now more than ever, but we need to take it to the next level.

Somebody wants to know about mental health,  tell them about WHO’s MindBank — a curated database of international sources on mental health.  Don’t just tell them to go to the WHO site.

Used to be known as selective dissemination of information (SDI).

If you tell them about something when it’s new, you become known as the person who knows about new resources.  It may not always be useful to them, but it reminds them that you’re doing this.

If you tell people to go to your web site, they forget.  If you tweet something once, people may not see it hours or days later.

What if we take the trouble to add these reports to our collection and then the link goes dead?  Maybe we should be archiving them.  You can go to the Wayback Machine and use the “Save Page Now” option.

The UK government has an RSS feed of new government reports.

NY Academy of Medicine grey literature report.

California Research Bureau: “studies in the news” (California State Library)

[My own contribution to a specialty curated collection: http://tinyurl.com/waterdistrictclimate ]

Super Searcher Apps, Sites & Tools, Gary Price #InternetLibrarian @InfoDocket

Gary Price, LJ Infodocket

Presentation at http://bit.ly/netlibrarianB

Uses Zoho, similar to and predating Google Docs.

Webrecorder records your web browsing (HTML, not PDF).  Can view later or download the pages you saved.

AudioSear.ch search for podcasts.  Directory of podcasts; keyword-searchable for some podcasts.

Don’t forget C-Span video search. Create custom clips.

BASE: Bielefeld Academic Search Engine.  100 million articles, many of them open access. 4,600 content sources. Can get RSS feed from any of them.

Semantic Scholar: Currently focused on telecommunications.

SHARE

CORE: Open access research papers.

Microsoft Academic: Re-launched.

Inoreader: fee-based. Aggregates RSS feeds, Twitter streams, Google News alerts.  All keyword-searchable.  Many ways to share info, including sending e-mail.

Time Travel: searches Wayback Machine and similar archives.

Various data search engines.

http://datasearch.elsevier.com

Notablist: aggregating business e-mail campaigns.

Downie: Best tool for downloading web video. c. $19 one-time charge.

Photomath: Take a picture of an algebra problem and it solves it.

Google Translate app works with pictures (e.g., a French menu)

CamFind: describes pictures, even TV.

NewsLookup: similar to Google News, global in scope.

Newsbrief.eu Also worldwide.  Source list.

Medisys.Newsbrief.eu Medical subset of Newsbrief.

WHO Mindbank: News on mental health.

Import.io: Data from web pages

JournalTOCs: Journal tables of contents

Another blog post about this session: http://www.libconf.com/2016/10/17/super-searcher-apps-sites-tools/

Edited to add links.

Power Searcher Techniques & New Trends, Greg Notess #InternetLibrarian @notess

Presentation available at http://conferences.infotoday.com/documents/259/A101_Notess.pptx

Majority of searches coming through mobile.

Mobile emphasis:

  • Fewer searche features: no search tools, cache; shorter snippets
  • Additions//advantages: fewer ads
  • Different results
  • Soon: different databases

Within months, Google will divide its index and give mobile users “better” and fresher content.  (Search Engine Land)

Accelerated Mobile Pages (AMP): now live at Google and Bing.  Publishers can optimize their pages for mobile.  In search results, you’ll see a little lightning bolt and/or AMP.

Knowlege Graph: Box with info, answers rather than links, “things not strings.”  Concept match rather than text match.

In the search box, Bing starts giving suggestions.  “Information literacy papers by …” gives author suggestions.

Google: RankBrain

Machine learning, artificial intelligence, predicts which results get chosen.

Machine learning:

Used by 100+ Google Teams.  RankBrain is just one.

Searcher impacts:

Google increasingly seeks to interpret queries.  Straight text matching becoming rare. Ideal is the Star Trek computer.  Some syntax tricks (like the + ) don’t work any more.  More image search, conversational searching.  But, they emphasize, there is still solid growth in desktop and tablet searching.

Google changes:

Gone: location search tool, PageRank scores, separate tablet interface (now uses mobile).

“Right to be forgotten” works with searchers geolocated within Europe.

Medical symptoms search: expanded knowledge cards.  Working with medical professionals to provide reliable info.

Google quality raters: Hiring humans to evaluate search results.  146 pages of guidelines.  Expertise, authoritative, trustworthy (EAT).  Less emphasis on supplementary content, such as “About” pages.

Google Images: saving and tagging, filtering with colored buttons.

Truncation: an asterisk at the end of a word doesn’t do much for you, since Google is doing all kinds of synonym and concept searching.  However it does work for a missing word in a phrase: “a wealth of information creates * attention.”

Number range still works in Google.  5..8.34 or 5-8.34  Even works with a $

Bing operators:

filteytpe:, intitle: and some odd ones.

Google News:

81 country/language editions

New “local source” tag in some areas.

Google Scholar:

Known item searches now give one result!

Added suggested queries at the bottom.

Legal searches offer “sort by date.”

Save to “my library.”  Can even edit citations.

Titles change: newspapers, preprint vs. published article. Better to search on a phrase within the article.

Some blogs on Googleblog.com rather than Blogspot.

Remember to click on the little triangle for cached and similar pages.

Remember Wayback Machine.

Archive.is

Webcitation.org

Memento feature on Chrome

Diversify search engines: use Bing, Duck Duck Go, Gigablast (has a show metatags button)

Link searching:

Gigablast (link: and sitelink:)

Open Site Explorer (3 a day)

Majestic.com (free registration)

Updated to add links.