Page 1
© Imperial College London
Searching Google and Google
Scholar
A Workshop for Library and Information Assistants across the London
Health Libraries Network
Royal Free Hospital Library – 6/12/13
John Nyman St Mary’s Fleming Library
Imperial College London
Page 2
Learning outcomes – you will
• have reviewed how Google and Google Scholar works
• understand the changing nature of Google
• have tried out some techniques for undertaking more
complex searches using Google
• be aware of some recommended alternatives to Google
Page 3
Agenda
• Presentation (with some online examples)
How Google and Google Scholar works
Problems with using it
Tips for getting the best from it
Alternatives to Google
• Hands-on session
• Conclusion and Questions
Page 4
How does Google Work?
• Googlebot: Google’s web crawling
robot finds and retrieves web pages
• Indexer: receives the full-text of these
webpages from Googlebot, stores them
in the index database, then sorts them
alphabetically by search term.
• Query Processor: matches the terms
you entered to the index and retrieves
webpages in the order of relevance it
thinks you want using over 200 queries
in defining the relevance of the webpage
Page 5
The Google Algorithms
These are the computer processes and formulas that take
your query and turn it into results. There are over 200
signals or clues that are used. Some important ones are:-
PageRank
Counts the number of times a webpage is linked to by other
webpages, and assesses the quality of those pages.
A webpage with a high PageRank will appear before others
in the results displayed as it is deemed more relevant.
Page 6
The Google Algorithms
On-Page (Keyword-Specific) Ranking Factors
Keyword use anywhere in the title tag
On-Page (Non-Keyword) Ranking Factors
Existence of substantive, unique content on the page
Site-Wide (non-link based) Ranking Factors
Site architecture of the domain (structure and hierarchy)
Page 7
Google Supplemental Index
• These are indexed pages that are no longer thought
to be important
• They contain junk, scams and duplicate content
• They have a low PageRank and don’t contain quality
links
• However, you cannot tell whether your results contain
supplemental material, as there is no designation for
this material
Page 8
Using the cache
• Your query retrieves what is in the Google database
• Clicking on the title displays the current version
• Clicking on the green arrow on the second line and
selecting Cache links to a stored version of the page,
which may be older than the current version
• Googlebot is constantly scanning the web for new
material and updates on its indexed webpages
Page 9
Why is the cache useful?
• If there is:-
Internet congestion
When the site is down
The website has been removed
• The cached version will usually still be there as it is
an older version.
Page 10
Searching Google
• Default: usually searches all the words
Measuring disease frequency in populations – will search
Measuring AND disease AND frequency AND populations
• Stopwords
‘in’ ‘on’ ‘at’ ‘and’ will not be searched
• Wild Cards or truncation
Not used. Instead word variations and synonyms are
automatically searched for
Page 11
Command line tips
• Boolean Operators
AND – OR – NOT
• AND is the default – you don’t have to enter it. All other
operators you do
• OR or | (pipe)
Hiv (rnai OR vaccine OR genetics) Hiv (rnai|vaccine|genetics)
• NOT is – (dash)
Diabetes insulin -obesity
Obesity high blood pressure –diabetes
Page 12
More command line options
• Phrase Searching “ “ is useful
Obesity (“high blood pressure” OR hypertension) –diabetes
• Exact word = “ “ – replaces the + sign
“the” times or “the” onion
• Synonyms = ~ (tilde)
~tutorial will find - guides, documentation, introduction
Page 13
More command line options
• Any word = *
This is the night * crossing the border (is it mail or train)
• Changing the word sequence makes a difference –
the natural order retrieves the best results
Pieces of eight
Eight pieces
Elite controllers
Controllers elite
Page 14
Focus the search
• Filetype – to specific document types
Oil exploration Falklands (filetype:ppt OR filetype:pdf OR
filetype:doc)
• Site – to search part of a domain or limit by specific website
Population trends site:gov
Higgs boson site:howstuffworks.com
Hiv site:www.tht.org.uk
malaria site:www.who.int/en
Page 15
Focus the search
• Inurl - the term must appear in the URL
Inurl:media – Inurl:hampstead
• Number range - ..
15..18
• Date Range – Search Tools – Any time
• Uses the Julian calendar as dates are integers easily added
and subtracted – 2456098-2456104
• You can type in date range as well as using the widget
• Geographical Location
Loc: followed by country code – loc:de/in/uk/
Page 16
Searching Aspects of Google
• It will sometimes omit a word when the search
retrieves few results to get something
• This soft ANDing is an embedded Google feature, but
you are not told this
• Unexpected results should alert you to think about
changing your search strategy
Page 17
Advanced Google Search
• This option appears in the top right of the screen
• The options are clear e.g.
Terms Appearing – alerts you to restrict to a field
Safe Search – removes images option
Reading Level - allows you to select content level
Filetype – a comprehensive list appears. Alerts you to the
range and code that you can use in the main search. You are
restricted to one option only though.
Usage rights – Alerts you to copyright and sharing rights and
that internet websites should be cited
Page 18
Advanced Google page
Page 19
Verbatim
• Run your search
• Click on Search Tools – All Results – Verbatim
• This will run your search exactly as input as “phrase
searching” doesn’t always work
• Parole evidence only finds parol evidence –
American jurisdictional spelling overrides other
spellings!
Page 20
Think naturally
• what are the characteristics of…
• how much does…
• the author died on…
• who founded…
• a guide to…
• a checklist of…
• was created by…
• Inspired by Terry Kendrick –Cutting Edge Internet Search Techniques
Page 21
Queries that Google finds difficult
• What would be the best time I could sow seeds in
India given that the monsoon is early this year? From – The Evolution of Search – Youtube video
http://www.youtube.com/watch?v=mTBShTwCnD4
• Google cannot do this yet
Page 22
Google Scholar
• Simple way to broadly search for scholarly literature
• It searches across many disciplines and sources to
retrieve
• Articles – Theses – Books – Abstracts – Court Opinion
• From
Academic Publishers – Professional Societies –
Online Repositories - Universities
Page 23
Google Scholar
• You can set up Google Scholar to access your
institution’s password controlled e-journals for the full
text
• Go to Settings – Library Links – Institution Name
• Link to the full-text via the sfx link
• Through Settings you can also set up reference
management links
Page 24
Google Scholar
• Metrics – will display the top 100 publications
ordered by the 5-year h-index
• H-index – attempts to measure the productivity and
impact of published work
• An h-index of 20 means that the author has 20
papers each of which has been cited 20+ times
• This is an alternative to total citations seen in WoS
which may not truly reflect impact as the author may
only have a few highly cited papers
Page 25
Searching Google Scholar
• Author – use the author operator
• Author:smith
• Title – use “ “ for phrase searching
• Use the sidebar to limit to year range
• Advanced Search – use the arrow in the search box
to get advanced search
Page 26
Other resources 1
• Bing www.bing.com
reported to be often more up-to-date than Google
by Karen Blakeman (personal communication
06/02/12).
Has option to restrict to pages from the UK
• Blekko http://blekko.com
spam free
Both recommended by Karen Blakeman, 06/02/12
Page 27
Other resources 2
• DuckDuckGo https://duckduckgo.com “less spam and clutter” (quote taken from their site)
• Zuula http://www.zuula.com See results from 9 sites, each on a separate tab: Google,
Bing, Yahoo, Gigablast, Exalead, Alexa, Entireweb, Mahala
and Moheek
Both recommended by Karen Blakeman, 06/02/12
Page 28
Other resources 3 - Internet Archive
• The WayBack Machine - http://archive.org/web/
• You need to know the URL, e.g.
www.imperial.ac.uk/library
www.rcn.org.uk
• Useful in seeing the development of websites over time
Page 29
Authentic or Hoax?
• Search: how does google work so fast
• Evaluate: Google Technology
Page 30
Authentic or Hoax?
Page 31
Authentic or Hoax?
• Search: biological hazard detector
homeland security
• Evaluate: SKC Civil Defense Tools
Page 32
Authentic or Hoax?
Page 33
Website checklist
• Authority and Accuracy
• Purpose and Content
• Currency
• Design, Organization and Ease of Use
Page 34
Summary
• Google is constantly changing
• Check Google is doing what you told it to do
• Use natural phrases and repeat important words
• Be specific, not general
• Use the different features for more accurate results –
e.g. Maps – News
• Use Advanced Search when needed – e.g. to restrict
to safe search
• Use other resources
Page 35
Further Insight
• Search now and search in the future – David Russell 2012
Youtube - http://www.youtube.com/watch?v=QDBhP7XTfTI
• Google Guide - http://www.googleguide.com/
• Various commoncraft.com videos
• Karen Blakeman’s Blog http://www.rba.co.uk/wordpress/
• Phil Bradley’s website: Making the net easier
http://www.philb.com/