MED7126 Data and Multimedia Journalism Paul Bradshaw Getting the data Advanced search tips
Jul 14, 2015
What text will it contain? Where will that text be? What text will it not contain?
Imagine the data: text
Specific references, not general:
Specify a constituency… …a school
…an institution code …an invoice number …a piece of jargon
quotes: “disclosure log” asterisk “between * and 2014”
minus “hate crime” -religion -"publication scheme"
Number ranges: 2000..2014
site:gov.uk site:nhs.uk
site:police.uk site:ac.uk site:org.uk
site:org site:birmingham.gov.uk site:met.police.uk/foi/
disclosure
filetype:xls filetype:xlsx filetype:pdf filetype:csv filetype:ppt filetype:doc
filetype:docx filetype:xml
“disclosure log” site:gov.uk allintitle:hate crime report filetype:pdf site:police.uk art inurl:search.asp -library
Combine operators:
Some sites use the robots.txt protocol to tell search engines not to index Use DownThemAll to download the site and search it locally
Sites that aren’t indexed
§
Do it now: Search for a piece of jargon in your field, on a particular type of site Search for spreadsheets or PDFs mentioning an individual in your field