UK Commission for Employment and Skills LMI for All Developing a Careers LMI Database: Final Report (02/07/15) Career Database Project Team Warwick Institute for Employment Research Jenny Bimrose, Rob Wilson, Sally-Anne Barnes, David Owen, Yuxin Li, Anne Green, Luke Bosworth, Peter Millar, Andy Holden Pontydysgu Graham Attwell, Philipp Rustemeier Raycom Raymond Elferink Rewired State Julia Higginbottom
158
Embed
UK Commission for Employment and Skills LMI for All Developing … · 2017-05-16 · UK Commission for Employment and Skills LMI for All Developing a Careers LMI Database: Final Report
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
UK Commission for Employment and Skills
LMI for All
Developing a Careers LMI Database:
Final Report (02/07/15)
Career Database Project Team
Warwick Institute for Employment Research
Jenny Bimrose, Rob Wilson, Sally-Anne Barnes, David Owen, Yuxin Li, Anne Green,
Luke Bosworth, Peter Millar, Andy Holden
Pontydysgu
Graham Attwell, Philipp Rustemeier
Raycom
Raymond Elferink
Rewired State
Julia Higginbottom
Contents
Executive summary .................................................................................................. i
Figure C.1 Labour market questions in 2011 Census of Population ............... 123
Figure C.2 Journey-to-work questions in 2011 Census of Population ............ 124
Table C.1 Mapping from ISCO08 to SOC2010 ................................................... 127
Table C.2 Map from ISCO 88 to SOC2010 at 2-digit level ................................. 128
Glossary
API API, an abbreviation of application program interface, is a set of
routines, protocols, and tools for building software applications.
A good API makes it easier to develop a program by providing
all the building blocks. A programmer then puts the blocks
together.
App An App or application is a computer software application that is
coded in a browser-supported programming language (such as
JavaScript, combined with a browser-rendered mark-up
language like HTML) and reliant on a common web browser to
render the application executable. Apps are accessed by users
over a network.
ASHE The Annual Survey of Hours and Earnings, from the Office for
National Statistics, provides information about the levels,
distribution and make-up of earnings and hours worked for
employees in all industries and occupations.
BRES Business Register and Employment Survey collects data to
update local unit information and business structures on the
Inter-Departmental Business Register (IDBR) and produce
annual employment statistics, which are published via the
NOMIS website. It replaces the Business Register Survey and
the Annual Business Inquiry.
CEN Chancellor Exchequer’s Notice is required to access potentially
disclosive data.
CSS Cascading Style Sheets (CSS) is a style sheet language used for describing the look and formatting of a document written in a mark-up language. It is designed primarily to enable the separation of document content from document presentation, including elements such as the layout, colours, and fonts and can improve accessibility.
Data cube A data cube is commonly used to describe a time series of image
data representing data along some measure of interest. It can
be 2-dimensional, 3-dimensional or higher-dimensional. Each
dimension represents some attribute in the database and the
cells in the data cube represent the measure of interest. Queries
are performed on the cube to retrieve decision support
information.
DLHE Destinations of Leavers from Higher Education is a survey of
qualifiers from higher education (HE) institutions, which is
conducted in two parts. The first stage asks what leavers were
doing six months after they qualified from their HE course. The
second stage or longitudinal survey is a follow-up survey that
looks at the destinations of leavers three and a half years after
they qualified. Managed by the Higher Education Statistics
Agency (HESA).
ESS The Employer Skills Survey conducted by UKCES provides
information on business management, recruitment, skills gaps
and vacancies. The surveys are designed to be representative
of the employer population across geography and sector.
ETLs Extract, Transform and Load processes are for database usage,
including: extracting data from external sources; transforming it
to fit operational needs, which can include quality levels; plus
loading it into the end database.
Hack day Hack days (also known as Hackathons or Appathons) bring
together experts and developers to collaborate or work alone
rapidly prototyping software or hardware, building mobile and
web apps or quick models for new ideas and features.
ILO The International Labour Organization is devoted to promoting
social justice and internationally recognised human and labour
rights. It helps advance the creation of decent work and the
economic and working conditions that give working people and
business people a stake in lasting peace, prosperity and
progress. Its main aims are to promote rights at work, encourage
decent employment opportunities, enhance social protection
and strengthen dialogue on work-related issues.
JACS JACS (Joint Academic Coding of Subjects) is the subject classification system used to describe the subject content of courses at UK Higher Education institutions. JACS3 is used from 2012/13. This was developed jointly by HESA (Higher Education Statistics Agency) and UCAS.
JCP Jobcentre Plus, part of the Department for Work and Pensions
(DWP). It provides services that support people of working age
from welfare into work, and helps employers to fill their
vacancies. Main supplier of vacancy data.
JSON JavaScript Object Notation is a lightweight data-interchange
format. It is a text format that is language independent using
familiar conventions that can be found in the C-family of
languages, including C, C++, C#, Java, JavaScript, Perl, Python
and others.
LFS The Labour Force Survey, conducted by ONS, is a quarterly
sample survey of households living at private addresses in the
UK. Its purpose is to provide information on the UK labour
market.
LMI Labour market information is data, graphs and statistics that
describe the condition of the past and current labour market, as
well as make future projections.
Modding day The modding day follows a hack day. Its aim is to take forward
the developments of the hack day and to produce a more
useable and defined product.
MySQL MySQL is a type of database management system that enables
data to be added, accessed and processed in a database. It is
open source. MySQL is supported by Microsoft and Oracle.
NQF NQF The National Qualifications Framework (NQF) is a former credit transfer system developed for qualifications in England, Wales and Northern Ireland. It was replaced in 2010 with the Qualifications and Credit Framework.
NOMIS Web-based database of labour market statistics from ONS,
includes statistical information on the UK labour market (i.e.
Employment, Unemployment, Earnings, Labour Force Survey
and Jobcentre Plus vacancies).
NQF National Qualification Framework sets out the level at which a qualification can be recognised in England, Northern Ireland and Wales. Only qualifications that have been accredited by the three regulators for England, Wales and Northern Ireland can be included in the NQF. This ensures that all qualifications within the framework are of high quality, and meet the needs of learners and employers.
NUTS1 Nomenclature of Units for Territorial Statistics. This is a geocode
standard for referencing the subdivisions of countries for
statistical purposes. The standard is developed and regulated
by the European Union. There are three levels of NUTS defined.
In the UK, NUTS1 represents the regions of England, plus
Wales, Scotland and Northern Ireland.
O*NET The Occupational Information Network is a US program
providing a primary source of occupational information. Central
to the project is the O*NET database, containing information on
standardised and occupation-specific descriptors. Information
from this database forms the heart of O*NET OnLine
http://www.onetonline.org/, an interactive application for
exploring and searching occupations.
ONS The Office for National Statistics is an Executive Office of the
UK Statistics Authority. It is responsible for the collection,
compilation, analysis and dissemination of a range of economic,
social and demographic statistics relating to the UK.
RAS RAS is an iterative procedure where the rows and columns of preliminary estimates of a two dimensional array are iteratively changed using proportions that are based on ‘target’ row and column totals (see Section A.8).
Relational
database
A relational database is the predominant choice in storing data
that conforms to relational model theory.
Scala and
Scalatra
Scalatra (using Scala) is a web micro-framework that helps the
developer quickly build high-performance websites and APIs.
SDS The Secure Data Service provides safe and secure remote
access by researchers to data previously deemed too sensitive,
detailed, confidential or potentially disclosive to be made
available under standard licensing and dissemination
arrangements.
SIC The Standard Industrial Classification is used to classify
business establishments and other statistical units by the type
of economic activity in which they are engaged. The latest
version in SIC2007.
SOC The Standard Occupational Classification is a common
classification of occupational information for the UK. Jobs are
classified in terms of their skill level and skill content. The latest
version is SOC2010. SOC 4-digit provides a list of occupations
at a more detailed level.
SPARQL A recursive acronym for SPARQL Protocol and RDF Query
Language. This is an RDF query language, that is, a query
language for databases, able to retrieve and manipulate data
stored in Resource Description Framework format. SPARQL is
a format favoured by linked data proponents as it allows
advanced queries and the ability to query between different
datasets.
SQL server This is a relational database server, developed by Microsoft. It
is a software product designed to store and retrieve data as
Staging A staging site is a website used to assemble, test and review its newer versions before it is moved into production.
Standard server,
web container of
servlet container
This is the component of a web server that interacts, is responsible for managing servlets, mapping a URL to a particular servlet and ensuring that the URL requester has the correct access rights.
SSIS This is a platform for data integration and workflow applications. It features a fast and flexible data warehousing tool used for data extraction, transformation, and loading (ETL). The tool may also be used to automate maintenance of SQL Server databases and updates to multidimensional cube data.
TTWA or Travel-To-Work-Area
TTWA indicates an area where the population would commute to another area for the purposes of employment.
Ubuntu Linux
LTS
This is a popular open source operating system for servers and cloud computing.
UKDA The UK Data Archive is curator of the largest collection of digital data in the social sciences and humanities in the UK.
Universal
Jobmatch service
Universal Jobmatch is the Department for Work and Pensions
(DWP) online service, which is open to all jobseekers,
regardless of whether or not they are claiming a benefit. It works
by matching jobseekers to jobs based on their skills and CV.
Visual Basic (VB) Visual Basic is a third-generation programming language from Microsoft. It enables rapid application development of graphical user interface applications and access to databases.
Working Futures Detailed historical and projected employment estimates produced on behalf of UKCES (for details see: http://www.ukces.org.uk/ourwork/working-futures)
XCRI XCRI stands for eXchanging Course Related Information. It is the UK standard for describing course information.
To identify and investigate which robust sources of LMI can be used to inform the
decisions people make about learning and work; and
To bring these sources together in an automated, single, accessible location (referred
to as the LMI for All database), so that they can be used by developers to create
websites and applications for career guidance purposes.
These were represented in three separate, but inter-related work strands, specified by the UK
Commission for Employment and Skills, identified below together with their related objectives,
all of which have been fully met:
Data development:
To identify the key information that is used in making decisions about learning and
work.
To explore the feasibility of including UK wide data where this is available.
To prepare the data and bring these together with other data sources as part of a single
access point.
Accessibility and open data:
To produce an initial version of the data tool (this refers to the LMI for All database,
platform, web portal and API), based on lessons learned from the pilot feasibility
project.
To develop subsequent iterations of the data tool, in-line with stakeholder feedback, to
be gathered as part of the project process.
Stakeholders and communication:
To test the data tool, through two separate iterations (for the first and second phases
of the project) of hack and modding days.
To consult with stakeholders in the broad community of career guidance practice.
To disseminate findings to a wider audience, through various methods.
1.3. Report structure
This final project report focuses on the Phase 2B activity. It deals with the three different work
strands separately: data development (section 2); accessibility and open data: technical
developments (section 3); and stakeholder and communications (section 4). A summary and
recommendations can be found in section 5, identifying the next steps necessary to secure
LMI for All going forward. Its specific purpose is to document progress of LMI for All during
Phase 2, detailing the data processing required to populate LMI for All, technical development
supporting the LMI for All, current data available and the stakeholder engagement process to
raise the profile of the offer.
5
1.4. Following up recommendations from Phase 2A
Phase 2A of the project demonstrated the practical feasibility of developing a comprehensive
careers LMI data tool designed to support individuals make better decisions about learning
and work. LMI for All was, therefore, further developed to meet the LMI needs of these
individuals (as well as other potential users in the longer term). Existing data were used, from
robust and reliable (mainly official) sources. However, a number of gaps in the existing data
were identified only some of which could be filled within the scope of the current project.
The main indicators used in the LMI for All database in Phase 1 (October 2012 – May 2013)
continued to be at its core in Phases 2A and 2B, (June 2013 – March 2015). These include:
Employment and employment forecasts based on Working Futures (these include
information on qualifications and replacement demands);
Unemployment rates (using the International Labour Organization definition of
unemployment2) based on the LFS;
Pay (estimates based on a combination of ASHE and LFS data);
Hours worked (ASHE);
Vacancy estimates (based on ESS and Universal Jobmatch);
Vacancies (based on a fuzzy search from Universal Jobmatch);
Occupational descriptions (ONS).
Phase 2B also considered:
Various refinements to the way these estimates are generated and presented (e.g.
focusing on medians/deciles, rather than means).
Some work outside the LMI for All project (e.g. refining the projections of employment
at the 4-digit occupational level, which required an extension to the then current
Working Futures database).
The full, revised O*Net dataset, including Skills, Abilities, Interests and Knowledge, as
well as a number of other skill related indicators;
Other possible indicators and enhancements considered for inclusion in the LMI for All
database during Phase 2B, included:
Further work to integrate Universal Job Match (UJM) vacancy data into the database
more fully, once mapping to occupational categories has been resolved;
Making greater use of data from higher education, such as HESA information on the
destination of graduates (this required detailed negotiation with data owners);
2 The ILO definition of unemployment covers people who are: out of work; want a job, have
actively sought work in the previous four weeks and are available to start work within the next
fortnight; or out of work and have accepted a job that they are waiting to start in the next
fortnight.
6
Course information – although a great deal of information is available about courses
of study and links to different career paths, this is not well coordinated or consistent -
work was undertaken to assess the feasibility of bringing this into the database.
The UK Census of Population, especially local labour market information (there is
limited sub-regional information), including some commuting and workplace data);
NOMIS, consideration of using the API to include workforce jobs data at regional level,
the unemployment claimant count and data from the APS;
Use of more information from the Cedefop pan-European employment database – this
is equivalent to the UK Working Futures employment database (but only available at
2-digit occupational level).
It was concluded during Phase 2A that the following should not be included in the database in
Phase 2B:
ONS Vacancy Survey (no occupational detail);
Annual Population Survey (does not add much to LFS);
Jobcentre Plus vacancies (historical data only – series discontinued); and
European Union labour Force Survey (EULFS, problems with availability and detail).
Early discussions took place in Phase 2B regarding technical priorities and server capacity.
The development and maintenance of a vibrant web portal with support services for users and
developers was undertaken to promote uptake. Consideration was given to the resources this
requires, not only in technical terms, but in design, moderation and intervention to respond to
and support developers and users. Such resources have to be balanced with priorities for
further data and technical development.
Continuous encouragement and support was given to organisations with an interest in using
the early release of the web portal and API, which is part of the approach to testing, evaluating
and improving the pilot tool, as well as demonstrating the benefits to a wider audience.
This included a more strategic use of social media and dissemination at key events throughout
Phase 2B, to ensure the web portal and API were promoted to create demand for the product
and to maintain the momentum of interest.
The successful format of the hack and modding days carried out in Phase 2A was repeated
in Phase 2B. These events were successful in not only helping to prove the viability of the
database, but also ensuring that career stakeholders were able to contribute to the
development process.
Active participation of key stakeholder representatives throughout the project was carefully
designed to ensure engagement and raise awareness of the resource. Throughout Phase 2B,
there was an on-going dialogue with organisations that expressed an interest in using the API
and their feedback was gathered in order to inform further refinements and amendments to
the database and API.
The intention was for communication of the web portal concept to go beyond traditional
dissemination methods (e.g. newsletters, professional publications, presentations at various
7
events, etc.). Visual representations of potential applications were made available to various
audiences, in response to advice on priority target groups and their career needs collected
from key stakeholders (see section 4).
1.5. Data overview
As of end of Phase 2B the LMI for All API contains key data from the following data sets,
available from a single access point:
Employment (historical and projected) and replacement demands from Working
Futures;
Weekly Pay based on the Annual Survey of Hours and Earnings and the Labour Force
Survey;
Weekly Hours based on the Annual Survey of Hours and Earnings;
Occupational descriptions (based on ONS information;
Skills, Abilities, Interests and Knowledge required in different jobs (based on US
O*NET data);
Changes in pay by occupation, 2012-2013;
Unemployment rates based on the Labour Force Survey;
Vacancies (including skills shortage vacancies based on the Employer Skills Survey)
and some information on current vacancies from Universal Job Match (UJM) made
available by Monster/DWP;
Census data (details of geographical location of jobs and travel to work distances); and
First destination of graduates (HESA data).
Sources of these data include: the Working Futures employment database; the Labour Force
Survey; Annual Survey of Hours and Earnings; UKCES Employer Skills Survey; and the
O*NET skills database. Also included in the database are the ONS occupational descriptions.
A detailed overview of the data included is presented in Section 2. Figure 1.5 provides a
summary.
Relevant labour market data have been organised by occupational category using the 2010
Standard Occupational Classification (SOC) at unit group (4-digit) level as a framework. An
index of c.28,000 job titles mapped to SOC provides the basis for the end-user to search, and
gain access to, data of interest and relevance in an intuitive fashion.
8
Figure 1.5 Overview of data and variables in the LMI for All database
LMI for All Database
Employment (historical time series 2000-12)*
Projected employment (2012-22)*
Future job openings (replacement needs)*
Weekly Pay (2013)*
Weekly hours (2013)*
Occupational descriptions
Skills required (based on US O*NET data))
Changes in pay 2012-2013
Unemployment rate
Current vacancies (ESS data and UJM)
Cenus data (details of geographical location of jobs and travel to work distances)
First destinations of graduates (HESA data)
Data (for Core indicators*) by:
SOC2010 4-digit occupations
Employment status
Highest qualification held
Countries and English
regions within the UK
Gender
9
2. Data development
2.1. Approach to providing data
The LMI for All database requires detailed data if it is to be useful for services that support
individuals in making better informed decisions about learning and work. Individuals and those
supporting career transitions have an interest in knowing which jobs are available,
distinguishing sector, occupation and typical qualifications required, as well as the typical pay
and hours associated with those jobs. Ideally, the full set of detail required is as follows:
Occupation (up to the 4-digit level of SOC2010, 369 Categories);3
Sector (up to the 2-digit level of SIC2007, about 80 categories);
Geographical area (12 English regions and constituent countries of the UK);4
Gender and employment status (full-time, part-time employees and self-employed).
The 369 SOC 4-digit occupational categories lie at the heart of the database prepared for LMI
for All. Information at this level of detail is provided everywhere possible, although not all data
are available at that level of detail.
The core data provided comprises detailed information, as described above, for:
Employment (time series of historical and projected levels, plus (for the future only)
projected replacement needs (RDs));
Pay (for a recent year (currently 2013, for employees only);
Hours (for a recent year (currently 2013, for employees only).
In addition, less detailed information is provided on the following:
Occupational descriptions for 4-digit occupations (based on ONS information);
Skills, Abilities, Interests and Knowledge data mapped to 4-digit occupations based on
US O*NET information;
Changes in pay by detailed 4-digit occupation (between 2012 and 2013);
Unemployment rates (based on LFS data);
Hard to fill vacancies (currently limited primarily to data from the UKCES Employer
Skills Survey);
Current job vacancies based on UJM API;
Census (limited information for 2011 on occupational employment at a detailed
geographical level and on travel to work distances); and
3 Some have argued for an even more detailed breakdown to the 5-digit level of SOC, but this is not
feasible given data currently available. 4 Plus in some cases additional information on: age; gender; status; and qualification (highest held).
10
Occupational destinations of graduates by detailed 4-digit occupation (based on HESA
data).
The initial approach to developing the LMI for All database focussed on using the APIs from
official sources in order to facilitate quick and automatic updates. However, it soon became
apparent that there were a number of problems and pitfalls with this approach. The main
difficulties arise because many of the official data sources that it was intended to tap into were
not designed for the purpose of providing very detailed labour market information to support
career transitions.
The key issue is around the connected matters of:
Disclosure;
Confidentiality; and
Statistical reliability.
Many of the official statistics are collected under the terms of strict legal instruments, which
ensure confidentiality for those providing the data. These guarantee that these data will not be
published in such a manner as to disclose commercially sensitive or other confidential
information about the companies or individuals concerned. The Office for National Statistics
(ONS), which is responsible for collecting and publishing the information, has strict rules in
place to ensure that this is the case. This poses quite severe limits on the level of detail that
can be placed into the public domain. It should also be noted that key data owners (such as
ONS) do not currently have APIs in place that allow easy access to very detailed data on
indicators such as employment and pay.
The other important consideration is statistical reliability. This is essentially a matter of the
sample size on which the statistics are based. Many of the official sources are based on
samples, which while large in statistical terms, are not large enough to provide robust
information at a very detailed level. This applies to both the Business Register and
Employment Survey (BRES), which is the main source of information on employment by
industry, and the Labour Force Survey (LFS), which is the main source of information on the
structure of employment by occupation, qualification and employment status. Reliance on the
raw survey data would, therefore, severely limit the level of detail that could be provided.
This issue has been addressed previously in the context of developing the Working Futures
(WF) employment database (See Wilson and Homenidou, 2012a, 2012b). The solution
adopted there was to combine the various official sources and to create estimates of
employment at a more detailed level than it is possible to obtain from the official surveys alone.
This has been combined with putting in place checks to ensure that the data generated are
robust (in a general statistical sense) and that they do not breach confidentiality nor are
disclosive. Following detailed discussions with ONS, it was concluded that:
11
First, the aggregation of information on employment by industry to some 75 industries
(by English region and UK nation) could avoid problems of disclosure;5 and
Second that as long as sources such as the LFS and the Annual Survey of Hours and
Earnings (ASHE) were used to produce estimates for general groups rather than
revealing information on individual cases, then this should not breach confidentiality.
Further details of how the official sources have been used to generate detailed estimates of
Employment, Pay and Hours are set out in Annex A. In addition, for pay, supplementary
information is provided showing variation by age, based on a parametric approach.
2.2. Summary of data and indicators included in the portal
2.2.1. Core Indicators
Employment (historical estimates and projections, based on LFS, BRES, etc.)
For reasons discussed above, these are taken from the Working Futures model, which is in
turn based on BRES and LFS data. The use of the raw data from BRES and the LFS does not
provide a suitable source of the kind of detailed data needed to populate the database.
It is important to emphasise that individual observations from these official surveys on
Employment (or Pay or Hours worked) are not required. What is needed for careers purposes
is general information on ‘typical’ pay or general employment opportunities in particular areas
for people with selected characteristics. The official data are a means to this end rather than
being required for their own sake.
The level of detail required in the LMI for All database can be obtained by replacing the official
‘raw’ data by estimates or predictions. For employment, the Working Futures employment
database has been used. The Working Futures database includes historical information on
employment by both Occupations and Qualifications. The latter shows the numbers employed
by highest level of qualification held using the National Qualification Framework (NQF) system
of classifying levels of qualification. The measure of employment used is workforce jobs rather
than a head count of people in employment.
The standard Working Futures employment database only provides information up to the 2-
digit level of the Standard Occupational Classification (SOC2010). This has been extended
for the LMI for All database to the 4-digit level by combining the database with additional
information on the patterns of employment at this more detailed level using LFS data. These
historical estimates are constrained to match the main Working Futures database using an
extended version of the algorithm developed to produce the main Working Futures database
(For details see Wilson and Homenidou, 2012b).
Although estimates can be generated for the full level of detail shown at the start of this section,
not all of these are reliable and robust. In order to rule out such information, the API censors
results that fall below a certain threshold and flags up cases where the estimates may be less
5 Without the necessity for a Chancellor of the Exchequer’s Notice (CEN), for details, see Annex A. Information
at a more detailed sub-regional level cannot be provided without running into such problems, as well as
concerns about statistical reliability of the estimates.
12
reliable. These criteria are based on rules developed for the main Working Futures database.
The rules used are based on the practice recommended by ONS for use of LFS data:
1. If the numbers employed in a particular category/cell (defined by the countries/regions,
gender, status, occupation, qualification and industry) are below 1,000, then a query
returns ‘no reliable data available’ and offers to go up a level of aggregation across
one or more of the main dimensions (e.g. UK rather than region, aggregation of
industries rather than the most detailed level, or SOC 2-digit rather than 4-digit).
2. If the numbers employed in a particular category/cell (defined as in (1)) are between
1,000 and 10,000 then a query returns the number but with a flag to say that this
estimate is based on a relatively small sample size and if the user requires more robust
estimates they should go up a level of aggregation across one or more of the main
dimensions (as in 1).
This also applies to estimates of replacement demands as well as employment levels. Full
details are given Annex A.
The published Working Futures database also provides projections by occupation at a 2-digit
level of SOC2010. In principle, more detailed projections are feasible but this is limited by the
quality of the available data upon which the analysis is based (primarily the LFS). In the LMI
for All project the possibility of using common growth factors applied to all 4-digit unit groups
within a 2-digit category (i.e. assuming fixed shares) was explored and then taken to an
operational level. As long as these results are clearly presented as projections based on
simple assumptions rather than precise predictions, then it is feasible to generate such
numbers as projections (rather than forecasts). This is the spirit in which even more detailed
occupational projections are made in the US by the Bureau of Labor Statistics (See Wilson,
2010) for more detailed discussion).
The LFS enables reasonably robust estimates of the current shares of employment in SOC 2-
digit categories that are employed in the 4-digit unit groups they contain at the all industry
level. In principle, trends within the 4-digit occupation can also be considered and used to
develop more realistic projections although only a small number of historical observations are
available. In practice, fixed shares for 4-digit categories within the broader 2-digit categories
were applied. This was implemented in Phase 2b of the LMI for All project.6 The initial attempts
to produce 4-digit projections in Working Futures based on this assumption ran into problems
for some detailed categories such as chefs. This occupation has fairly positive growth
prospects but is part of a larger 2-digit occupational grouping for which employment was
projected to decline across all industries quite sharply. It was not possible to generate
plausible projections for chefs, which are heavily concentrated in the hotel and restaurant
sectors within that total. An amended set of 2-digit occupational projections across all
industries was therefore produced for LMI for All, which differed slightly from the original
published Working Futures estimates.
6 There is also scope for considering variations by industry (although sample sizes in the LFS would
preclude doing this at a more detailed a level than the six broad sectors used in Working Futures).
13
Figure 2.2 Data overview – LMI for All
All the indicators are currently available in the database although the present version of the API does not provide access to all of these. Providing
extended access to these dimensions will require a rewrite of the whole endpoint and we recommend making all this available in an API v2,
whilst noting the need to allow access to the original API for those with currently running applications based on this version.
Notes: * Occupation (SOC2010 4-digit), Industry (SIC2007, 75 industries), Qualification (NQF 0-8), Geography (UK countries and English regions), Gender, Status (full-time or part-time employee and self-employed). ** Geography available for Output Areas, Lower and Middle Super Output Areas and the hierarchy of local government areas from wards to regions and nations # For 2000-2011 SOC2010 2-digit data is only available
14
Pay (estimates based on a combination of ASHE and LFS)
In the feasibility study (Bimrose et al., 2012) information on pay was extracted from the LFS.
UKCES were keen to make use of data on pay from ASHE as this is thought to be more reliable
(because information is provided by employers, rather than being the subject of individuals’ recall)
and because it is based on a larger sample. However, despite this, ASHE is still not able to deliver
robust information at a very detailed level (i.e. for individuals classified by a combination of
detailed industry, occupation and region). This is partly because of concerns about disclosure,
but also because the limited sample size means that estimates have a high degree of uncertainty.
This issue is exacerbated if information on variations in pay by age is also required. A further
problem is that ASHE does not have any information on pay by qualification.
In order to get around these problems, the LMI for All database is based on a set of
estimates/predictions of pay rather than the raw survey estimates and is based on a combination
of information from both ASHE and the LFS. Analysis of pay using earnings equations is a well-
established way of understanding the key factors that influence pay. In order to ensure that the
predicted pay figures match up with the published official data at a “headline” level, an algorithm
to constrain the data to match agreed ‘targets’ has been developed. This is analogous to the
procedure used to generate the detailed Working Futures employment data, described in the
previous section. This is now done for both part-time and full-timers.
Queries to the LMI for All database about Employment and Pay (and Hours) also check the
implied sample sizes to see if the estimates are likely to be unreliable. In the case of Pay (and
Hours) the API interrogates the part of the LMI for All database holding the employment numbers
to do the checks, as in (1) and (2) above, but then reports the corresponding Pay or Hours values
as appropriate. Again, full details are given in Annex A.
Finally additional analysis has also been included to enable estimates of deciles, including median
pay levels, to be derived from the detailed estimates of mean pay. These estimates are based on
assumptions that pay is log normally distributed rather than the statistical properties of the original
sample data in ASHE or the LFS.
Hours worked (ASHE)
As in the case of Pay, relevant information is available from the LFS or ASHE, but in both cases
very detailed data cannot be extracted because of concerns about disclosure, confidentiality or
statistical reliability. The ASHE data are regarded as the more reliable (for the same reasons as
Pay) and are therefore used here.
This problem has been addressed in a similar way to Pay, by producing predictions for Hours in
place of the raw survey data. In principle, a regression equation could be used to produce these
estimates although there is no direct equivalent to the well-established ‘earnings equation’. This
was explored in Phase 2b of the project. In practice, a non-parametric method has been used
based on the published data. The occupational patterns of weekly hours in the ASHE data set are
assumed to apply for all industries and constrained to the published ASHE hours for Industry and
15
Occupation. As for Pay, the API checks for reliability and where necessary, suppresses unreliable
data. Again, full details are given in Annex A.
2.2.2. Other Indicators
Occupational descriptions (ONS)
ONS have collated information on detailed job descriptions for SOC2010 4-digit categories. This
is very useful for supporting career transitions, because the description details methods of entry
into an occupation including the qualifications required and a list of tasks involved in the job. It is,
therefore, included in the LMI for All database. Detailed information is provided for each SOC2010
4-digit category.
ONS have prepared a detailed job description for each occupation distinguished in SOC2010.
These go to the 4-digit level. This textual information has been added to the LMI for All database.
The following three text boxes provide examples of the kind of information available for sub major
group 1.1 (2-digit level) with information for a selection of two 4-digit level categories (1115 and
1116, referred to as unit groups here). Similar information is available for all of the 369 unit groups
(4-digit categories).
SUB-MAJOR GROUP 11
CORPORATE MANAGERS AND DIRECTORS
Job holders in this sub-major group formulate government policy; direct the operations of
major organisations, local government, government departments and special interest
organisations; organise and direct production, processing, maintenance and construction
operations in industry; formulate, implement and advise on specialist functional activities
within organisations; direct the operations of branches of financial institutions; organise and
co-ordinate the transportation of passengers, the storage and distribution of freight, and the
sale of goods; direct the operations of the emergency services, revenue and customs, the
prison service and the armed forces; and co-ordinate the provision of health and social
services.
MINOR GROUP 111
CHIEF EXECUTIVES AND SENIOR OFFICIALS
Jobholders in this minor group plan, organise and direct the operations of large companies
and organisations and of special interest organisations; direct government departments
and local authorities; and formulate national and local government policy.
Occupations in this minor group are classified into the following unit groups:
1115 CHIEF EXECUTIVES AND SENIOR OFFICIALS
1116 ELECTED OFFICERS AND REPRESENTATIVES
16
1115 CHIEF EXECUTIVES AND SENIOR OFFICIALS
This unit group includes those who head large enterprises and organisations. They plan, direct and co-ordinate, with directors and managers, the resources necessary for the various functions and specialist activities of these enterprises and organisations. The chief executives of hospitals will be classified in this unit group. Senior officials in national government direct the operations of government departments. Senior officials in local government participate in the implementation of local government policies and ensure that legal, statutory and other provisions concerning the running of a local authority are observed. Senior officials of special interest organisations ensure that legal, statutory and other regulations concerning the running of trade associations, employers’ associations, learned societies, trades unions, charitable organisations and similar bodies are observed. Chief executives and senior officials also act as representatives of the organisations concerned for the purposes of high level consultation and negotiation.
TYPICAL ENTRY ROUTES AND ASSOCIATED QUALIFICATIONS
Entry may be by appointment or internal promotion, as appropriate, and is usually based on relevant experience although candidates may also require academic qualifications for some posts.
TASKS
analyses economic, social, legal and other data, and plans, formulates and directs at strategic level the operation of a company or organisation;
consults with subordinates to formulate, implement and review company/organisation policy, authorises funding for policy implementation programmes and institutes reporting, auditing and control systems;
prepares, or arranges for the preparation of, reports, budgets, forecasts or other information;
plans and controls the allocation of resources and the selection of senior staff;
evaluates government/local authority departmental activities, discusses problems with government/local authority officials and administrators and formulates departmental policy;
negotiates and monitors contracted out services provided to the local authority by the private sector;
studies and acts upon any legislation that may affect the local authority;
stimulates public interest by providing publicity, giving lectures and interviews and organising appeals for a variety of causes;
directs or undertakes the preparation, publication and dissemination of reports and other information of interest to members and other interested parties.
RELATED JOB TITLES
Chief executive
Chief medical officer
Civil servant (grade 5 & above)
Vice President
17
O*NET Skills data
The feasibility study (Bimrose et al., 2012) suggested that the US O*NET database could be
exploited in the UK to provide useful information about the skills involved in carrying out different
jobs. The US database has been developed over many years and contains a very rich set of
information classified using the US equivalent to SOC2010. The feasibility study used some
mappings developed in an earlier study to link SOC2010 occupational categories to the US ones.
It showed that this could then be used to exploit information on STEM skills developed in the US
based around two particular areas entitled ‘Abilities’ and ‘Basic Skills’ in the O*NET database.
The present project has reassessed the mappings and also explored the other areas covered by
the O*NET system. This includes a much richer set of skills and related attributes. These add
considerable value from a careers guidance perspective and are therefore included in the full LMI
for All database.
1116 ELECTED OFFICERS AND REPRESENTATIVES
Elected representatives in national government formulate and ratify legislation and government policy, act as elected representatives in Parliament, European Parliament, Regional Parliaments or Assemblies, and as representatives of the government and its executive. Elected officers in local government act as representatives in the local authority and participate in the formulation, ratification and implementation of local government policies.
TYPICAL ENTRY ROUTES AND ASSOCIATED QUALIFICATIONS
Entry is by election.
TASKS
represents constituency within the legislature and advises and assists constituents on a variety of issues;
acts as a Party representative within the constituency;
participates in debates and votes on legislative and other matters;
holds positions on parliamentary or local government committees;
tables questions to ministers and introduces proposals for government action;
recommends or reviews potential policy or legislative change, and offers advice and opinions on current policy;
advises on the interpretation and implementation of policy decisions, acts and regulations;
studies and acts upon any legislation that may affect the local authority.
RELATED JOB TITLES
Councillor (local government)
Member of Parliament
18
The full set of US O*NET indicators now comprises:
Indicator Description7
Abilities O*NET-SOC codes (occupations) Ability scores – enduring attributes of the individual that influence performance (e.g. cognitive, physical, psychomotor and sensory)
Skills O*NET-SOC codes (occupations) Skill scores – developed capacities that facilitate learning or the more rapid acquisition of knowledge (e.g. basis, complex problem solving, resource management, social, systems and technical skills)
Interests O*NET-SOC codes (occupations) Interests scores – preferences for work environments and outcomes (e.g. realistic, investigative, artistic, social, enterprising and conventional)
Content Model Reference
Content Model elements and descriptions
Education, Training, and Experience Categories
Categories associated with the Education, Training, and Experience content area
Education, Training, and Experience
O*NET-SOC codes (occupations) per cent frequency data associated with Education, Training and Experience
Job Zone Reference
Job Zone data in seven tab delimited fields
Job Zones O*NET-SOC code (occupations) and its corresponding job zone number
Knowledge O*NET-SOC codes (occupations) Knowledge scores – organised set of principles and facts applying in general domains
Level Scale Anchors
Scale anchors associated with the four content areas
Occupation Data O*NET-SOC codes (occupations), occupational titles and definition/description
Occupation Level Metadata
O*NET-SOC codes (occupations) and the associated Occupation Level Metadata
Scales Reference Scale information by which the raw values are measured
Task Categories Categories associated with the Task content area
Work Activities O*NET-SOC codes (occupations) and the associated Content Model Work Activity data – general types of job behaviours occurring on multiple jobs (e.g. information input, interacting with others, mental processes and work output)
Work Context Categories
Categories associated with the Work Context content area – physical and social factors that influence the nature of work (e.g. interpersonal relationships, physical work conditions and structural job characteristics)
Work Context O*NET-SOC codes (occupations) Work Context scores
Work Styles O*NET-SOC codes (occupations) and the associated Content Model Work Styles data – personal characteristics that can affect how well someone performs a job (e.g. achievement/effort, adaptability/flexibility, analytical thinking, attention to detail, concern for others, cooperation, dependability, independence, initiative, innovation, integrity, leadership, persistence, self-control, social orientation and social tolerance)
Work Values O*NET-SOC codes (occupations) and associated Content Model Work Values data associated – global aspects of work that are important to a person’s satisfaction (e.g. achievement, independence, recognition, relationships, support and working conditions)
Green Occupations
O*NET-SOC codes (occupations) and associated Green occupations associated
Green Task Statements
O*NET-SOC codes (occupations) and associated Green Task Statements data associated
More detailed information on O*NET indicators and descriptors see Annex B.
Unemployment (LFS)
The unemployment rate is an important indicator for supporting careers transitions. The
unemployment rate represents the probability of a worker of a given type, or living in a particular
location, being unemployed. The unemployment rate in an occupation is a key indicator, providing
information on the likelihood of securing employment. Various sources provide information on
unemployment by occupation including the Census of Population and the official series on
claimant unemployment made available on NOMIS. However, only one source offers the
possibility of developing a consistent time series on the unemployment rate by detailed occupation
classified using SOC2010. 8 This is the LFS. This adopts the standard ILO definition for
unemployment rate (those unemployed and actively searching for work expressed as a
percentage of the economically active workforce). The data available are only classified on a
SOC2010 basis from 2011 onwards, but data on the old SOC200 basis are available for earlier
years. In principle, the unemployment rate can also be calculated by age, gender and occupation
for statistical regions from the LFS.
While the LFS microdata can be used to calculate unemployment rates for SOC 4-digit
occupations, the sample sizes involved can be very small (resulting in problems of breaching
8 The official claimant series uses SOC2000 and hence cannot be used.
20
confidentiality and statistical reliability of estimates). Estimates of the unemployment rate have
therefore been generated, using the End User Licence version of the LFS microdata. In principle,
these allow detail up to the same level as shown for employment at the start of this section, but
in practice, there are many gaps in the data and the results for many categories are based on
sample sizes too small for the results to be reliable. The same rules of thumb are used to suppress
unreliable estimates as for Employment and Pay.
The Census of Population provides an alternative source for the unemployment rate which has
much greater geographical detail, but this is only available for March 2011 (the Census date) and
so is increasingly out of data and irrelevant as an indicator of the current state of the labour
market. It is not therefore used in the LMI for All portal.
Vacancies (UKCES ESS and Monster/DWP)
General considerations
The number of vacancies is another key indicator for supporting individuals in making better
decisions about learning and work. They provide a measure of the number of jobs potentially
available to job-seekers. Historically, the Department for Work and Pensions (DWP) and its
predecessors have generated a set of information on vacancies notified to Jobcentre Plus by
occupation that would ideally form part of the database (this source is discussed in the next
section below). This series was discontinued and has been replaced by information on raw
vacancies generated by DWP/Monster. Unfortunately these data are not coded using SOC, so no
occupational data coded to the SOC10 are currently available from this source.
The ONS Vacancy Survey provides a count of the total number of vacancies in the UK economy.
It provides information by sector but not by occupation. In principle, it could be used to provide
some indication of the general state of the job market. However, given that the main focus of the
LMI for All database is on supporting individuals make better decisions about learning and work
it was recommended NOT to include this source but to wait for the Monster/DWP data to be made
available on a SOC2010 basis.
ESS data on vacancies
At present there is only one statistical source for vacancies that can be used in the LMI for All
database to provide information classified to SOC2010. This is the Employer Skills Survey (ESS),
carried out once every two years since 2001, and now managed by UKCES.
The detailed UKCES Employer Skills Survey (ESS) collects information on skill deficiencies,
including vacancies. It is a sample survey covering some 90,000 establishments. The information
is normally published up to the 2-digit level of SOC2010, but the survey company have made
more detailed information available at a 4-digit level.
The survey is intended to produce estimates of the total number of vacancies, hard-to-fill
vacancies and skill shortage vacancies in the UK from this large sample of establishments. This
is achieved by multiplying the results of a survey by a weight derived from the ratio of the number
21
of establishments in the survey to the total number of establishments in the UK. The dataset
includes the weighted and unweighted number of establishments upon which each value in the
dataset is based. Vacancy counts from the survey have been multiplied by the survey’s
employment weight in order to provide an estimate of the total number of vacancies of this type
in the UK or region. The most detailed geographical breakdown available is to regions in England
and the other nations of the UK: Wales; Scotland; and Northern Ireland. The time period covered
by the two most recent surveys is 2011 and 2013. The ESS has been conducted on a similar
basis roughly every two years. Results from the 2013 survey are the first ESS to cover the entire
UK and the first to use the SOC2010 classification.
The survey does not cover all vacancies at this level of detail. Information is collected for up to
six occupations per establishment. Unfortunately, the survey does not collect data on the numbers
employed in each occupation. Therefore, the indicators that are possible to generate are limited
to the number of vacancies, hard-to-fill and skill shortage vacancies, plus the percentage of total
vacancies, which are hard-to-fill and skill shortage within each occupation.
The dataset can be queried on the occupation or industry code, and returns a set of the vacancies
for this occupation, and how many of those vacancies are hard to fill or have skills shortages.
The Employer Skills Survey is a sample survey. Because it is based on a sample of around 1 in
20 employers, data from the ESS is subject to statistical uncertainty, which increases as the
number of observations on which an estimate of vacancy numbers is based decreases. Estimates
based on an unweighted cell count of less than 50 should not be reported. The API therefore only
returns vacancy estimates based on 50 or more observations. This means that data is not
available for many smaller occupations (the effect of which is greatest for 4-digit occupations).
Another limitation of this source for supporting individuals make better decisions about learning
and work is that it does not provide a picture of all jobs currently available – but a measure of the
number of vacancies employers had when the survey was conducted. The latest data relate to
2013. Nor is it comprehensive, focusing on up to six occupations in the sampled firms. However,
until an alternative source, such as the new series produced by DWP/Monster, can be linked in
to the database it provides the best indication of job availability. The ESS data complements the
official ONS count of vacancies by providing an indication of the matching of supply and demand
in particular occupations (showing occupations in which vacancies are hard to fill and subject to
skill shortages).
General Vacancies (Monster/DWP)
In principle, the data on vacancies collected by Monster on behalf of DWP provides a key dataset
for LMI for All. Detailed information on the number of jobs available classified by occupation is a
crucial element for supporting individuals make better decisions about learning and work. Such
information used to be available via DWP as Jobcentre Plus vacancies (see discussion in Annex
C.4).
22
The Monster contract with DWP includes a specification for LMI, which “needs to be displayed in
an intuitive and logical way so the general public can understand what is happening to the labour
market nationally, regionally and locally”. This includes use of SIC and SOC codes and
geography, though Universal Jobmatch (UJM) does not follow standard statistical definitions at
present. This lack of standardisation has been the subject of debate in the Labour Market
Statistics User Group. The lack of standardisation also applies to other dimensions such as
geography. Regional options in England that Universal Jobmatch offers to employers posting jobs
include ‘Anglia/Home Counties/Midlands/North West/London/South East & Southern/South
West/Tyne-Tees/Yorkshire’. These do not match statistical regions.
As noted above the data on vacancies collected by Monster on behalf of DWP replaced the former
series of vacancy by occupational information, which was based on vacancies notified to
Jobcentre Plus (a subset of unknown size of all vacancies in the economy). In practice, the data
currently available via the DWP/Monster website uses a system of classification based on job
titles that does not match any UK occupational standard. Without a mapping between the
categories used by DWP/Monster and SOC2010 4-digit categories used in the LMI for All
database, this information is therefore of limited value.
Consequently, the LMI for All Technical Team have implemented a “fuzzy matching” based on
reported job titles, which provides a feed of vacancy information from the DWP/Monster website.
This includes details of actual vacancies rather than any attempt to quantify the overall number
of vacancies or estimate a vacancy rate. The information has limited value, as it is not fully
integrated into the main database coded to SOC2010 4-digit occupational categories (although it
does allow the user to explore specific opportunities available in their local area). It is worth noting
that this is one of the most heavily used indicators within LMI for All, reflecting its perceived
importance to both developers and end-users.
A meeting took place with representatives from Monster in October 2013 to discuss the
requirements for vacancy data for LMI for All database. Problems with using Monster data were
identified and explored, including mapping job titles to UK SOC. Further exploratory meetings and
correspondence took place between Monster representatives and IER (namely Professor Peter
Elias and Professor Rob Wilson). This focused on the adoption of the IER’s CASCOT9 software
package as a possible solution to the mapping problem.
In 2014 IER undertook a separate feasibility study for Monster to assess if it was possible to
recode the Monster/DWP data using a version of CASCOT. This led to a follow-on project and
annual licencing arrangements. In principle, this should allow Monster to make data available
recoded to SOC2010. At present such data are not in the public domain nor included in the LMI
9CASCOT, Computer Aided System for Coding Occupational Titles, is a computer program designed to
make the coding of text information to standard classifications simpler, quicker and more reliable. The
software is capable of occupational coding and industrial coding to the UK standards developed by the UK
Office for National Statistics. For more information see: http://www.warwick.ac.uk/go/ier/software/cascot
considerable interest to labour market analysts. Annex C provides a comprehensive description
of the various data available from the Census, including the timetable for delivery of results
announced by ONS.
Many of these data are probably of more value to general labour market analysts than those
concerned specifically with supporting individuals make better decisions about learning and work.
Annex C sets out a long list of potentially interesting indicators including:
Labour market and employment data (employment, unemployment, economic activity);
Commuting and workplace data (distance travelled and mode of transport).
The key advantage of the Census is the provision of data for small geographical areas and the
information it provides on the distance workers have to travel to different types of job.
Its main disadvantage from the perspective of supporting individuals make better decisions about
learning and work is that it is not very timely (most results being published more than two years
after the Census is taken) and it refers to just a single point of time (27th March 2011). For further
Details see Annex C.
Three sets of variables derived from the 2011 Census of Population have been included in the
LMI for All database. These add some detail to the picture of local employment patterns although
the data are of course increasingly out of date. The focus here is on geographical patterns rather
than detailed occupational structure.
The three data sets are as follows:
Occupational breakdown of residents in employment: This data set presents the number of people
aged 16-74 living in the area and in work during the week before the Census date who were
working in each SOC2010 sub-major group. The data is provided for all 232,297 Output Areas in
the UK. The Output Areas are referred to by their Office for National Statistics codes and by two
types of geographical code: the 1 metre Ordnance Survey grid reference of the geographical
centroid of the Output Area and the latitude and longitude of this point. These geographical
references can be used to calculate the number of workers in a given occupation within a given
distance of a location.
Occupational breakdown of jobs in a location: This data set presents the number of people aged
16-74 working in one of the 53,579 workplace zones in England and Wales for each 3 digit
SOC2010 occupation in the week before the Census was taken. Workplace Zones are groupings
of Output Areas designed to preserve the confidentiality of employers. They are referred to by
their Office for National Statistics codes and by two types of geographical code: the 1 metre
Ordnance Survey grid reference of the geographical centroid of the Output Area and the latitude
and longitude of this point. These geographical references can be used to calculate the number
of jobs in a given occupation within a given distance of a location.
Mean distance travelled to work in a location: This data set presents the mean distance (in
kilometres) between home and work location for people in work within the week preceding the
25
Census date. Mean distances are calculated for persons aged 16 to 74, 16 to 24, 25 to 49 and
50 to 74 for all output areas. In England and Wales, mean distances are also calculated for men
and women aged 16-74 and for people aged 25 to 34 and 35 to 49. Data is provided for the
227,760 Output Areas in Great Britain. They are referred to by their Office for National Statistics
codes and by two types of geographical code: the 1 metre Ordnance Survey grid reference of the
geographical centroid of the Output Area and the latitude and longitude of this point.
First Destination of Graduates (HESA data)
HESA data provide a rich source of information on the pathway of individuals through Higher
Education and the first destinations of many graduates. In principle, this data set provides useful
information on the kinds of qualifications held by those entering different occupations by both the
subject/field of study the level of qualification held.
Data are collected in the HESA graduate destination survey, which contains SOC classification.
This allows mapping from courses studied to job destination. Currently much of this information
is only made available subject to a fee. Following detailed consultation and negotiation with the
data owners led by the UKCES detailed information has been made available for use in LMI for
All. The authors and UKCES acknowledge that these data are made available with the kind
permission of HESA.
The full set of HESA indicators now comprises:
Variable Description Details
ACYEAR Academic year 2011/2012 and 2012/2013
F_SOCDLHE2010 Standard occupational classification SOC2010, 4-digit Level
F_LEVEL Level of qualification obtained
(DOC - Doctorate, MAS - Masters, OPG - Other Postgraduate, FID - First degree, OUG - Other undergraduate)
F_QUALREQ Qualification required for job
(11 - Yes: the qualification was a formal requirement, 12 - Yes: while the qualification was not a formal requirement it did give me an advantage, 13 - No: the qualification was not required, 14 - Don't know, Unk - Unknown)
F_XJACS201NEW Subject of study (2012/13) Principal subject of study.
F_XJACS201OLD Subject of study (2011/12) Principal subject of study.
TOTAL Number of cases
(NB, this includes decimals since there is an apportionment of courses split between different areas).
In principle this data set helps to fill part of the gap between the course of study individuals
undertake and the jobs they end up in. Obviously, it only covers part of the picture. In particular
it is focussed just on those going through the higher education system. It is also restricted to the
jobs that higher education graduates go to soon after graduation (rather than their longer term
destinations). Nevertheless, it provides some useful information of interest to those wishing to
26
pursue particular careers or wanting to find out what particular course of study might best qualify
them for. The data can be used to consider what occupation graduates with particular
qualifications typically end up in. It can also be used to work backwards from an occupation to the
types and levels of qualification typically held by those starting out in such jobs.
2.3. Data development summary
The data development strand of Phase 2B has identified, through expert knowledge and
stakeholder consultation, the key information used by individuals in making decisions about
learning and work, as well as that used by others supporting those decisions and transitions. This
has included data and information on: employment rates and forecasts; qualifications;
replacement demands; unemployment rates; pay; hours worked; vacancies and vacancy
estimates; occupational descriptions; graduate destinations; geographic location of work; travel
to work areas; plus occupational skills. These data have been processed and offered as part of
the LMI for All service ensuring that issues of quality and disclosiveness have been addressed.
Whilst course data were identified as important in learning and work decisions, no viable set of
data are currently available due to issues of mapping to SOC, comprehensiveness and/or quality
of data. Similar issues with vacancy information have been identified, but could not be resolved
with the timeframe of the project. A range of data from other sources were also examined and
discounted for a number of reasons. UK wide data have been included disaggregated by region
and devolved nation.
Data from a number of sources (namely LFS, ASHE, BRES, Census, Working Futures and ESS)
have been prepared and made available through the purpose built web portal and data Application
programming interface (API) as part of the LMI for All service. The LMI data generally covers the
The development of strong partnership arrangements across a range of different categories of
stakeholders and partners was regarded as essential to the success of the LMI for All project.
The following principles, specified by the UK Commission for Employment and Skills, set the
framework for this activity:
User focussed: Engagement with developers at every stage of development was crucial
to make changes to ensure the data tool is user-friendly.
A ‘work with’ approach: Working with partners and stakeholders to identify where links
could be made to add value to existing products/projects, as well as maximising benefits
of the relationship to the project was crucial.
Focus on the overall objective: The overall objective for LMI for All was to create a data
tool that developers would use to create products to support individuals make better
decisions about learning and work. Initial work with partners should, therefore, not focus
solely on the development of the data tool, but also form the basis for raising demand and
publicising the completed data tool.
4.1. Testing the database API
Originally, the stakeholder engagement and communication for Phase 2B of the project was
designed to take place through two major sets of activities:
The first related to the testing the detail and technical aspects of the data tool with
developers to ensure that the database is accessible and useful. This was undertaken by
organising a third iteration of Hack and Modding days, which mirrored the processes
undertaken under for the same purpose in the Prototyping Phase and Phase 2A.
The second was to be through a series of events with stakeholders designed to increase
awareness of the data tool; gain feedback that can inform the final development of the
data tool; and explore the potential for other websites to draw on the data tool using the
API. A conference with not more than 100 participants was to be organised, presenting
work to date and focusing on how the usefulness of data and how it might be used by
stakeholder organisations. In addition, a series of three stakeholder engagement
workshops was planned, involving not more than 30 participants each. These would
present a more focused opportunities to gain feedback to inform the final development of
the data tool and explore the potential for linking to the data tool from other websites with
targeted stakeholder groups (for example, career practitioners and their managers).
Whilst the Hack and Modding Days were retained, the second part of the stakeholder engagement
plan was amended by the UK Commission. Instead of a large conference, a number of small-
scale events for target audiences of potential were delivered. Different categories of partners were
identified, as follows:
37
Potential users: The main intended audience for the data tool is developers of apps and
websites. Since developers are unlikely to use the data spontaneously unless there is a
clear way to profit from it, this group also includes people who commission careers
websites.
Wider interest: This group includes a wide range of different individuals and
organisations that have an interest in the project because LMI is central to their role.
Technical experts: Individuals and organisations that have knowledge and expertise
about data, website development and/or IT development that can help us to better
understand some of the technical issues and the wider agenda around open data.
4.2. Testing the database API
Hack and Modding Days were organised along similar lines to those organised for the pilot phase
for the project and Phase 2A. The general aims of a hack day are to: solve problems; test new
data; test and launch new APIs; come up with new ideas or apps; or to highlight issues and areas
of improvement. The modding day follows a hack day. Its aim is to take forward the developments
of the hack day and to produce a more useable and defined product.
The LMI for All ‘hack day’ in Phase 2B was organised for 23 June 2014. The objectives were to:
Test, further, the functionality of the LMI for All API;
Develop apps that used the LMI for All API to demonstrate the potential; and
Present the apps developed during the day to key stakeholders working in careers to get
feedback for suitability and relevance for practice.
The corresponding ‘modding day’ was held on 10 September 2014 with the aim of taking the
winning application from the hack day through a process of further technical development,
towards becoming a marketable product. To ensure the application was useful, feedback from
the hack day was used by the developers in a further iteration of the application. The overall aim
of the hack and modding days was to produce a marketable development-application. More
detailed information about the selection of developers, the stakeholders who participated in these
days and the applications produced can be found in Appendix E.
Overall, the feedback from the careers stakeholders on all the applications was positive with many
praising the developers for their innovative use and visualisation of the data in the LMI for All
database. The applications raised some issues around the need to ensure that they were targeted
as different information and data would appeal to those of different ages and stages of their
career. Concerns were raised about individuals understanding and being able to recognise their
skills in order to start career exploration through an application or web interface. A career narrative
element to applications was proposed, whereby a user can explore career pathways. The different
approaches were seen to add value at different stages of careers learning and transitions through
the labour market for the end user.
38
Feedback from the developers was also positive. Suggestions were around the development of
documenting the data and improvements to the LMI for All website.
4.3. Stakeholder engagement and communications
Engaging with the wider stakeholder community (defined as careers organisations, developers,
schools, further education colleges, higher education institutions, recruitment agencies and
jobsites) has been a key element of the project to ensure that the LMI for All data tool could be
used by developers, support the work of careers professionals and career organisations, and
users/customers/clients. The first element, as detailed above, has been the testing the LMI for All
data portal and API with developments. The second element of this project has been
dissemination and awareness raising activities with a broad range of stakeholders. This has also
been key to gaining feedback to inform the final development of the data tool and explore how it
can be used by careers organisations. The targeting of specific events to raise awareness of LMI
for All has provided focused opportunities to gain feedback for particular groups of users, as well
as the opportunity to explore the appetite to use the data and whether stakeholders see the value
in the data tool as well as the value in linking this to their own work. This was framed against an
assessment of client/customer LMI needs, such as the information needs of different groups, gaps
in information, influences on and the process of career decision making, and understanding of
LMI.
Various methods have been used to disseminate LMI for All to different stakeholder groups. Over
the past 15 months, 851 participants have attended these events to learn about this innovation.
Methods have included:
Presentations at conferences (e.g. CDI and IAEVG), n=4;
Invited presentations to targeted audiences (e.g. Universities UK), n=6;
Invited keynote presentations (e.g. National Symposium, Republic of Ireland), n=10;
Discussions (e.g. Education Services Australia, Association of Colleges; plotr), n=11;
Article in professional journal (Career Matters, CDI Professional Journal);
Hack and Modding Days (i.e. career stakeholders), n = 4.
Details of the stakeholder engagement and communications strategy, together with events and
numbers of participants are presented next.
39
4.3.1. Stakeholder dissemination and communication strategy
Objective What practical steps do we want them to take?
Contribution to KPIs Dissemination activity
Schools
Raise awareness among teachers of LMI for All as a source of intelligence to inform careers practice within schools
Want teachers involved in provision of careers support to access LMI for All via existing websites (iCould etc)
Increase in unique visitors to API
Widen base of end-users
Development of schools strategy development paper led by Sir John Holman. Discussion with Sir John Holman (27/08/14), stressing the importance of disseminating to schools. He followed up on 29/08/14, with an undertaking to ‘give some thought’ to the challenge of developing a school strategy paper.
Presentations to ASCL (15/06/15).
Special Schools and Academies Trust (SSAT): meeting on 28/11/14 introduced LMI for All –
SSAT workshop on 12/02/15 (n=50)
CEIAG Conference (David Andrews) – Keynote on 21/11/14 (n=45)
Inspiring Futures: Regional Directors’ Forum on 15/12/14 – keynote on LMI for All (n=38)
Dissemination and promotion of Careerometer and publicly available websites using LMI for All data
Education Services Australia – skype focused on LMI for All initiative. Discussions on-going.
Education and Employer Taskforce – seminar presentation on 28/11/14 (n=58)
Schools to promote LMI for All as resource for pupils and their parents
Schools to implement widget on their websites
Schools to refer pupils and parents to third-party websites and apps that make use of LMI for All
Schools to develop their own apps, on individual basis or as part of consortium
Increase in number of apps using API
Increase in unique visitors to API
40
Objective What practical steps do we want them to take?
Contribution to KPIs Dissemination activity
FE colleges
Colleges to use LMI for All data to inform curriculum strategy and development
Colleges to access existing websites including RCU data store and Skills Match (forthcoming product from Mime Consulting)
Increase in unique visitors to API
Presentations to Association of Employer and Learning Providers, Titan partnership Ltd. On 29/01/15 (n=16)
Association of Colleges, contact with Regional Representative for the West Midlands
Further Education Learning Technology Action Group (workshop and stand) 22/10/14 (n=48)
Dissemination to JISC and City and Guilds
Involvement of Gloucester FE College in the Hack and Modding Days
Colleges to offer LMI for All data to students to inform learning/careers decisions
Colleges to install widget on their websites
Colleges to refer students to third-party websites and apps that make use of LMI for All
Colleges to develop their own apps, either individually or as part of consortium
Increase in unique visitors to API
Increase in number of apps using API
Widen base of end-users
Universities, HE institutes
Colleges to offer LMI for All data to students to inform learning/careers decisions
Colleges to install widget on their websites
Colleges to develop their own apps, either individually or as part of consortium
Increase in number of apps using API
Increase in unique visitors to API
Widen base of end-users
Presentations at AGCAS annual conference (opening address), AGCAS Heads of Service conference on 06/01/15 (n=39)
Dissemination to Universities UK – presentation on 19/02/15 (n=17)
Early development at University of Warwick with Student Services (05/03/15)
Open University – meeting with Head of Careers Service on 19/02/15
Contact with Republic of Ireland, AHECS executive
41
Objective What practical steps do we want them to take?
Contribution to KPIs Dissemination activity
Recruitment agencies, jobsites
Jobsites to offer access to LMI for All data as an additional information resource to support customers in exploring careers options
Jobsites to install widget
Jobsites to develop their own dedicated apps using LMI for All
Increase in number of apps using API
Increase in unique visitors to API
Widen base of end-users
Presentation to The Recruitment & Employment Confederation research steering group on 12/12/13
Careers organisations
Raise awareness of LMI for All among careers professionals in order to encourage them to use data to inform their careers practise
Careers professionals to access websites that already offer LMI for All
Article published in ‘Career Matters’ October 2014
Presentations at international IAEVG (2013, n=25; 2014, n=27) conferences
DWP – on-going discussions about app development for both employer engagement teams and training and development work coaches
NCS West Midlands Education & Training Sectors 25/03/15 (n=30)
National Careers Guidance Show, 04/03/15 Opening presentation at Breakfast Reception (n=40)
Republic of Ireland, National Symposium 10/10/15 Keynote presentation (n=70)
CDI South East regional meeting (n=28)
Hack and Modding days (n=22)
Careers organisations to draw on LMI for All data as part of their wider IAG offer to clients
Careers organisations to develop their own dedicated apps using LMI for All
Careers organisations to install widget on their websites or link to existing resources (e.g. iCould)
Increase in number of apps using API
Increase in unique visitors to API
42
Objective What practical steps do we want them to take?
Contribution to KPIs Dissemination activity
Developers
Raise awareness of LMI for All among developers as a data resource that can be incorporated into their offerings to commercial customers
Use LMI for All as a source for their own app development
Review examples of existing apps on LMI for All website; re-use existing code as part of their own development work; promote potential of LMI for All to clients
Increase in number of apps using API
Increase in unique visitors to API
Widen base of end-users
Presentations x 2 and stand at Alt-C conference (1-3 September, 2014)
Hack days x 2
Modding days x 2
Plotr website – 2 meetings to explore potential
43
The team has also engaged with the media and social media to disseminate the LMI for All
web portal. An infographic was produced to illustrate the type of data that are available in the
database, which was used with the media by both IER and the University of Warwick. Twitter
(@WarwickIER and @CareersResearch) has been extensively used by the team to promote
activities and events, as well as report progress with the project. These have been retweeted
by UKCES who manage the LMIforAll twitter account. Engagement with social media has been
successful at promoting the project to a wider audience.
4.4. Future implications
The level of participation, and interest of participants attending dissemination events, has been
consistently high. This has been gratifying and demonstrates a real appetite for the product.
For instance, SSAT (The Schools Network) is a UK-based, independent educational
membership organisation working with primary, secondary, special, free schools, academies
and University Technical Colleges (UTCs). A session on LMI for All was delivered in workshop
format to SSAT membership on 12th February, 2015. SSAT organised and hosted the
workshop, circulating information about the content in advance and inviting expressions of
interest. Over 50 indications of interest were received. After a presentation about the web
portal and demonstrations of applications that could be developed, participants identified
priority target groups for the development of applications which could present customised
labour market information. These included: students from families experiencing
intergenerational unemployment; parents and carers; subject teachers; disengaged young
people (NEET: Not in Education, Employment or Training). The purposes of the applications
designed for these target groups would be to: inform, inspire, motivate and educate. Barriers
to integrating LMI for All in schools included: technology compatibility issues and language
(students for whom English is not the first language). Advantages of harnessing the potential
of the dataset were also identified. For example, access to high quality, reliable data about the
labour market and the potential for an application enhancing students’ e-portfolios.
A number of organisations have requested follow-up meetings, subsequent to initial
presentations, to explore the potential next steps within their organisation. Other organisations
and consortia have indicated an interest in, for example, implementing Careerometer,
exploring organisational requirements and capacity to use LMI for All, reviewing existing app
code and how it could be developed to meet organisational needs and, in one instance,
exploring whether organisational data could be added to the LMI for All database. There have
been progressed, where feasible. The success of the stakeholder and communications
strategy, does, however, emphasise the importance of retaining the momentum of this activity,
to ensure that the uptake of LMI for All is fully integrated in organisational practices.
4.5. Stakeholder engagement and communication summary
The LMI for All service was thoroughly and successfully tested through two separate iterations
during the phases 2A and 2B. Hack and modding days were organised during the two phases,
which enabled developers to explore and test the service. During these events a number of
apps, widgets and websites aimed at individuals making learning and work decisions were
design and developed. Careers stakeholders were able to judge the developments and inform
future iterations. Overall, the events proved that useful services could be developed using the
LMI for All data.
44
An extensive stakeholder and communications engagement strategy has been pursued
throughout, but with a particular emphasis during the final fifteen months of the pilot project,
to consult and raise awareness in the key target groups. These have comprised: the broad
community of careers and employment guidance practice; developers, technologists; further
education, higher education; and schools. A variety of methods were used, including: keynote
presentations at conferences; workshop presentations at conferences; exhibition stands;
article features in professional journals; discussions with stakeholder interest groups;
presentations to target audiences; and the use of social media. The UK Commission took the
lead on dissemination to the policy audience. A range of promotional materials were also
developed to support dissemination activities.
High levels of attendance at these events testify to the genuine interest in, and demand for
the LMI for All product. However, there is a real danger that the impetus gained through this
strand of work will be lost quickly, should the potential user community lose confidence in the
longevity of the data portal, not least because investment decisions have to be made regarding
the potential use of the dataset for particular operational contexts. The UK Commission for
Employment and Skills has made a commitment to continue to support the portal into the
longer term, though this commitment currently has no formality or visibility in the public
domain.
45
5. Future issues and potential resolutions
5.1. Enhancing the database: potential and additional data sources
5.1.1. General considerations
There are many other data sources that could be exploited to enhance and extend the LMI for
All database. These are considered in this section. The discussion is deliberately succinct,
with more detailed information provided in Annex C.
As with a number of the sources discussed in the previous section there are many technical
problems linked to the fact that these sources were not designed with the particular purpose
of providing data suitable for supporting individuals make better decisions about learning and
work.
In the longer-term, it would be better if the predicted estimates used for the three key indicators
in the database, employment, pay and hours, could be replaced by “raw” or “real” survey data,
which could be updated automatically as they are published. This raises two questions:
If and when it will ever be possible to replace at least some of the predicted/estimated
values used for some indicators by “real” survey values; and
Checks on the reliability and robustness of some of the more detailed
predictions/estimates.
In principle, it is possible to use “real’ survey values where these are statistically robust and
non-disclosive and to only use predicted values to fill in the many gaps. In practice, this would
pose many problems of consistency. There is no obvious methodology for merging “real” and
predicted values in a seamless fashion. This is likely to be a very demanding technical
exercise, which would require detailed consultation with ONS, with no guarantee of reaching
a successful and agreed outcome. This is probably too difficult and would raise too many new
problems to make it worthwhile pursuing. In general, the authors of this report are of the view
that we should use either:
Statistical/survey estimates (where reasonably reliable information is available and the
demand for detail is not that great); or
Econometric (or similar) estimates (where the survey estimates cannot provide the
level of detail required).
Not all data are classified in a manner suitable for inclusion in the database, (the use of
SOC2010 for classifying occupations is especially important). Steps need to be taken to
ensure better harmonisation. This is partly about lobbying data providers to move to a common
standard as soon as it is practicable (recognising that this has cost implications and may take
time). This requires work with data owners to encourage them to improve access to their data
via APIs, with the ultimate aim of increasing automation and providing a more dynamic
resource for data users, increasing commitment to open data principles, while recognising the
practical barriers.
A number of other sources might add information that could be of value to a broader audience
than those concerned with the support of individuals making better decisions about learning
46
and work. Once the database is fully established, thought should be given as to how it might
be developed and enhanced to meet the needs of groups such as those concerned with local
economic development and other users.
From the perspective of supporting individuals make better decisions about learning and work,
the two main areas that need to be enhanced in the short-medium term are:
Provision of more detailed data on vacancies, properly coded to SOC2010; and
Addition of more and better information on links between courses of study and job
outcomes i.e. understanding what types of course are relevant to particular
occupations and vice versa; and then providing access to information about specific
course opportunities.
From a more general perspective the database has potential for many other uses, including
local economic analysis and development. This calls for a much greater use of data sources
such as the Census of Population as well as other ONS data, some of which may be available
via NOMIS. Annex C provides a more detailed consideration.
The remainder of this section consider the main possibilities in a bit more detail.
5.1.2. Vacancy data
As noted above, in principle, the data on vacancies collected by Monster on behalf of DWP
provides a key dataset for LMI for All. However, there is a need for vacancy metrics classified
by SOC2010 in order to provide fuller integration with the rest of the LMI for All database.
It should be a priority to make these data available (or an equivalent dataset) for the LMI for
All database.
5.1.3. Course information
Course information is particularly important for learners, as it enables the identification of
learning opportunities that relevant to a chosen career path. However, two issues have been
encountered in trying to locate and include course information and data into LMI for All. First,
it is difficult to map occupation (defined by SOC) and subject classification and second,
collating course information and data and classifying it to the subject classification. Information
is variously available, sparse and provided in a range of formats. Consistency and quality are
a concern.
Data on courses and training available across the UK are not held in any one central database.
Discussions were held with various government departments and other relevant organisations
to negotiate access to the information repositories, which are accessed through various search
tools. From this it is evident that compiling a comprehensive list of further and higher education
training and courses is very complex, mainly due to the number and range of courses
available, as well as the variable quality of the data. Accessing the data is complex due to the
way it is recorded and coded, with different coding systems that have been developed and
47
evolved over time (i.e. JACS10, XCRI11). In order to include such detailed course data in the
LMI for All database, there would need to be comprehensive mapping of courses to
occupational codes.
Although a central database of course data is not available, various stakeholders compile and
use information from providers for various purposes. For university courses, these can be
found on the UCAS (University and Colleges Admissions Service) website. This covers the
whole of the UK. College-based provision is found on careers websites. Each of the four
constituent countries of the UK has a careers website and these sites have been investigated
for high quality course data.
In England, the Skills Funding Agency (SFA) maintains a Course Directory Provider
Portal, which comprises learning and course provision data. Current problems with the
quality of course data continues to be an issue, but it is improving. The provider portal
enables learning providers to view and update their course directory information. For
learners, the Course Directory can be accessed on the National Careers Service
website at
https://nationalcareersservice.direct.gov.uk/advice/courses/Pages/default.aspx. The
SFA disclosed that there have been problems with the quality of course data collected
in the past, but this has greatly improved. Discussions have also progressed with the
Student Information Services Limited, a charity that runs the ‘best course for me’
website (http://www.bestcourse4me.com). This website provides information on
university courses and possible career paths. Mapping of course codes to SOC have
been undertaken and a range of APIs are available. During the discussions, the
complex nature of coding and mapping was highlighted.
In Scotland, information about learning opportunities and careers in Scotland is
collected and collated from a specialist service called Gateway Shared Services
(http://www.ceg.org.uk). This organisation collects and collates information about
learning opportunities and careers throughout Scotland to produce a range of online
services. It covers both further and higher education data, which are updated on an
annual basis. This information is currently available through a range of online services
(such as MappIT, MerIT, PlanIT Plus, WorkIT) and reference books. Course data are
not freely available.
In Wales, the Welsh Government and Careers Wales collect and update course
information and vacancy data for Wales. Agreement was secured, in principle, that
access to these data could be provided through an API, but has yet to be followed up.
In Northern Ireland, there is no central database of course information. The Northern
Ireland Course Directory (also known as NI Learning Opportunities Database) was
developed and maintained by DCA Data Solutions, but is no longer available and there
are no plans to update or maintain this directory. NI Careers recently confirmed that
their advisers and clients currently access information about learning provision through
10 JACS (Joint Academic Coding of Subjects) is the subject classification system used to describe the
subject content of courses at UK Higher Education institutions. JACS3 is used from 2012/13. 11 eXchanging Course Related Information, or XCRI, is the UK standard for describing course
In the medium to longer term, it is likely that course data will need to be carefully mapped and
expectations managed as data will not be automatically updated on an annual basis. Manual
input will be required unless it is possible to access external APIs to dynamically update data.
Overall, accessing course data will be complex due to the way it is recorded and coded, with
different coding systems that have been developed and evolved over time (i.e. JACS12 ,
XCRI13). A basis for mapping higher education course subjects (JACS) to occupation using
the HESA data is available. However, there is gap in mapping courses to occupations in
further education sector. Ideally, a common classification would be preferable particularly as
JACS does not take account of subjects that are relevant to lower skilled occupations.In order
to include course data in the LMI for All database, a comprehensive mapping of courses to
occupational codes would be needed and commissioning such work should be considered.
To explore possibilities to map XCRI and JACS, and JACS and SOC. Discussions are
underway with those who led on the Salami project in Nottingham as they have been
developing a method of coding SOC to JACS through a thesaurus, which they may be willing
to share. This mapping would enable data from the XCRI API feed of higher and further
education courses to be included in the API. However, this would not be a complete directory
of further education courses.
It will be necessary to follow up discussions with Student Information Services Limited to
explore the API.
It seems unlikely that the disparate national sources of course data can be pulled together to
create a complete dataset.
5.1.4. Census of Population data
The Census of Population provides very geographically detailed information on the location of
employment and the characteristics of workers in 2011. The LMI4All database includes a
number of variables from the Census, which can be used as the basis of indicators, which
detail the spatial pattern of labour demand and the geographical distribution of workers.
Future developments that might enhance the database could be focused more on a local
economic development perspective rather than the careers support angle.
From a local economic development perspective, the main value of the Census data is to
provide a detailed geographical breakdown of the availability of workers of different skill levels.
The sort of variables which could be derived include:
Number of workers at a given skill level (defined in terms of SOC major groups) within
certain distance bands of a location of interest;
12 JACS (Joint Academic Coding of Subjects) is the subject classification system used to describe the
subject content of courses at UK Higher Education institutions. JACS3 is used from 2012/13. 13 eXchanging Course Related Information, or XCRI, is the UK standard for describing course
information developed for higher and further education.
50
The percentage of workers at various skill levels within a locality being considered for
industrial development;
Identification of areas in which employment of particular occupations is concentrated.
Further details of what is available can be found in Annex C.
5.1.5. European data – the Cedefop database and EU Skills Panorama
Cedefop projections and related data
Over the past 10 years, IER, in collaboration with others, have developed an historical
employment database and projections at a pan European level on behalf of Cedefop. This
replicates many of the same features of the Working Futures employment database. In
principle, the data can be used to generate employment information, including replacement
demands, for each of the 27 EU Member States plus a few additional countries such as
Norway and Switzerland.
In practice, there are a few issues:
The data are currently classified using ISCO 88, which is not directly comparable with
SOC2010 – however, a broad brush mapping can be derived (see below).
The new data to be published in 2014/15 will use ISCO08. This is broadly compatible
with SOC2010. IER and ONS have been working on developing mappings.
The current Cedefop projections are primarily focused on the 2-digit level.
Development of information at a more detailed level is being explored, but data
limitations are problematic. Information at a 4-digit level is unlikely to be available in
the foreseeable future.
On balance, it would be useful to add such information to the database in order to provide a
broad perspective on job opportunities across Europe but it would not be a top priority for LMI
for All, given the lack of occupational detail and the difficulties in making a simple mapping of
occupational categories.
The European data is also being expanded by Cedefop to populate the EU Skills Panorama.
The latter is a new website/portal aiming to provide a comprehensive one stop shop for LMI
at a pan-European level. This is still under development by Cedefop. The current version can
be found at: http://euskillspanorama.cedefop.europa.eu/
Other European sources
A range of other European sources has also been considered for inclusion in LMI for All.
These include the European LFS as well as other regular European surveys (such as the
Eurobarometer surveys, the European Values Survey, European Social Survey and the
European Working Conditions Survey). These can also provide useful contextual information
on issues such as attitudes towards labour migrants in different countries, working conditions,
etc. They are briefly summarised and discussed in Annex C.
In practice, although they all contain some interesting and useful data they are generally not
suitable for inclusion in the LMI for All database because the sample sizes are inadequate to
provide reliable data at a detailed and consistent level by occupation. The information they
provide is also generally not particularly relevant for careers guidance and advice. They would
have more value if the database were to be extended to cover the needs of other users such
as more general labour market analysts.
5.1.6. Stakeholder impact and future viability
A high level of interest has been generated in the product through different dissemination
activities, delivered as part of the stakeholder and communications strand (see section 4.3,
above). Stakeholder activities were designed to target representative bodies to ensure the
effective use of resources and target large numbers of stakeholders. Technical skills and
resourcing have been particular issues arising from the dissemination activities. Developing
innovative ideas on what is useful and could be developed from the LMI for All service has
been unproblematic for stakeholders. The LMI for All service is seen as having the potential
to make significant impact in helping individuals make learning and career decisions.
A frequently asked question at events has related to the future prospects for the data portal.
Investment decisions in app development by different stakeholder organisations clearly hinge,
in many cases, on evidence that the future of the portal is secure. It was not possible to convey
this level of assurance during the final months of phase 2B. At the time, there was concern
that the momentum gained through the intensive stakeholder activities was at risk of being
lost due to the project’s uncertainty. In the final part of phase 2B, the UKCES commissioners
approved the continuation of the LMI for All service for an indefinite period. Throughout the
project, a reoccurring question has been the uncertainty about future viability, which could not
be determined or communicated until the later stages of the project. Organisations were,
understandably, hesitant about using resources to develop an app containing data that may
not have been updated.
5.2. Future implications for costing
5.2.1. General considerations
As discussed in the previous sections there are many technical problems linked to the fact
that these sources were not designed with the particular purpose of providing data suitable for
supporting individuals make better decisions about learning and work. For this reason,
contingency planning for time required to deal with issues related to each dataset have been
included in estimates below, based on experience to date. Additionally, the costs of
maintaining and updating the database from a data perspective are therefore much more
significant than if it were possible simply to tap into a relevant API for each of the main data
sources involved.
The LMI for All project has demonstrated that adequate data are available to populate a rich
database. However, this will require regular processing to keep the database up to date. Steps
will need to be taken to maintain this process. This will involve developing a smooth workflow
around processing the various core datasets (making the sources and procedures as efficient
and transparent as possible so that updating the database is automated as much as it can
be).
As noted in the previous section, many sources considered are based on samples too small
to provide useful information at the level of detail desired. Increases in sample sizes could
52
help to make the data more useful. However, this would imply very significant costs and such
developments are unlikely to happen quickly. In the meantime it is important to make the most
of what is currently available.
5.2.2. Employment
There is a need to update Working Futures:
Employment (historical time series 2000-12);
Projected employment (2012-22);
Future job openings (replacement needs).
Because the data available directly from the official sources are not sufficiently detailed to
provide data for 4-digit occupations cross-classified by other dimensions of interest, it is
necessary to generate estimates using econometric and other methods. This has been
characterised as the Working Futures employment database.
Updating of the employment estimates therefore requires that the full Working Futures
database is updated. This is a major project that has typically been let by competitive tender
once every 3 years or so. The budget required depends on the precise specification set out in
the tender, but is likely to be well into 6 figures (i.e. £100-200K).
This excludes any time required by the Data Team to process the Working Futures data and
by the Technical Team to upload the processed data to the LMI for All portal. Assuming the
specification for any update to Working Futures builds in a requirement to produce data
compatible with LMI for All, this should be quite modest.
Based on the experience in LMI for All Phase 2. This is expected to involve around:
1 day of senior research time (SRT) to manage and supervise the process;
2 days of research support time (RST) to process the results and upload them for the
Technical Team;
4 days of Technical Team time (TTT) to upload the new data for testing;
3 days of Technical Team time (TTT) to move data from testing to production;
2 days of Technical Team time (TTT) to adapt the API, as necessary, for the dataset;
3 days of Technical Team time (TTT) for contingencies – responding to unanticipated
challenges with the data.
5.2.3. Pay and Hours
Need to update econometric and related analysis of LFS and ASHE data, which realistically
could be undertaken on an annual basis:
Mean Weekly Pay;
Medians and deciles;
Estimates by age;
53
Annual changes in pay;
Weekly Hours.
Because the data available directly from the LFS and ASHE are not sufficiently detailed to
provide data for 4-digit occupations cross-classified by other dimensions of interest, it is
necessary to generate estimates using econometric and other related analysis.
Based on the experience in LMI for All Phase 2. This is expected to involve around:
3 days of senior research time (SRT) to manage and supervise the process;
20 days of research time (RT) to manage and supervise the process and to conduct
the relevant econometric analysis, (some of which needs to be carried out in the
Secure Data System run by ONS);
20 days of research support time (RST) to process the results including updating the
RAS processes and generate the new estimates;
4 days of Technical Team time (TTT) to upload the new data for testing;
3 days of Technical Team time (TTT) to move data from testing to production;
2 days of Technical Team time (TTT) to adapt the API, as necessary, for the dataset;
3 days of Technical Team time (TTT) for contingencies – responding to unanticipated
challenges with the data.
5.2.4. Occupational descriptions and skills
ONS descriptions:
Nothing to be done until SOC is revised (No date for this process has been published
by the ONS, but it is expected to be 2020).
O*NET Skills required (based on US O*NET skills information):
Redo any mapping to US occupations
Identify collate and make available relevant data files on skills
Based on the experience in LMI for All Phase 2. This is likely to involves around:
2 days of senior research time (SRT) to manage a supervise the process;
20 days of research support time (RST) to update any mapping using CASCOT and
to process the results and upload them for the Technical Team;
4 days of Technical Team time (TTT) to upload the new data for testing;
3 days of Technical Team time (TTT) to move data from testing to production;
2 days of Technical Team time (TTT) to adapt the API, as necessary, for the dataset;
3 days of Technical Team time (TTT) for contingencies – responding to unanticipated
challenges with the data.
54
5.2.5. Unemployment and Vacancies
LFS analysis of unemployment rates
This involves interrogating the LFS and extracting the relevant data on Unemployment
rates.
Based on the experience in LMI for All Phase 2. This is estimated to involve around:
1 day of senior research time (SRT) to manage and supervise the process;
2 days of research support time (RST) to process the results and update the Wiki;
4 days of Technical Team time (TTT) to upload the new data for testing;
3 days of Technical Team time (TTT) to move data from testing to production;
2 days of Technical Team time (TTT) to adapt the API, as necessary, for the dataset;
3 days of Technical Team time (TTT) for contingencies – responding to unanticipated
challenges with the data.
ESS Vacancies
This involves obtaining the relevant vacancies data from the survey company
responsible for conducting ESS and processing the data for use in the database.
Based on the experience in LMI for All Phase 2, this is estimated to involve around:
3 days of research time to manage and supervise the process;
1 days of research support time to process the results and update the Wiki.
4 days of Technical Team time (TTT) to upload the new data for testing;
3 days of Technical Team time (TTT) to move data from testing to production;
2 days of Technical Team time (TTT) to adapt the API, as necessary, for the dataset;
3 days of Technical Team time (TTT) for contingencies – responding to unanticipated
challenges with the data.
DWP vacancy data (classified by standard occupations and regions)
Assuming DWP/Monster make the data available via an API this should be a relatively
straightforward task. However as noted this does not seem likely without a major new
investment by DWP or some other organisation. Unless this happens this will not progress.
Currently the Technical team have implemented a temporary solution based on the data
Monster have made available and fuzzy matching.
Based on the experience in LMI for All Phase 2 for other similar data sets, this is estimated to
involve around:
2 days of research time to manage and supervise the process;
2 days of research support time to process the results and update the Wiki;
4 days of Technical Team time (TTT) to upload the new data for testing;
3 days of Technical Team time (TTT) to move data from testing to production;
55
2 days of Technical Team time (TTT) to adapt the API, as necessary, for the dataset;
3 days of Technical Team time (TTT) for contingencies – responding to unanticipated
challenges with the data.
5.2.6. Other indicators
Census data
There is nothing to be updated unless new indicators are included. There are some possible
new indicators to be added (see Annex C). If these are to be added then the marginal costs
will depend on precisely what is included.
Assuming a single indicator, based on the experience in LMI for All Phase 2, this is estimated
to involve around:
5 days of research time to manage and supervise the process;
1 day of research support time to process the results and update the Wiki;
4 days of Technical Team time (TTT) to upload the new data for testing;
3 days of Technical Team time (TTT) to move data from testing to production;
2 days of Technical Team time (TTT) to adapt the API, as necessary, for the dataset;
3 days of Technical Team time (TTT) for contingencies – responding to unanticipated
challenges with the data.
HESA course data
First destinations of graduates – the main tasks are:
Obtain updated information from HESA;
Update documentation;
Add to database.
Based on the experience in LMI for All Phase 2, this is likely to involve around:
1 day of research time to manage and supervise the process;
2 days of research support time to process the results and update the Wiki;
4 days of Technical Team time (TTT) to upload the new data for testing;
3 days of Technical Team time (TTT) to move data from testing to production;
2 days of Technical Team time (TTT) to adapt the API, as necessary, for the dataset;
3 days of Technical Team time (TTT) for contingencies – responding to unanticipated
challenges with the data.
5.2.7. Technical improvements indicated
Because of the complexity of data, prebuilt packages for loading could not be used for most
datasets. Almost every data update in the final iteration of the project had their format changed
56
or were completely new. Consequently, it will be necessary to build and deploy permanent
SSIS packages, with the following technical improvements required:
Database: clean-up and optimization is necessary, with maybe slight architecture
changes (depending on patterns of usage emerging).
Cubes: whilst SSAS is up and running, dimensions/attributes were selected in the
absence of usage data. Changes will have to be applied as patterns become clearer.
For an auto import solution to be implemented, a format checking app would be needed, to
get the file directly to db. Mandatory option could be: update or a complete reload. For this,
some kind of an option for syncing DEVDATA would be necessary, after checks with
PRODATA.
One issue that has emerged in discussions with employment advisers is the desire for more
local and geocoded data. It is possible to 'mash' data from LMI for All with NOMIS local data.
However, there remains the problem that as data becomes more local, the sample size
becomes smaller and, thus, the analysis we can offer becomes less fine-grained in terms of
occupational and other categorisations. One potential answer may be 'crowdsourcing' to all
employment and careers advisers and other end users to add local 'intelligence' to the LMI
provided at national and regional levels. There is nothing to stop application developers adding
these features themselves. But there may be benefit in including such intelligence within the
national database for scaling purposes. A further possibility would be to develop scrapers to
collect data from for example local newspaper websites to add to the 'official LMI. These
options would require a significant level of development and resourcing whilst highlighting
issues of data quality and up-to-datedness.
5.2.8. Stakeholder dissemination and communications
A number of activities have been initiated during Phase 2B of the project that require follow
up. The nature of the follow up would need to be discussed and negotiated with the UK
Commission. However, stakeholder groups in the schools, higher education and careers
guidance community have demonstrated real commitment to taking this initiative forward in
their own organisational contexts. It is clear that lessons need to be learned regarding the
level of support necessary to enable these stakeholders to grasp the necessary steps needed
to embed practice that integrates the full potential of LMI for All.
57
Table 5.2 Summary of updating Data Costs14
Data source Indicators in LMI for All database Variables Updating costs (resources required,days of SRT, RT, RST & TTT)
SRT RT RST TTT
Working Futures (combination of LFS and BRES)
Total number of jobs by detailed type (historical time series)
Where possible all data available at SOC2010 4-digit occupations. Also covers: Industry; region; gender; employment status; and highest qualification held.
Census of Population 2011 Location of jobs, workers by occupation, jobs by industry, travel-to-work distances (per new indicator)
5 0 1 16
HESA Graduates first destinations 2 0 5 16
Total All indicators 24 35 76 128
14 Excludes update of Working Futures database – (£100-200K) and development of a replacement to DQP JCP vacancy series
58
Annex A: Core data sources included in LMI for All
A.1 Introduction
LMI for All aims to provide detailed data on a range of ley labour market indicators to those
interested in careers prospects and progression (Bimrose, 2012). These include Employment,
Pay and Hours, plus a range of other labour market information.
The original design was to access various official datasets directly. However, concerns about
breaching confidentiality and releasing disclosive data into the public domain severely limit the
level of detail that can be published. Therefore, an alternative approach has been proposed
for a number of the core indicators. This uses the official data to generate the detailed
information required, but does not release the original survey data into the public domain
(Bimrose and Wilson, 2013). Further, more technical, details are contained in Li and Wilson
(2015).
The remainder of this Annex is structured as follows:
The remainder of this section sets out the rationale for the general approach and
describes the information placed into the public domain.
Section A.2 summarises the case for making detailed data on Employment, pay and
Hours available as part of the LMI for All database.
Sections A.3 and A.4 then set out in general terms how this has been accomplished,
while at the same time ensuring this is non-disclosive (and not in breach of
confidentiality restrictions recommended by ONS). Section A.3 deals with Pay and
weekly Hours worked and Section A.4 with employment.
Section A.5 goes on to discuss some longer-term issues, including how official survey
estimates might be improved to replace the predicted figures for the key indicators
(employment, Pay and Hours).
Section A.6 describes the Checking Algorithm used to avoid publishing unreliable
estimates of Pay and Hours.
Section A.7 provides technical details of the regression analysis undertaken for pay
predictions.
Section A.8 provides technical details of the algorithms used to ensure that the
predicted estimates for employment, Pay and Hours are consistent with the official
published data.
Section A.9 concludes by providing a summary of the main data on employment, pay
and hours provided in the LMI for All database.
59
A.2 The case for detailed data in the LMI for All database
The LMI for All database requires detailed data if it is to be useful for careers guidance and
advice. Individuals and their advisers have a personal and professional interest in knowing
which jobs are available, distinguishing sector, occupation and typical qualifications required,
as well the typical pay associated with those jobs.
Ideally, the full set of detail required is as follows:
Occupation (up to the 4-digit level of SOC2010, 369 Occupational categories);15
Sector (up to the 2-digit level of SIC2007, about 80 industry categories);
Geographical area (12 English regions and constituent countries of the UK);16,17
Gender and employment status (full-time, part-time employees and self-employed).
The main official data sources for such data are:
the Business Register and Employment Survey (BRES);
the Labour Force Survey (LFS); and
the Annual Survey of Hours and Earnings (ASHE).
These sources collect data on individual organisations and individual people, but such detail
cannot be published because of concerns about disclosure and confidentiality.
It is important to emphasise that the specific individual observations on Pay or employment
from these official surveys are not necessarily required. What is needed is general information
on ‘typical’ pay or general employment opportunities in particular areas for people with
selected characteristics. The official data are a means to this end rather than being required
for their own sake.
The level of detail required in the LMI for All database can be obtained by replacing the official
‘raw’ data by estimates or predictions.
For pay – these are based on an earnings function approach;
For employment – the Working Futures employment database has been used.
Estimates of Pay and Employment (by the detailed categories as described above), and based
on these methods, form the core of the LMI for All database.
15 Some have argued for an even more detailed breakdown to the 5-digit level of SOC, but this is not
feasible given data currently available. 16 Plus for some purposes additional information on: Age; Gender; Status; and Qualification (highest
held). 17 It should be noted that to enhance usability for careers professionals there would be merit in
presenting sub-regional data where possible.
60
A.3 Providing detail without being disclosive – Pay and Hours worked
Pay: In the case of Pay, an earnings function can be estimated using the original detailed
individual data under secure conditions.18 Such a function can then be used to generate
estimates of pay (including confidence intervals) that are not disclosive.
A typical earnings function takes the form:
Ln (E) = a +b*A + c*A2 +D*X +u
Where:
Ln (E) is the log of earnings or pay;
A is age;
X is a vector of other explanatory variables which will include (inter alia) all the key
dimensions as set out in Annex A.2;
D is a vector of parameters associated with the vector X;
a, b and c are also parameters to be estimated;
u is the standard regression error term.
X includes:
Gender (default is Male (0), a 1 indicates Female);
Region (default is London, 11 other 0/1 dummies one for each other region);
Sector (default is currently Agriculture, plus 78 other 2-digit SIC2007 categories as
used in Working Futures)19;
Occupation (default is Chief executives and senior officials, plus 368 other 4-digit
SOC2010 categories);
Qualification (default is a degree or equivalent and 5 other qualification categories20
(highest held)).
Using the estimated parameters, point estimates of the typical pay of individuals in a range of
different situations and with a range of different characteristics can be generated. In principle,
these estimates can be extended to include other indicators (such as the median or quartiles).
During Phase 2A, the focus was on mean pay only. In Phase 2B this was extended to provide
median and decile estimates based on an assumption of the distribution of pay being log-
normally distributed
The parameters have been estimated using the full and most detailed sets of raw individual
data in ASHE or the LFS available (under the secure conditions imposed by the ONS Secure
18 Other estimation methods than a standard earnings function might also be used. These might have
some advantages, but for the present a simple standard earnings function is proposed. 19 The regression using LFS data currently adopts the full set of SIC2007 2-digit categories, but it is
proposed to replace those by the Working Futures 79 industry categories in the final version. 20 Including ‘none’ and ‘don’t know’.
61
Data Service (SDS)). These parameters are then used to generate the estimates for the
careers database. Table A.1 shows some typical regression results based on the publically
available LFS dataset.
Note that data on pay could also be potentially disclosive if it were to identify a particular
employer. It is necessary to treat pay as for employment in terms of addressing queries to the
database, so that potentially disclosive information is not placed into the public domain.
Effectively this requires some censoring (as described in Section A.4 below).
Some ‘common sense’ rules are imposed in dealing with queries to the database so that
nothing unreliable is revealed. These rules are based on general ONS guidelines for dealing
with LFS data (e.g. anything involving fewer than 10,000 observations (grossed up) will be
flagged up as potentially unreliable. Anything involving fewer than 1,000 observations
(grossed up) will result in a query defaulting to a higher level of aggregation and return a ‘not
available’ message. This avoids generating estimates of pay where there are tiny (or even
zero) numbers of people involved.
ONS were requested to confirm that the process described is in line with current rules
regarding access to ASHE and LFS data via the SDS. This confirmation was achieved
implicitly by the process of formal application to use the ASHE and LFS data via the SDS, and
the checks imposed on the extraction of the relevant parameters from the SDS.
Hours: Information on weekly hours worked is also required. This has been obtained from
ASHE. There is no obvious analogous approach that can be adopted using a simple earnings
function type, as described above for pay. Due to technical problems of simultaneity, as well
as the need to include external variables relating to economic cycle, etc., estimating an hours
equation is not a straightforward option.
Nevertheless, this possibility has been explored using LFS data. If the focus was on predicting
hours worked at an individual level, these issues would pose more serious concerns, but given
that the focus is on average hours for broad groups it is less of a concern. A regression with
hours of working being the dependent variable, and including all the other dimensions and
interactive terms as independent variables as for the earnings equation other than age seems
to deliver reasonable results. In any event, variations in hours worked are much less significant
than those for pay across occupations. Therefore, the focus is on providing broad-brush
indicators across occupations and other key dimensions. In the current version of the
database information on hours is not derived from an equation but is extrapolated from
published ASHE data.
Indicators of part-time working can also be based in part on the Working Futures employment
database described in Annex A.4. This provides, for example, information on the percentage
of jobs that are part time.
62
Table A.1 Typical Earning Function Results
Variable Coefficient
Age (continuous variable) 0.06
Age squared -0.001
(default =male)
female -0.10
(default =London)
North East -0.08
North West -0.10
Yorkshire & Humberside -0.17
East Midlands -0.16
West Midlands -0.16
Eastern -0.19
South East -0.18
South West -0.20
Wales -0.21
Scotland -0.14
Northern Ireland -0.21
(default =degree or equivalent)
Higher education -0.11
GCE A Level or equivalent -0.18
GCSE grades A-C or equivalent -0.24
Other qualifications -0.28
No qualification -0.33
(default=Agriculture, etc.)
02 Coal, oil & gas 0.62
03 Other mining and quarrying 0.13
04 Mining support 0.08
05 Food products 0.01
06 Beverages & tobacco -0.03
07 Textiles 0.06
08-75……………….etc, etc *
(default =Chief executives and senior officials)
1120 ‘Elected officers and representatives’ -1.08
1121 ‘Production managers and directors in manufacturing’ -0.67
1122 ‘Production managers and directors in construction’ -0.43
1123 ‘Production managers and directors in mining and energy -0.10
constant 5.86 Notes: LFS 2013 full-time regression results. The highlighted rows are missing from the table.
63
A.4 Providing detail without being disclosive – Employment
A.4.1 Data sources and the problems of disclosure and confidentiality
There are two main official data sources for time series information on employment. These
are the Business Register and Employment Survey (BRES) and the Labour Force Survey
(LFS). Together with some other data they can be combined to provide a very detailed picture
of employment patterns.
The BRES dataset is based on a survey of employers. It provides detailed information on
employment (employees only) by detailed sector (up to 5 digits) and by detailed geographical
location (down to Local Authority Districts). The key issue is whether or not the data are
disclosive (i.e. can individual companies/units be identified).
In fact the BRES data are collected for workplaces or establishments (units) rather than
companies or enterprises. Nevertheless, the potential for identifying the information as
pertaining to a particular company is obvious. For some sectors where there are only one or
two companies operating, this may be a problem even at a UK level (for example there is only
one manufacturer of Nuclear Submarines). Therefore, if the sectoral level of disaggregation
is detailed enough such a company will inevitably be identifiable. If sector is cross classified
by geographical area, there are many more companies that can potentially be identified (for
example, there is only one company that produces cars in Derbyshire).
The LFS dataset is a survey of households and individuals. It provides information on
occupation and qualification as well as industry and region. In principle, it can be used to
identify individual respondents. Given enough dimensions (age, gender, location of
employment, sector, occupation, qualifications, etc.) it is possible (in principle) to identify the
individual that has responded to the survey. Revealing this information, and any associated
survey data, would breach confidentiality.
Providing detailed estimates for employment analogous to those described for pay is much
more complex. There is no simple analogy to the earnings equation which can be used to
produce econometric estimates of employment as an alternative to publishing the raw survey
estimates. However, there is an alternative set of very detailed employment estimates
available that has been developed by IER on behalf of UKCES. It covers all the main
dimensions needed (although currently only up to the 2-digit level of SOC). It is constructed
using various official datasets, available either in the public domain or through NOMIS (subject
to a Chancellor of the Exchequer’s Notice (CEN)). This is the Working Futures database.
The sectoral aspect (which at its heart is based on BRES data) is potentially problematic
because of concerns about disclosure. Although the data in the Working Futures database
are not the raw BRES numbers, 21 for some sectors there may be only a handful of
organisations involved, especially at a sub-UK level, so potentially these cases could be
identified from the Working Futures data. The key question is how to deal with this problem
(of not being disclosive) while providing as much detail as possible?
21 In practice, the Working Futures database does not use the BRES data as such, but makes use of
the various sectoral employment time series ONS publish based on BRES and made available via
NOMIS under the terms of a CEN.
64
A.4.2 The Working Futures database
The numbers within the Working Futures database are estimates, just as the pay figures from
an earnings function are. 22 The Working Futures database is the result of a complex
combination of datasets, models and assumptions (including various iterative procedures).23,24
The Working Futures database does not include any of the original raw survey data upon
which it is based. Given all the adjustments, assumptions, and amendments made to the data,
the final Working Futures estimates of employment numbers are far removed from the original
source data (BRES and LFS).25
Where sector is not involved, there is no danger of disclosure since identification of a company
or unit depends on sector. However, sector is an important aspect from a careers guidance
perspective, so it is not possible to simply remove it from the LMI for All database.
A.4.3 BRES information on number of establishments/units
ONS publish information that can be used to assess the sample size (number of units) on
which the Working Futures employment dataset is based. This enables the risk of disclosure
to be assessed. The data source for this information is the Inter Departmental Business
Register (IDBR), which is the sampling frame for the BRES and ABI surveys (which in turn
underlie the Working Futures employment estimates).
Analysis of these data suggest that only a handful of the industries in the Working Futures
database are problematic. If the smaller industries are further aggregated to make just 75
industries rather than the 79 in the original Working Futures database, then no case (industry
by region cell) would have fewer than 10 units. It has been agreed with ONS that such data
is, therefore, not disclosive. The aggregation of those few industries into the 75 slightly broader
categories mean that NONE of the Working Futures data is regarded by ONS as disclosive.
Regarding confidentiality, since the Working Futures estimates are based on publically
available data, there is no danger of the data breaching confidentiality from a LFS perspective.
22 Effectively the generation of the Working Futures database can be regarded as equivalent to
estimating the probability of employment in a certain category defined by: industry (75 categories);
occupation (25 2-digit SOC categories, extended to the 369 4-digit categories); gender; status (3
groups full-time, part-time employees and self-employment); ‘region’ (12 countries and English Regions
within the UK); and qualifications. These probabilities sum to 1 when added up across all these
dimensions. Applied to an estimate of total UK employment they generate an employment estimate
analogous to the pay estimates from the earning function. 23 For full details of how the Working Futures database is constructed see Wilson and Homenidou
(2012b). 24 The main iterative procedure used is called RAS. This is a well-established technique for generating
a matrix A which is consistent with target row and column totals (R and S respectively). Assuming
consistent totals, the process involves summing the matrix across rows and columns in turn, comparing
the totals with the targets, and then scaling to meet the targets. Typically, a solution is reached in just
a few iterations. This simple two dimensional technique can be extended to cover multiple dimensions. 25 BRES data are used by ONS to produce their published employment figures. The latter are used to
constrain the Working Futures estimates. ONS revised their published estimates in the light of other
information, so that figures used may gradually diverge from the original BRES estimates as official
data are revised.
65
The data on individuals are not used directly. There are so many adjustments and process
involved that none of the original data are in fact released into the public domain.
ONS were requested to confirm these interpretations:
That employment estimates by aggregated sectoral categories by region (by
combining them with other categories) would NOT be disclosive; and
Combining this information with data from the publically available LFS dataset in order
to generate breaks by occupation and qualification will not breach rules regarding
confidentiality.
A.4.5 Case for ONS to place more detailed data into the public domain
At present, many of the more detailed data used to construct the Working Futures database
are only available via NOMIS.26 It was agreed that it would be helpful in future if ONS could
place most of the information currently collected in order to construct the Working Futures
database via NOMIS into the public domain. That would mean that the Working Futures
database (possibly excluding sub-regional analysis) could be based solely on publically
available data and would not, therefore, be disclosive.
If the Working Futures database were redesigned to be dependent only upon data in the public
domain this would remove the need to impose any restrictions.
ONS agreed to release data at a more detailed level into the public domain (at the level of the
75 industries aggregated up from 79 as discussed above). This only required a modest
increase in the level of detail made available.
26 These data are therefore obtained subject to possession of a CEN and which cannot be passed on
to a third party.
66
A.5 Longer-term issues relating to employment, pay and hours
In the longer-term, it would be better if the predicted estimates used for the three key indicators
in the database, employment, pay and hours, could be replaced by survey data, which could
be updated automatically as they are published. This raises two questions:
If and when it will ever be possible to replace at least some of the predicted/estimated
values for some indicators by ‘real’ survey values; and
Checks on the reliability robustness of some of the more detailed
predictions/estimates.
In principle, it is possible to use ‘real’ survey values where these are statistically robust and
non-disclosive and to only use predicted values to fill in the many gaps. In practice, this might
pose some problems, if and when the predicted values and real values show significant
divergence. This is something that can be explored in further development work as and when
such data become available. This will require further detailed consultation with ONS and the
development of an agreed methodology for merging ‘real’ and predicted values in a seamless
fashion.
In the short to medium-term, it is recommended that the database continues to be based on
predicted values throughout.
67
A.6 Checking Algorithm to avoid publishing unreliable estimates
A checking algorithm is built-in to the API to avoid ‘publishing’ estimates that might be
regarded as unreliable. This algorithm checks roughly whether or not the employment
numbers concerned would be likely to be regarded as disclosive or not statistically robust.
The use of the slightly more aggregate 75 industry categories avoids the immediate issue of
disclosure, since ONS have agreed that data at that level are not disclosive.
However, some of the numbers could still be unreliable because they are based on small
sample numbers. In the Working Futures database, this is dealt with by adopting some simple
rules of thumb and the same applies in the LMI for All database.
The rules of thumb used are:
1. If the numbers employed in a particular category/cell (defined by the 12 regions,
gender, status, occupation, qualification and industry (75 categories)) are below 1,000
then a query should return ‘no reliable data available’ and offer to go up a level of
aggregation across one or more of the main dimensions (e.g. UK rather than region,
some aggregation of industries rather than the 75 level, or SOC 2-digit rather than 4-
digit).
2. If the numbers employed in a particular category/cell (defined as in 1.) are between
1,000 and 10,000 then a query should return the number, but with a flag to say that
this estimate is based on a relatively small sample size and if the user requires more
robust estimates they should go up a level of aggregation across one or more of the
main dimensions (as in 1).
This is done not only for any queries about Employment (including Replacement Demand
calculations), but also for Pay and Hours.
In the case of Pay and Hours, the API interrogates the part of the database holding the
employment numbers to do the checks, as in points 1 and 2 above, but then reports the
corresponding Pay or hours values as appropriate.
Currently, data are provided at the most detailed level possible for all three indicators. More
aggregate estimates are obtained by simple summation (for employment) or by creating
weighted averages (using the employment numbers as weights).
68
A.7 Details of the regression analysis for pay predictions
A.7.1 Introduction
This section provides a general description on issues involved in generating the weekly pay
and hours worked estimates in the LMI for All portal. The data used are taken from the UK
Labour Force Survey (LFS) and the Annual Survey of Hours and Earnings (ASHE). The
analysis adopts the 2010 Standard Occupational Classification (SOC2010). The same
approach has been applied to the LFS and ASHE data wherever possible.
The use of “raw” data from the LFS or ASHE in the LMI for All data portal is limited due to
sample size and concerns about confidentiality. Reliance on the “raw” data would result in
huge gaps in the information available to be presented in the portal. To get around these
limitations the portal uses “predicted pay” estimates, based on an econometric analysis of the
ASHE and LFS data sets.
In order to provide additional details by age, as well as features of the distribution of pay such
as deciles, supplementary equations are used.
The discussion in this section describes the specification and estimation of the earnings
functions. It also describes the data sources, definition of the variables included and methods
used in the estimation. Details about how the estimation results are used to predict wages and
caveats that need to be borne in mind when using and interpreting the outputs are also
provided.
The discussion here does not attempt a detailed explanation of the estimation outcomes, but
aims to provide some notes to help the reader understand how the analysis has been
conducted and when care is needed in using or interpreting some of the results. It is structured
into 6 sub-sections. This first sub-section (A.7.1) provides a brief introduction. Section A.7.2
explains how the LFS and ASHE databases are constructed and introduces the definition of
earnings and other variables used in the analysis. Section A.7.3 discusses the specification of
the earnings functions and how the estimated results are used for predicting pay. Section
A.7.4 explains some supplementary analysis focusing on mean pay which is used to generate
prediction by age ‘on the fly’ in the LMI for ALL API. Section A.7.5 compares ASHE and LFS,
outlining the advantages and limitations of both datasets. Section A.7.6 concludes this
discussion.
A.7.2 Data and definitions
Pooled samples from the UK Labour Force Survey (LFS) and the Annual Survey of Hours and
Earnings (ASHE) are created to derive the pay estimates for the construction of the career
database. LFS and ASHE complement each other in various aspects.
The LFS is a quarterly survey which collects information from households living at private
addresses and is representative of the entire population of the UK. Each quarterly sample is
made up of five waves with approximately the same sizes. Each wave is interviewed in five
successive quarters. The sample is designed in a way that over the period of any four
consecutive quarters, wave one and five will never contain the same households. Thus, for
the construction of an annually representative sample of the population, wave one and wave
69
five of each quarter in 2012 and 2013 are pooled together to form an aggregated sample of
288,937 different individuals covering two years. For the purpose of this exercise, the pooled
sample is further constrained to employees aged 16 and over, leaving 86,828 full-time
employees and 33,608 part-time employees for the 2013 pay estimation.
ASHE originated from the New Earnings Survey (NES) which was started in 1970 and carried
out each year subsequently. It is the most comprehensive source of information on the
structure and distribution of earnings in the UK. It collects data on level of wages, wage
components, paid hours of work, pension arrangements and other job characteristics from all
employee jobs (self-employed workers are not included in ASHE). It covers all industries and
occupations across the whole of the UK. The samples are designed to select all employees
whose National Insurance Number ends in a particular pair of digits. ASHE currently has a
sample size of around 180,000 employees in the UK. The selected sample covers about one
per cent of the whole working population in the UK.
Compared to the LFS, ASHE has the advantages in that it has more reliable pay information
which is provided by employers rather than individuals and it has a larger sample size than
LFS. However, information on individual characteristics is limited in ASHE and it does not have
any information on education or qualification. In order to get around these problems, the LMI
for All database is based on a set of estimates/predictions of pay using data from both ASHE
and LFS. In addition, ASHE data are only available to researchers at Great Britain level (data
for Northern Ireland have not been released by the Department of Enterprise Trade and
Investment Northern Ireland). A pooled sample for pay estimation is constructed by including
the 2012 and 2013 waves of ASHE, constraining to employees aged 16+. It has 237,117 full-
time workers and 110,810 part-time workers in the core research sample.
Gross weekly pay is used in all the pay estimations based on LFS and ASHE. Here the term
Pay is generally used, although following standard conventions the term “earnings equation”
is used to refer to the econometric equation estimated to predict pay. The main earnings
equations estimate follows the well-established tradition pioneered by Jacob Mincer (Mincer,
1974). The econometric analysis adopts the standard “Mincerian” earnings function or
earnings equation. This is the “main” earnings equation as described in section A.7.3. below.
The pay variable used in the LFS is “GRSSWK”. It is the gross weekly pay before deductions
in an individual’s main job. It applies to employees and those on a government scheme but
not those employed on New Deal, in the voluntary sector, or the environmental task force.
Information on components of gross wage and the contribution of each component is not
available in LFS. The pay variable used in ASHE is “GPAY”. It is the average gross weekly
earnings in the reference period from either the main job or another job. Its main components
are basic gross weekly earnings and allowances. The other components include overtime
payments, incentive/bonus payments that relates to this pay period, and additional premium
payments during the pay period for shift work and night or weekend work not treated as
overtime.
The predicted mean weekly pay estimates in the LMI for All database are generated using the
main earnings equation. These initial predictions are then adjusted using an iterative RAS
procedure to match the published pay figures from ASHE and the LFS across each of the
70
main dimensions/characteristics (gender, region, industry, occupation and qualification). 27
This process is described in more detail in Section A.7.6.
In order to generate predictions of pay by age in the database supplementary age equations
are estimated. These results are then use to predict pay by age based on the mean value for
all ages. This is done “on the fly” in the LMI for All API.28
Similarly, predicted median and decile pay levels are based on parametric methods and the
assumption that pay is log-normally distributed.
Note that there is no pay data available for the occupation 'Armed Forces' in the LMI for All
database.
A.7.3 Earnings function
Again, this section does not attempt a detailed interpretation of the regression results, but
explains what has been included in the earnings equation and how this has been estimated.
A linear earnings function with a quadratic term for age indicating changes of age effect on
wage is estimated using the ordinary least square method.
The earnings function has been run using the log of gross weekly wage as the dependent
variable. The independent variables included are as far as possible identical in the LFS and
ASHE earnings functions. Their definitions, are as follows:
Age: a continuous variable ranged 16 to 84 in LFS and 16-93 in ASHE;
Age squared: continuous variable;
Gender: male and female, 1 dummy variable for male (base category: female) (same
in LFS and ASHE);
Region: 12 government official regions of England or devolved countries within the UK,
11 dummy variables in the regression (base category: London) (same in LFS and
ASHE);
Highest qualification: 8 QCF levels of qualifications, QCF1-8, and no qualification. and
8 dummy variables in the regression (base category: QCF8). Information on highest
qualification is only available in LFS, thus regressions conducted based on ASHE are
without any education measures.
Industry: standard 75 categories as used in Working Futures, 74 dummy variables in
the regression (base category: Agriculture, etc.) (same on LFS and ASHE);
Occupation: 4-digit SOC2010, 369 categories and 368 dummy variables in the
regression (base category: 115 Chief executives and senior officials) (same on LFS
and ASHE).
27 RAS is an iterative process used to reconcile row and column totals of a two dimensional data array
with some target figures. See McMenamin et al. (1974), Toh, (1998), Miller and Blair (2009) and Lahr
and Mesnard (2004) for a general discussion of RAS methods. 28 API stands for Application programming interface.
71
Interactive terms have also been included to detect heterogeneity across different groups
(these are the same in LFS and ASHE):
Gender by occupation: gender is interacted with 4-digit occupation categories to
control wage differences between male and female within each occupation. The base
group is female Chief executives and senior officials.
Industry by time trend: a time trend variable is created for 2012 and 2013 It is interacted
with industries to control time trend differences within each industry. The base groups
are industries in 2012.
Occupation by time trend: the time trend is also interacted with occupations to control
time trend differences within each occupation. The base groups are occupations in
2012.
The estimated coefficients of the independent variables and the constant term can be used to
derive the expected wage for an individual with certain characteristics (as defined by the
variables included). For the earnings function specified in this study, the default reference
group is female workers living in London with highest qualification QCF8 working in the
Agriculture sector and are Chief executives or senior officials in 2012. The log expected wage
for an individual with these default characteristics at certain age can be calculated by adding
the following parts together: coefficient on age times age; coefficient on age square times age
square; plus the coefficient for the constant term. The calculation of log expected wage for
people with other characteristics can simply be made by adding coefficients for relevant
dummy variables and interaction terms to this default log expected wage. For example, for a
male worker with all the other same characteristics as default, his log expected wage is the
default log expected wage plus the estimated coefficient of the male dummy. To obtain the
expected wage, the log numbers need to be converted back to wage following: EXP(log
expected wage).Given a regression function like this, it still leaves the question of how to
provide the information for individuals whose combination of characteristics are not reflected
in the dataset. This is because the expected wages derived from the estimated coefficients in
the regression package are based on taking the fitted values for each individual in the
regression, so it is not possible to produce expected wage where there is no sample numbers
in a particular cell. This is, therefore, done outside the Stata regression package used to
estimate the parameters.
A.7.4 Supplementary age equations
In order to generate predictions of pay by age and provide an indication of how far pay of each
age varies from mean pay for all ages, “supplementary age equations” and ratios between
pay of a particular age category and mean pay of all ages are developed which enable
calculation of variations of pay by age ‘on the fly’.29 These supplementary age equations and
age ratios reflect how age affects the deviation of pay from the mean pay in groups with
different combination of characteristics. Supplementary age equations were performed to
29 This was to avoid too large a data file of predicted pay being used in the API which caused some
problems of access speed, as well as allowing the mean pay predictions to be constrained to match
published pay totals using an iterative process. The latter requires information on the numbers of
people in each category, which was not available for individual age categories.
72
derive the estimated pay at each age in a particular combination defined by four dimensions
including occupation, gender, full-time or part-time working and the highest level of
qualification. To ensure a reasonably large sample size of each combination, occupation has
been defined for this purpose at the broad 1-digit level (covering 9 categories). To provide
information of all possible aggregates, an extra category for all occupations has also been
included. The occupational categories are as follows:
Managers and senior officials;
Professional occupations;
Associate professional and technical occupations;
Administrative and secretarial occupations;
Skilled trades occupations;
Personal service occupations;
Sales and customer service occupations;
Process, plant and machine operatives;
Elementary occupation;
All occupations.
For the same reason, the highest level of qualification held has been classified into three broad
groups plus an aggregated group for all qualifications:
High: QCF Levels 4-8;
Medium: QCF Levels 1-3;
Low: no qualifications;
All qualifications including “High”, “Medium” and “Low” qualifications.
Gender and full-time or part-time workers both have two categories and an aggregated total.
Across all the four dimensions this gives a total of 360 combinations. Industry is not included
here because the sample size tends to get very small once industry is considered. It is
assumed that patterns by age are common across industry one these other dimensions have
been taken into account.
The main objective of the supplementary age equations is to provide a descriptive summary
of how pay varies by age (all else equal). Thus the factors included in the linear supplementary
equations are much more limited and only include average gross weekly pay of each age
group (the dependent variable), age and age squared to generate the age coefficients that
depict the age curve in each group. (Note: the most common finding in the literature is that the
relationship between age and pay is an inverted U-shaped with pay peaking in middle age and
declining smoothly thereafter).
The supplementary age equation is estimated using the ordinary least square method and is
based on the pooled 2013 LFS data. The estimated coefficients of the independent variables
and the constant term are used to derive the expected wage for an individual at a particular
73
age. The age equation is performed for each of the 360 combinations to derive the age
coefficients and constant term. The expected pay at each age from 20 to 65 is calculated
subsequently by applying the estimated coefficients to the value of age and age squared.
(Note: some ages are missing in some combinations, the estimated coefficients are applied to
those ages to derive their expected pay).
To provide an indication of how the expected pay at each age between 20 and 65 is distributed
around the mean pay of all ages in each combination, the mean pay of all ages in each of the
360 combination is calculated from the LFS data. A ratio between the predicted pay of a
particular age and mean pay of all ages in a combination is derived to indicate the distance of
pay of an age from mean pay. The ratios calculated for each age then enable a prediction of
pay by age around the mean pay to be made.
A.7.5 Median and deciles
Median and deciles are used to describe the distribution of pay. For a normal distribution,
median and other deciles can be predicted using mean and standard deviation. The pay
distribution in ASHE and the LFS is not normally distributed, however the natural log of pay
tends to follow a normal distribution. Consequently, by converting pay to log pay it is possible
to use the log normal distribution of pay to predict the median and deciles in the log-normal
distribution. In order to generate predictions of pay for medians and deciles in LMI for All,
supplementary “distribution equations” are used, based on analysis of LFS data.
The median and deciles analysis is to show how median pay (and other deciles) typically vary
around mean pay of a selected dimension (for example, mean pay of an industry, or mean
pay of an occupation, etc.). This assumes that the pay distributions are otherwise the same
across the other main dimensions such as gender, region, etc.
The formula used to compute median and other deciles pay follows the property of the log-
normal distribution. For a selected dimension, the median or deciles of log pay equals to the
mean of log pay plus the relevant z score times the standard deviation. z scores measure how
far away the decile or median of interest is located from the mean in a normal distribution, (or
in another words, how many standard deviations it is away from the mean).They are known
for any specified deciles or median and can be obtained from the standard normal cumulative
probability table. They are fixed values in the normal distribution and are the same for any
selected dimensions with a normal distributed log pay measure. Given mean log pay and the
standard deviation of a selected dimension and z scores, the median and deciles of log pay
can be predicted. Exponentiation is needed to convert the log pay back to Pay.
However, the mean of log pay of a selected dimension is normally not available directly. Given
that median equals to mean in a normal distribution and median log pay equals log median
pay in a log normal distribution, the median level of pay for a category can be estimated by
assuming the ratios of median to mean are common to a small subset of categories chosen
arbitrarily. Using ASHE published figures on median and mean pay for 2013, the ratios of
median to mean are calculated. The ratios are applied to the mean pay of a select dimension
to generate the median pay of this dimension by assuming same ratios apply across all other
dimensions of the database. The median log pay are calculated subsequently for prediction of
log pay at other deciles.
74
Ideally estimates of mean and standard deviation are needed for all the main dimensions, but
limitations of sample size in both LFS and ASHE imply this is impossible for all possible
permutations and combinations. Inspection of the data suggests that variations are greatest
by status (FT/PT), industry and occupation. Log mean pay and values of σ have therefore
been estimated across FT/PT, industry and occupation and similar patterns are assumed to
apply across all other dimensions for the purpose of this calculation. Typical values are
assumed, based on variations across the main dimensions of interest (but not all possible
cross dimensions).
A.7.6 Concluding remarks on pay predictions
This section of Annex A has set out various issues that need to be borne in mind when using
the estimated results from the wage functions and supplementary age regressions. Details on
how the research sample has been generated, what variables have been included and how
they are defined are explained. The 2013 results are based on the UK LFS and ASHE. The
same methods and analysis are applied to LFS and ASHE. Although ASHE has a number of
advantages compared to LFS, it does not provide any information on education, thus it will not
be possible to include the same highest qualification variable as in LFS. Thus the estimated
coefficients derived for other variables using ASHE are overestimated because they are taking
account of education effects (omitted variable bias). The estimates from ASHE therefore are
not fully comparable with those from the LFS. This could be seen as an argument for just
relying upon the LFS for the regression analysis. However, the larger sample size in ASHE,
and the more reliable data from employer records, outweighs such considerations.
75
A.8 Technical details of the algorithms used to constrain the data to
match official estimates of pay and hours
A.8.1 Introduction
Key elements of the data requirement set out in the original project plan included pay, hours
and employment, broken down into as much detail as possible by:
Occupation (up to the 4-digit level of SOC2010, 369 Categories);
Sector (up to the 2-digit level of SIC2007, 75 categories); and
Geographical area (12 English regions and constituent countries of the UK).
Plus:
Age;
Gender;
Status; and
Qualification (where available).
The original idea was to access these data directly from the original survey sources, but it
soon became clear that this poses various problems of confidentiality and disclosure if
information is to be made available at the levels of detail that would be really useful for a
careers database. These problems are exacerbated when the additional dimensions such as
gender, employment status (full-time, part time, self-employment), age and qualification are
added, or when additional granularity is demanded in key dimensions such as sector or
occupation. The indicators used have therefore been estimated, using data from Working
Futures and using econometric analysis (earning functions, etc, as described in Section A.7).
This section sets out details for the algorithm used to constrain the estimates to match official
”headline” published figures. This is based on the well-established RAS process.30 RAS
procedures have been developed to generate detailed data on Pay, Employment and Hours
consistent with published data from official sources.
A.8.2 RAS processes
There are three main elements to the database that require RASing to make sure the data
agree with published figures. These relate to employment, pay and hours.
Employment RAS processes
Employment data at the 2-digit level are published in the Working Futures (WF) database (See
Wilson and Homenidou, 2012a, 2012b). This dataset has been expanded from the 25 2-digit
30 RAS is an iterative procedure where the rows and columns of preliminary estimates of a two
dimensional array are iteratively changed using proportions that are based on the ‘target’ row and
column totals. The basic RAS technique relates to a two dimensional matrix, but can be extended in to
n dimensional arrays. For some references see: McMenamin and Haring (2006); Miller and Blair (2009);
and Toh (1998).
76
occupations in the WF dataset to 369 4-digit categories for LMI for All database. In the first
instance, this is done using a simple assumption of fixed and constant shares of employment
of the 369 categories within each of the 25 digit ones, based on LFS data. The focus is on 25
sets of shares (each summing to 100 per cent) showing the proportions of employment in 4-
digit categories within each 2-digit category. In principle, this analysis could be extended to
allow these shares to vary by other dimensions, such as industry. In practice, this refinement
was not made.31
In the longer-term, it is also necessary to think about how these patterns change over time
and how to extend the projections to 2022 and beyond, but for the moment these shares are
constant, based on 2011/2012 LFS data (for further discussion see Annex C.6).
The main steps are as follows:
1. Interrogate the LFS and extract the sets of shares of 4-digit occupations within 2-digit
categories:
a. Across the whole of the UK;
b. Showing variations by ‘region’ (12 countries and English Regions);
c. Variations by Type (FT, PT, SE) and gender;
d. Variations by Sector (Working Futures 6 broad sectors.
There are just two years of LFS data available classified using SOC2010. These have
been combined for this purpose, avoiding double counting of individual cases in the
standard manner.
To begin with the data are extracted in the form of numbers in employment at the most
2. ISCO08 code (assigned by ONS for SOC-only index entries)
3. IER’ss suggestion for ISCO08 code change
4. SOC2010 index entries, matched to ISCO08 entries where possible by ONS
5. SOC2010 code (assigned by ONS for ISCO-only index entries)
128
Table C.2 Map from ISCO 88 to SOC2010 at 2-digit level
ISCO88 Categories as used in Cedefop Projections 2010 SOC2010 categories as used in Working Futures 2010
11 Legislators and senior officials 55 1.1 ( 11 Corporate managers and directors 2,015 12 Corporate managers 3,764 1.1 ( 13 Managers of small enterprises 1,177 1.2 12 Other managers and proprietors 1,000 21 Physical, mathematical and engineering science 1,284 2.1 21 Science, research, engineering and technology professionals 1,593 22 Life science and health professionals 403 2.2 22 Health professionals 1,296 23 Teaching professionals 1,270 2.3 23 Teaching and educational professionals 1,364 24 Other professionals 1,496 2.4 24 Business, media and public service professionals 1,591 31 Physical and engineering science associate professionals 748 3.1 31 Science, engineering and technology associate professionals 501 32 Life science and health associate professionals 965 3.2 32 Health and social care associate professionals 323 33 Teaching associate professionals 178 3.3-
3.5
34 Culture, media and sports occupations 569 34 Other associate professionals 2,350 3.3-
3.5
35 Business and public service associate professionals 2,074 41 Office clerks 2,869 4.1 41 Administrative occupations 2,738 42 Customer services clerks 942 4.1 42 Secretarial and related occupations 961 51 Personal and protective services workers 3,455 6.1 } 33 Protective service occupations 458 } 61 Caring personal service occupations 2,094 } 62 Leisure, travel and related personal service occupations 625 } 72 Customer service occupations 617 52 Models, salespersons and demonstrators 1,683 7.1 71 Sales occupations 1,991 61 Skilled agricultural and fishery workers 436 5.1 51 Skilled agricultural and related trades 399 71 Extraction and building trades workers 1,450 5.3 53 Skilled construction and building trades 1,152 72 Metal, machinery and related trades workers 875 5.2 52 Skilled metal, electrical and electronic trades 1,330 73 Precision, handicraft, craft printing and related trades 114 5.4 } 54 Textiles, printing and other skilled trades 645 74 Other craft and related trades workers 149 5.4 } 81 Stationary plant and related operators 145 8.1 } 81 Process, plant and machine operatives 822 82 Machine operators and assemblers 575 8.1 } 83 Drivers and mobile plant operators 1,073 8.2 82 Transport and mobile machine drivers and operatives 1,128 91 Sales and services elementary occupations 2,258 9.2 92 Elementary administration and service occupations 2,628 92 Agricultural, fishery and related labourers 136 9.1 } 91 Elementary trades and related occupations 544 93 Labourers in mining, construction, manufacturing and 1,140 9.1 } All occupations 31,049 All occupations 30,458
129
C.7 Other European datasets
In principle, there are a number of pan-European datasets that might be useful to add to the LMI
for All database. These include:
1. European Labour Force Survey (ELFS);
2. Other surveys including:
a. Eurofound survey of living and working conditions;
b. Eurobarometer;
c. European Values Survey; and
d. European Social Survey
These are briefly summarised here.
In practice, although they contain some interesting and useful data they are generally not suitable
for including in the database because the sample sizes are inadequate to provide reliable data at
a detailed and consistent level by occupation.
They would have more value if the database were to be extended to cover the needs of other
users such as more general labour market analysts.
European Labour Force Survey (EFLS)
General description of the dataset
The European Union Labour Force Survey (EU LFS) is conducted in the 27 Member States of
the European Union, three candidate countries and three countries of the European Free Trade
Association (EFTA) in accordance with Council Regulation (EEC) No. 577/98 of 9 March 1998.
At the moment, the LFS microdata for scientific purposes contain data for all 27 Member States
and in addition Iceland, Norway and Switzerland.
The EU LFS is a large household sample survey providing quarterly results on labour participation
of people aged 15 and over as well as on persons outside the labour force. All definitions apply
to persons aged 15 years and over living in private households. Persons carrying out obligatory
military or community service are not included in the target group of the survey, as is also the
case for persons in institutions/collective households.
The national statistical institutes are responsible for selecting the sample, preparing the
questionnaires, conducting the direct interviews among households, and forwarding the results to
Eurostat in accordance with the common coding scheme.
The data collection covers the years from 1983 onwards. In general, data for individual countries
are available depending on their accession date.
130
The Labour Force Surveys are conducted by the national statistical institutes across Europe and
are centrally processed by Eurostat:
Using the same concepts and definitions;
Following International Labour Organisation guidelines;
Using common classifications (NACE, ISCO, ISCED, NUTS);
Recording the same set of characteristics in each country.
In 2011, the quarterly LFS sample size across the EU was about 1.5 millions of individuals. The
EU-LFS covers all industries and occupations.
A significant amount of data from the European Labour Force Survey (EU LFS) is also available
in Eurostat's online dissemination database, which is regularly updated and available free of
charge. The EU LFS is the main data source for the domain ‘employment and unemployment’ in
the database. The contents of this domain include tables on population, employment, working
time, permanency of the job, professional status etc. The data is commonly broken down by age,
sex, education level, economic activity and occupation where applicable.
Several elements of indicator sets for policy monitoring are also derived from the EU LFS and
freely available in the online database. The structural indicators on employment include the
employment rate, the employment rate of older workers, the average exit age from the labour
force, the participation in life-long learning and the unemployment rate. The sustainable
development indicators also include employment rates by age and educational attainment as well
as the population living in jobless households and the long-term unemployment rate.
Data made available via Eurostat are annoymised by suppression if necessary.
Microdata from the ELFS is available from Eurostat but confidentiality concerns mean that access
to the data is tightly controlled, many variables are not available in all countries and limited detail
is made available on sensitive variables. Publically available data are available in xls format to
download from the Eurostat website. The standardisation of the data means that it could be
integrated in to the Careers LMI database providing a European perspective on employment,
unemployment rates, workforce characteristics, educational attainment and earnings. Because of
concerns about confidentiality and statistical robustness Eurostat only make the data available in
restricted format. These data would, therefore, need to be presented at an aggregated industry,
Please could you supply us with a brief written scenario of the type of questions and information a client/customer/claimant may ask in a typical one-to-one-session. Please also list some ‘real world’ questions.
Currently, what type of labour market information do you most commonly use with your
clients/customers/claimants?
What are the gaps in labour market information you need for your business?
Please specify the particular target group of clients/customers/claimants with whom you would
want to use this application. What would be your priority for an application for this target group
using the LMI for All database?
136
Annex E: Hack and modding day feedback and developments
The developers
Twelve developers were selected for the second hack day, comprising: five developers who
participated in the first LMI for All hack day; five from the UKCES careerhack competition; and
two further developers who contribute to widening the skill set of the developers. There were
eleven male developers and one female. Developers were selected based on their skill set to
comprise teams in order to progress Phase 2A hacks. Developers are variously involved in:
accessibility and open data; UX front-end and back-end development; product management; IOS
developments; social and mobile apps development; RS development; and API development.
Skills included: HTML5; CSS; Photoshop; wireframing and semantic web technologies; 3D