1 Using Webcrawlers to Estimate Incremental Measure Costs for the Retail Plug-Load Portfolio (RPP) Program November 21, 2014
Dec 23, 2015
1
Using Webcrawlers to Estimate Incremental Measure Costs for the Retail Plug-Load Portfolio (RPP) Program
November 21, 2014
Webcrawlers and RPP
• Agenda:• Review of webcrawling as applied to LED lamps• Applying webcrawlers to other new product categories• Comparison of the webcrawling methodology to the CPUC’s IMC study
2
Applications of Webcrawling and Big Data to LED Lamps
• Project Goal: Understand relationship between price and performance for LED lamps. • Study metrics: Light color, light quality, efficacy, lifetime, and dimmability.
• Research Implications: • Identify changes in costs over time• Look beyond efficiency toward product performance
3
2012 Analysis Conducted by Energy Solutions for PG&E
• Approach:• 700 unique price points were manually collected for over 500
unique lamp models • Multi-variable regression model to analyze the dataset
4
ENERGY STAR?
CRI
CCT
Power Factor
Wattage
Efficacy
Light Output
Bulb Shape
Dimmability Lifetime
Price Modeling – 2012 Data
5
Note: Results based on online retailer data, which we found to be significantly higher on average than in store prices.
2014 Analysis: Applying Big Data
• Retailer-based web crawler tool:• Retailer provided APIs (Application Programming Interfaces)• Screen-scraping methods
• Scope of data collection:• 9 online retailers• 3,000 unique price points• 1,000 unique LED lamp models• 50 different manufacturers• Data collected weekly
6
A Difference in Magnitude: 2012 vs. 2014 Data Collection
7
Watts
Lamp Sh
ape / Typ
e
Lumens
Lumen M
aintenance
Color Tempre
rature
Efficacy
(lpw)
Dimmable (Y
/N)
Energy S
tar Qual. (
Y/N)
Warra
nty CRI
Power Facto
r
Beam Angle
Input Volta
ge R9
Color Consis
tency /
Change
Candlepower (intensit
y)
Product
Weight
Power Typ
e (AC/D
C)0
500
1000
1500
2000
2012 Data - Manually Collected 2014 Data - Web-Crawler
Lamp Property
# Pr
oduc
ts
Note: 2014 data is refreshed every week
Some Benefits of Big Data
• More data -> improvements to the regression analysis:• Individual models could be created for each lamp type• Additional independent variables analyzed• Comparable or improved explanatory power for each model
• New data is collected each week with minimal effort• Ability to monitor real-time performance and price changes• Observe trends in performance and price
8
Sample Regression Results
Best fit model is based on:
• Lumens • Brand• Energy Star Qualified
Metrics not independently impacting price include:
• Dimmable• Color Temperature• CRI• Wattage• Beam Angle• Warranty Length• Diameter• Efficacy• Lumen Maintenance
9
Observed Trends
10
Implications for IMC over Time
11
No more IMC for CRI?
• Moving towards a more dynamic understanding of IMC and performance
Developing a Dynamic Understanding of Products
• IMC is one of many issues that can be informed by data analysis
What’s the market’s baseline performance?
How do the best products perform?
How is performance changing over time?
What’s the incremental cost?
12
Outstanding Questions for this Effort
13
• Opportunities for Refinement• Identifying new ways to use data most effectively• Linking to product performance databases• Inconsistent retailer info and labeling• Developing improved links between in-store and online data• Better understanding regional pricing impacts (rebates)
• Remaining Questions• Legal issues of web-crawling
Applications of Webcrawlers and Big Data
• Agenda:• Review of webcrawlers applied to LED lamps• Applying webcrawlers to new product categories• Comparison of the webcrawling methodology to the CPUC’s IMC study
14
Building on Prior Work
• Ongoing Parallel Efforts• LED bulbs webcrawlers• Assess compliances with CEC Appliance Standards• Development of paper on scaling webcrawler effort
• Existing Infrastructure• Established extraction mechanisms for 9 different online retailers (Best
Buy, Costco, Lowe’s, Home Depot, Walmart, ACE Hardware)
15
Applying Past Experience to New Product Categories
• For New Products • Straightforward for retailers in previous effort• Identify key product features to track• Perform initial data requests• Refine process over time to improve data quality
• For New Retailers • Requires significantly more time to understand retailer API, website
layout.
16
Sample Product Features: Refrigerators
• Primary Product Features:• Brand• Model• Configuration• Defrost Type• Through the Door Ice, Water• Total Capacity (Freezer, Fresh)
• Energy Data
• Energy Star Qualified?• Annual Electricity Use
17
Applications of Webcrawlers and Big Data
• Agenda:• Review of webcrawlers applied to LED lamps• Applying webcrawlers to new product categories• Comparison of the webcrawling methodology to the CPUC’s IMC study
18
IMC Estimation: A Comparison of Methodologies
• Traditional IMC Studies:• Given high cost and time to
execute, restricted to high priority measures
• Managed by the CPUC• Take 2-3 years to complete• Conducted every 3-5 years
• Webcrawler Approach:• Better suited to rapidly-
changing markets• Use a similar analytic approach
(e.g., hedonic price models)• Census of products on all
available site• New products can be added as
needed• Data collected faster and in
higher volume• Data must be adjusted for
differences between brick-and-mortar and online price points
19
Next Steps
• Test approach on 1-2 products, expand to remaining products
• Timeline:• May be between 1-2 months to develop IMC estimates for all product
categories for current retailers.• Additional time is required for additional retailers
20
Questions?