Top Banner
Automotive Information Research driven by Apache Solr Mario-Leander Reimer Chief Technologist, QAware GmbH [email protected] @LeanderReimer
32

Automotive Information Research driven by Apache Solr

Apr 16, 2017

Download

Data & Analytics

QAware GmbH
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Automotive Information Research driven by Apache Solr

Automotive Information Research driven by Apache Solr Mario-Leander Reimer

Chief Technologist, QAware GmbH [email protected]

@LeanderReimer

Page 2: Automotive Information Research driven by Apache Solr

2

01Agenda

Reverse Data Engineering and Exploration with MIR

Aftersales Information Research with AIR

Architecture, Requirements, Challenges

Solutions for the Problem of Combinatorial Explosion

Data Consistency and Timeliness

BOM Explosions and Demand Forecasts with ZEBRA

Page 3: Automotive Information Research driven by Apache Solr
Page 4: Automotive Information Research driven by Apache Solr

Reverse Data Engineering and Exploration with MIR

Page 5: Automotive Information Research driven by Apache Solr

5

02How do we find the originating data silo for the desired data?

System A System B System C System D

Vehicle data Other data

Where to find the vehicle data? 60 potential systems with 5000 entities.

Page 6: Automotive Information Research driven by Apache Solr

6

03How do we find the hidden relations between the systems?

How is the data linked to each other? 400.000 potential relations.

Vehicle data Other data

System A System B System C System D

Parts Documents

Page 7: Automotive Information Research driven by Apache Solr

7

01Reverse Data Engineering and Analysis with MIR and Solr

MIR manages the meta information, data models and record descriptions about the all our source systems (RDBMS, XML, SOAP, …)

MIR allows to navigate and search the metadata, easy drill into the metadata using facets

MIR also manages the target data model and Solr schema description

Page 8: Automotive Information Research driven by Apache Solr

Search Results

Tree view of systems, tables and attributes

Drill down via facets

Wildcard Search

Found potential synonyms for the chassis number

Page 9: Automotive Information Research driven by Apache Solr

Aftersales Information Research with AIR

Page 10: Automotive Information Research driven by Apache Solr

10

01Find the right information in less than 3 clicks.

The initial situation: Users had to use up to 7 different applications for their daily work.

Systems were not really integrated nicely.

Finding the correct information was laborious and error prone.

The project vision: Combine the data into a consistent information network.

Make the information network and its data searchable and navigable.

Replace existing application with one easy to use application.

Page 11: Automotive Information Research driven by Apache Solr

11

01

Page 12: Automotive Information Research driven by Apache Solr

12

01

Page 13: Automotive Information Research driven by Apache Solr

„But Apache Solr is only a full-text search engine. You have to use an Oracle database for your application data.“

– Anonymous IT person

Page 14: Automotive Information Research driven by Apache Solr

14

01Solr outperformed Oracle in query time as well as index size.

SELECT * FROM VEHICLE WHERE VIN='V%'INFO_TYPE:VEHICLE AND VIN:V*

SELECT * FROM MEASURE WHERE TEXT='engine'

INFO_TYPE:MEASURE AND TEXT:engine

SELECT * FROM VEHICLE WHERE VIN='%X%'

INFO_TYPE:VEHICLE AND VIN:*X*

| 038 ms | 000 ms | 000 ms| 383 ms | 384 ms | 383 ms

| 092 ms | 000 ms | 000 ms

| 389 ms | 387 ms | 386 ms

| 039 ms | 000 ms | 000 ms

| 859 ms | 379 ms | 383 ms

Disk space: 132 MB Solr vs. 385 MB OracleTest data set: 150.000 records

Page 15: Automotive Information Research driven by Apache Solr

The dirt race use case: •No internet connection •Low-End Devices

Page 16: Automotive Information Research driven by Apache Solr

16

01Solr and AIR on Raspberry Pi Model B as PoC worked like a charm!

Running Debian Linux + JDK8

Jetty Servlet Container with the Solr und AIR web apps deployed

A reduced offline data set with ~1.5 Mio Solr Documents

Model B Hardware Specs: ARMv6 CPU 700Mhz 512MB RAM 32GB SD Card

And now try th

is

with Oracle!

Page 17: Automotive Information Research driven by Apache Solr

17

01A careful schema design is crucial for your Solr performance.

Page 18: Automotive Information Research driven by Apache Solr

18

01Naive denormalization quickly leads to combinatorial explosion!

33.071.137 Vehicles14.830.197

Flat Rate Units

1.678.667 Packages

5.078.411 FRU Groups

18.573 Repair

Instructions

648.129 Technical

Documents

55.000 Parts

648.129 Measures

41.385 Types

6.180 Fault Indications

RelationshipNavigation

Page 19: Automotive Information Research driven by Apache Solr

19

01Multi-value typed fields can efficiently store 1..n relations, but may result in false positives.

{ "INFO_TYPE":"AWPOS_GROUP", "NUMMER" :[ "1134190" , "1235590" ]

"BAUSTAND" :["1969-12-31T23:00:00Z","1975-12-31T23:00:00Z"]

"E_SERIES" :[ "F10" , "E30" ]

}

In case this doesn‘t matter, perform a post filtering of the results in your application.

Alternative: current Solr versions support nested child documents. Use instead.

Index 0 Index 1

fq=INFO_TYPE:AWPOS_GROUP AND NUMMER:1134190 AND E_SERIES:F10

fq=INFO_TYPE:AWPOS_GROUP AND NUMMER:1134190 AND E_SERIES:E30

Page 20: Automotive Information Research driven by Apache Solr

20

01Technical documents and their validity were expressed and stored in a binary representation.

Validity expressions may have up to 46 characteristics

Validity expressions use 5 different boolean operators (AND, NOT, …)

Validity expessions can be nested and complex

Some characteristics are dynamic and not even known at index time

The solution: transform the validity expressions into the equivalent ternary JavaScript terms and evaluate these terms at query time using a custom function query filter.

Page 21: Automotive Information Research driven by Apache Solr

21

01Binary validity expression example.

Type(53078923) = ‚Brand‘, Value(53086475) = ‚BMW PKW‘

Type(53088651) = ‚E-Series‘, Value(53161483) = ‚F10‘

Type(64555275) = ‚Transmission‘, Value(53161483) = ‚MECH‘

Page 22: Automotive Information Research driven by Apache Solr

22

01Transformation of the binary validity terms into their JavaScript equivalent at index time.

((BRAND=='BMW PKW')&&(E_SERIES=='F10')&&(TRANSMISSION=='MECH'))

AND(Brand='BMW PKW', E-Series='F10'‚ Transmission='MECH')

{ "INFO_TYPE": "TECHNISCHES_DOKUMENT", "DOKUMENT_TITEL": "Getriebe aus- und einbauen", "DOKUMENT_ART": " reparaturanleitung", "VALIDITY": "((BRAND=='BMW PKW')&&((E_SERIES=='F10')&&(...))", „BRAND": [„BMW PKW"] }

Page 23: Automotive Information Research driven by Apache Solr

23

01The JavaScript validity term is evaluated at query time using a custom function query.

&fq=INFO_TYPE:TECHNISCHES_DOKUMENT

&fq=DOKUMENT_ART:reparaturanleitung

&fq={!frange l=1 u=1 incl=true incu=true cache=false cost=500}

jsTerm(VALIDITY,eyJNT1RPUl9LUkFGVFNUT0ZGQVJUX01PVE9SQVJCRUlUU

1ZFUkZBSFJFTiI6IkIiLCJFX01BU0NISU5FX0tSQUZUU1RPRkZBUlQiOm51bG

wsIlNJQ0hFUkhFSVRTRkFIUlpFVUciOiIwIiwiQU5UUklFQiI6IkFXRCIsIkV

kJBVVJFSUhFIjoiWCcifQ==)

Base

64 d

ecod

e { "BRAND":"BMW PKW", "E_SERIES":"F10", "TRANSMISSION":"MECH" }

http://qaware.blogspot.de/2014/11/how-to-write-postfilter-for-solr-49.html

Page 24: Automotive Information Research driven by Apache Solr

24

01Custom ETL combined with Continuous Delivery and DevOps ensure data consistency and timeliness.

Page 25: Automotive Information Research driven by Apache Solr

BOM Explosions and Demand Forecasts with ZEBRA

Page 26: Automotive Information Research driven by Apache Solr

26

01Bills of Materials (BOMs) explained

Page 27: Automotive Information Research driven by Apache Solr

27

01BOMs are required for …

Production planning Forecasting Demand Scenario-based PlanningSimulations

Page 28: Automotive Information Research driven by Apache Solr

28

01The Big Picture of ZEBRA

Parts / abstract

demands

Orders / actual demands

Analytics

BOMs / dependent demands

Demand Resolver

Production Planning

7 Mio.2 Mio. 21 Mrd.

Page 29: Automotive Information Research driven by Apache Solr

29

01The most essential Solr optimizations in ZEBRA

Bulk RequestHandler

Binary DocValue support

Boolean interpreter as postfilter

Mass data binary response format

Search components with custom JOIN algorithm

Solving thousands of orders with one request

Be able to store data effective using our own JOIN implementation.

Speed up the access to persisted data dramatically using binary doc values.

0111 0111

Use the standard Solr cinary codec with an optimized data-model that reduce the amount of data by a factor of 8.

Computing BOM

explosions

Enable Solr with custom post filters to filter documents using stored boolean expessions.

Page 30: Automotive Information Research driven by Apache Solr

30

01Low Level Optimizations can yield great boosts in performance

October 14 January 15 May 15 October 15

4,9 ms 0,28 ms

24 ms

Tim

e to

cal

cula

te th

e Bo

M fo

r one

ord

er

0,08 ms

Scoring (-8%)

Default Query Parser (-25%)

Stat-Cache (-8%)

String DocValues (-28%)

Development of the processing time Demand Calulation Service PoC Profiling result and the some improvements to reduce the query time.

XX

XX

Page 31: Automotive Information Research driven by Apache Solr

Solr has become a powerful tool for building enterprise and data analytics applications. Be creative!

Page 32: Automotive Information Research driven by Apache Solr

&Mario-Leander Reimer Chief Technologist, QAware GmbH [email protected]

https://www.qaware.de https://slideshare.net/MarioLeanderReimer/ https://speakerdeck.com/lreimer/ https://twitter.com/leanderreimer/