Top Banner
Example and Use Cases: TIBCO Statistica™ Live Score® Server Document Updated: April 2019
26

TIBCO Statistica Live Score Example · e.g. SOAP UI NG Pro and quickly generate a web service client. Configure WSDL (i.e. the transaction) Verify WSDL (service endpoint) using a

Oct 27, 2019

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: TIBCO Statistica Live Score Example · e.g. SOAP UI NG Pro and quickly generate a web service client. Configure WSDL (i.e. the transaction) Verify WSDL (service endpoint) using a

Example and Use Cases: TIBCO Statistica™ Live Score® Server

Document Updated: April 2019

Page 2: TIBCO Statistica Live Score Example · e.g. SOAP UI NG Pro and quickly generate a web service client. Configure WSDL (i.e. the transaction) Verify WSDL (service endpoint) using a

The following information is confidential information of TIBCO Software Inc. Use, duplication, transmission, or republication for any purpose without the prior written consent of TIBCO is expressly prohibited.

CONFIDENTIALITY

Page 3: TIBCO Statistica Live Score Example · e.g. SOAP UI NG Pro and quickly generate a web service client. Configure WSDL (i.e. the transaction) Verify WSDL (service endpoint) using a

This document (including, without limitation, any product roadmap or statement of direction data) illustrates the planned testing, release and availability dates for TIBCO products and services. This document is provided for informational purposes only and its contents are subject to change without notice. TIBCO makes no warranties, express or implied, in or relating to this document or any information in it, including, without limitation, that this document, or any information in it, is error-free or meets any conditions of merchantability or fitness for a particular purpose. This document may not be reproduced or transmitted in any form or by any means without our prior written permission.

The material provided is for informational purposes only, and should not be relied on in making a purchasing decision. The information is not a commitment, promise or legal obligation to deliver any material, code, or functionality. The development, release, and timing of any features or functionality described for our products remains at our sole discretion.

During the course of this presentation TIBCO or its representatives may make forward-looking statements regarding future events, TIBCO’s future results or our future financial performance. These statements are based on management’s current expectations. Although we believe that the expectations reflected in the forward-looking statements contained in this presentation are reasonable, these expectations or any of the forward-looking statements could prove to be incorrect and actual results or financial performance could differ materially from those stated herein. TIBCO does not undertake to update any forward-looking statement that may be made from time to time or on its behalf.

DISCLAIMER

Page 4: TIBCO Statistica Live Score Example · e.g. SOAP UI NG Pro and quickly generate a web service client. Configure WSDL (i.e. the transaction) Verify WSDL (service endpoint) using a

© Copyright 2000-2016 TIBCO Software Inc.

• In Statistica, the same model can be used for both processing modes• Batch: can be scheduled through Statistica Server (web) or run on demand

(interactively); some customers prefer existing third-party schedulers => there is a command line option

• Live: for that, there is Statistica scoring engine - Live Score

What is it all about

Scoring [in this context]: making predictions / business decisions based on data (analytic model)•simply: applying a model to new data

Batch scoring: process data [usually large number of records] on schedule or on demand•e.g. automatic weekly rating of insurance claims for fraud detection / subrogation opportunities; periodic re-ranking of bank’s customers for risk management (Value-at-Risk assessment); scanning customer data for marketing campaign selection

Live scoring: process data [usually single cases] in real-time, only relevant records and when needed (sometimes - as they are created)

•e.g. on-line fraud detection; underwriting support for agents working with prospective customers; customer service / marketing support: credit scorecards, segmentation, up-sell/cross-sell, churn

Page 5: TIBCO Statistica Live Score Example · e.g. SOAP UI NG Pro and quickly generate a web service client. Configure WSDL (i.e. the transaction) Verify WSDL (service endpoint) using a

• Live Score is a web service within the Statistica platform• Data is prepared. Models are trained and validated using the Statistica algorithms. The

models are then deployed to the Statistic Live Score Server. Live Score provides efficient, scalable and platform-independent real-time scoring of data from line-of-business applications.

What is Live Score

The primary purpose of this solution is the incorporation of the resulting predictive models into the respective business processes, actualizing the real value from analytics.

Page 6: TIBCO Statistica Live Score Example · e.g. SOAP UI NG Pro and quickly generate a web service client. Configure WSDL (i.e. the transaction) Verify WSDL (service endpoint) using a

• Flexibility:• IT-friendly: dynamic access to all Statistica facilities and third-party components through server-side SVB scripting• quick modifications/extensions compared to compiled/hard-coded implementation• customizable input/output structures for web service requests – easy integration with existing systems• rapid pathway to deployment for model updates with all the benefits of Enterprise (versioning, audit logs)• same models/workspaces can be used for both live and batch use cases• simple model migration between environments (Dev / QA / Prod)

• Performance: 50-500ms roundtrip, depending on the specific processing logic• Scalability: stateless, easy to load-balance, near-linear scaling with the server count in the

farm• Reliability: failover/high availability configurations, service health check, error notifications• Adherence to industry standards: OLEDB/ODBC; PMML; SOAP/XML, WSDL; COM, VB / R

Benefits of Statistica Live Score

Page 7: TIBCO Statistica Live Score Example · e.g. SOAP UI NG Pro and quickly generate a web service client. Configure WSDL (i.e. the transaction) Verify WSDL (service endpoint) using a

This is an interesting and complex topic and the details are out of scope for this presentation. Below are the high-level steps necessary to produce a model ‘worthy’ of deploying in Live Score. • Business understanding, buy-in from stakeholders, requirements, resources (analysts / IT

/ hardware)• Implementation (high availability/throughput), validation, training, integration with current

processes• Data acquisition and preparation• Model building – analytic models, business rules• Analytic/business processes expressed in reusable, parameterizable workflows

(Workspaces)• Deployment to a centralized, shared and access-controlled location (Enterprise database,

managed by Statistica Enterprise Manager application)• Model versioning, approval, preproduction testing, migration to Production environment• Model quality monitoring and then update models with poor quality

Overview: Model Lifecycle

Page 8: TIBCO Statistica Live Score Example · e.g. SOAP UI NG Pro and quickly generate a web service client. Configure WSDL (i.e. the transaction) Verify WSDL (service endpoint) using a

• Scoring process in a nutshell: which model to use, which data to score, what to do with results

• Data in a single analytic record format, representing an entity like a claim or customer• Request/response to the web service over HTTP/SOAP(XML)• Pass all the input data, some of it or just an entity identifier [used to assemble the record

by Live Score using e.g. a stored procedure in a database]• Return the results (in arbitrary/customized format) in the response or write it to the

database• Additional optional switches to control service behavior (e.g. caching) or client-specific

customizations• Integration with multiple data sources through standard data access protocols; scoring

engine as part of a larger event processing workflow within customer’s IT environment• Service interface is expressed by a standard WSDL definition; validate by load-testing

using a toolset like Soap UI; rapidly implement client-side code using e.g. Microsoft’s standard Dev tools like Visual Studio

Overview: Using Statistica Live Score

Page 9: TIBCO Statistica Live Score Example · e.g. SOAP UI NG Pro and quickly generate a web service client. Configure WSDL (i.e. the transaction) Verify WSDL (service endpoint) using a

There are two user interfaces to create models. You can use a spreadsheet and select menu items (i.e. interactive/quick model building) or you can create a workspace. Typically models are created with workspaces to provide documentation of model building steps. And to make it is easier to rebuild the model later (i.e. refresh model). This example use interactive model building.

• Start Statistica. Select Home menu à Open menu à Open Examples menu.• Open the Dataset folder and Select/open CreditScoring.sta.• Rename all the variables with underscores. For eg:- Credit_scoring, Duration_of_Credit.

Interactive Example: Start with Credit Scoring dataset

Page 10: TIBCO Statistica Live Score Example · e.g. SOAP UI NG Pro and quickly generate a web service client. Configure WSDL (i.e. the transaction) Verify WSDL (service endpoint) using a

We used a robust modeling method called Boosted Trees that is used by financial services.

• Select Data Mining menu à Boosted Trees menu

• Select Classification Analysis

• Select Variables as shown in image below and click OK to watch model build

• Select Report tab à Code generator pull down list à Deployment to Statistica Enterprise• This saves model in database (metadata store). You can view model by starting Statistica Enterprise Manager application.

You want to save the model as a “deploy new object”.

Interactive Example: Create a prediction model and deploy it…

Page 11: TIBCO Statistica Live Score Example · e.g. SOAP UI NG Pro and quickly generate a web service client. Configure WSDL (i.e. the transaction) Verify WSDL (service endpoint) using a

• Start Statistica. Select Home menu à New menu à New Workspace menu.• Select blank workspace template• When the Select Data Source dialog opens, click on Files button. • Browse to Statistica’s executable directory – for example C:\Program Files\Statistica\Statistica

13\Examples\Datasets• Select the CreditScoring.sta which adds a green node• Double click on the CreditScoring node to open the dataset• Rename all the variables with underscores. For eg:- Credit_scoring, Duration_of_Credit.

Workspace Example: Start with Credit Scoring dataset

Page 12: TIBCO Statistica Live Score Example · e.g. SOAP UI NG Pro and quickly generate a web service client. Configure WSDL (i.e. the transaction) Verify WSDL (service endpoint) using a

We used a robust modeling method called Boosted Trees that is used by financial services.

• Click on the green node, then select Data Mining menu à Boosted Classification Trees menu

• New node is added and connected to the data (green node)

• Select Variables as shown in image below and click Run All to build model

Workspace Example: Create a prediction

Page 13: TIBCO Statistica Live Score Example · e.g. SOAP UI NG Pro and quickly generate a web service client. Configure WSDL (i.e. the transaction) Verify WSDL (service endpoint) using a

• Double click on Boosted Classification Trees (2) node

• Select PMML tab Report tab

• Select Deploy to Enterprise button. Code generator pull down list à Deployment to Statistica Enterprise• This saves model in database (metadata store). You can view model by starting Statistica Enterprise

Manager application. You want to save the model as a “deploy new object”.

• Now save “Workspace1” to the metadata store. Use this workspace to rebuild the model when it needs refreshed. Select Enterprise menu à Deploy to Enterprise menu à Workspace

Workspace Example: Deploy it…

Page 14: TIBCO Statistica Live Score Example · e.g. SOAP UI NG Pro and quickly generate a web service client. Configure WSDL (i.e. the transaction) Verify WSDL (service endpoint) using a

• The model is deployed to Statistica Enterprise database in a standard, portable format: PMML (XML model description)

• Load WSDL definition from ”C:\WebSTATISTICAPub\RepositoryRoot\System\Scripts\Live Score Sample.wsdl” into one of many development environments that support SOAP web services, e.g. SOAP UI NG Pro and quickly generate a web service client.

Configure WSDL (i.e. the transaction)

Page 15: TIBCO Statistica Live Score Example · e.g. SOAP UI NG Pro and quickly generate a web service client. Configure WSDL (i.e. the transaction) Verify WSDL (service endpoint) using a

Verify WSDL (service endpoint) using a web service testing tool, e.g. SoapUI

Validate and test the service endpoint

Page 16: TIBCO Statistica Live Score Example · e.g. SOAP UI NG Pro and quickly generate a web service client. Configure WSDL (i.e. the transaction) Verify WSDL (service endpoint) using a

How a more complex scenario differs from what we just saw?• Workspaces: business-analyst-friendly, not requiring IT/development implementation.

Some loss of performance compared to hard-coded/compiled solutions is more than covered by the flexibility of the system and the ground/control gained by the analysts from IT

• The deployment tool does not [yet] support automatic generation of WSDL for workspace models; however, the reason for the delay of this feature is the fact that each customer’s requirements so far were so unique and peculiar [mainly due to the desire to fit the scoring component into the existing systems without any modifications to them] that it always required manual adjustments away from any generalizations; still, this feature is on the roadmap

• What we need to know now is that the request processing is handled by executing customizable server-side SVB macros that interpret the request parameters, retrieve and set up the workspace models (from Enterprise database or possibly from cache), execute them and handle the results per customers’ requirements

More complex scenarios

Page 17: TIBCO Statistica Live Score Example · e.g. SOAP UI NG Pro and quickly generate a web service client. Configure WSDL (i.e. the transaction) Verify WSDL (service endpoint) using a

Workspace complexity for different customers, lines of business and specific tasks may vary, and the seemingly simple ones can pack a lot of punch, e.g. conditional logic, per-case selection of multiple models and Enterprise-linked rulesets contained in a single Rules node

Workspaces in Live Score

Almost all workspaces in Live Score follow a variation on this path:

get the case from the database [using parameterization] for scoring, apply analytics of arbitrary complexity, write results back to database

Page 18: TIBCO Statistica Live Score Example · e.g. SOAP UI NG Pro and quickly generate a web service client. Configure WSDL (i.e. the transaction) Verify WSDL (service endpoint) using a

There are several ways to parameterize a workspace. The one of most use in Live Score is parameterization of database query / stored procedure (Enterprise data configurations). This lets us collect a single case of data for scoring based on e.g. case identifier, as well as provide other context-specific “switches” to guide data case assembly and analytic processing, e.g. segmentation (model selection)

Workspace parameterization

•ß Image is from Statistica Enterprise Manager application which manages metadata and models.

•Another option is to parameterize a node by using a global workspace Dictionary. For example, we could provide an override switch in web service request, turning off ‘archive to database’ step for QA/load test runs so as not to pollute the results table.•The most general form of parameterization is the server-side macro handling requests.•Any request input is a parameter that can control the arbitrary VB logic, e.g. choosing to invoke different workspaces altogether.

Page 19: TIBCO Statistica Live Score Example · e.g. SOAP UI NG Pro and quickly generate a web service client. Configure WSDL (i.e. the transaction) Verify WSDL (service endpoint) using a

• WSDL includes a SOAP operation name (customizable e.g. through Live Score Deployment tool) that is matched to a particular server-side Statistica Visual Basic (SVB) macro• Different scripts can be used to implement various applications / scoring tasks• Macro has access to the request’s SOAP/XML content and has control over the response content, e.g. it

can return an XML blob with arbitrary level of complexity instead of a single-value model score

• Most frequently the input data is not passed in with the request but is referenced through a ‘case ID’ identifying e.g. a customer or a claim record in the database; in this case the macro needs to parameterize the workspace with this ID, which can then be used to parameterize a query or a stored procedure invocation to obtain a single-case input data record for scoring

• Another important step is dealing with the scoring results: some customers prefer to return them in the response [usually in some custom structure], some place them into a known table in a database with the ‘case ID’ for reference, but in all cases there is a requirement to log/archive the result as well as the fact of the scoring request – usually in a database• For that there is a Database Writeback node for Workspaces

Server-side SVB macros handle Live Score requests

Page 20: TIBCO Statistica Live Score Example · e.g. SOAP UI NG Pro and quickly generate a web service client. Configure WSDL (i.e. the transaction) Verify WSDL (service endpoint) using a

Scoring is initiated by a ‘client’. The form the client may take can be very different – an interactive thick-client app for customer service agents, browser-based self-service customer-facing app or a ‘headless’ piece of code in a larger business workflow. It can look like an intranet web site:

Example: how this all fits together

or this Informatica workflow invoking a web service…

Page 21: TIBCO Statistica Live Score Example · e.g. SOAP UI NG Pro and quickly generate a web service client. Configure WSDL (i.e. the transaction) Verify WSDL (service endpoint) using a

Underneath it all is a piece of code like this one, with the client-server communication encapsulated into a method of an object that is usually auto-generated from WSDL file; that’s the whole [albeit simplified] client:

Example: client internals

Page 22: TIBCO Statistica Live Score Example · e.g. SOAP UI NG Pro and quickly generate a web service client. Configure WSDL (i.e. the transaction) Verify WSDL (service endpoint) using a

Client and server speak the common language of Service-Oriented Architecture: SOAP/XML (over HTTP)

Note here how we needed to replace the legacy scoring engine with Live Score with minimum of modifications to the rest of the infrastructure, hence we basically take the legacy XML request, put it into a string, unpack it on the server, score the data, then package up the response in legacy XML format and send it back as a string in SOAP response. Your team’s preference for a string over strong-typed XML was due to the simplicity of plugging it into existing processing pipeline - as XML validation is done by upstream/ downstream components, the developers did not want to unpack/pack/validate it again.

Example: this is what goes ‘on the wire’

Page 23: TIBCO Statistica Live Score Example · e.g. SOAP UI NG Pro and quickly generate a web service client. Configure WSDL (i.e. the transaction) Verify WSDL (service endpoint) using a

Then the response XML is loaded into a string output parameter, and we get a similarly structured SOAP back to the client

Example: that’s what happens on the server

Page 24: TIBCO Statistica Live Score Example · e.g. SOAP UI NG Pro and quickly generate a web service client. Configure WSDL (i.e. the transaction) Verify WSDL (service endpoint) using a

• Logging: it is recommended to adjust the logging settings after installation to increase the level of detail in the collected information and enable some additional logging that might be useful in diagnosing rare unreproducible issues that might occur days or weeks after the service started, e.g. triggered by a specific data input that never occurred since

• Debugging in case of non-interactive services consists mostly of log inspection (files and Windows Event Log), tracing (either built-in or sometimes custom output) and process snapshot review (automatically created in case of major failures during request processing)• Log file locations, what those logs cover• Windows Event Log: can create notifications (e.g. send an email) based on error events• Sample trace output, with custom entries• What are process snapshots

• In rare cases you might need to restart or update the service in Production – in this case load-balanced server farm comes in handy as one can temporarily take one server out of the farm without affecting the overall behavior of the system.

Logging service activity, debugging problems

Page 25: TIBCO Statistica Live Score Example · e.g. SOAP UI NG Pro and quickly generate a web service client. Configure WSDL (i.e. the transaction) Verify WSDL (service endpoint) using a

• The service will work after installation out-of-the-box, with minimal setup, e.g. creation of user account(s) and actually setting up some models for scoring. There is a sample server-side macro preinstalled that will let one score workspace models with minimal customization.

• Most users want to customize to Live Score fit in your existing business process – will take some effort.

• Most configuration options available to administrators are at their optimal settings, with exception of logging which is recommended to be made more detailed, and the number of STATCF processes, which should be set to double the number of processors purchased.

• Some of the useful options…• Administrative Console• Precaching: controlled through InitInstance.svb & Registry keys

• The most frequent issues are due to permissions / credentials – on system/OS level (e.g. the account the service is running under), at the database layer, in Enterprise environment etc.

“Advanced” Configuration 101

Page 26: TIBCO Statistica Live Score Example · e.g. SOAP UI NG Pro and quickly generate a web service client. Configure WSDL (i.e. the transaction) Verify WSDL (service endpoint) using a

Administrative console