This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Age CategoricalP-value=0.0000, Chi-square=30.1113, df=1
Young (< 25);Middle (25-35)
Cat. % nBad 0.00 0Good 100.00 7Total (2.17) 7
Old ( > 35)
Cat. % nBad 48.98 24Good 51.02 25Total (15.17) 49
Age CategoricalP-value=0.0000, Chi-square=58.7255, df=1
Young (< 25)
Cat. % nBad 0.92 1Good 99.08 108Total (33.75) 109
Middle (25-35);Old ( > 35)
Cat. % nBad 0.00 0Good 100.00 8Total (2.48) 8
Social ClassP-value=0.0016, Chi-square=12.0388, df=1
Management;Clerical
Cat. % nBad 58.54 24
Good 41.46 17Total (12.69) 41
Professional
Anomaly Detection
Find emerging trends in claims data. Use data mining to show the emerging patterns in current year data. Reported results will present specific cases that either : Exhibit a common pattern or Exhibit an unusual pattern
Unusual cases are deployed to the field investigators for further analysis.
Case Study: Audit Selection Goals
Build models to predict different outcomes. Positive Adjustment (Y/N). DPH group membership. Actual $$ Adjustment.
Historical Cases selected for model build Cases with Prior audit – prior audit and organizational data. All Cases – organizational data only.
Deployment For each outcome combine predictions for those with and
without previous audit data . For each outcome predict using organizational data only.
Clementine Workbench
Case Study: Results
Text Mining and Linguistic ExtractionText Mining and Linguistic ExtractionText Mining and Linguistic ExtractionText Mining and Linguistic Extraction
Text Mining Timeline: Text Extraction
Bag of « Words » extraction
Expressions extraction
Named Entities extraction
Events/SentimentExtraction
Combined with structured data
70’s 80’s 90’s Now
Mr.Smithakawasseenwith
Ahmedonthe
cornerof
ChurchEtc.
Mr. Smithwas seen
Mr. Ahmedcorner
Church St.Magnolia Ave.
Nov 13thMr. Smith -> Person
Mr. Ahmed-> Personaka -> Alias
was seen -> location
Church St. -> AddressMagnolia Ave. -> Address
Nov 13th -> Date
Mr. Smith (Person) -> aka (Alias) -> Mr. Ahmed (Person)was seen (location) -> Church and Magnolia (address) ->
November 13 (Date)
Mr. Ahmed in database wanted for questioning
Suspect-> send agent to this
location
“Mr. Smith aka Mr. Ahmed was seen on the corner of Church St. and Magnolia Ave. on Nov 13 th”
Text Mining Management
General Dictionaries
Organization, Location, Name, Phone Number, etc
Custom Built Subject Dictionaries
Tax Code, Form Names, Commodity, Business, etc
Interactive Synonym Dictionaries
Exclude Dictionaries
NEW!: Classification algorithms enable you to aggregate concepts from a wide variety of unstructured text data and group them into a small number of categories.
What’s NewWhat’s NewWhat’s NewWhat’s New
Binary Classifier – Automation of Many Models
Sophisticated users: hundreds of models (scripting)
Binary Classifier Node imitates this… …but easily, with a pre-built node
Time Series Algorithm
ARIMA & Exponential Smoothing
Expert Modeler – finds best model automatically
Forecast Multiple Series at once
Data Preparation Tools
Optimal Binning
Splitting up numeric data into sub-ranges
New capability to make this optimal for prediction
Existing Capability – Equal bins New Capability – Optimal bins
SPSS Reporting
SPSS Statistics and Graphs Within Clementine
Configuration Management
AuditProcessAudit
Process
Analytical Data StorageAnalytical
Data Storage
Data MiningData
Mining
AuditSelection
AuditSelection
AuditProcessAudit
Process
Analytical Data StorageAnalytical
Data Storage
Data MiningData
Mining
AuditSelection
AuditSelection
Predictive EnterpriseServices (PES) Top Four
Deployment and Integration
Configuration Management
Exporting Data, Models and Streams
Explore and Describe
1. Improve Collaboration
In single project there is the potential to create a large number of models and versions of models: different out variables different algorithms different settings different training samples.
X # different data sets
X # different users
X # different locations.
2. Improve Transparency
Provide information on which models are run on which data.
For audit standards, track who has made changes to the model and when.
Your analytics team from their desktop can see which models were
most recently run on data, so that they would be able to provide this