Top Banner
Predictive Tax Compliance Presentation to the IRS SPSS Benjamin Chard Senior Solution Engineer [email protected] Sarah Mattingly IRS Account Executive [email protected] SRA Ted Fischer Project Manager [email protected] or [email protected] 301-731-3534
30
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Click here to view the presentation.

Predictive Tax CompliancePresentation to the IRS

SPSS

Benjamin ChardSenior Solution [email protected]

Sarah MattinglyIRS Account [email protected]

SRA

Ted FischerProject [email protected] or [email protected]

Page 2: Click here to view the presentation.

Agenda

Introduction to Data Mining

Predictive Tax Compliance

Using Clementine for Audit Selection

What’s New in Clementine Version 11.1

IRS Refund Fraud Detection Project Case Study

Page 3: Click here to view the presentation.

Where Does Data Mining Fit?

Operational Setting•Reporting•Case Mgt

•Claim Scoring

Operational Setting•Reporting•Case Mgt

•Claim Scoring

Build ModelsData MiningWorkbench

Build ModelsData MiningWorkbench

Existing Data •Historical Claims•Current Claims

Existing Data •Historical Claims•Current Claims

Page 4: Click here to view the presentation.

‘Data Mining’ vs. ‘Query/Reporting’

Reporting (Tables, Graphics, OLAP)

Provide you with a very good view of what is happening, but within a limited view of the data and only in models defined by the user

YEAR

200120001999

Cou

nt

600

500

400

300

200

100

0

A&B

Assault

B&E

carjacking

Larceny

Murder

MV

Rape

Robbery

other

Incident Count - by day and shift

Count

48 15 43 62 73 68

25 39 101 131 199 100

21 27 106 179 191 102

29 38 101 177 177 103

38 50 105 168 197 107

33 40 88 147 209 107

45 21 52 82 116 112

Sunday

Monday

Tuesday

Wednesday

Thursday

Friday

Saturday

00-04 04-08 08-12 12-16 16-20 20-24

Page 5: Click here to view the presentation.

‘Statistics’ vs. ‘Data Mining’ Statistics: Hypothesis Testing

Page 6: Click here to view the presentation.

Three classes of data mining algorithms:

Predict who is likely to exhibit specific behavior in the future.

Associate

“Patterns”

Predict

“Relationships”

Cluster

“Differences”

Data

Mining

Group cases that exhibit similar characteristics.

What events occur together? Given a series of actions; what action is likely to occur next?

What is Data Mining?

Page 7: Click here to view the presentation.

Predictive Tax Compliance

Page 8: Click here to view the presentation.

Predictive Tax Compliance

Tax Collection•Risk Models

Tax Collection•Risk Models

Audit Selection • Audit Models

Audit Selection • Audit Models

Non-Filer Discovery•Soft-Matching

•Prioritization Models

Non-Filer Discovery•Soft-Matching

•Prioritization Models

RegisterRegister AssessAssess CollectCollect

DATA WAREHOUSE

DATA MINING & PREDICTIVE ANALYTICS TOOLS

Right work to the right resources at the right time

Page 9: Click here to view the presentation.

Predictive Modeling

Building a predictive profile of the claim that after investigation was flagged as an improper payment regardless of amount.

Select positive investigations Maximize those claims with the highest dollar adjustment found per audit hour.

Minimize the number of no-change audits.

Cat. % nBad 52.01 168

Good 47.99 155Total (100.00) 323

Credit ranking (1=default)

Cat. % nBad 86.67 143

Good 13.33 22Total (51.08) 165

Paid Weekly/MonthlyP-value=0.0000, Chi-square=179.6665, df=1

Weekly pay

Cat. % nBad 15.82 25Good 84.18 133Total (48.92) 158

Monthly salary

Cat. % nBad 90.51 143

Good 9.49 15Total (48.92) 158

Age CategoricalP-value=0.0000, Chi-square=30.1113, df=1

Young (< 25);Middle (25-35)

Cat. % nBad 0.00 0Good 100.00 7Total (2.17) 7

Old ( > 35)

Cat. % nBad 48.98 24Good 51.02 25Total (15.17) 49

Age CategoricalP-value=0.0000, Chi-square=58.7255, df=1

Young (< 25)

Cat. % nBad 0.92 1Good 99.08 108Total (33.75) 109

Middle (25-35);Old ( > 35)

Cat. % nBad 0.00 0Good 100.00 8Total (2.48) 8

Social ClassP-value=0.0016, Chi-square=12.0388, df=1

Management;Clerical

Cat. % nBad 58.54 24

Good 41.46 17Total (12.69) 41

Professional

Cat. % nBad 52.01 168

Good 47.99 155Total (100.00) 323

Credit ranking (1=default)

Cat. % nBad 86.67 143

Good 13.33 22Total (51.08) 165

Paid Weekly/MonthlyP-value=0.0000, Chi-square=179.6665, df=1

Weekly pay

Cat. % nBad 15.82 25Good 84.18 133Total (48.92) 158

Monthly salary

Cat. % nBad 90.51 143

Good 9.49 15Total (48.92) 158

Age CategoricalP-value=0.0000, Chi-square=30.1113, df=1

Young (< 25);Middle (25-35)

Cat. % nBad 0.00 0Good 100.00 7Total (2.17) 7

Old ( > 35)

Cat. % nBad 48.98 24Good 51.02 25Total (15.17) 49

Age CategoricalP-value=0.0000, Chi-square=58.7255, df=1

Young (< 25)

Cat. % nBad 0.92 1Good 99.08 108Total (33.75) 109

Middle (25-35);Old ( > 35)

Cat. % nBad 0.00 0Good 100.00 8Total (2.48) 8

Social ClassP-value=0.0016, Chi-square=12.0388, df=1

Management;Clerical

Cat. % nBad 58.54 24

Good 41.46 17Total (12.69) 41

Professional

Page 10: Click here to view the presentation.

Anomaly Detection

Find emerging trends in claims data. Use data mining to show the emerging patterns in current year data. Reported results will present specific cases that either : Exhibit a common pattern or Exhibit an unusual pattern

Unusual cases are deployed to the field investigators for further analysis.

Page 11: Click here to view the presentation.

Case Study: Audit Selection Goals

Build models to predict different outcomes. Positive Adjustment (Y/N). DPH group membership. Actual $$ Adjustment.

Historical Cases selected for model build Cases with Prior audit – prior audit and organizational data. All Cases – organizational data only.

Deployment For each outcome combine predictions for those with and

without previous audit data . For each outcome predict using organizational data only.

Page 12: Click here to view the presentation.

Clementine Workbench

Page 13: Click here to view the presentation.

Case Study: Results

Page 14: Click here to view the presentation.

Text Mining and Linguistic ExtractionText Mining and Linguistic ExtractionText Mining and Linguistic ExtractionText Mining and Linguistic Extraction

Page 15: Click here to view the presentation.

Text Mining Timeline: Text Extraction

Bag of « Words » extraction

Expressions extraction

Named Entities extraction

Events/SentimentExtraction

Combined with structured data

70’s 80’s 90’s Now

Mr.Smithakawasseenwith

Ahmedonthe

cornerof

ChurchEtc.

Mr. Smithwas seen

Mr. Ahmedcorner

Church St.Magnolia Ave.

Nov 13thMr. Smith -> Person

Mr. Ahmed-> Personaka -> Alias

was seen -> location

Church St. -> AddressMagnolia Ave. -> Address

Nov 13th -> Date

Mr. Smith (Person) -> aka (Alias) -> Mr. Ahmed (Person)was seen (location) -> Church and Magnolia (address) ->

November 13 (Date)

Mr. Ahmed in database wanted for questioning

Suspect-> send agent to this

location

“Mr. Smith aka Mr. Ahmed was seen on the corner of Church St. and Magnolia Ave. on Nov 13 th”

Page 16: Click here to view the presentation.

Text Mining Management

General Dictionaries

Organization, Location, Name, Phone Number, etc

Custom Built Subject Dictionaries

Tax Code, Form Names, Commodity, Business, etc

Interactive Synonym Dictionaries

Exclude Dictionaries

NEW!: Classification algorithms enable you to aggregate concepts from a wide variety of unstructured text data and group them into a small number of categories.

Page 17: Click here to view the presentation.

What’s NewWhat’s NewWhat’s NewWhat’s New

Page 18: Click here to view the presentation.

Binary Classifier – Automation of Many Models

Sophisticated users: hundreds of models (scripting)

Binary Classifier Node imitates this… …but easily, with a pre-built node

Page 19: Click here to view the presentation.

Time Series Algorithm

ARIMA & Exponential Smoothing

Expert Modeler – finds best model automatically

Forecast Multiple Series at once

Data Preparation Tools

Page 20: Click here to view the presentation.

Optimal Binning

Splitting up numeric data into sub-ranges

New capability to make this optimal for prediction

Existing Capability – Equal bins New Capability – Optimal bins

Page 21: Click here to view the presentation.

SPSS Reporting

SPSS Statistics and Graphs Within Clementine

Page 22: Click here to view the presentation.

Configuration Management

AuditProcessAudit

Process

Analytical Data StorageAnalytical

Data Storage

Data MiningData

Mining

AuditSelection

AuditSelection

AuditProcessAudit

Process

Analytical Data StorageAnalytical

Data Storage

Data MiningData

Mining

AuditSelection

AuditSelection

Predictive EnterpriseServices (PES) Top Four

Page 23: Click here to view the presentation.

Deployment and Integration

Configuration Management

Exporting Data, Models and Streams

Explore and Describe

Page 24: Click here to view the presentation.

1. Improve Collaboration

In single project there is the potential to create a large number of models and versions of models: different out variables different algorithms different settings different training samples.

X # different data sets

X # different users

X # different locations.

Page 25: Click here to view the presentation.

2. Improve Transparency

Provide information on which models are run on which data.

For audit standards, track who has made changes to the model and when.

Your analytics team from their desktop can see which models were

most recently run on data, so that they would be able to provide this

for internal audits.

Page 26: Click here to view the presentation.

3. Automate Process

Combine Clementine, SPSS, SAS & other processes

Scheduling & notification

Page 27: Click here to view the presentation.

4. Centralize and Control Access

Page 28: Click here to view the presentation.

Contact information

Project personnel: Ted Fischer – [email protected] or

[email protected], 301-731-3534 Anthony Colyandro – [email protected]

or [email protected], 301-731-3524

SRA Director of Business Intelligence Dave Vennergrund – [email protected],

703-803-1614

Page 29: Click here to view the presentation.

How do I get SPSS software?

IRSCathy J. Allen

Enterprise System ManagementSoftware Management Section

Idea Branch - MS 5850(304) 264-7279  -  voice(304) 279-5309  -  cell(304) 260-3033  -  [email protected]

SPSS Contacts:Account Executive – Sarah Mattingly

Email: [email protected] – 703-740-2446C – 703-389-6485

Account Manager – Matt MaddenW - 312 651 3894

Page 30: Click here to view the presentation.

Predictive Tax CompliancePresentation to the IRS

SPSS

Benjamin ChardSenior Solution [email protected]

Sarah MattinglyIRS Account [email protected]

SRA

Ted FischerProject [email protected] or [email protected]