Top Banner
MINING REAL ESTATE LISTINGS MINING REAL ESTATE LISTINGS USING USING ORACLE DATA WAREHOUSING AND ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSION PREDICTIVE REGRESSION Wuri Wedyawati, Meiliu Lu Wuri Wedyawati, Meiliu Lu Department of Computer Science Department of Computer Science California State University California State University Sacramento, CA 95819-6021 Sacramento, CA 95819-6021 [email protected] [email protected]
33

MINING REAL ESTATE LISTINGS USING ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSION Wuri Wedyawati, Meiliu Lu Department of Computer Science California.

Dec 17, 2015

Download

Documents

Myles Mason
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: MINING REAL ESTATE LISTINGS USING ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSION Wuri Wedyawati, Meiliu Lu Department of Computer Science California.

MINING REAL ESTATE LISTINGS MINING REAL ESTATE LISTINGS USING USING

ORACLE DATA WAREHOUSING ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSIONAND PREDICTIVE REGRESSION

Wuri Wedyawati, Meiliu LuWuri Wedyawati, Meiliu Lu

Department of Computer ScienceDepartment of Computer Science

California State UniversityCalifornia State University

Sacramento, CA 95819-6021Sacramento, CA 95819-6021

[email protected]@csus.edu

Page 2: MINING REAL ESTATE LISTINGS USING ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSION Wuri Wedyawati, Meiliu Lu Department of Computer Science California.

OutlineOutline IntroductionIntroduction Data WarehousingData Warehousing

Building a data warehouseBuilding a data warehouse MasterDW: the data warehouseMasterDW: the data warehouse

Predictive RegressionPredictive Regression Real Estate Price PredictionReal Estate Price Prediction

ConclusionConclusion Future workFuture work

Page 3: MINING REAL ESTATE LISTINGS USING ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSION Wuri Wedyawati, Meiliu Lu Department of Computer Science California.

IntroductionIntroduction The The objective objective is to develop a knowledge is to develop a knowledge

discovery system for prospective real discovery system for prospective real estate sellers and buyers to determine estate sellers and buyers to determine their properties price based on local sold their properties price based on local sold listings. listings.

The The predictionprediction of properties selling price, is of properties selling price, is modeled by predictive regression.modeled by predictive regression.

Building a Building a data warehousedata warehouse is a prerequisite is a prerequisite for efficient mining of large and operational for efficient mining of large and operational data like Multiple Listings Services (MLS) – data like Multiple Listings Services (MLS) – data source for this system. data source for this system.

Page 4: MINING REAL ESTATE LISTINGS USING ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSION Wuri Wedyawati, Meiliu Lu Department of Computer Science California.

Data WarehouseData Warehouse

A decision support database that is A decision support database that is maintained maintained separatelyseparately from the from the organization’s operational database. organization’s operational database.

SupportSupport decision-making by providing a decision-making by providing a platform of consolidated, historical data platform of consolidated, historical data for analysis.for analysis.

Our data warehouse is based on a Our data warehouse is based on a multidimensional data modelmultidimensional data model called called star schemastar schema with one large fact table with one large fact table surrounded by a set of dimension tables. surrounded by a set of dimension tables.

Page 5: MINING REAL ESTATE LISTINGS USING ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSION Wuri Wedyawati, Meiliu Lu Department of Computer Science California.

Data Warehousing Data Warehousing

Process of building a data Process of building a data warehouse:warehouse:

1. Extraction1. Extraction

2. Transformation and cleansing2. Transformation and cleansing

3. Modeling3. Modeling

4. Transport4. Transport

Page 6: MINING REAL ESTATE LISTINGS USING ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSION Wuri Wedyawati, Meiliu Lu Department of Computer Science California.

1. Extraction1. Extraction Document the sources of dataDocument the sources of data

Identify the databases and files containing the Identify the databases and files containing the data of interestdata of interest

Analyze and document the business meaning of Analyze and document the business meaning of the data, data relationships and business rulesthe data, data relationships and business rules

Determine data that need to be extractedDetermine data that need to be extracted Extract all of subset of the data from the sourceExtract all of subset of the data from the source

– Use unload utilityUse unload utility– Use data manipulation language statementUse data manipulation language statement

Extract the changes made to the source dataExtract the changes made to the source data– Use a recovery logUse a recovery log– Use a database triggerUse a database trigger

Page 7: MINING REAL ESTATE LISTINGS USING ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSION Wuri Wedyawati, Meiliu Lu Department of Computer Science California.

2. Transformation and 2. Transformation and CleansingCleansing

Check the Check the integrityintegrity of the source data of the source data to verify that it conforms to the business to verify that it conforms to the business rules and relationships identified in rules and relationships identified in extraction step.extraction step.

Check the Check the accuracyaccuracy of the source data. of the source data. Identify the Identify the taskstasks required for data required for data

cleansing.cleansing. TransformTransform and and integrateintegrate the cleaned the cleaned

data into the format required by the data into the format required by the target system – data warehouse. target system – data warehouse.

Page 8: MINING REAL ESTATE LISTINGS USING ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSION Wuri Wedyawati, Meiliu Lu Department of Computer Science California.

3. Modeling3. Modeling Star SchemaStar Schema shows data as a collection of shows data as a collection of

two types: facts and dimensions.two types: facts and dimensions. AA Fact Fact tabletable is the primary table in a is the primary table in a

dimensional model and it contains the dimensional model and it contains the names of the facts or numerical measures, names of the facts or numerical measures, as well as keys to each of the related as well as keys to each of the related dimension tables. Examples of facts: sales, dimension tables. Examples of facts: sales, credit cards accounts, credit cards accounts, residentialresidential records. records.

AA Dimension table Dimension table is used to describe a is used to describe a specific dimension with a set of attributes. specific dimension with a set of attributes. Examples of dimensions: time, students, Examples of dimensions: time, students, areasareas. .

Page 9: MINING REAL ESTATE LISTINGS USING ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSION Wuri Wedyawati, Meiliu Lu Department of Computer Science California.

An Example Star SchemaAn Example Star Schema

Page 10: MINING REAL ESTATE LISTINGS USING ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSION Wuri Wedyawati, Meiliu Lu Department of Computer Science California.

MasterDW ModelingMasterDW Modeling

RESIDENTIAL Fact Table

OFFICESDimension

Table

AGENTSDimension

Table

AREASDimension

Table

Page 11: MINING REAL ESTATE LISTINGS USING ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSION Wuri Wedyawati, Meiliu Lu Department of Computer Science California.

4. Transport4. Transport

Identify the tools and techniques to be Identify the tools and techniques to be used for loading the data into the target used for loading the data into the target systemsystem

SQL*Loader utility (for flat file data)SQL*Loader utility (for flat file data) Transportable tablespaces (for Oracle Transportable tablespaces (for Oracle

database)database) Evaluate the need for data compression Evaluate the need for data compression

and encryption if captured or transformed and encryption if captured or transformed data is to be transported across a networkdata is to be transported across a network

Page 12: MINING REAL ESTATE LISTINGS USING ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSION Wuri Wedyawati, Meiliu Lu Department of Computer Science California.

MasterDW Data Warehousing MasterDW Data Warehousing RESI.TXT

(Data Source)

RESSOLDLOG.TXT (Log File)

RES.TXT

Transformation and Cleansing

Update

OFCSRC.TXT AGTSRC.TXT

RESIDENTIAL.TXTOFFICE.TXT AGENT.TXT AREA.TXT

Transformation and Cleansing 2

Duplicate Detection

OFFICES TABLE

AGENTS TABLE

RESIDENTIAL TABLE

AREA TABLE

Load Load Load

Page 13: MINING REAL ESTATE LISTINGS USING ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSION Wuri Wedyawati, Meiliu Lu Department of Computer Science California.

MasterDW ExtractionMasterDW Extraction

The The operational data sourceoperational data source is is extracted from Sacramento, El Dorado, extracted from Sacramento, El Dorado, Placer, and Yolo Counties Placer, and Yolo Counties Multiple Multiple Listings ServicesListings Services ( (MLSMLS) database. ) database.

It captures all the residential data in the It captures all the residential data in the source system since January 1, 1998 until source system since January 1, 1998 until January 9, 2004. January 9, 2004.

The source data is in the “|” delimited The source data is in the “|” delimited flat flat filefile and contains of 191 fields and 295787 and contains of 191 fields and 295787 rows (“rows (“RESI.TXTRESI.TXT”). ”).

Page 14: MINING REAL ESTATE LISTINGS USING ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSION Wuri Wedyawati, Meiliu Lu Department of Computer Science California.

MasterDW Transformation and MasterDW Transformation and CleansingCleansing

There are four steps :There are four steps :1.1. Transformation and cleansing 1 Transformation and cleansing 1

2.2. Update process for the result of Update process for the result of transformation and cleansing 1transformation and cleansing 1

3.3. Transformation and cleansing 2Transformation and cleansing 2

4.4. Duplication detection for office and Duplication detection for office and agent recordsagent records

Page 15: MINING REAL ESTATE LISTINGS USING ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSION Wuri Wedyawati, Meiliu Lu Department of Computer Science California.

1. Transformation and 1. Transformation and Cleansing 1Cleansing 1

Listing Price CheckListing Price CheckIf intLP <= 0 Or intLP > 99999999 ThenIf intLP <= 0 Or intLP > 99999999 Then

LPCheck = strMLSNo & " : INVALID LP = " & Str(intLP)LPCheck = strMLSNo & " : INVALID LP = " & Str(intLP)ElseIf intLP < 10000 Or intLP > 50000000 ThenElseIf intLP < 10000 Or intLP > 50000000 Then

LPCheck = strMLSNo & " : LP EXCEEDS LIMIT = " & Str(intLP)LPCheck = strMLSNo & " : LP EXCEEDS LIMIT = " & Str(intLP)End IfEnd If

Square Footage CheckSquare Footage CheckIf intSQFT = 0 ThenIf intSQFT = 0 Then

SQFTCheck = strMLSNo & " : SQFT IS NULL = " & Str(intSQFT)SQFTCheck = strMLSNo & " : SQFT IS NULL = " & Str(intSQFT)ElseIf intSQFT > 10000 And intLP < 1000000 ThenElseIf intSQFT > 10000 And intLP < 1000000 Then

SQFTCheck = strMLSNo & " : SQFT EXCEEDS LIMIT = " & SQFTCheck = strMLSNo & " : SQFT EXCEEDS LIMIT = " & Str(intSQFT)Str(intSQFT)End IfEnd If

Page 16: MINING REAL ESTATE LISTINGS USING ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSION Wuri Wedyawati, Meiliu Lu Department of Computer Science California.

1. Transformation and 1. Transformation and Cleansing 1 (cont.)Cleansing 1 (cont.)

Listing Date CheckListing Date Check If strLD = "0000-00-00" Or Len(strLD) < 8 ThenIf strLD = "0000-00-00" Or Len(strLD) < 8 Then LDCheck = strMLSNo & " : INVALID LD = " & strLDLDCheck = strMLSNo & " : INVALID LD = " & strLD ElseIf DateValue(strLD) < "1900-01-01" ThenElseIf DateValue(strLD) < "1900-01-01" Then LDCheck = strMLSNo & " : LD EXCEEDS LIMIT = " & strLDLDCheck = strMLSNo & " : LD EXCEEDS LIMIT = " & strLD End IfEnd If

Number of Full Bathroom and Half Number of Full Bathroom and Half Bathroom CheckBathroom Check

If intFull <= 0 ThenIf intFull <= 0 Then BathCheck = strMLSNo & " : NO FULL BATHROOM = " & BathCheck = strMLSNo & " : NO FULL BATHROOM = " &

Str(intFull) & " Str(intFull) & " AND " & Str(intHalf) AND " & Str(intHalf)End IfEnd If

Page 17: MINING REAL ESTATE LISTINGS USING ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSION Wuri Wedyawati, Meiliu Lu Department of Computer Science California.

1. Transformation and 1. Transformation and Cleansing 1 (cont.)Cleansing 1 (cont.)

Number of Bedroom CheckNumber of Bedroom Check If intBed <= 0 ThenIf intBed <= 0 Then

BedCheck = strMLSNo & " : NO BEDROOM = " & Str(intBed)BedCheck = strMLSNo & " : NO BEDROOM = " & Str(intBed)

End IfEnd If

Year Built CheckYear Built Check If Len(strYearBlt) = 0 ThenIf Len(strYearBlt) = 0 Then

YearBltCheck = strMLSNo & " : NO YEAR BUILT = " & strYearBltYearBltCheck = strMLSNo & " : NO YEAR BUILT = " & strYearBlt

ElseIf Val(strYearBlt) <= 1900 ThenElseIf Val(strYearBlt) <= 1900 Then

YearBltCheck = strMLSNo & " : INVALID YEAR BUILT = " & YearBltCheck = strMLSNo & " : INVALID YEAR BUILT = " & strYearBltstrYearBlt

End IfEnd If

Page 18: MINING REAL ESTATE LISTINGS USING ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSION Wuri Wedyawati, Meiliu Lu Department of Computer Science California.

1. Transformation and 1. Transformation and Cleansing 1 (cont.)Cleansing 1 (cont.)

Pending Date CheckPending Date Check If strSD = "0000-00-00" Or Len(strSD) < 8 ThenIf strSD = "0000-00-00" Or Len(strSD) < 8 Then SDCheck = strMLSNo & " : INVALID SD = " & strSDSDCheck = strMLSNo & " : INVALID SD = " & strSD ElseIf Len(LDCheck(strMLSNo, strLD)) = 0 And DateValue(strSD) < DateValue(strLD) ElseIf Len(LDCheck(strMLSNo, strLD)) = 0 And DateValue(strSD) < DateValue(strLD)

ThenThen SDCheck = strMLSNo & " : SD / LD = " & strSD & " / " & strLDSDCheck = strMLSNo & " : SD / LD = " & strSD & " / " & strLD ElseIf Len(PDCheck(strMLSNo, strPD, strLD)) = 0 And DateValue(strSD) < ElseIf Len(PDCheck(strMLSNo, strPD, strLD)) = 0 And DateValue(strSD) <

DateValue(strPD) ThenDateValue(strPD) Then SDCheck = strMLSNo & " : SD / PD = " & strSD & " / " & strPDSDCheck = strMLSNo & " : SD / PD = " & strSD & " / " & strPD ElseIf DateValue(strSD) < "1990-01-01" ThenElseIf DateValue(strSD) < "1990-01-01" Then SDCheck = strMLSNo & " : SD EXCEEDS LIMIT = " & strSDSDCheck = strMLSNo & " : SD EXCEEDS LIMIT = " & strSD

End IfEnd If Days on Market CheckDays on Market Check If intSP <= 0 Or intSP > 99999999 ThenIf intSP <= 0 Or intSP > 99999999 Then SPCheck = strMLSNo & " : INVALID SP = " & Str(intSP)SPCheck = strMLSNo & " : INVALID SP = " & Str(intSP) ElseIf intSP < 10000 Or intSP > 50000000 ThenElseIf intSP < 10000 Or intSP > 50000000 Then SPCheck = strMLSNo & " : SP EXCEEDS LIMIT = " & Str(intSP)SPCheck = strMLSNo & " : SP EXCEEDS LIMIT = " & Str(intSP) ElseIf Len(LPCheck(strMLSNo, intLP)) = 0 And (intSP < intLP / 1.5 Or intSP > intLP * ElseIf Len(LPCheck(strMLSNo, intLP)) = 0 And (intSP < intLP / 1.5 Or intSP > intLP *

1.5) Then1.5) Then SPCheck = strMLSNo & " : LP / SP EXCEEDS NORM = " & intLP & " / " & intSPSPCheck = strMLSNo & " : LP / SP EXCEEDS NORM = " & intLP & " / " & intSP

End IfEnd If

Page 19: MINING REAL ESTATE LISTINGS USING ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSION Wuri Wedyawati, Meiliu Lu Department of Computer Science California.

1. Transformation and 1. Transformation and Cleansing 1 (cont.)Cleansing 1 (cont.)

Sold Date CheckSold Date Check If strPD = "0000-00-00" Or Len(strPD) < 8 ThenIf strPD = "0000-00-00" Or Len(strPD) < 8 Then PDCheck = strMLSNo & " : INVALID PD = " & strPDPDCheck = strMLSNo & " : INVALID PD = " & strPD ElseIf Len(LDCheck(strMLSNo, strLD)) = 0 And DateValue(strPD) < ElseIf Len(LDCheck(strMLSNo, strLD)) = 0 And DateValue(strPD) <

DateValue(strLD) ThenDateValue(strLD) Then PDCheck = strMLSNo & " : PD IS LESS THAN LD => PD = " & strPD PDCheck = strMLSNo & " : PD IS LESS THAN LD => PD = " & strPD

& " & & " & LD = " & strLDLD = " & strLD ElseIf DateValue(strPD) < "1990-01-01" ThenElseIf DateValue(strPD) < "1990-01-01" Then PDCheck = strMLSNo & " : PD EXCEEDS LIMIT = " & strPD PDCheck = strMLSNo & " : PD EXCEEDS LIMIT = " & strPD

End IfEnd If

Sold Price CheckSold Price Check If (LDCheck(strMLSNo, strLD)) = 0 And Len(PDCheck(strMLSNo, If (LDCheck(strMLSNo, strLD)) = 0 And Len(PDCheck(strMLSNo,

strPD,strPD,strLD)) = 0 And DateDiff(DateInterval.Day, DateValue(strPD), strLD)) = 0 And DateDiff(DateInterval.Day, DateValue(strPD), DateValue(strLD)) > 730 ThenDateValue(strLD)) > 730 Then

DOMCheck = strMLSNo & " : DOM TOO LARGE = " &DOMCheck = strMLSNo & " : DOM TOO LARGE = " & DateDiff(DateInterval.Day, DateValue(strPD), DateDiff(DateInterval.Day, DateValue(strPD),

DateValue(strLD))DateValue(strLD))End IfEnd If

Page 20: MINING REAL ESTATE LISTINGS USING ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSION Wuri Wedyawati, Meiliu Lu Department of Computer Science California.

2. Update Process for the 2. Update Process for the Result of Transformation and Result of Transformation and

Cleansing 1Cleansing 1 132110169 : LP EXCEEDS LIMIT = 132110169 : LP EXCEEDS LIMIT = 132132 (132000) (132000) 30015346 : SQFT EXCEEDS LIMIT = 30015346 : SQFT EXCEEDS LIMIT = 1270012700 (1270) (1270) 30015611 : LD EXCEEDS LIMIT = 30015611 : LD EXCEEDS LIMIT = 1920-05-071920-05-07 (2000-05-07) (2000-05-07) 30015755 : NO FULL BATHROOM = 30015755 : NO FULL BATHROOM = 0 AND 30 AND 3 (3 AND 0) (3 AND 0) 102100090 : INVALID YEAR BUILT = 102100090 : INVALID YEAR BUILT = 9696 (1996) (1996) 30028591 : INVALID YEAR BUILT = 30028591 : INVALID YEAR BUILT = 10561056 (1956) (1956) 102000035 : PD IS LESS THAN LD => PD = 2000-03-30 & 102000035 : PD IS LESS THAN LD => PD = 2000-03-30 &

LD = LD = 2020-01-262020-01-26 (2000-01-26) (2000-01-26) 102000035 : SD / LD = 2000-05-31 / 102000035 : SD / LD = 2000-05-31 / 2020-01-262020-01-26 (2000- (2000-

01-26)01-26) 122003643 : SD / PD = 122003643 : SD / PD = 2000-11-012000-11-01 / / 2000-11-052000-11-05 (2000- (2000-

11-05 / 2000-11-01)11-05 / 2000-11-01) 132001727 : SP EXCEEDS LIMIT = 132001727 : SP EXCEEDS LIMIT = 124124 (124000) (124000) 30016715 : LP / SP EXCEEDS NORM = 226000 / 30016715 : LP / SP EXCEEDS NORM = 226000 / 2260000022600000

(226000)(226000)

Page 21: MINING REAL ESTATE LISTINGS USING ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSION Wuri Wedyawati, Meiliu Lu Department of Computer Science California.

3. Transformation and 3. Transformation and Cleansing 2Cleansing 2

The agents table (“AGTSRC.TXT”) fields:The agents table (“AGTSRC.TXT”) fields:Listing agent id, Listing agent name, Listing agent Listing agent id, Listing agent name, Listing agent phone 1, Listing agent phone 2, Listing agent phone 3, phone 1, Listing agent phone 2, Listing agent phone 3, Listing agent phone type 1, Listing agent phone type 2, Listing agent phone type 1, Listing agent phone type 2, Listing agent phone type 3, Listing office id, Listing co-Listing agent phone type 3, Listing office id, Listing co-agent id, Listing co-agent name, Listing co-agent phone, agent id, Listing co-agent name, Listing co-agent phone, Listing co-office id, Selling agent id, Selling agent name, Listing co-office id, Selling agent id, Selling agent name, Selling agent phone, Selling office id, Selling co-agent Selling agent phone, Selling office id, Selling co-agent id, Selling co-agent name, Selling co-office id.id, Selling co-agent name, Selling co-office id.

Example:Example: ““SREIDMAR|Marjorie Reid|916-485-5124|916-485-SREIDMAR|Marjorie Reid|916-485-5124|916-485-

5124||1|2||LYON01”5124||1|2||LYON01”

Page 22: MINING REAL ESTATE LISTINGS USING ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSION Wuri Wedyawati, Meiliu Lu Department of Computer Science California.

3.3. Transformation and Cleansing 2 Transformation and Cleansing 2 (cont.)(cont.)

The offices table (“OFCSRC.TXT”) fields:The offices table (“OFCSRC.TXT”) fields:Listing office id, Listing office name, Listing office phone, Listing office id, Listing office name, Listing office phone, Listing office address, Listing office zip, Listing co-office id, Listing office address, Listing office zip, Listing co-office id, Listing co-office name, Listing co-office phone, Selling office Listing co-office name, Listing co-office phone, Selling office id, Selling office name, Selling office phone, Selling co-id, Selling office name, Selling office phone, Selling co-office id, Selling co-office name.office id, Selling co-office name.

Example:Example:““LYON01|Lyon Real Estate|916-481-3840|2580 Fair LYON01|Lyon Real Estate|916-481-3840|2580 Fair

Oaks Oaks Blvd. #20 Sacramento, CA 95825|95825|Blvd. #20 Sacramento, CA 95825|95825|Sacramento”Sacramento”

Page 23: MINING REAL ESTATE LISTINGS USING ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSION Wuri Wedyawati, Meiliu Lu Department of Computer Science California.

3. Transformation and Cleansing 2 3. Transformation and Cleansing 2 (cont.)(cont.)

The areas table (“AREA.TXT”) fields:The areas table (“AREA.TXT”) fields:Area number, area name, county.Area number, area name, county.

Example:Example:““10819|East Sacramento & Vicinity|Sacramento 10819|East Sacramento & Vicinity|Sacramento

County”County”

Page 24: MINING REAL ESTATE LISTINGS USING ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSION Wuri Wedyawati, Meiliu Lu Department of Computer Science California.

3. Transformation and Cleansing 2 3. Transformation and Cleansing 2 (cont.)(cont.)

The residential table (“RESIDENTIAL.TXT”) fields:The residential table (“RESIDENTIAL.TXT”) fields:

Example:Example:

““15501835|2367|Glen Ellen|95822|Sacramento|1082215501835|2367|Glen Ellen|95822|Sacramento|10822

||Thomas Bros. (PL,SA)|317 D-5|035-0132-012-0000|||Thomas Bros. (PL,SA)|317 D-5|035-0132-012-0000|Residential|Sold|22-Jan-95|28-Jan-95|20-Apr-99|01-Residential|Sold|22-Jan-95|28-Jan-95|20-Apr-99|01-Jan-00|20-May-99|01-Jan-00|1549|1700Jan-00|20-May-99|01-Jan-00|1549|1700

|764705.9|17.56|2|1|4|130000|159500|SGREENCA|764705.9|17.56|2|1|4|130000|159500|SGREENCA

||GCNA||130000|SSTANWIL||CLBA20||1959|3|FHA||GCNA||130000|SSTANWIL||CLBA20||1959|3|FHA

|Sacramento|Sacramento Unified|Sacramento Unified||Sacramento|Sacramento Unified|Sacramento Unified|Sacramento Unified”Sacramento Unified”

Page 25: MINING REAL ESTATE LISTINGS USING ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSION Wuri Wedyawati, Meiliu Lu Department of Computer Science California.

4. Duplication and Detection for 4. Duplication and Detection for Agent and Office RecordsAgent and Office Records

““AGTSRC.TXT” contains duplicate records.AGTSRC.TXT” contains duplicate records.An agent can be a selling agent, a buyer agent, or both in a An agent can be a selling agent, a buyer agent, or both in a listing. An agent can have more than one listing in “RES.TXT”.listing. An agent can have more than one listing in “RES.TXT”.Example: Example:

““SAKBARIR|Rouhi N. Akbari|916-484-5456|916-223-SAKBARIR|Rouhi N. Akbari|916-484-5456|916-223-7647||1|C||LYON01” 7647||1|C||LYON01”

““OFCSRC.TXT” contains duplicate records. OFCSRC.TXT” contains duplicate records. An office can be a selling office, a buyer office, or both in a An office can be a selling office, a buyer office, or both in a listing. An office can have more than one listing in “RES.TXT”. listing. An office can have more than one listing in “RES.TXT”. Example: Example:

““LYON01|Lyon Real Estate|916-481-3840|2580 Fair OaksLYON01|Lyon Real Estate|916-481-3840|2580 Fair OaksBlvd. #20 Sacramento, CA 95825|95825|Sacramento”Blvd. #20 Sacramento, CA 95825|95825|Sacramento”

Page 26: MINING REAL ESTATE LISTINGS USING ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSION Wuri Wedyawati, Meiliu Lu Department of Computer Science California.

MasterDW Modeling: MasterDW Modeling: Ready to load the clean data into the 4 Ready to load the clean data into the 4

tablestables

RESIDENTIAL Fact Table

OFFICESDimension

Table

AGENTSDimension

Table

AREASDimension

Table

Page 27: MINING REAL ESTATE LISTINGS USING ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSION Wuri Wedyawati, Meiliu Lu Department of Computer Science California.

MasterDW TransportMasterDW Transport Load “AREA.TXT” to AREAS dimension table Load “AREA.TXT” to AREAS dimension table

c:\>sqlldr masterdw/masterdw control=area.ctl c:\>sqlldr masterdw/masterdw control=area.ctl log=area.loglog=area.log

Load “OFFICE.TXT” to OFFICE dimension table Load “OFFICE.TXT” to OFFICE dimension table c:\>sqlldr masterdw/masterdw control=office.ctl c:\>sqlldr masterdw/masterdw control=office.ctl log=office.loglog=office.log

Load “AGENT.TXT” to AGENTS dimension table Load “AGENT.TXT” to AGENTS dimension table c:\>sqlldr masterdw/masterdw control=agent.ctl c:\>sqlldr masterdw/masterdw control=agent.ctl

log=agent.loglog=agent.log

Load “RESIDENTIAL.TXT” to RESIDENTIAL dimension table Load “RESIDENTIAL.TXT” to RESIDENTIAL dimension table c:\>sqlldr masterdw/masterdw control=residential.ctl c:\>sqlldr masterdw/masterdw control=residential.ctl log=residential.loglog=residential.log

Page 28: MINING REAL ESTATE LISTINGS USING ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSION Wuri Wedyawati, Meiliu Lu Department of Computer Science California.

Predictive RegressionPredictive Regression

PredictivePredictive regressionregression is regression that uses continuous is regression that uses continuous values in the data set to predict unknown or future values in the data set to predict unknown or future values of other variables of interest. values of other variables of interest.

The objective of regression analysis is to determine the The objective of regression analysis is to determine the best model that can relate the output variable to best model that can relate the output variable to various input variables. various input variables.

nn nn

ββ = [ ∑ (x = [ ∑ (xii – mean – meanxx) . (y) . (yii – mean – meanyy) ] / [ ∑ (x) ] / [ ∑ (xii – mean – meanxx))22 ] ] i=1i=1 i=1i=1

αα = mean = meanyy - β .mean - β .meanxx

y = y = αα + + ββ.x.x

Page 29: MINING REAL ESTATE LISTINGS USING ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSION Wuri Wedyawati, Meiliu Lu Department of Computer Science California.

Regression: input and Regression: input and outputoutput

Input Data: X, Input Data: X, ββ, , αα to be determined by to be determined by query selection result from MasterDW query selection result from MasterDW based on user request parametersbased on user request parameters

example:example:   ““Select * from Residential where (Status = ‘Sold’) and Select * from Residential where (Status = ‘Sold’) and

(Area_Number) = ‘10835’ and (Square_Footage between ‘2000’ (Area_Number) = ‘10835’ and (Square_Footage between ‘2000’ and ‘3000’) and (Bedrooms = ‘4’) and (Bathrooms_Full = ‘2’) and and ‘3000’) and (Bedrooms = ‘4’) and (Bathrooms_Full = ‘2’) and (Bathrooms_Half = ‘0’) and (Year_Built = ‘2001’)”(Bathrooms_Half = ‘0’) and (Year_Built = ‘2001’)”

Assumption: Bull housing marketAssumption: Bull housing market Output result: Y predicted house price Output result: Y predicted house price

todaytoday

Page 30: MINING REAL ESTATE LISTINGS USING ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSION Wuri Wedyawati, Meiliu Lu Department of Computer Science California.

InterfaceInterface

Interface:Interface: (user => MasterDW => Predictive Regression => user)(user => MasterDW => Predictive Regression => user)

Visual Basic .NET is used to create Visual Basic .NET is used to create user interface.user interface.

The communication between Oracle The communication between Oracle and .NET framework is established by and .NET framework is established by adding Oracle Provider for OLE DB adding Oracle Provider for OLE DB (OraOLEDB) component as reference. (OraOLEDB) component as reference.

Page 31: MINING REAL ESTATE LISTINGS USING ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSION Wuri Wedyawati, Meiliu Lu Department of Computer Science California.
Page 32: MINING REAL ESTATE LISTINGS USING ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSION Wuri Wedyawati, Meiliu Lu Department of Computer Science California.

ConclusionConclusion understand the knowledge domainunderstand the knowledge domain: Real : Real

estate terms and transaction processestate terms and transaction process TechnologyTechnology used: used:

Building a data warehouse using Oracle Building a data warehouse using Oracle data warehousing toolsdata warehousing tools

Statistical data analysis (predictive Statistical data analysis (predictive regression method)regression method)

Visual Basic .NET programmingVisual Basic .NET programming Oracle Provider for OLE DB (ORAOLEDB)Oracle Provider for OLE DB (ORAOLEDB)

Page 33: MINING REAL ESTATE LISTINGS USING ORACLE DATA WAREHOUSING AND PREDICTIVE REGRESSION Wuri Wedyawati, Meiliu Lu Department of Computer Science California.

Future WorkFuture Work

Towards Towards tightly couplingtightly coupling data data mining architecture.mining architecture.

Enhance this project by making it an Enhance this project by making it an onlineonline service for public. service for public.

Integrate Integrate current market trendcurrent market trend factorfactor

Determine what kind of house Determine what kind of house improvement that a real estate seller improvement that a real estate seller can do to can do to increase property valueincrease property value on the market.on the market.