Top Banner
JIGSAW ANALYTICS CONTEST (PROPERTY RECOMMENDATION FOR INVESTMENT) Parindsheel S. Dhillon
20

Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation - Parindsheel Dhillon

May 25, 2015

Download

Data & Analytics

Jigsaw Academy
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation  - Parindsheel Dhillon

JIGSAW ANALYTICS CONTEST(PROPERTY RECOMMENDATION FOR INVESTMENT)

Parindsheel S. Dhillon

Page 2: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation  - Parindsheel Dhillon

2

RECOMMEND PROPERTY FOR INVESTMENTProject Approach Data preprocessing - data transformation & outlier handling

based on industry historical data Key Performance Indicator selection - Rent yield, Income to

Rent Ratio & Population of housing units based on multi-co-linearity & noise reduction

Data modeling through unsupervised learning by Classification with use of k means clustering technique

Cluster profiling led to investment ranking recommendation with scale from 1 to 5 for strategic investment selection

Page 3: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation  - Parindsheel Dhillon

3

SWOT ANALYSIS OF ADOPTED CLUSTERING MODEL

STRENGTH• Optimized number of clusters (7) by plotting of no. of clusters v/s within sum of squared distances• High ratio of Between SS / Total SS = 87% & 80% • Good cluster profiling with varied investor choicesa) High population size places with high yield and good scope of

increasing rentsb) Medium population size places with high yield & no scope of

increasing rentsc) Medium population size places with relatively ok yield & high

scope of increasing rents d) Small size population places with high yield & some scope of

increasing rents

WEAKNESS• Clusters may have few property areas with the different characteristics, further classification may be required for the final investment decision• There is a possibility of increase in the yield because of reduction in the property prices resulting in the probable wrong conclusions• Yield could be high due to the location with the heavy concentration of the housing commission homes for higher rents in comparison with the property prices

OPPORTUNITY• Scope of adding more variables e.g. distress areas factor, climate risk factors etc• Addition of another variable for comparison to history rates could counter the reduction in property prices problem leading to the rise in yield• Geographical Heat map can be used for segmenting & locating presence of many markets in suburbs. Longitude & Latitude data is required for this activity.• Resource optimization can be further implemented to finalize investment e.g. capital investment budget optimization

THREAT• Presence of any historical housing data can provide a measure of imminent property bubble. But in our dataset, there is no scope of identifying such catastrophe, which is certainly what investors will be interested in to check before investing. • Strong dependence on the median prices & rent yield could lead to anomalies especially in areas where many housing markets in single suburb. It could lead to a wrong investment criteria.

Page 4: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation  - Parindsheel Dhillon

4

CLUSTER VISUALIZATION

3 D cluster snapshot visualization using RGL package

Principal Component Analysis

Adjusted Box plots comparisons of clusters for KPIs

Page 5: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation  - Parindsheel Dhillon

5

RECOMMENDATIONS Investor strategic alignment with property is the most important

aspect to consider rather building property portfolio based on scale suggested by clustering model

Clustering can not absorb uniqueness of each property. It classify properties into clusters based on property characteristics. Further cluster refinement will add value to investment decisions

Additional variables such as risks, depressed area, historical housing price, longitude & latitude need to be added to data set to fine tune the clusters & bring more insights

Page 6: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation  - Parindsheel Dhillon

DETAILED SUMMARY OF ANALYSIS

Page 7: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation  - Parindsheel Dhillon

SYNOPSIS Objective– Recommend properties/places to investors

Property valuation approach– Data Pre-processing– Analytics KPI selection for Data modeling• Multi-co-linearity & Noise reduction by skipping highly co-

related independent variables for analysis– Data modeling through unsupervised learning by Classification

with use of k means clustering technique– Recommendation based on clusters characteristics

Page 8: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation  - Parindsheel Dhillon

METHODOLOGYData Pre-processing

– Excel data pre-processing• Zip code observation alteration in excel• Separate data set saving for individual analysis • $ and , sign removal in excel from variable values

– R data pre-processing• Scaling/ Normalizing/ Standardizing Data

– Various methods tried such as scaling with mean=0 and std dev=1 – Reducing prices and population values by dividing with standard value (e.g. 10000) – Based on between sum of sq/total sum of sq & characteristics of clusters, data processing has

been finalized. The above ratio varies from 80 to 87%. – Achieved best ratio with data where reduction of values by dividing is done. Moreover, by this

method, Interpretation is easy as compare to scaling. • Inversion of Rent/Income to Income/Rent

– Data scaling & better data comparison, understanding • Data type conversion for state and place Type to numeric• Handling data anomalies

– Based on historical data of USA rent yield and rent to income variables, imputation done on impossible values appendix (i)

– Multivariate imputation by chained equations (MICE) for values» Rent yield >20%» Rent/Income >30%

Page 9: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation  - Parindsheel Dhillon

METHODOLOGY (CONT.)• Key performance indicator selection for modeling

– Zip Dataset• Rent Yield• Median Income to Median Rent• Population in occupied housing units

– Place Dataset• In addition to above variables in zip Dataset

– Place Type as numeric– State as numeric

• Highly Co-related variables skipped to reduce multi-co-linearity and noise addition

– Median Rent, median income and median value of property (variables effect already covered in yield & income to rent)

– Total Population( variable effect marginally covered in population in occupied housing)

Page 10: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation  - Parindsheel Dhillon

METHODOLOGY (CONT.)• Data analysis– Classification of given unsupervised dataset done by using

k means clustering• K value optimization by use of graphical analysis

against within sum of square value• Cluster selection based on visualization and

characteristics– Adjusted Box plot of yield, income/rent and

population variable for all clusters comparison Appendix(ii)

– Range, mean centers and other statistical characteristics of cluster comparison

Page 11: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation  - Parindsheel Dhillon

PROJECT SUMMARYBased on rent yield, income/rent ratio & population, various properties

have been grouped in clusters Between sum of squares to total sum of square has been

maximized while focusing on cluster characteristics Every property is unique, clusters can provide foundation to

property selection for investment purposes based on KPI for property valuation

Based on investor requirements, property from strategic aligned clusters can be chosen

Additionally Clusters refining using sub-setting will help to get desired characteristics property

Page 12: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation  - Parindsheel Dhillon

RECOMMENDATION Tables in next slides will show various types of options available for

investment purposes. E.g. Clusters with high yield and medium size population having less

scope of increase in rents(low income/rent) can attract those investors who are willing to invest in property which is already giving good return of investment, although there is no additional scope of increase in rents(however property price can be used)

In second example, we can talk about cluster having high populated areas & high scope of increasing rents with current yield as marginally good (if not best), investor looking with future high return can invest in such property

As mentioned above, various categories for various clusters have been recommended in next slides for investment purposes

Ranking for investment has been done. Scales of 1 to 5 has been given. 1 as best attractive property to invest and 5 as least attractive property.

However there is no strict rule, as it depends purely on investor strategic decisions to invest.

Page 13: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation  - Parindsheel Dhillon

ZIP DATASET CLUSTER ANALYSISType of property Zip Dataset –cluster

number(no of properties)

Investing Rank Preference(1 to 5)1- highly recommended5- least recommended

High yield, but less scope of increasing rent, medium size of population

Cluster no 3 (No of properties - 3218)

Scale 1

High yield, no scope of increasing rent, smallest population

Cluster no -2(No of properties - 1100)

Scale 5

Relatively good yield with some scope of increase in rents, smaller size Population

Cluster no -6(No of properties - 5982)

Scale 4

Relatively ok yield with little scope of increasing rents, high population

Cluster no -1(No of properties - 2107)

Scale 2

Yield little lower side, but very high scope of increase in rent, medium size of population

Cluster no -5 (No of properties - 7831)

Scale 3

Page 14: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation  - Parindsheel Dhillon

PLACE DATASET CLUSTERS ANALYSISType of property Place Dataset –cluster

number(no of places)

Investing Rank Preference(1 to 5)1- highly recommended5- least recommended

high yield, high scope of increasing rent, medium size population

Cluster 5 (No of places - 305)

Scale 1

Good yield, good scope of increasing rent, smaller size population

Cluster 4 (No of places - 219)

Scale 4

High yield, with high scope of increasing rent, smallest size of population

Cluster 7 (No of places - 200)

Scale 5

High yield, no scope of increasing rent, medium size population

Cluster 2(No of places - 156)

Scale 2

Relative ok yield, some scope of increasing rents, large size population

Cluster 3(No of places 272)

Scale 3

No strict scale rule, as it depends purely on investor strategic decisions to invest These clusters can be refined to get better feel of properties by various means such as sub-setting. Moreover depressed areas like flint and Detroit are in cluster 4, such factors need to be re-checked as there was no variable assigned to them. Further analysis can be done on chosen cluster to get property as per investor requirements.

Page 15: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation  - Parindsheel Dhillon

MORE RECOMMENDATION Integer optimization in conjunction with clustering to utilize resources efficiently

Based on constraints such as investment budget, we can optimize various property investments along with investor personalization.

Geographical heat maps based on zip code for property investment recommendation could be good option

Addition of another variables to dataset Variable for distressing areas can be added into the dataset rather

looking individually after clustering Risk variable to be added in future (e.g. Typhoon prone area) There is possibility of increase in yield may be due to reduction in

property prices due to some reason. Comparison to history rates is necessary in that case by adding another variable in dataset

Things to check before finalizing investment Yield is calculated using median rent and median prices. Both variables are

highly susceptible to statistical anomaly especially where many housing markets in single suburb

Yield could be high due to location with heavy concentration of housing commission homes for higher rent in comparison with property price

Page 16: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation  - Parindsheel Dhillon

BOTTOM LINE Each property is unique with unique characteristics

Clustering can help to figure out the various groups for investment, Refinement will be advantageous before finalizing property for investment

Additional factors such as risk, depressed area etc need to be considered in addition to some risks mentioned in last slide

Investor strategic alignment with property is the most important aspect to consider rather scale of property provided.

Page 17: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation  - Parindsheel Dhillon

APPENDIX(I)• Historical values of KPI for outlier removal – http://www.realestateanalysisfree.com/blog/real-estate-a

nalysis/price-to-rent-ratio-rental-yield-of-all-us-states– http://seattlebubble.com/blog/2013/03/29/top-30-cities-p

rice-to-rent-price-to-income-ratios-2011/

Page 18: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation  - Parindsheel Dhillon

CLUSTERS COMPARISON BASED ON KPI BY USING ADJUSTED BOX PLOT FOR ZIP DATA (APPENDIX II)

Page 19: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation  - Parindsheel Dhillon

APPENDIX (III)GRAPHICAL ESTIMATION OF CLUSTERS

& ANALYSIS FOR CLUSTERING

Page 20: Jigsaw Mortgage Dex Data Analysis Competition Winner Presentation  - Parindsheel Dhillon

APPENDIX(IV)CLUSTERS VISUALIZATION CO-RELATION OF VARIABLES