Top Banner

of 15

BR Assignment Report Full

Apr 07, 2018

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/6/2019 BR Assignment Report Full

    1/15

    Housing Price Prediction Model Business Research Assignment

    Full time MBA 2009 Utrecht

    Date of submission: 17 th November 2009

    Word count: 900 words (excluding Appendix)

    FTMBA09, UB Number: 09028224

  • 8/6/2019 BR Assignment Report Full

    2/15

    Business Research Assignment, FTMBA09, UB Number: 09028224

    Table of Contents

    Executive Summary ................................................................................................................................. 1

    Introduction ............................................................................................................................................ 1

    Objective........................................................................................................................................... 1

    Data and Methodology ..................................................................................................................... 1

    Data Analysis ........................................................................................................................................... 1

    Linear Multiple Regression Analysis ....................................................................................................... 2

    Conclusion ............................................................................................................................................... 4

    Recommendations .................................................................................................................................. 4

    Appendix ................................................................................................................................................. 5

    Table of illustrations

    Tables

    Table 1 : Descriptive statistics of each variable from district A and B .................................................... 1

    Table 2 : Correlations table ..................................................................................................................... 2

    Table 3 : Multiple Regression Analysis of Price, H_Size, Age, District, H_Dist and Age_Dist ................. 2

    Charts

    Chart 1 : Box plot of price in district A and B .......................................................................................... 1

    Chart 2 : Histogram of residual value from regression model 3 ............................................................. 3

    Chart 3 : Residual value plot against predicted value from regression model 3 .................................... 3

    Chart 4 : Scatter plot, Y axis = Price, X axis = H_Size, Z axis = Age separated by district ....................... 4

  • 8/6/2019 BR Assignment Report Full

    3/15

    Housing Price Prediction Model November 17, 2009

    Real Estate Association 1

    Executives summary

    This report has developed a reliable housing price

    prediction model to forecast the selling price in District A

    and B by using linear multiple regression technique. Our

    model can explain 88.6% of total variation in price within

    the relevant range of house size and age of house.

    Introduction

    Objective

    To develop a regression model as a tool for predicting

    the selling price of resident properties in both districts in

    the city

    Data and Methodology

    Several real estate agents and property assessors were

    interviewed in order to identify what the major

    explanatory variables are that might affect the price of

    properties. The following independent variables were

    considered:

    Quantitative variables: H_Size (House size in square feet), L_Size (Lot size in acres), Age (House age in years),

    Attract (An attractiveness rating of the property ranging

    from 0 to 100, the higher the better), P_Tax (Property

    tax of the prior year in dollars), N_Rooms (Number of

    bedrooms in the house)

    Qualitative variable: District (The district in the city: 0

    for district A, 1 for district B)

    The data consists of 625 properties sold in the past 3

    months. We used linear multiple regression by adding

    the dummy variable (District ) and interaction terms

    (H_Dist : H_Size*District, Age_Dist : Age*District)

    technique to find out the forecasting model that give the

    most suitable relationship between independent

    variables and Price (dependent variable) in each district.

    Data Analysis

    Chart 1: Box plot of price in district A and B

    The median of property price in district B is higher than

    that of in district A and there are no outliers. This means

    that the price data are reliable.

    District A (District = 0)

    District B (District = 1)

    Table 1: Descriptive statistics of each variable from district A and B

  • 8/6/2019 BR Assignment Report Full

    4/15

    Housing Price Prediction Model November 17, 2009

    Real Estate Association

    Table 1 shows that the average of the housing price in

    district B (USD 453,980.94) is more expensive than that

    in district A (USD 226,174.77). Consequently, average

    property

    tax

    in

    district

    B

    is

    more

    expensive

    than

    that

    in

    district A (USD 5,300.90 in district B and USD 1,655.65 in

    district A). Additionally, average house size and lot size

    in district B are 4,055.05 square feet and 1.4568 acres

    respectively, bigger than those in district A, which are

    2,032.47 square feet and 0.6608 acres respectively. The

    average age of a house in district B is 47.28 years, older

    than in district A, which is 12.57 years. Average

    attractiveness and number of bedrooms in both districts

    are not significantly different.

    2

    Prce 50215.092 94.322H_Size 1241.796Age

    (0.000) (0.000) (0.000)

    6.994H_Dist 1087.526Age_Dist

    (0.023) (0.001)

    Std. Error of the Estimate = 46669.901

    Table 2: Correlations table

    As shown in Table 2, Attract and N_rooms have no

    significant relationship to Price. However, H_Size,

    L_Size, Age and P_Tax have a significant relationship

    between each other. This means that there is multi

    collinearity between independent variables.

    Linear Multiple Regression Analysis

    Table 3: Multiple Regression Analysis of Price, H_Size, Age, District, H_Dist and Age_Dist

    Model 1:

    = 0.886, Adjusted = 0.885

  • 8/6/2019 BR Assignment Report Full

    5/15

    Housing Price Prediction Model November 17, 2009

    Real Estate Association 3

    odel 2:

    88.970 95.212H_Size 1211.297Age

    (0.000) (0.000) (0.000)

    8District

    (0.004) (0.646)

    = 0.886, Adjusted = 0.885

    Model 3:

    Prce 42025.525 98.102H_Size 1212.041Age

    (0.000) (0.000) (0.000)

    e_Dist 22962.084District

    (0.004) (0.031)

    = 0.886, Adjusted = 0.885

    Std. Error of the Estimate = 46690.584

    odel 1

    has the least Std. Error of the Estimate. However, model

    1 arginality. It is not practical

    includes interaction

    erms but eliminates the main effect from the dummy

    variable. Model 2 dropped because H_Dist and District

    District terms).

    all of p values of coefficient show that all regressors

    have significant effect on Price. Secondly, even though

    VIFs of Age and Age_Dist are more than 10, which mean

    Chart 2: Histogram of residual value from regression model 3

    Chart 3: Residual value plot against predicted value from regression model 3

    M

    Prce 478

    4.856H_Dist 1029.883Age_Dist 8885.61

    (0.384)

    Std. Error of the Estimate = 46699.619

    1025.601Ag

    All models have the same R2and adjusted R2. M

    violates the principle of m

    to stipulate and fit a model that

    t

    terms have 95% potential to have no linear relationship

    with Price (evaluated from p values of coefficients of

    H_Dist and Therefore, model 3 is the

    most appropriate to be our forecasting model. Firstly,

    there is collinearity between them, it is acceptable

    because Age_Dist is interaction term of Age and District.

    Lastly, residual pattern analysis of model 3 shows there

    is

    no

    evidence

    to

    violate

    normality,

    constant

    variance

    and independence of errors assumptions.(See chart 2, 3)

  • 8/6/2019 BR Assignment Report Full

    6/15

  • 8/6/2019 BR Assignment Report Full

    7/15

    Appendix

    1 Define objective

    To develop a regression model as a tool for predicting the selling price of resident properties in both

    districts in the city

    2 Specify model

    Using linear multiple regression model (1 dependent variable and many independent variables)

    3 Collect data

    The data consists of 625 properties sold in the past 3 months both in District A and District B.

    3.1 Dependent variable

    Price (House selling price in USD)

    3.2 Initial independent variables

    Quantitative

    H_Size (House size in square feet)

    L_Size (Lot size in acres)

    Age (House age in years)

    Attract (An attractiveness rating of the property ranging from 0 to 100, the higher the

    better)

    P_Tax (Property tax of the prior year in dollars)

    N_Rooms (Number of bedrooms in the house)

    Qualitative

    District (The district in the city: 0 for district A, 1 for district B)

    Page | 5

  • 8/6/2019 BR Assignment Report Full

    8/15

    4 Descriptive Data Analysis

    Figure 1: Box plot of Price

    There are no outliers data in Price. Median of price in District B is more expensive than that in

    District A

    H_Size

    L_Size Age

    Attract

    P_Tax N_Rooms

    Figure 2: Box plot of all quantitative variables

    There are no outliers data in any of independent variables. H_Size, L_Size, Age and P_Tax have the

    same pattern of box plot.

    Page | 6

  • 8/6/2019 BR Assignment Report Full

    9/15

    District A (District = 0)

    District B (District = 1)

    Figure 3: Descriptive statistics of each variable from district A and B

    The average of the housing price in district B (USD 453,980.94) is more expensive than that in district

    A (USD 226,174.77). Consequently, average property tax in district B is more expensive than that in

    district A (USD 5,300.90 in district A and USD1,655.65 in district B). Additionally, average house size

    and lot size in district B are 4,055.05 square feet and 1.4568 acres respectively, bigger than those in

    district A, which are 2,032.47 square feet and 0.6608 acres respectively. The average age of a house

    in district B is 47.28 years, older than in district A, which is 12.57 years. Average attractiveness and

    number of bedrooms in both districts are not significantly different.

    Page | 7

  • 8/6/2019 BR Assignment Report Full

    10/15

    Figure 4: Correlation between each of variables in both districts

    Attract and N_rooms have no significant relationship to Price. However, H_Size, L_Size, Age and

    P_Tax have a significant relationship between each other. This means that there is multi collinearity

    between independent variables.

    Page | 8

  • 8/6/2019 BR Assignment Report Full

    11/15

    5 Estimate unknown parameter and Evaluate model

    We have one independent quantitative variable that is District. Therefore, we add District as dummy

    variable into linear multiple regression model, created 2 interaction terms (H_Dist : H_Size*District , Age_Dist : Age*District). Attract and N_Rooms are eliminated because they are no relationship to

    Price (from correlation analysis). We decide to not adding P_Tax because it is necessary to know the

    price before we pay the tax that means it is not suitable to add it in price prediction model.

    Figure 5: Statistic results from SPSS

    (

    Page | 9

  • 8/6/2019 BR Assignment Report Full

    12/15

    R2 = 0.886, Adjusted R2 = 0.885 and Standard Error of the Estimate = 46733.169

    F test (Overall test)

    : 0

    F = 799.862, p value = 0.000 which is less than 0.05 (95% confident interval)

    tly linear relationship

    etween independent variables and dependent variable.

    We reject null hypothesis. We are 95% confident that there are significan

    b

    T test (Individual test)

    We fail to reject 0 : 2 0 , t = 0.334, p value = 0.739 which is more than 0.05. We are 95%

    no significantly linear relationship between L_Size and Price.

    L_Size fro

    confident that there is

    Now, we eliminate m the initial model

    Page | 10

  • 8/6/2019 BR Assignment Report Full

    13/15

    Figure 6: Statistic results from SPSS

    R2 = 0.886, Adjusted R2 = 0.885 and Standard Error of the Estimate = 46699.169

    F test (Overall test)

    : 0

    F = 961.192, p value = 0.000 which is less than 0.05 (95% confident interval)

    e reject null hypothesis. We are 95% confident that there are significantly linear relationship

    T test (Individual test)

    Wbetween independent variables and dependent variable.

    We fail to reject 0 : 4 0 , t = 0.872, p value = 0.384 which is more than 0.05. We are 95%

    re is no sign

    though we fail to reject 0 : 3 0 , t = 0.460, p value = 0.646 which is more than 0.05, we

    keep it in our model because of it is dummy variable. It is not practical to stipulate and fit a model

    but elimina

    confident that the ificantly linear relationship between H_Dist and Price.

    Even

    that includes interaction terms tes the main effect from the dummy variable.

    ow, we eliminate H_Dist from the model N

    Page | 11

  • 8/6/2019 BR Assignment Report Full

    14/15

    Figure 7: Statistic results from SPSS

    R2 = 0.886, Adjusted R2 = 0.885 and tandard Error of the Estimate = 46690.584

    F test (Overall test)

    S

    : 0

    F = 1201.765, p value = 0.000 which is less than 0.05 (95% confident interval)

    We reject null hypothesis. We are 95% confident that there are significantly linear relationship

    between independent variables and variable. dependent

    Page | 12

  • 8/6/2019 BR Assignment Report Full

    15/15

    Page | 13

    T test (Individual test)

    We reject all null hypotheses ( 0 : 0 0 , 0 : 1 0, 0 : 2 0, 0 : 3 0, 0 : 4 0 ).

    p values are more than 0.05. We are 95% confident that there are significantly linear relationship

    etween each independent variables and Price.

    6 Prediction model

    1Age 1025.601Age_Dist 22962.084District

    A (District = 0):

    42025.525 98.102 _ 1212.041

    Prediction equation for District B (District = 1):

    64987.609 98.102 _ 186.44

    Allb

    Prce 42025.525 98.102H_Size 1212.04

    Prediction equation for District