Application of Statistics in Human Geography --The influence of geographic position on urban development C-level essay in Statistics, Fall 2006 Department of Economics and Society, Dalarna University Authors Boyuan Zhao Hao Luo Supervisor Johan Bring Date January, 2006
26
Embed
Application of Statistics in Human Geography · Application of Statistics in Human Geography --The influence of geographic position on urban development . C-level essay in Statistics,
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Application of Statistics in Human
Geography
--The influence of geographic position on urban development
C-level essay in Statistics, Fall 2006
Department of Economics and Society, Dalarna University
Authors Boyuan Zhao Hao Luo
Supervisor Johan Bring
Date January, 2006
Abstract
In this essay, basing the data of “China Urban Statistics Year Book 2005” and the data
of Chinese cities’ longitude and latitude, we analyze the relationship between the
urban social geography and the development degree of cities. Various methods of
statistics were applied, such as descriptive statistics, classification, etc.
According to the analysis result, selected Chinese cities are divided into 10 classes
and their development situation is studied so that the individuality and the
commonness of each class could be revealed.
After that, an in-depth analysis is made on the first class and attention is paid to three
aspects: population, resources, and environment. To make further study, we analyze
the situation of urban population, construction, traffic, water supply, and greening.
In this essay, we use two methods of classification of SPSS, k-means method and
hierarchical cluster method. At the same time, we compare two results come from
geography data and economical data.
In the end, we hope this essay could give a help and some revelation to Chinese
urbanization and regionalization.
- 2 -
Keywords:
Urban Development Descriptive Statistics Cross Table Geographic Location
Urban Management Cluster Analysis Human Geography Correlation
Analysis
Part I
Introduction
In B.C.334, Alexander the Great led his army south across the sea, then east towards
the Persian Empire. Geographer Nearcnos went along with the troops, gathering
necessary information for a “World Map”. He noticed that along the marching routes,
from west to east, the changing of seasons and the sunshine durations were nearly the
same. The geographer made an important contribution: for the first time in history, he
drew a latitude line on the map. This latitude line started from the Strait of Gibraltar
went along the Himalayas and reached the Pacific Ocean. Alexander’s empire
collapsed very soon, however in the Egyptian city which named after Alexander the
Great, a well-known museum was founded. The old curator Eratosthenes was learned
and has mastery of mathematics, astronomy and geography. Through calculation he
noted that the circumference of the earth is 46250km and drew a world map with 6
latitude lines and 7 longitude lines on it. From then on, latitude and longitude were
used to mark locations accurately.
Thereafter, an inseparable relationship was built between geography and statistics
through latitude and longitude lines.
In 21st century, cities are playing a more and more important role in the development
of society and economy. Therefore, good management and quick modernization
become a momentous meaningful issue that citizens generally pay attention to, both
theoretical and practical.
- 3 -
Part II
Description of data
Questions and step-by-step solutions
In this part, the data of the year 2004 selected from “China Urban Statistics Year
Book” and the geographical locations of various cities will be integrated through the
knowledge of latitude and longitude. All these are to be analyzed using statistical
methods and related software.
I. Data gathering and pretreatment
1. Selection of data
This essay focuses on the influence that latitude and longitude factors have on urban
development, therefore the existing sort of cities by province will not be used. What
we need to do is re-classifying the cities into several regions by the latitude and
longitude, so the accurate latitude and longitude information of every city are
necessary.
In addition, we need some indicators that can reflect the level of urban development
to some extent, so the data about the development of these cities should be gathered.
2. Source of data
In order to meet the need of data mentioned above, the analysis of this essay focuses
on the data selected from “China Urban Statistics Year Book 2005” and the latitude
and longitude information of various cities.
3. Structure of data
The process of data reduction has the following steps:
- 4 -
First, select several cities and their tables from “China Urban Statistics Year Book
2005”.
Second, establish a table that contains three columns which are names, latitude and
longitude of the cities.
At last, attach this table to the tables in “China Urban Statistics Year Book 2005” for
analysis.
4. Problems
There are totally 659 cities in “China Urban Statistics Year Book 2005”, however
among these cities, the data of 9 cities could not be obtained accurately. According to
the principle of data analysis we abandon them. Eventually we get the data of 650
cities.
II. Data transformation
Through some basic data transformation, a table containing 53 variables is confirmed.
These variables can be classified into three parts: urban names, latitude and longitude,
urban development indices. (List 1)
Considering the possible influence caused by correlation, we will apply the
correlation analysis to eliminate unnecessary variables, leaving only the most useful
ones. 13 variables are selected to represent urban development and regional advantage
from different aspects. Listed as below:
Urban names, latitude, longitude, province codes, population density, urban
population, urban area, per capita fund for urban construction and maintenance, per
capita public green areas, number of public transportation vehicles per 1000000
population, area of paved roads per 10000 persons, density of paved roads.
Part III
- 5 -
Methods
The main methods used in this essay are cluster methods, such as K-means method
and hierarchical cluster method. Clustering is the classification of similar objects into
different groups, or more precisely, the partitioning of a data set into subsets (clusters),
so that the data in each subset (ideally) share some common trait.
Part Ⅳ
Results
A
Geographical clustering
I. Data analysis
1. Descriptive analysis
A table containing 650 rows and 13 columns is established. The results of descriptive
analysis using SPSS are listed as below:
Table 1:The descriptive statistics of each variable
Descriptive Statistics
N Minimum Maximum Mean Std.DeviationLongitude 650 75.49 133.97 114.2158 8.66988 Latitude 650 18.14 50.80 33.4269 6.95958 Population density 650 25.00 11195.00 2516.0754 2297.10422 Urban population 650 2.42 1289.13 52.2434 102.01179 Urban area 650 10.00 12909.67 600.3818 1239.44341 Fund for urban construction and maintenance per person
648 3.00 14190.00 1010.5015 1213.77229
Number of public transportation vehicles per 1000000 population
Area of paved roads per person 650 1.53 48.73 11.0801 5.45099 Paved roads density 650 0.02 15.78 1.6757 1.64742 Daily living consumption of water per person
649 42.15 803.97 177.3334 86.53371
Per capita public green areas 649 0.25 47.12 7.2153 4.37324 Valid N(listwise) 619
From the table above, we can find maximum, minimum, mean and standard deviation
of 11 variables except urban names and province codes. For instance, it could be seen
that these cities range from 75°56’ 24 E to 133°58’ 12 W and from 18°8’ 4 N to
50°48’ S.
2. Choose the number of groups in classification analysis
Basing on the urban latitude and longitude data, we obtain the following graph using
SPSS, which means the to-be-researched geographic distribution of cities.
Graph 1: The to-be –researched geographic distribution of cities
Dot/Lines show Means
80.00 90.00 100.00 110.00 120.00 130.00
¾-¶È
20.00
30.00
40.00
50.00
γ¶È
VV
V
V
V
V
V
V
V
VV
V
VV
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
VV
V
V
V
V
V
V
VVVV
V
V
V
V
V
VV
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
VV
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
VV
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
VV
V
V
V
V
V
V
V
V
V
V
VV
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
VV
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
VV
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
VV
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
VV
V
V
V
VV
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
VV
V
V
V
V
V
V
V
VV
V
V
V
V
V
V
VV
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
VVV
V
V
V
- 7 -
Because the analysis focuses solely on cities regardless of the provinces they belong
to, we classify the cities by latitude and longitude. Knowing that the amount of data is
tremendous, we use K-means cluster analysis and have to choose the number of
groups.
The formula to decide the number of groups is the following:
Number of groups= (number of observations) - (the stage after which the distance
growth rapidly)
Using the agglomeration schedule of SPSS, we take the distance at the last step
(80218.109) as 100% and then transform all the other values to percentage you will
see that the distance does not exceed 10% up to 641st step. So the stage after which
the distance growth rapidly=640, and the number of groups=650-640=10.