Top Banner
Pattern Recognition in the National Bridge Inventory for Automated Screening and Assessment of Infrastructure Mohamad Alipour 1 , Devin K. Harris 1 , Laura E. Barnes 2 1 Department of Civil and Environmental Engineering, University of Virginia, 2 Department of Systems and Information Engineering, University of Virginia The size and complexity of the problem of maintaining the aging US transportation infrastructure system, combined with the shortage of resources, necessitates an efficient strategy to prioritize the allocation of funds. Within the suite of tools available for decision-making for bridges, a fundamental characteristic is safe load carrying capacity. This capacity measure typically requires knowledge and data on the structural details of the constituent members to enable predictions of available resistance relative to loading demands. Bridges that receive low ratings and are deemed incapable of carrying the required loads are “posted” with maximum weight limit signage. This paper introduces a data-driven solution that enables the automated, rapid, and cost- effective evaluation of load postings for large infrastructure networks. The method proposed in this paper involves leveraging the large bridge population in the national bridge inventory and the associated bridge descriptors such as geometrical, operational, functional, and physical features, to extract and define patterns for predicting posting status. A cost-sensitive random forest classification algorithm was trained on over 280,000 bridges in selected categories in the national bridge inventory including steel, reinforced concrete, prestressed concrete, and timber bridges. Performance evaluation of the models demonstrated the validity of the models and comparisons with a number of other common classifiers was presented. The trained models were capable of detecting posted and unposted bridges with an average error of about 11% and 16% respectively. The trade-off between safety and economy in the models was also studied. Finally, as a product of the data-driven approach, an interactive software interface was developed which accepts user input data on bridges and predicts the posting status. This tool is expected to provide an intuitive method for rapid screening of bridge inventories and estimating deterioration progression, thereby resulting in substantial safety and financial benefits to owners. KEYWORDS: Bridge infrastructure, Data-Driven, National Bridge Inventory, Random Forests, Load rating and posting INTRODUCTION US infrastructure system is a vital element in support of the nation’s economy, security, and sustainability. An essential component of the national infrastructure system is the aging bridge inventory, with more than 610,000 structures currently with an average age of 43 years of which 24% are considered deficient (FHWA 2014). With this size population, strategies and resources for maintenance is a growing challenge for federal, state and local governments, especially considering that many bridges are reaching or exceeding their intended design service lives.
13

Pattern Recognition in the National Bridge Inventory for …ma4cp/Mohamad Alipour_files... · 2016-10-26 · Pattern Recognition in the National Bridge Inventory for Automated Screening

Jan 20, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Pattern Recognition in the National Bridge Inventory for …ma4cp/Mohamad Alipour_files... · 2016-10-26 · Pattern Recognition in the National Bridge Inventory for Automated Screening

Pattern Recognition in the National Bridge Inventory for Automated Screening

and Assessment of Infrastructure

Mohamad Alipour1, Devin K. Harris1, Laura E. Barnes2 1Department of Civil and Environmental Engineering, University of Virginia, 2Department of Systems and Information Engineering, University of Virginia

The size and complexity of the problem of maintaining the aging US transportation

infrastructure system, combined with the shortage of resources, necessitates an efficient

strategy to prioritize the allocation of funds. Within the suite of tools available for

decision-making for bridges, a fundamental characteristic is safe load carrying

capacity. This capacity measure typically requires knowledge and data on the structural

details of the constituent members to enable predictions of available resistance relative

to loading demands. Bridges that receive low ratings and are deemed incapable of

carrying the required loads are “posted” with maximum weight limit signage.

This paper introduces a data-driven solution that enables the automated, rapid, and cost-

effective evaluation of load postings for large infrastructure networks. The method

proposed in this paper involves leveraging the large bridge population in the national

bridge inventory and the associated bridge descriptors such as geometrical, operational,

functional, and physical features, to extract and define patterns for predicting posting

status. A cost-sensitive random forest classification algorithm was trained on over

280,000 bridges in selected categories in the national bridge inventory including steel,

reinforced concrete, prestressed concrete, and timber bridges.

Performance evaluation of the models demonstrated the validity of the models and

comparisons with a number of other common classifiers was presented. The trained

models were capable of detecting posted and unposted bridges with an average error of

about 11% and 16% respectively. The trade-off between safety and economy in the

models was also studied. Finally, as a product of the data-driven approach, an

interactive software interface was developed which accepts user input data on bridges

and predicts the posting status. This tool is expected to provide an intuitive method for

rapid screening of bridge inventories and estimating deterioration progression, thereby

resulting in substantial safety and financial benefits to owners.

KEYWORDS: Bridge infrastructure, Data-Driven, National Bridge Inventory,

Random Forests, Load rating and posting

INTRODUCTION

US infrastructure system is a vital element in support of the nation’s economy, security,

and sustainability. An essential component of the national infrastructure system is the

aging bridge inventory, with more than 610,000 structures currently with an average

age of 43 years of which 24% are considered deficient (FHWA 2014). With this size

population, strategies and resources for maintenance is a growing challenge for federal,

state and local governments, especially considering that many bridges are reaching or

exceeding their intended design service lives.

Page 2: Pattern Recognition in the National Bridge Inventory for …ma4cp/Mohamad Alipour_files... · 2016-10-26 · Pattern Recognition in the National Bridge Inventory for Automated Screening

An essential step in the condition assessment and safety rating of bridges is calculating

the safe load capacity. This process is usually referred to as load rating and is carried

out by qualified engineers following procedures outlined in the Manual for Bridge

Evaluation (AASHTO 2011). Bridges that receive low ratings, i.e. rating factor less than

one (RF < 1.0), are deemed incapable of carrying the required loads and are “posted”

with maximum weight limit signs. Vehicles heavier than the posted weight limit are

then required to detour these bridges, thus leading to increased transportation costs.

This bottleneck effect is an undesirable outcome, especially with the increasing

demands for increasing commerce, and therefore a safe increase in bridge load rating

is valuable to state departments of transportation (DOT). As a result, accurate and

reliable load ratings are critical to ensure operational safety and functionality, while

avoiding overly conservative ratings with obvious economic implications. As a recent

example highlighting the importance of accurate load ratings, state legal truck weights

needed to be increased to accommodate the transit of more supply within the crisis-

stricken regions as part of the relief efforts in the aftermath of Hurricane Katrina.

Therefore, accurate maximum safe load ratings of bridges were necessary for the

officials to plan aid corridors (FHWA 2006).

These needs for efficient bridge rating together with the size of the problem and the

shortage of resources, highlights a tremendous challenge and calls for innovative

multidisciplinary solutions. One proposed solution aims to leverage the wealth of

knowledge embedded within existing datasets such as the national bridge inventory

(NBI) database. The NBI is a unified database of bridges longer than 20 feet and their

associate characteristics. The NBI is formulated from data provided by the states, with

the bridges from their population that meet this requirement, but maintained by the

Federal Highway Administration (FHWA). Literature includes various examples of the

use of data mining techniques on the NBI database to study bridge condition and

performance (Chase et al 1999, Chase and Gaspar 2000, Kim and Yoon 2009, Li and

Burgueño 2010, Huang and Ling 2005, Harris et al. 2015) as well as many similar studies

on other similar datasets in the domain of civil and infrastructure engineering (Amiri

et al. 2015; Saitta et al. 2009; Farrar and Worden 2012; Jootoo and Lattanzi 2016).

This study proposes an objective method for the assessment of bridge load postings by

using emerging machine learning techniques. The proposed method involves the

extraction of patterns between operational, geometrical, functional and physical bridge

descriptors to arrive at predictions of bridge posting status.

Research Goals and Significance.

As an extension of the idea of data-driven load posting approach (Alipour et al. 2016a

and b), the main goal of this paper is to demonstrate the applicability and promise of

extending this approach across different types of bridge systems. To this end, the

proposed method was applied on populations of highway bridges extracted from the

NBI database and categorized into groups based on structural system and material. A

number of machine learning algorithms were also tested on the datasets to provide

baseline comparisons for the method.

Page 3: Pattern Recognition in the National Bridge Inventory for …ma4cp/Mohamad Alipour_files... · 2016-10-26 · Pattern Recognition in the National Bridge Inventory for Automated Screening

Other than the innovative formulation of the relation between bridge safety rating and

the bridge descriptors used, this paper provides a data mining study on over 280,000

bridges in the US. With the promising results presented herein, the successful

implementation of the proposed approach can be used as a screening tool for the rapid

audit of large populations of bridges maintained by state departments of

transportations. Moreover, as the prediction of the posting status of a bridge using the

proposed method solely relies on general bridge descriptors and does not require

detailed design or as-built plans, it can be used to assess the performance status of

bridges with insufficient design or as-built information (Alipour et al. 2016a).

PROPOSED METHODOLOGY

The research has three main components as summarized in Fig. 1. First, the NBI data

was studied to select the target populations. These data were preprocessed into the

appropriate form for model development. In the second step, classification algorithms

were trained on the data to enable the inference of posting status (class) based on bridge

descriptors (features). Finally, the third component involved the evaluation of the

models on unseen test set and the reporting of appropriate performance criteria. The

details of these three steps are described in the following sections.

Fig. 1. Flowchart showing research steps

Data Collection and Preprocessing. NBI data for 2015, which includes 611,845

structures, was used as the population set. This study focused on in-service highway

bridges; therefore, culverts, non-highway bridges (railroad, pedestrian, etc.), and

temporary or closed structures were filtered out. This reduced the population of study

to 403,255 bridges. Fig. 2 depicts the makeup of this population in terms of material

and structural systems. It is evident that the most common bridge types include: steel

multi-girder, concrete slab, concrete tee-beam, prestressed tee-beam, prestressed box-

beam, and wood/timber multi-girder. Bridges within these categories were selected for

this study and a summary of relevant characteristics are present in Table 1. It should be

noted that according to the NBI, about ten percent of the initial population shown in

Fig. 2 have been load rated using “engineering judgment” and not traditional

mechanics-based capacity calculations. Previous investigation by the authors indicate

that in a majority of these cases, sufficient structural details for an analytical rating may

not be readily available (Harris et al. 2015) and the current practice in the state DOTs

Page 4: Pattern Recognition in the National Bridge Inventory for …ma4cp/Mohamad Alipour_files... · 2016-10-26 · Pattern Recognition in the National Bridge Inventory for Automated Screening

is to rate these structures based on experience, judgment and observations about the

past performance of the structure. As such, these bridges were not used for modeling

and are not included in Table 1.

Fig. 2. Material and structural system type of NBI population

Table 1. Bridge population characteristics within primary categories

Bridge Categories Population Unposted Posted Ratio

Concrete Slab 52,262 47,467 (90.8%) 4,795 (9.2%) 9.9

Tee-Beam 21,411 19,075 (89.1%) 2,336 (10.9%) 8.2

Steel Multi-Girder 112,116 93,392 (83.3%) 18,724 (16.7%) 5.0

P/S

Concrete

Multi-Girder 50,687 50,199 (99.0%) 488 (1.0%) 102.9

Box-Beam 33,637 32,747 (97.4%) 890 (2.6%) 36.8

Wood Multi-Girder 10,737 4,704 (43.8%) 6,033 (56.2%) 0.8

Total 280,850 247,584 (88.2%) 33,266 (11.8%) 7.4

The NBI data lists more than 200 different features for each bridge, many of which

discuss identification, location, characteristics of the roads or rivers and so on. Some

others also include redundant or correlated information. Table 2 summarizes the

features that were related to the structural performance and capacity of a bridge and

were used for modeling categorized into four feature groups (Functional Properties,

Operational Conditions, Geometrical Properties, and Physical Conditions). The

features designated with an asterisk were not categories within the NBI, but were

developed through transformation or inference from pertinent NBI features. For

example, deck, superstructure and substructure condition ratings and deck geometry

evaluation ratings which are initially in a scale of 0-9, were re-categorized to poor

(rating<5), fair (=5) and good (>5) following the recommendation of the Manual for

Page 5: Pattern Recognition in the National Bridge Inventory for …ma4cp/Mohamad Alipour_files... · 2016-10-26 · Pattern Recognition in the National Bridge Inventory for Automated Screening

Bridge Evaluation (AASHTO 2011). Design loads were also re-categorized into three

groups of heavy, light and other vehicles based on their equivalent truck weight. Two

other attributes were also defined by indirectly using the information in the NBI. These

are continuity and urban (yes or no). Finally, two additional attributes were generated

from external sources to provide further context for operational conditions of a bridge.

One is the climatic zone as classified by the National Centers for Environmental

Information (Karl and Koss 1984). Each bridge was assigned to one of nine climatic

zones: Northwest, West, Southwest, South, Southeast, Northeast, Central, East North

Central and West North central. The other attribute was the ratio of each state’s gross

domestic product (GDP) to its number of bridges (categorized to three groups of high,

low and moderate), thus attempting to represent an indication of the economic factors

affecting the construction, maintenance and management practices in different parts of

the country.

Table 2. Attributes used in modeling and their statistics

* The feature is not originally in the NBI data and was defined based on one or more NBI items.

Model Development

Using the data discussed in the previous section, a classification algorithm was trained

on each bridge category, with the input features shown in Table 2 and with the goal of

predicting the class (posting status). The K-fold cross-validation technique was used

for splitting the data into training and testing sets in which the entire population is

Page 6: Pattern Recognition in the National Bridge Inventory for …ma4cp/Mohamad Alipour_files... · 2016-10-26 · Pattern Recognition in the National Bridge Inventory for Automated Screening

partitioned into k different random subsamples, and training is done on all possible k-1

sets of subsamples while the performance of the model is tested on the remaining

subsample. In this study, k was chosen as 10 and the sampling was stratified to maintain

the class distribution in the subsamples, as recommended in the literature (Witten et al.

2011). The performance criteria used for evaluating the models are misclassification

error (ERR), false positive error rate (FPR) and false negative error rate (FNR) as given

in equations 1-3. Note that misclassification error (ERR) is the complement of

conventional classification accuracy. In defining these criteria, posted and unposted

bridges were referred to as positive (+) and negative (-) instances respectively, as

shown in Table 3.

𝐹𝑁𝑅 =𝐹𝑁

𝑇𝑃 + 𝐹𝑁

Eqn. 1

𝐹𝑃𝑅 =𝐹𝑃

𝐹𝑃 + 𝑇𝑁

Eqn. 2

𝐸𝑅𝑅 =𝐹𝑃 + 𝐹𝑁

𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁

Eqn. 3

Table 3. Confusion matrix for the classification task

Actual

Posted (+) Unposted (-)

Predicted Posted (+) TP FP

Unposted (-) FN TN

A fundamental characteristic of the data under study is the level of inherent imbalance

between the two labels of the class attribute. Table 1 shows the number of posted and

unposted instances in each bridge category together with the ratio of unposted to posted

instances. This ratio quantifies the imbalance as a value close to one, which would

represent a balanced dataset. It can be seen that moderate to severe imbalance toward

unposted bridges exists across the different groups (ratios of 5 to 102.9). Wood girder

bridges are the only relatively balanced group where posted bridges even slightly

outweigh the unposted.

In the literature on classification tasks, class imbalance is widely known to adversely

affect the prediction performance with respect to the minority group (Chawla 2005).

As an example, in a dataset where 99% of the instances belong to the majority

(negative) group (as in P/S girder bridges in this study), the classification algorithm

will be usually biased toward the majority by simply predicting all instances to be

negative, thus reaching an accuracy of 99%. However, this is not satisfactory because

none of the minority instances (posted bridges) that are usually of more interest, are

correctly detected (FPR=0 and FNR=100%). Various methods of resampling,

reweighting or cost-sensitive classification have been proposed in the literature to

remedy the negative effect of class imbalance (Visa and Ralescu 2005; Chawla et al.

2002). In this paper, based on previous observation of desirable performance, a cost-

Page 7: Pattern Recognition in the National Bridge Inventory for …ma4cp/Mohamad Alipour_files... · 2016-10-26 · Pattern Recognition in the National Bridge Inventory for Automated Screening

sensitive random forest algorithm implemented in the open-source data mining package

Weka (Hall et al 2009) was selected as the main classifier of choice.

Random Forest is an ensemble learning algorithm wherein a large number of decision

trees are constructed on randomized samples of data, each with a few randomly selected

attributes and an instance is classified by taking a majority vote among all of the trees

(Breiman 2001). Random forests are known to possess significant predictive

performance, scale-invariance, and robustness to noise and irrelevant features, as well

as desirable behavior in imbalanced datasets (Hastie et. al 2009). As the random forest

model is being trained, the cost-sensitive learning scheme employed in this paper

imposes a higher penalty for a false negative than a false positive, thus forcing the

model to focus more on the correct prediction of the positive instances. A penalty value

equal to the majority/minority ratio of Table 1 was adopted in the absence of actual

error costs from the bridge owners, which could be appropriately calculated in a risk

analysis. For this study, the number of unpruned trees (with unlimited length) in each

forest was selected as 200 and the number of random features in each tree was 5 as

recommended in Weka for the current attribute set (Hall et al. 2009).

Finally, to provide context for comparison of the proposed modeling strategy with other

possible baseline approaches, six other conventional classification algorithms were

trained on the data using the same cost-sensitive approach and also a simple under-

sampling strategy where the majority group was randomly under-sampled to give a 1:1

ratio of the two classes. The baseline classification algorithms used in this paper are

listed here together with the model parameters used:

A pruned decision tree with C4.5 formulation, binary splits, and a minimum

leaf size of 25 (Quinlan 1993) (DT).

A naïve Bayes classifier (John and Langley 1995) (NB).

A logistic regression model with a ridge parameter of 1e-8 where nominal

features were first transformed into binary numeric features (Le Cessie and Van

Houwelingen 1992) (LR).

A K-nearest neighbor model with k=9 and a brute force search method for

finding the nearest neighbors by Euclidian distance criterion (Aha and Kibler

1991) (KNN).

A support vector machine model using the sequential minimal optimization

algorithm with a complexity parameter of one. The nominal features were

transformed into binary numeric prior to modeling (Platt 1998) (SVM).

An artificial neural network with one hidden layer of 27 nodes, 500 epochs, a

learning rate of 0.3, and a momentum of 0.2 (ANN).

For all these models, the implementation available in Weka was adopted and the

attributes were normalized before modeling (Hall et al 2009).

Page 8: Pattern Recognition in the National Bridge Inventory for …ma4cp/Mohamad Alipour_files... · 2016-10-26 · Pattern Recognition in the National Bridge Inventory for Automated Screening

RESULTS AND DISCUSSION

Model Performance. Fig. 3 shows the performance of the cost-sensitive random forest

model across the selected bridge categories. All error measures are within 20% with

the average FNR of 11.2%, FPR of 16.4% and ERR of 15.8%. This shows that the

classifier is effectively capable of discriminating between posted and unposted bridges

in every category. The best FNR and FPR are obtained in concrete slab (9.3%) and P/S

box-beam (15.1%) bridge categories. However, none of the categories possess a FPR-

FNR pair that are simultaneously smaller across the board. This means using only this

figure, it is difficult to decide if the modeling approach performs better in any of the

categories than others.

Fig. 4 compares the performance of the cost-sensitive random forest classifier with the

1:1 majority under-sampling and also with no imbalance treatment across different

bridge categories. According to this figure, in all categories, except wood girder

bridges, the model with no imbalance treatment shows a very high FNR (44.7%-84.9%)

together with a very low FPR (0-4.4%). This phenomenon is exacerbated with increase

in majority-to-minority ratio (from left to right in the figure). As discussed in the model

development section, this is to be expected due to the class imbalance, which manifests

in the inability of the models to detect the posted bridges and their excessive focus on

the majority (unposted) group. However, the figure also shows that with the

introduction of a cost-sensitive learning criterion and majority subsampling, this

problem is efficiently alleviated. The cost-sensitive option is slightly better with respect

to the detecting the posted bridges (lower FNR) than the under-sampling. Also, note

that this problem does not exist in wood girder bridges, which are relatively balanced.

Fig. 3. Performance of the cost-sensitive random forest model

Page 9: Pattern Recognition in the National Bridge Inventory for …ma4cp/Mohamad Alipour_files... · 2016-10-26 · Pattern Recognition in the National Bridge Inventory for Automated Screening

Fig. 4. Performance comparison of random forest models with different imbalance treatment

Discussions. As a comparison of the random forest model with the selected baseline

modeling approaches, Fig. 5 presents the average FPR and FNR rates for 7 different

learning algorithms for the selected bridge categories and the two class imbalance

treatment strategies (cost-sensitive and majority sub-sampling). While the best FNR

and FPR are for random forest (11.97%) and SVM (12.75%) respectively, it is difficult

to categorize which model is actually the “best” model. If we assume that the only

objective of a model is to correctly detect as many posted bridges as possible, the

random forest model is the best. However, a higher FPR means more safe bridges

would be incorrectly detected as posted (false alarms). This translates into more

unnecessary limitations or extra conservative testing/analyses required to ensure the

safety of the bridges and thus has economic implications. This downside is amplified

by the fact that in the imbalanced categories with a significantly higher number of

unposted (negative) instances than posted, a unit increase in FPR results in the

misclassification of many more bridges than it would with a unit increase in FNR.

Fig. 5. Average performance over different bridge categories and for cost-sensitive and sub-

sampled scenarios

Page 10: Pattern Recognition in the National Bridge Inventory for …ma4cp/Mohamad Alipour_files... · 2016-10-26 · Pattern Recognition in the National Bridge Inventory for Automated Screening

To address the FNR-FPR trade-off and gain a better understanding of the models,

another performance criterion named the F-Score was employed. The F-Score (Eq. 4)

is a function of the relationships between the prediction performances as defined by the

terms recall and precision (Chawla 2005). Recall (Eqn. 5) is the complement of FNR

and in this scenario, would take safety into account, whereas precision (Eqn. 6)

considers the percentage of all instances detected as positive (posted) that were actually

positive and thus takes into account the false alarms, which could equate to a measure

of economy. F-Score combines the two to provide an average goodness of a classifier

in an imbalanced dataset (Chawla 2005).

𝐹 − 𝑆𝑐𝑜𝑟𝑒 =2 ∗ 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑅𝑒𝑐𝑎𝑙𝑙

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙

Eqn. 4

𝑅𝑒𝑐𝑎𝑙𝑙 =𝑇𝑃

𝑇𝑃 + 𝐹𝑁

Eqn. 5

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =𝑇𝑃

𝑇𝑃 + 𝐹𝑃

Eqn. 6

Fig. 6a summarizes the average F-Scores for different classifiers over different bridge

categories. This figure shows that the random forest classifier possesses the best F-

Score and thus a superior combined performance compared with other baseline

classifiers. Also, Fig. 6b which shows the F-Scores for the cost-sensitive random forest

classifier, emphasizes that the combined performance degrades with the increase in

class-imbalance. The highest F-Score belongs to wood girder bridges, which again is a

relatively balanced dataset, while P/S girder bridges is the most severely imbalanced

dataset (ratio=102.8) and has the lowest F-Score.

(a) (b)

Fig. 6. Model evaluation results a) Average F-Scores of different algorithms over all bridge

categories, b) average F-Scores for the cost-sensitive random forest classifier (right)

Path for Implementation. In order to realize the practical benefits and facilitate the

utilization of the proposed data-driven approach, an interactive software interface was

developed. The software is currently in beta, but is expected to be made available for

access in the future by the authors. The software requires user input data on bridges

(Fig. 7) and predicts the posting status (Fig. 8) based on the cost-sensitive random

forest classifier developed in this paper. The graphical user interface (GUI) was

Page 11: Pattern Recognition in the National Bridge Inventory for …ma4cp/Mohamad Alipour_files... · 2016-10-26 · Pattern Recognition in the National Bridge Inventory for Automated Screening

designed and implemented using the C# programming language in Microsoft .Net

framework (Microsoft 2012), and Java APIs were employed to import the classifier

trained in Weka in the previous sections. This software is expected to serve as a useful

tool for rapid screening of the existing bridge populations and foresee bridge

deteriorations before emergencies occur thereby resulting in substantial safety and

financial benefits.

Fig. 7. Input screen of the software application

Fig. 8. Output screen of the software application with sample prediction results

CONCLUSIONS

An innovative data-driven method to predict the load posting status of bridge

populations was introduced. The method was based on pattern recognition between a

set of geometrical, physical, functional, and operational characteristics of bridges in the

national bridge inventory and their posting status. Six different categories of bridges

including over 280,000 instances with 25 features were extracted from the NBI. A cost-

Page 12: Pattern Recognition in the National Bridge Inventory for …ma4cp/Mohamad Alipour_files... · 2016-10-26 · Pattern Recognition in the National Bridge Inventory for Automated Screening

sensitive random forest classifier was trained and tested with an average error of 11%

and 16% in detecting posted and unposted bridges, respectively. The model was then

compared with a number of other classifiers and an alternative sub-sampling strategy.

The trade-off between false positive and false negative error rates was also studied to

obtain a better understanding of the performance of the models with respect to safety

versus economy. Finally, using the trained classifier models trained in this paper, an

interactive software application was developed which is expected to serve as a useful

tool for screening and evaluation of bridge inventories.

REFERENCES

Aha, D., Kibler, D. (1991). “Instance-based learning algorithms.” Machine Learning.

6:37-66.

AASHTO. (American Association of State Highway and Transportation Officials).

(2011). The Manual for Bridge Evaluation. 2nd ed. Washington, D.C.

Alipour, M., Gheitasi, A., Harris, D.K., Ozbulut, O.E., Barnes, L.E. (2016a). “A data-

driven approach for automated operational safety evaluation of the national

inventory of reinforced concrete slab bridges”. Transportation Research Board

(TRB), 95th Annual Meeting, Washington, DC.

Alipour, M., Harris, D.K., Barnes, L.E., Ozbulut, O.E., Carroll, J. (2016b). “Load

capacity rating of bridge populations using machine learning”, Journal of Bridge

Engineering, ASCE, in press.

Amiri, M., Ardeshir, A., Fazel Zarandi, M.H., and Soltanaghaei, E. (2015). “Pattern

extraction for high-risk accidents in the construction industry: a data-mining

approach”. International journal of injury control and safety promotion. 1-13.

Breiman, L. (2001). “Random forests.” Machine learning, 45(1). 5-32.

Chase, S. B., Small, E. P., Nutakor, C. H. R. I. S. (1999). “An in-depth analysis of the

national bridge inventory database utilizing data mining, GIS and advanced

statistical methods”. Transportation Research Circular, 498, 1-17.

Chase, S. B., and Gáspár, L. (2000). “Modeling the reduction in load capacity of

highway bridges with age”. Journal of bridge engineering, 5(4), 331-336.

Chawla, N. V. (2005). “Data mining for imbalanced datasets: An overview. In Data

mining and knowledge discovery handbook (pp. 853-867). Springer US.

Chawla, N.V., Bowyer, K.W., Hall, L.O., and Kegelmeyer, W.P. (2002). “SMOTE:

synthetic minority over-sampling technique.” Journal of artificial intelligence

research. 321-357.

Farrar, C.R. and Worden, K. (2012) Structural health monitoring: a machine learning

perspective. John Wiley & Sons.

FHWA. (Federal Highway Administration). (2006). “Audit of Oversight of Load

Ratings and Postings on Structurally Deficient Bridges on the National Highway

System.”. US DOT, Office of Inspector General. Washington, D.C.

FHWA. (Federal Highway Administration). (2014). “National Bridge Inventory

Database”. <https://www.fhwa.dot.gov/bridge/nbi.cfm>. (July 2015).

Page 13: Pattern Recognition in the National Bridge Inventory for …ma4cp/Mohamad Alipour_files... · 2016-10-26 · Pattern Recognition in the National Bridge Inventory for Automated Screening

Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H. (2009).

“The WEKA Data Mining Software: An Update; SIGKDD Explorations”, Volume

11, Issue 1.

Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of Statistical

Learning. 2nd edition. Springer Series in Statistics.

Harris, D.K., Ozbulut, O.E., Alipour, M., Usmani, S., and Kassner, B.L. (2015).

“Implications of load rating bridges in Virginia with limited design or as-built

details.” Proc. 7th Intl. Conf. on Structural Health Monitoring of Intelligent

Infrastructure. Torino, Italy.

Huang, J., and Ling, C.X. (2005). “Using AUC and accuracy in evaluating learning

algorithms”. IEEE Transactions on Knowledge and Data Engineering. 17(3). 299-

310.

John, G. H., Langley, P. (1995). “Estimating continuous distributions in Bayesian

classifiers.” In Proceedings of the Eleventh conference on Uncertainty in artificial

intelligence (pp. 338-345). Morgan Kaufmann Publishers Inc.

Jootoo, A., and Lattanzi D. (2016). “A machine learning approach to bridge

prototyping.” Proc. Intl. Conf. on Sustainable design, Engineering and Construction.

Tempe, AZ.

Karl, T., and Koss, W.J. (1984) “Regional and national monthly, seasonal, and annual

temperature weighted by area.” Historical Climatology Series. National Climatic

Data Center.

Kim, Y. J., & Yoon, D. K. (2009). “Identifying critical sources of bridge deterioration

in cold regions through the constructed bridges in North Dakota”. Journal of Bridge

Engineering, 15(5), 542-552.

Le Cessie, S., Van Houwelingen, J.C. (1992). “Ridge Estimators in Logistic

Regression.” Applied Statistics. 41(1):191-201.

Li, Z., and R. Burgueño. (2010). “Using soft computing to analyze inspection results

for bridge evaluation and management”. Journal Bridge Eng. 15(4). 430-438.

Microsoft. (2012). Microsoft Visual Studio 2012, One Microsoft Way, Redmond, WA.

Platt, J. (1998). “Fast Training of Support Vector Machines using Sequential Minimal

Optimization.” In B. Schoelkopf and C. Burges and A. Smola, editors, Advances in

Kernel Methods - Support Vector Learning.

Quinlan, J.R. (1993). “C4. 5: programs fosimple r machine learning”. Morgan

Kauffmann. San Mateo, CA.

Saitta, S., Benny, R., and Smith, I. (2009). Data Mining: Applications in Civil

Engineering. VDM Verlag. Saarrucken, Germany.

Visa, S., and A. Ralescu. (2005). “Issues in mining imbalanced data sets-a review

paper.” in Proc. 16th midwest artificial intelligence and cognitive science conf.

Witten, I.H., Frank, E., and Hall M.A. (2011). Data Mining: Practical machine learning

tools and techniques. Third ed. Morgan Kaufmann. Burlington MA.