RG_CubeAnalyst

8/9/2019 RG_CubeAnalyst

1/184

Cube Analy

Cube Analy

CUBE ANALYST

VERSION 6.1.0

http://../CubeAnalyst/Reference/master.pdfhttp://../CubeAnalyst/Reference/master.pdfhttp://../CubeAnalyst/Reference/master.pdfhttp://../CubeAnalyst/Reference/master.pdfhttp://../CubeAnalyst/Reference/master.pdfhttp://www.citilabs.com/


2/184

Copyright © 2007–2013 Citilabs, Inc. All rights reserved.

Citilabs is a registered trademark of Citilabs, Inc. All other brand names and product names used in this book are

trademarks, registered trademarks, or trade names of their respective holders.

The information contained in this document is the exclusive property of Citilabs. This work is protected under United

States copyright law and the copyright laws of the given countries of origin and applicable international laws, treaties,

and/or conventions. No part of this work may be reproduced or transmitted in any form or by any means, electronic or

mechanical, including photocopying or recording, or by any information storage or retrieval system, except as expresslypermitted in writing by Citilabs.

Citilabs has carefully reviewed the accuracy of this document, but shall not be held responsible for any omissions or

errors that may appear. Information in this document is subject to change without notice

60-010-1

April 24, 2013

http://../CubeAnalyst/Reference/master.pdfhttp://../CubeAnalyst/Reference/master.pdfhttp://../CubeAnalyst/Reference/master.pdfhttp://../CubeAnalyst/Reference/master.pdf


3/184

Cube Analyst Reference Guide iii

Cube Analyst Reference Guide

Contents

About This Document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1What is Cube Analyst? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Scope of this document. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

What’s new? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Common elements and variations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Reading this document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Conventions used in this document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Computing resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Cost information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Chapter 2 Estimation System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Framework for handling different data consistently . . . . . . . . . . . . . . . . . . . . 12

Objectives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Handling data variability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Options for users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Considerations for users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Deciding what information to input. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Inputting data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18Estimating the matrix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Analyzing the estimated matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19Improving the estimated matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Estimating highway and public transport matrices . . . . . . . . . . . . . . . . . . . . . 20

Overview of Cube Analyst. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

http://../CubeAnalyst/Reference/master.pdfhttp://../CubeAnalyst/Reference/master.pdfhttp://../CubeAnalyst/Reference/master.pdfhttp://../CubeAnalyst/Reference/master.pdfhttp://../CubeAnalyst/Reference/master.pdf


4/184

iv Cube Analyst Reference Guide

Contents

Chapter 3 Possible Data Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Types of data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26Link counts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Turning counts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Prior trip matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Trip cost matrix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27Partial O-D matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Trip ends. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28Routing information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28Cost distribution function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29Part-trip data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Sets of data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Chapter 4 Mathematical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

Mathematical notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32Explaining the letters and symbols. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32Notation used in the estimation equation . . . . . . . . . . . . . . . . . . . . . . . . . 34

Introduction to the mathematics in Cube Analyst . . . . . . . . . . . . . . . . . . . . . . 35Main mathematical features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35Estimation equation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36Model parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38Maximum likelihood objective function . . . . . . . . . . . . . . . . . . . . . . . . . . . 40Describing the variation in data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41Optimizer: Finding the minimum value . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Mathematical summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48Maximum likelihood method: Background theory . . . . . . . . . . . . . . . . . 48Application of maximum likelihood to Cube Analyst. . . . . . . . . . . . . . . 49Cube Analyst objective function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51Cube Analyst trip estimation model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52Estimating model parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52Optimization procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55Parameter errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56Cell reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

Extensions to the calculations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

Chapter 5 Data Preparation and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Trip ends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

Networks and traffic and passenger counts. . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

Screenlines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

http://../CubeAnalyst/Reference/master.pdfhttp://../CubeAnalyst/Reference/master.pdfhttp://../CubeAnalyst/Reference/master.pdf


5/184

Cube Analyst Reference Guide v

Contents

Routings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68Highways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68Public transport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

Setting confidence levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

Characteristics of the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70Deciding on confidence values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

Tuning estimation performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

Control of routing information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

Analyzing the results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

Chapter 6 Estimation Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

Study area. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

Estimating the matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

Evaluation: Sensitivity analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86Including part-trip data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

Chapter 7 Hierarchic Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

Introduction to hierarchic estimation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94Approaches to estimating very large matrices . . . . . . . . . . . . . . . . . . . . . 94Different levels of detail: Districts and zones. . . . . . . . . . . . . . . . . . . . . . . 94Different approaches to hierarchic estimation . . . . . . . . . . . . . . . . . . . . . 95

Alternative approaches to hierarchic estimation . . . . . . . . . . . . . . . . . . . . . . . 96Estimation with mixed district and zonal detail . . . . . . . . . . . . . . . . . . . . 96Local matrices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98Summary of the hierarchic estimation process . . . . . . . . . . . . . . . . . . . . 99

Defining districts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

Running Cube Analyst for hierarchic estimation . . . . . . . . . . . . . . . . . . . . . .106Parameter ZCONF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

Chapter 8 Using Cube Analyst . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

Input data: overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

Outputs: overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

Estimating large matrices (hierarchic estimation) . . . . . . . . . . . . . . . . . . . . .112

Estimation process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

Chapter 9 Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

Summary of Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

Sample reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .118Average confidence level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118Final five iterations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119



6/184

vi Cube Analyst Reference Guide

Contents

Matrix totals and zone generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .119Zone attractions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120Average confidence level (part trip data) . . . . . . . . . . . . . . . . . . . . . . . . .121Part trip totals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

District matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122Local matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

Chapter 10 Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

Chapter 11 Control Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

&PARAM keywords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128Standard user control parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128Secondary user control parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

Tuning control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .135

&OPTION keywords. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

Chapter 12 Program Specific Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

Screenline file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140Link count format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

Turning count format. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

Trip end file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .142

Coordinate file. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

Model parameter file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

Local matrix control file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

District definition file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

Intercept file. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149Gradient search file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

Chapter 13 Notes on Program Use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

Approaches to running Cube Analyst. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152Initial estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152Constrained model parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153Controlling the optimization process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

Selection of model form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

Information in the optimization log file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

Computation times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

Running Cube Analyst from Cube Voyager . . . . . . . . . . . . . . . . . . . . . . . . . . . 160Running Cube Analyst from a VOYAGER script . . . . . . . . . . . . . . . . . . . . 160Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

Chapter 14 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

Estimation with prior trip and count data only . . . . . . . . . . . . . . . . . . . . . . . . 162



7/184

Cube Analyst Reference Guide vii

Contents

Estimation with prior trip, count, and trip end data . . . . . . . . . . . . . . . . . . .163

Estimation with ”warm start” and cost data . . . . . . . . . . . . . . . . . . . . . . . . . . .164

Estimation with highways part trip data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

Estimation with public transport part-trip data . . . . . . . . . . . . . . . . . . . . . . .166

Hierarchic estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167Example of screenline volumes report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169



8/184

viii Cube Analyst Reference Guide

Contents



9/184

Cube Analyst Reference Guide ix


About This Document

Welcome to Cube Analyst!

This document provides detailed reference information aboutCube Analyst.

This document contains the following chapters:

• Chapter 1, “Introduction”

• Chapter 2, “Estimation System”

• Chapter 3, “Possible Data Inputs”

• Chapter 4, “Mathematical Background”

• Chapter 5, “Data Preparation and Analysis”

• Chapter 6, “Estimation Process”

• Chapter 7, “Hierarchic Estimation”

• Chapter 8, “Using Cube Analyst”

• Chapter 9, “Reports”

• Chapter 10, “Files”

• Chapter 11, “Control Data”

• Chapter 12, “Program Specific Data”

• Chapter 13, “Notes on Program Use”

• Chapter 14, “Examples”



10/184

x Cube Analyst Reference Guide

About This Document



11/184

Cube Analyst Reference Guide 1


1 Introduction

This chapter introduces you to Cube Analyst. Topics include:

• What is Cube Analyst?• Scope of this document

• What’s new?

• Background

• Common elements and variations

• Reading this document

• Conventions used in this document

• Computing resources

• Cost information



12/184

2 Cube Analyst Reference Guide

Introduction

What is Cube Analyst? 1

What is Cube Analyst?

Cube Analyst is a program which estimates an origin-destination

(O-D) trip matrix. It is an optional, standalone and separatelylicensed module in the Cube suite.

Cube Analyst estimates one matrix at a time, and the data shouldform a set related to this particular matrix; that is, the data shouldcorrespond to the same time period (hour(s) of day, day of week,time of year) as the matrix. It should also correspond to the sameunits of flow as the matrix (vehicles, pcu’s, passengers, etc.).

The characteristic common to all estimation options offered byCube Analyst is that they make the best use, in a flexible way, of

commonly available data sources to contribute to the estimationprocess.

Data is given “levels of confidence” or ”reliability” by the user whichconditions the influence of varying sources of data in theestimation. The estimation process is based on the maximumlikelihood technique, coupled with an optimization procedure.



13/184


Introduction

Scope of this document

Scope of this document

This document applies to all levels of functionality offered and

modes of operation of Cube Analyst. Features specific to a variantare noted.

This document concentrates on Cube Analyst; wider matters onmatrix estimation, and the context within which Cube Analyst maybe used, are described in the ”Introduction to the Matrix EstimationPrograms.” This also explains the terms which have a specificmeaning for Cube Analyst which are also used in this document.



14/184


Introduction

What’s new? 1

What’s new?

Cube Analyst can now estimate Cube Voyager Public Transport

matrices by using an intercept file output by the Cube Voyager PTprogram.



15/184


16/184


Introduction

Common elements and variations1

Common elements and variations

The characteristic common to all variants of Cube Analyst is that

they make the best use, in a flexible way, of most available datasources in the estimation process. This includes not only vehicletraffic or passenger flow counts and prior (old) matrices, but alsopartially observed matrices, zonal trip end (generation andattraction) data, vehicle routing, travel cost matrices, and evenpreviously calibrated trip cost distribution functions. An extensionis the use of a further form of data called “part trip data,” describedin “Part-trip data” on page 29.

Data is ascribed confidence, or reliability levels by the user. This

conditions the influence of data when different data items(inevitably) imply different trip matrix cell values. The estimationprocess is based on a statistically rigorous procedure which takesdirect account of inherent traffic data variability. It uses themaximum likelihood technique, coupled with a powerfuloptimization procedure, to derive simultaneously an unusuallylarge set of model parameters. These then determine the estimatedtrip cell values with correspondingly enhanced precision.

Nevertheless, the estimation process remains mathematically

underspecified and a feature of Cube Analyst is the informationavailable to assess the quality of the estimated matrix. This includescomparative and sensitivity analyses, and reports which draw on arange of graphical and tabular presentations. Statistical reports areavailable which provide information on the standard errors ofmodel parameter values, and indicators of the stability of estimatedtrip matrix cells (via a sensitivity matrix).

Cube Analyst provides a hierarchic approach to estimation, suitedfor use with very large matrices, typically, between 2,500 and 5,000

zones in size. Its basic approach is to estimate a general matrix, inwhich zones are automatically grouped into districts. This area-wide estimation is then used to control a set of detailedestimations, which build up to provide a fully detailed estimate forthe entire study area.



17/184


18/184


Introduction

Conventions used in this document 1

Conventions used in this document

The following conventions are used in this document:

• Parameters, options, and selections appear in upper case.

For example: COSTM

• Technical term introduced for the first time, in upper and lowercase italics.

For example: Hessian

• Terms and phrases with particular meaning in the context ofCube Analyst in quotes. These phrases may also appear in

italics.For example: ”Sensitivity Matrix”



19/184


20/184


Introduction

Cost information1

Cost information

For highways, cost data is produced by Citilabs products.

For public transport in TRIPS, cost data is produced by MVPUBM.



21/184



2 Estimation System

This chapter discusses the nature of the estimation system. Topicsinclude:

• Framework for handling different data consistently

• Objectives

• Handling data variability

• Options for users

• Considerations for users

• Estimating highway and public transport matrices

• Overview of Cube Analyst



22/184


Estimation System

Framework for handling different data consistently 2

Framework for handling different data consistently

Cube Analyst provides a framework that is used to input a variety of

information to estimate an O-D matrix. The characteristics of thesystem are that:

• Some or all of the types of information introduced in “Commonelements and variations” on page 6 may be used.

• The system can work with little data, but the accuracy of theestimated matrix is improved as more data is input.

• Different information is handled on a consistent basis.

• The variability of data is explicitly accounted for.



23/184


Estimation System

Objectives

Objectives

The aim of Cube Analyst is to maximize the value of existing data

and to limit the need for costly surveys. As such, it is mainlyconcerned with processing information in the best (statistical)manner; though the accuracy of the estimated matrix remainsstrongly affected by the amount and the quality of the informationinput by the user.

Beside the role of estimating matrices for individual studies, CubeAnalyst is suited for use with regular surveys designed to keepmatrix information up-to-date.



24/184


Estimation System

Handling data variability 2

Handling data variability

Cube Analyst explicitly considers the variability of data. Inevitably,

there are inconsistencies in what the different data suggest that theestimated matrix should be. The inherent variability means thatcollected data items are merely a sample, and hence the values,(even of simple traffic counts) may only be considered to fall withina range (a distribution). The width of this range is a reflection of theconfidence that may be placed in particular items.

Cube Analyst therefore requires the user to input informationabout how confident they are that each data item is representativeof the situation for which the matrix is to be estimated. The

information is input as a nominal percentage sample value. Inrestricted circumstances, this may be an actual sample obtained ina survey. This information about the variability is used to determinewhat relative influence each item of data has in the estimationprocess—it acts algebraically as a weighting value, and is referredto as a “confidence level.”



25/184


Estimation System

Options for users

Options for users

The user does not have to use Cube Analyst in one manner, but

rather according to the information that is available and thecontext within which the matrix is required. Typically, the user willstart with what information is to hand or may easily be collected.

This provides a fast means of obtaining an initial matrix that canenable a study to proceed, at least for general investigations.Analysis of the resulting matrix and estimation statistics will showwhere there is greatest requirement for further quality data. CubeAnalyst is then used to integrate this new (and possibly differenttype of) data to produce an improved estimated matrix.



26/184


27/184


Estimation System

Considerations for users

Deciding what information to input

This will usually be all information already available, but new datawill normally be appropriate for those parts of the study area where

most change has taken place since previous surveys, or wheretraffic schemes or policy proposals require detailed analysis.

Identify notable features and data sources

Feature Example Data

Changes in:

Car ownership Traffic growth Counts

Land use New industry, shops

New car parking

Trip ends (generations and

attractions)

Road/public

transport network

New bypass

Traffic management

New bus/rail services

Travel times, routing

Travel habits Out-of-town shopping Observed O-D patterns;

PT operators’ boarding &

alighting surveys;

vehicle licence plate surveys

Appreciating key land uses



28/184


Estimation System

Considerations for users2

Inputting data

Information may be input in the form of matrices, as trip ends, or asnetwork-related information. This data is prepared by the user

within Cube, which offers a variety of modes of data entry. Extrainformation is required on data variability. This is input in the sameform as the information to which it corresponds. Each data item, forexample each count, trip end, etc., may have an individualconfidence level attached to it, but in many cases global values willbe used.

Estimating the matrix

The matrix estimation stage simply requires the user to input theprepared files into Cube Analyst. As is described in “Overview ofCube Analyst” on page 21, and with more detail in Chapter 4,“Mathematical Background” Cube Analyst performs a set ofiterative calculations which will automatically determine thestatistically most likely matrix for the set of input data valuesprovided.

The first time Cube Analyst is run, it creates a set of files which canbe used to reduce the run times of subsequent runs of Cube

Analyst. This is either because the need to restructure data isavoided (the intercept file) or because an estimation can takeadvantage of previously calculated results (the gradient search fileand the model parameter file).

This ability to benefit from a previous run of Cube Analyst (for thesame basic study) is usually used to assist in analyzing theconsequences of changes in data values, but, for lengthy runs forlarge matrices it can provide a means of breaking an estimationinto more than one run, for convenience.

With an improved optimizer in Cube Analyst and more powerfulcomputers such staging of estimations is now rarer, but it remains atypical feature for hierarchic estimations of extremely large



29/184


Estimation System

Considerations for users

matrices. This is assisted by the local matrix control file, which isopen to editing so that estimations are staged in a mannerconvenient to the user.

Analyzing the estimated matrix

It is natural and desirable to want to check the quality of theestimated matrix. A typical approach to checking quality might beto compare the estimated matrix with some observed data whichhas not been used in the estimation process. However, thisapproach is not usually appropriate for Cube Analyst, which isdesigned to take advantage of all reasonably observed data. Forexample, if the estimated matrix implies that the link flows across a

screenline are different from that observed (this is easily checkedby assigning the estimated matrix to the network), then thesolution is to re-run the estimation but now incorporating the extraobserved data.

The approach to analyzing the quality of the estimated matrix is,therefore, based on:

• Comparing the estimated results with input data values

• Checking the sensitivity of the results if data values are altered

• Analyzing the estimation calculations

Besides information output by Cube Analyst itself, extensive use ismade of other Citilabs programs for creating tabulations andgraphic displays which highlight different characteristics of theestimated matrix.

Improving the estimated matrix

Deficiencies in the quality of the estimated matrix, when they are

signalled by the results of the analysis phase, are remedied byimproving the quality or quantity, or both, of the input data. Theanalysis phase can provide strong pointers as to which data iscontributing to quality problems and hence where the user canfocus attention.



30/184


Estimation System

Estimating highway and public transport matrices2

Estimating highway and public transport matrices

For much of the time, it is not necessary to distinguish between the

cases of estimating matrices for use with highways and publictransport analysis; the same principles apply to each. However,there are a number of points to note. The first one is that the unitsof the matrices are usually in terms of vehicles for highways, and interms of passengers for public transport.

Much of the data and methods of processing are identical for bothhighways and public transport, but the routing information isderived in quite different ways. There is also the concept of linegroups, which only applies to public transport and not to highways.

Assumptions about the quality and quantity of data vary betweenthe modes. Link count data is more readily, and accurately,available for highways than for public transport. Public transport isoften more reliant on part-trip data, as obtained from boarding andalighting surveys. This form of data may be obtained from licenceplate matching surveys for highways.



31/184


Estimation System

Overview of Cube Analyst


Cube Analyst’s operations can be considered as a series of activities:

1. Data input and restructuring

For the most part, Cube Analyst simply reads the set of user’sinput data at this initial stage. However Cube Analyst alsoanalyzes and restructures routing information (from the TRIPSroute choice probability (RCP) file or Cube Voyager path file),and count data, from the screenline file, into a more conciseand efficient file, called the intercept file. This restructuring canbe relatively lengthy so, as noted in “Considerations for users”on page 16, it is possible to re-use an Intercept file once it has

been created. For Cube Voyager users, the creation of theIntercept file is handled by the HIGHWAY program.

2. Calculation initiation

The main Cube Analyst calculations may be viewed as a searchfor the statistically most likely matrix, given the set of inputdata values. As this search relates, typically, to many thousandsof matrix cell values, the manner of searching is a critical aspectof Cube Analyst.

A calculation called the ”method of scoring” directs the start ofthe searching process. This calculation is always done as thefirst stage of the estimation calculation, and it may be repeatedlater, according to the settings of Cube Analyst’s ITERHparameter. (This determines the number of iterations betweengradient search matrix calculations.)

There is a ”strategy” consideration here. The default method forrunning Cube Analyst spends time with the ”method ofscoring” calculation in order to limit subsequent calculations.

Cube Analyst also calculates a suitable value for ITERH.However, it is open to the user to over-ride this strategy by:

Changing the setting of the IHTYPE parameter (used todetermine the optimization process) of Cube Analyst fromits default in order to avoid the method of scoring. This



32/184


Estimation System

Overview of Cube Analyst 2

reduces the associated calculation time, but means that thesearching process is initially less well directed and so thenet calculation time may still be longer.

Setting ITERH to a lower value than the default, whichmeans that the searching process is re-appraised by furtherapplication of the method of scoring. This may be suitablewhen there are signs that the optimizer is not able todetermine a convergent solution in a reasonable number ofiterations.

The user should note that these options for tuning theperformance of Cube Analyst exist, but should not necessarilybe concerned to apply them, as the default operation is usuallyentirely satisfactory. It requires some experience with aparticular estimation problem to determine its best strategy.

3. Function evaluation

“Function evaluation” is the term used to describe thecalculation of a series of estimation results. These are calculatedby way of an estimation equation (function). The estimationequation calculates the values of the estimated cells accordingto the current values of a series of model parameters. There area large number of model parameters, in fact the number is

usually two times the number of zones, plus the number ofscreenlines.

These model parameters have an initial value of 1.0, which hasthe consequence that the initial function evaluation (usually)results in an estimated matrix which is identical to the old(”Prior”—see “Prior trip matrix” on page 27).

4. Optimization

The optimizer is a central feature of Cube Analyst; there are twocritical elements to it:

a. Objective function — This provides a criterion by which theoptimizer can determine whether one value of a particularcell is better than another value. “Maximum likelihoodobjective function” on page 40 explains how this criterion



33/184


Estimation System


is derived from the statistical maximum likelihood theoryand rigorous mathematical calculation. Hence, CubeAnalyst defines ”better” as ”statistically more likely.”

b. Set of search directions and a step length — The optimizeralters the model parameter values, from their starting pointof 1.0, to seek an estimated matrix that is an improvementon its current estimates. The search direction determines,for any cell in the matrix, whether model parametersshould be increased or decreased, and the step lengthdefines by how much.

The final values of the model parameters are available to viewas the model parameter file, so it is possible to see how theyhave been changed from 1.0.

5. Iterations and convergence

After the optimizer has calculated new model parametervalues, the function evaluation process is repeated to obtainthe latest estimated matrix (and its derivative values). Thisoverall process is repeated in a series of iterations; at eachiteration the optimizer will ensure that the new estimatedmatrix is an improvement (”more likely”) than the previous one.Because there are so many cells to estimate, which Cube

Analyst does not confine to have integer values, it is normallyalways possible to make some improvement, however small. Therefore, it is necessary to define a criterion to determinewhen the iterations have reached an acceptable solution. InCube Analyst, this criterion is set by the UTOL (”user tolerance”)control parameter. UTOL sets a minimum value on the steplength which the optimizer is allowed to use, as very small steplengths indicate that the optimizer is making correspondinglysmall changes to the estimated matrix. It is usual to leave UTOLat its default value, and allow Cube Analyst to run until it

terminates with a ”converged” message.



34/184


Estimation System

Overview of Cube Analyst 2



35/184


36/184


Possible Data Inputs

Types of data3

Types of data

Cube Analyst can operate using some or all of the following types

of data:• Link counts

• Turning counts

• Prior trip matrix

• Trip cost matrix

• Partial O-D matrix

• Trip ends

• Routing information

• Cost distribution function

• Part-trip data

NOTE: Cube Analyst requires confidence level information for alldata types except routing information and cost distributionfunction.

Link counts

For highways, this information may be surveyed with considerableaccuracy and exploit automatic counters, but it may not show thecurrent demand for travel (which the O-D matrix should represent)if congestion has constricted flows.

For public transport, this data is often obtained from estimates ofpassenger numbers in buses and rail carriages, and is of inherentlylimited accuracy (but may still be usefully exploited by Cube

Analyst).

For both modes, it should be observed that matrices normallyapply to average situations for which individual counts will matchto only some extent.



37/184



Types of data

Link counts which are spread randomly across the networkcontribute relatively little information to the estimation of matrixcells. This may be less of a problem for public transport networksoffering limited alternative routes, than for highway networks with

inherently greater route choice options.

Turning counts

The same comments as for link counts apply. Note that turningcounts may only be applied when inputting a Cube Voyager pathfile. They are not supported for an estimation using a TRIPS RCP file.

Prior trip matrix

This matrix might be an out-of-date matrix for the study area, orpossibly a previous study forecast for the present day. It is notessential to input a prior trip matrix, but in practice a matrix is verydesirable for information about the pattern of trip movements.

Trip cost matrix

This matrix summarizes the cost of travel between zones, wherecost is normally defined as a user-specified combination of time

and distance, and any tolls or fares, etc. The trip cost matrix may beused as a substitute when some or all of a prior matrix is notavailable. The costs may be based on either modelled or surveyedspeed data.

Partial O-D matrix

This is simply another approach to providing the prior matrix thatmakes it possible to use information that specifies some cells of thematrix but not all. The user merely identifies a (relatively) highconfidence in those cells which have been observed and allowsother information to determine values in the remaining cells. Thismay be data from the cost matrix, in which case the correspondingprior matrix cells must be zero. Alternatively, non-observed cells aregiven non-zero values with zero or low confidence levels. Zero



38/184



Types of data3

values in input matrices are taken to indicate that trips incorresponding cells are impossible. Cost data are not used toestimate trips for cells which have non-zero prior-trip values.

This approach makes Cube Analyst useful when surveys have beenconducted around critical parts of a study area (for example, towncenters, travel corridors, etc.), but there remains a need to estimatethe matrix for the rest of the area.

Trip ends

The total number of trips generated from and attracted to zones(G&A) may be obtained either from surveys or from mathematicalland-use type models. Surveys are appropriate when zone

boundaries are such that traffic may be counted entering andleaving zones on distinct trips, rather than merely passing throughthe zone. This tends to occur only for some zones, for example a carpark or an industrial estate, but these are often important zones fora study.

It is possible to use data derived from both methods, for example, afew zones surveyed and the remainder derived from a model, withthe resulting trip ends distinguished through differing confidencelevels.

Routing information

It is possible to survey routing data, though this is rarely done. Themodelling of routing is often not a very good replication of actual(erratic) driver or passenger routing, and it is often not possible toplace much reliability on this otherwise important data. CubeAnalyst is therefore designed to use routing information, as far aspossible, only where the precise routing does not matter. Thus, for

skim cost information small variations in routes may be ignored,while count information is used in ”bottleneck” situations wherethe number of routes is limited to a few alternative links (ideallyone).



39/184



Types of data

Cost distribution function

Many areas which have been the subject of previous studies willhave a previously calibrated mathematical trip-cost distribution

function, as used in the gravity model. Because Cube Analystcontains its own calibration procedures, the information implied bythe distribution function is not normally used directly, although thea and b parameters, discussed later, may be fixed with reference toa previously calibrated gravity model.

Part-trip data

This data is surveyed in the form of matrices where the recorded

origin and destination are not necessarily the ultimate origin anddestination of the trip. This is illustrated in the figure “Definition ofpart-trip data” that shows the recorded part of trip (S - E) relative tothe total trip (O - D). It is possible for one or both of points S and Eto coincide with the corresponding points O and D. For highways,this data is typically obtained from licence plate matching surveys,and from on-board surveys recording passenger boarding andalighting points for public transport.

Definition of part-trip data



40/184



Sets of data3

Sets of data

Cube Analyst estimates one matrix at a time, and the data should

form a set related to this particular matrix, that is, the data shouldcorrespond to the same time period (hour(s) of day, day of week,time of year) as the matrix. It should also correspond to the sameunits of flow (vehicles, pcus, passengers, etc.). Sometimes the userwill have to transform data (for example, by factoring) to achievethis, and this will usually imply a reduction (small or large) inconfidence levels for the transformed data.

Also, only one set of information may be input into Cube Analyst foran estimation. Hence, if multiple sets exist, say, several traffic counts

for the same link, then the user must derive a single set. This maysimply be to choose the most recently surveyed set, or it might be aweighted average of all available sets. Multiple sets of data usuallyallow confidence levels to be increased relative to single sets ofdata.



41/184



4 Mathematical Background

This chapter describes the mathematics that Cube Analyst uses. Topics include:

• Mathematical notation

• Introduction to the mathematics in Cube Analyst

• Mathematical summary

• Extensions to the calculations



42/184


Mathematical Background

Mathematical notation4

Mathematical notation

This section discusses mathematical notation. Topics include:

• Explaining the letters and symbols

• Notation used in the estimation equation

Explaining the letters and symbols

This section uses mathematical notation, which can look dauntingfor those who are not accustomed to it. So, first, a word ofbackground explanation. The notation can be made to appearworse because of the use of greek letters and some specialist

mathematical symbols. The problem is that the normal 26-letterRoman alphabet is not sufficient, even considering upper andlower case letters, and remembering that some letters havetraditional mathematical meanings and associations. Themathematics which is presented here is only an extract of the fullCube Analyst mathematics, which uses an even wider range ofletters. Also, some of the traditional mathematical notations arecumbersome when used with vectors and matrices and theirelements, as Cube Analyst requires, hence it is better to usealternative forms.

This is mainly a pronunciation guide, but some of the symbols andletters are explained further:

Symbol Description

α alpha

β beta

η eta

θ theta

λ lambda

Ξ xi (upper case)

ξ xi (lower case)

π pi (upper case); symbol for multiplication (product)

Σ sigma (upper case); symbol for summation



43/184



Mathematical notation

The notation P(x|X) implies the probability of x, given the value X.Similarly, L(x|X) is the likelihood of x, given X; M(x|X) refers to thelog-likelihood of x, given X. Note the use of bold in the last exampleimplies that x and X are multi-valued vectors (or matrices).

Φ phi (upper case)

Ψ psi (upper case)

∂ partial differential operator∇ nabla; symbol for (partial) differentiation of matrix elements

e exponent

! factorial operator (for example, 4! = 4x3x2x1)

Symbol Description



44/184



Mathematical notation4

Notation used in the estimation equation

This notation is used in “Introduction to the mathematics in CubeAnalyst” on page 35 and “Mathematical summary” on page 48.

Notation Description

= Origin zone

=Destination zone

=Link count

= Screenline count (from count sites .....)

=

Model parameters

Mean travel cost

Any one of the model parameters

=

=

Observed data item

Estimated data item

NOTE: These may take values as shown below:

H observed h estimated Description

Number of trips from i to j

Number of trips from origin i

Number of trips to destination j

Number of trips through link k



45/184



Introduction to the mathematics in Cube Analyst


The design of Cube Analyst means that a user can estimate

matrices simply by supplying the program with the appropriateinput data and accepting the resulting matrix. However, it isvaluable to have some understanding of how Cube Analystcalculates the value of the estimated matrix cells; this insight bothhelps in providing confidence in the results and in guiding theapproach to input data, such as setting confidence levels andconsidering the potential effects of extra data or improved dataquality.

This section provides additional information about how Cube

Analyst computes matrix cells. Topics include:• Main mathematical features

• Estimation equation

• Model parameters

• Maximum likelihood objective function

• Describing the variation in data

• Optimizer: Finding the minimum value

Main mathematical features

This section is intended to cater to those Cube Analyst users whoare interested in the detailed mathematical and statisticalunderpinnings of the estimation process. Users who are moreinterested in other aspects of the model should proceed toChapter 5, “Data Preparation and Analysis.”

The basis of Cube Analyst’s calculations is an application of the

standard statistical approach known as the maximum likelihoodmethod. This method allows estimates of a set of inputs to guidethe estimates of a corresponding set of outputs; the estimates ofthe set of inputs are obtained from likelihood functions, which areexpressions of probability distribution functions (pdfs) associated



46/184



Introduction to the mathematics in Cube Analyst 4

with the user’s input data. The outputs are calculated from anestimation equation, which must be provided. These points arefurther explained below.

Given the range of possible input data, the full mathematicalexpression of Cube Analyst is complex, but it involves someprincipal components which we use to describe the essentialfeatures of Cube Analyst. “Mathematical summary” on page 48 explains the standard Cube Analyst calculations by summarizingthe main mathematical steps. “Extensions to the calculations” onpage 57 shows how additional features are accommodated in thecalculations. This section continues with explaining Cube Analyst’smathematics in largely descriptive terms, while introducing themain equations. Throughout this section, the mathematical

notation is defined “Mathematical notation” on page 32, where it isnot otherwise clear from the text.

Estimation equation

The heart of the estimation is an equation (”estimation model”)whose output, , corresponds to the values of the cells of CubeAnalyst’s output matrix for trips between zones and . The formof this mathematical estimation model in Cube Analyst is:

.....(1)

This equation contains the following elements:

• its output,

• some data items:

Prior observation of trips between and

Probability of trips between zones and using screenline site (it

is possible for a ”screenline” to correspond to a single count site, in

suitable circumstances)



47/184




• some Model Parametersai, bj, XK.

If there is no prior observation for movements between some orall possible origin-destination zone pairs, , then may becalculated by Cube Analyst from:

.....(2)

Equation (2) introduces further elements:

• One data item:

- the generalized cost of travel between zones and

• Two model parameters:

α, β

It may be noted that screenlines are usually organized so thator . Also, because provides an estimator of the output, as

well as possibly being an input data item, it may also be considered

as a model parameter. Hence, the data item is also referred to as. (That is, and are numerically identical, but are logically

distinct.)

The form of equation (1) has been chosen primarily for reasons ofconvenience, and for the appropriateness of its form according tothe data used in the estimation (as we discuss below). It is designedto be efficient is assisting information to be processed, but is notbehavioral in nature. This implies that Cube Analyst is suitable forestimating present day matrices, but not for forecasting which

would require some behavioral assumptions.

Equation (2) is borrowed from the well-known gravity model thatmakes the behavioral assumption that people prefer lower cost

journeys to higher cost ones, but are influenced by the level of tripsgenerated by and attracted to different zones. This is a broad

Implies the product of over all the screenline count sites



48/184




assumption; it means that cost data may be used where no othersource of prior matrix data is available, but it is not a preciseapproach to estimating individual matrix cells.

Model parameters

For Cube Analyst, therefore, the estimated matrix is entirelydependent on the values given to the model parameters. CubeAnalyst is thus, in effect, solely concerned to establish the mostappropriate values for these model parameters. (Cube Analyst’scalculations are in ”parameter space,” which accounts for some ofthe behavior that may be observed in Cube Analyst’s output to thescreen and log file while it is computing, where the values of the

matrix may change in an apparently erratic manner.) Cube Analyst’scalculations are mainly in the nature of a search for the ”best”model parameter values. Apart from the estimation equation itself,the main features of the Cube Analyst calculations are:

• Directing the search for Model Parameters values —”optimization”

• Deciding whether the new Model Parameter values are the”best” — ”function evaluation”

We now describe the general issues for Cube Analyst when settingmodel parameter values.

Unless the user supplies an input model parameter file (createdeither by an earlier run of Cube Analyst), the model parameters areautomatically initialized to 1.0. From equation (1), it may be seenthat the initial estimate is identical to the prior matrix (or based onthe cost matrix, equation (2), if no prior matrix value exists).

It is possible to compare the estimated matrix with all of the itemsof the user’s input data. For example, the sum of rows and columnsof the estimated matrix may be compared with input trip ends(“Mathematical summary” on page 48 shows this in mathematicalterms for all data items). If the result of this comparison indicates



49/184




that the current estimate is too low, then an improved estimatedmatrix may be achieved by increasing the value of, at least, somemodel parameters.

The ”problem” for Cube Analyst is that there are many items of userdata, implying many comparisons of the type just described; someof these comparisons may require the current estimate to beimproved in one way (increased, say), while other comparisonsneed the estimate to be altered another way (decreased, say). Thelarge number of model parameters provides the basis forreconciling these apparent conflicts;by definition there are (2 x thenumber of zones) model parameters provided by the s and the

s alone. It may be demonstrated that these are sufficient forequation (1) to define any possible combination of positive, non-

zero matrix cell values. Hence, if, by some means, suitable values ofthe model parameters may be found, equation (1) can produce amatrix which is consistent with all of the user’s input data. That is, atleast, if the input data is self-consistent in the first place.

Of course, this consistency is never the case in real applications ofCube Analyst, and the best that may be hoped for is to estimate thematrix which is most likely, given the user’s input data. Achievingthis ”most likely” result is the next main topic to discuss, but we willstay with model parameters to make a few more points.

In principle, there is nothing particular to distinguish the set ofmodel parameters ; mathematically, they are equal andeach may be affected by any item of data. However, the form of theestimation equation allows parameters to be associated naturallywith different types of data such as:

Trip ends, for trips generated at zone i and attracted to zone j

Counts on screenline site K

Trip c information

(Prior trip matrix



50/184




This association is useful to the optimizer in reflecting the different(quality) characteristics of the data sets. The nominally redundant

parameters provide extra ”degrees of freedom” to handle datainconsistencies. This is useful, as the matrix cells affected by a set of

screenline data are precisely defined by the routinginformation.

Maximum likelihood objective function

When Cube Analyst establishes values for the model parameters, itrequires a criterion to determine if the corresponding Tij estimateseither are ”correct” or are ”better” than another set of modelparameter values. This criterion is provided by a mathematical

equation called an objective function. The objective function, ,for Cube Analyst has the following form:

.....(3)

where:

“Notation used in the estimation equation” on page 34 showswhich items and can represent but, in general terms, is theinput data which the user supplies and is the correspondingvalue implied by the estimated matrix.

We have already discussed how the form of the estimationequation (1) has been determined for reasons of effectiveness, butwhich remain essentially arbitrary; also, how equation (2) derives a

weak behavioral basis from the gravity model. It is thereforeimportant to appreciate that in contrast, the objective function,equation (3), is the result of a statistically rigorous procedure,namely the maximum likelihood method.

- is an estimated data item

- is an observed data item

- is the confidence level associated with .



51/184




The consequence of this is a guarantee, subject to somequalifications which we consider below, that the estimated matrixis the statistically most likely, given the data supplied by the user.

The ”correctness” of the estimate remains, of course, dependent on

the quality of the input data. Maximum likelihood theory showsthat the most likely values are indicated when M in equation (3),which is negative, reaches its minimum possible value. (For reasonsof computational convenience, Cube Analyst minimizes thenegative of the ”log-likelihood” objective function, rather thanmaximizing the positive version, as the name ”maximumlikelihood” might suggest.)

The qualifications mentioned before respectively concern the inputdata sets representing ”independent observations,” which is not

normally a problem for Cube Analyst users, and of the input databeing described by a probability distribution function, which wenow discuss. The derivation of equation (3) for the objectivefunction is outlined in “Mathematical summary” on page 48.

Describing the variation in data

The maximum likelihood method assumes that each item of inputdata represents an observation from a random distribution of

possible values, but where the variation of values may be describedby a probability distribution function. That is when the usersupplies Cube Analyst with, say, a screenline traffic count value of1684 vph; this is not considered to be the count for that screenlinebut, rather, a sample from a distribution. It is common experiencethat counting the same screenline on another, but equivalentoccasion (for example, the same time the following week) willprovide another count value, say 1739 vph, simply on account ofthe random variation which is inherent in all traffic (and passenger)data.



52/184




The assumption is made, therefore, that all input data for CubeAnalyst is subject to variation which may be described by thePoisson probability distribution function (pdf).

llustration of a Poisson probability distribution function

The Poisson is a well-known pdf, often associated with data whichcan involve many ”events” (for example, 1684 vehicles passing anobserver in an hour). It has the statistical property that its meanequals its variance. This is valuable for data such as countinformation where the variation of 100 vph is significant when themean figure is 200 vph, but not when it is 1000 vph; alternatively, a10% variation implies many vehicles on a mean of 5000 vph, but

not on 50 vph. The Poisson distribution reflects these changes insignificance in an appropriate way.

During the original development of Cube Analyst, alternativeassumptions about the pdf used to describe data variation werereviewed; the Log-Normal distribution for example, but these were



53/184




considered only to add complexity, rather than accuracy. It isusually that case that the Poisson is a good way of describing trafficand passenger data. The Poisson distribution also has theconsiderable merit that it leads to some mathematical relationships

where the role of confidence levels is clearly apparent. In particular,“Mathematical summary” on page 48 shows an element of thecalculation concerned with calculating the optimum value of theobjective function which has the following general form (seeequation (18) later for details):

The λ 1, , and represent, respectively, the confidence levels (λ ),

observed (H), and estimated (h) values for the first data item,similarly for the second, third, etc., data items. The form of thisequation is directly attributable to the use of the Poisson pdf;another pdf, the Normal pdf for example, would give a differentand more complex form.

The significance of equation (18) is two-fold: first, each and everydata item is represented in this equation—that is, each cell of theprior matrix, each trip end, each screenline count, and so on. Thus,all items of data are considered together, not in separate

categories. (It is not only equation (18) which shows this, mostsignificantly, so does equation (3), the objective function, amongstothers.) The second point is that the data contributes as:

1. A ratio of observed to estimated values

2. A linear combination (that is, simple addition (+)) of data items,each multiplied (weighted) by its own confidence level

This enables the Cube Analyst user to view confidence levels assimple weighting factors, even though the derivation of λ isoriginally from considerations of data sampling, as discussed in thefollowing section. This would not be the case if a non-Poisson pdfhad been used.



54/184




Optimizer: Finding the minimum value

We have already discussed how Cube Analyst is designed to adjustthe model parameters, from their initial value of 1.0, so that

equation (1) leads to a new value of , which provides a new set ofestimated data values, .

Equation (3) can then be used to determine if the new estimates aremore likely (”more consistent with”) the input data, . CubeAnalyst therefore incorporates a powerful optimizer to amend themodel parameters so that the value of is minimized as much aspossible. This minimum is defined mathematically by locating thepoint at which the gradient of the objective function, with respectto the set of model parameters, , is zero, that is . This

well-known approach to determining minimum or maximum



55/184




points is shown in the following figure, which shows in a schematicfashion how the value of the objective function, , variesaccording to the value of a parameter, .

It is at this stage, in particular, that Cube Analyst is operating in”parameter space.” The principle is, simply, to adjust eachparameter by an amount (the ”step length”) and by a searchdirection (up or down). The optimizer ensures that Cube Analystonly makes adjustments which improve the situation (that is, tofurther minimize the objective function, ). Once a set of(improving) adjustments has been made, the Cube Analyst

optimizer performs another iteration of adjustments to determinewhether more improvements are possible, and so on, until nofurther decrease in the (negative) value of the objective function ispossible.

Two dimensional schematic view of variations in objective function according to

model parameter values



56/184




This approach places several requirements on the optimizer:

• Efficiency in determining optimum step lengths and directions

• Avoidance of ”local minima” and location of the ”global

minimum” (this means being sure that no values of step lengthand direction could lead to a better result)

• Identification of the minimum point when in the neighborhoodof one (this means achieving a stable convergence point)

There are several possible approaches to calculating optimum steplengths and directions. These may be considered to represent aspectrum characterized, at one end, by methods which use asimple strategy to define a step length and direction, but spend

more time adjusting these elements through more iterations; at theother end, the methods spend more effort calculating the optimumstep length and direction, but require fewer iterations.

The direction information is held by Cube Analyst in the gradientsearch matrix file; this is also known as the Hessian matrix, as thegradient search matrix is an approximation for the Hessian. Thedegree of approximation depends on the method and certainaspects of the calculation, notably the proximity to convergenceand the number of iterations since the gradient search was last re-

computed (controlled in part by Cube Analyst control parameterITERH).

The significance of the Hessian matrix for Cube Analyst is that itprovides a mathematical description of the relationships betweenmodel parameters; indeed the Hessian itself approximates to thevariance-covariance matrix. This can be exploited by the optimizerto update the direction information in an optimum manner.

Through the Cube Analyst control parameter IHTYPE, the user can

select alternative methods. These are listed below in order ofincreasing calculation effort given to the step length and direction:

1. Method of steepest descent

2. Newton’s method



57/184


58/184



Mathematical summary 4

Mathematical summary

This section presents a further explanation of Cube Analyst’s

calculations, as given in “Introduction to the mathematics in CubeAnalyst” on page 35.

Topics include:

• Maximum likelihood method: Background theory

• Application of maximum likelihood to Cube Analyst

• Cube Analyst objective function

• Cube Analyst trip estimation model

• Estimating model parameters

• Optimization procedure

• Parameter errors

• Cell reliability

Maximum likelihood method: Background theory

Maximum likelihood is a standard method of estimating

parameters of mathematical modeling equations, based on sets ofrelevant data observations. Given values of the model parameters,the pdf defines the probability associated with the observed data.When viewed as a function of the model parameters, the pdf iscalled a Likelihood function. The values of the parameters whichmaximize this function are called maximum likelihood estimates.

They correspond to a model in which the probability of theobserved data is maximized. The estimation process has twoelements of establishing the likelihood function and of

determining the optimum parameter values to maximize it.Mathematically, the theory may be expressed as:

.....(4)

where:



59/184




= random variable

= observation

= parameter (or function of a parameter) The likelihood function is then defined to be:

.....(5)

where:

that is, is a set of observations

The optimization process is to find the value of that maximizes.

Application of maximum likelihood to Cube Analyst

In accordance with the above theory, but with a slightly alterednotation, the following are defined:

= a data item ( =above)

= an estimated item ( =above)

It is assumed that the appropriate pdf is

.....(6)

where is called the ”weighting factor.” It can be seen that isa Poisson random variable with mean . Thus can be

considered a scaling parameter which defines the time units in theunderlying Poisson process.

A likelihood function may thus be defined as:



60/184




.....(7)

Taking logarithms of equation (7) leads to:

.....(8)

It may be noted that = constant

Referring to equation (5), and considering all data items, H, alikelihood function may be defined as:

.....(9)

For computational ease, the task of maximizing L may be convertedto the minimization of:

.....(10)

where

.....(11)

Equation (10) therefore represents the general form of the

objective function which is minimized by Cube Analyst.



61/184




Cube Analyst objective function

Cube Analyst allows varied data items to be used in the estimation,that is, H and h may represent different data items, as shown in the

following table:

Substituting these observed and estimated data items intoEquation (10) gives an objective function shown below, with thesource of the data indicated.

For reasons to do with function evaluation, the estimated tij istreated as a least squares minimization in the objective function.

The objective function then becomes:

Observed data, H, and estimated equivalents, h

Observed data value, Estimated data value, Description

Nij Number of trips with

origin at zone i and

destination at zone j

Oi Number of trips with

origin at zone i

D j Number of trips withdestination at zone j

QK Number of trips

through screenline K

where: RijK is the proportion of trips in matrix cell (i, j) using screenline K

Objective function, M = Comment

Screenline counts

Trip origins



62/184




where indicates summation over cells which are zero in theprior matrix, but not the cost matrix.

Cube Analyst trip estimation model

The objective function, equation (12) above, is used to calibrate thetrip estimation model of the form:

.....(13)

where tij = Nij

or

Estimating model parameters

It follows, by differentiation of equation (11):

.....(14)

.....(15)

(Note: undefined for h=0)

Trip destinations

Prior matrix

Cost matrix derived

.... (12)

Objective function, M = Comment



63/184




The minimum value of the objective function, M, for a parameter ,

is found when .

The remaining steps are to:

1. Calculate using equation (13) and current values of ModelParameters.

2. Use the table “Observed data, H, and estimated equivalents, h”on page 51 to calculate for each set of input and estimateddata.

3. Calculate as we show below, for each set of estimated data.

leads to

.....(16)

where

.....(17)

and

.....(18)

Note: are constants

Substitutions for equation (15)



64/184




is undefined if or .

In equation (16) we need to substitute each set of model

parameters for . We start by determining for each parameterreproducing Model Equation (13),

.....(13)

where = constant

or

let

Then differentiating (13) gives:

(for each ) .....(19)

(for each ) .....(20)

(for ) .....(21)

( ) .....(22)

.....(23)

Finally, we substitute (19) to (23) into (16) for each value of , anduse an optimization procedure to choose parameter values thatgive values of that minimize the objective function (9).



65/184




Optimization procedure

Given an ini

RG_CubeAnalyst

Documents