Ildar Batyrshin, Janusz Kacprzyk, Leonid Sheremetov ...978-3-540-36247-0/1.pdf · Ildar Batyrshin, Janusz Kacprzyk, Leonid Sheremetov, Lotﬁ A. Zadeh (Eds.) Perception-based Data

Ildar Batyrshin, Janusz Kacprzyk, Leonid Sheremetov, Lotfi A. Zadeh (Eds.)

Perception-based Data Mining and Decision Making in Economics and Finance

Studies in Computational Intelligence, Volume 36

Editor-in-chiefProf. Janusz KacprzykSystems Research InstitutePolish Academy of Sciencesul. Newelska 601-447 WarsawPolandE-mail: [email protected]

Further volumes of this seriescan be found on our homepage:springer.com

Vol. 19. Ajita Ichalkaranje, NikhilIchalkaranje, Lakhmi C. Jain (Eds.)Intelligent Paradigms for Assistiveand Preventive Healthcare, 2006ISBN 978-3-540-31762-3

Vol. 20. Wojciech Penczek, Agata PolrolaAdvances in Verification of Time Petri Netsand Timed Automata, 2006ISBN 978-3-540-32869-8

Vol. 21. Candida FerreiraGene Expression on Programming: MathematicalModeling by an Artificial Intelligence, 2006ISBN 978-3-540-32796-7

Vol. 22. N. Nedjah, E. Alba, L. de MacedoMourelle (Eds.)Parallel Evolutionary Computations, 2006ISBN 978-3-540-32837-7

Vol. 23. M. Last, Z. Volkovich, A. Kandel (Eds.)Algorithmic Techniques for Data Mining, 2006ISBN 978-3-540-33879-6

Vol. 24. Alakananda Bhattacharya, Amit Konar,Ajit K. MandalParallel and Distributed Logic Programming,2006ISBN 978-3-540-33458-3

Vol. 25. Zoltan Esik, Carlos Martın-Vide,Victor Mitrana (Eds.)Recent Advances in Formal Languagesand Applications, 2006ISBN 978-3-540-33460-6

Vol. 26. Nadia Nedjah, Luiza de Macedo Mourelle(Eds.)Swarm Intelligent Systems, 2006ISBN 978-3-540-33868-0

Vol. 27. Vassilis G. KaburlasosTowards a Unified Modeling and Knowledge-Representation based on Lattice Theory, 2006ISBN 978-3-540-34169-7

Vol. 28. Brahim Chaib-draa, Jorg P. Muller (Eds.)Multiagent based Supply ChainManagement, 2006ISBN 978-3-540-33875-8

Vol. 29. Sai Sumathi, S.N. SivanandamIntroduction to Data Mining and itsApplications, 2006ISBN 978-3-540-34350-9

Vol. 30. Yukio Ohsawa, Shusaku Tsumoto (Eds.)Chance Discoveries in Real World DecisionMaking, 2006ISBN 978-3-540-34352-3

Vol. 31. Ajith Abraham, Crina Grosan, VitorinoRamos (Eds.)Stigmergic Optimization, 2006ISBN 978-3-540-34689-0

Vol. 32. Akira HiroseComplex-Valued Neural Networks, 2006ISBN 978-3-540-33456-9

Vol. 33. Martin Pelikan, Kumara Sastry, ErickCantu-Paz (Eds.)Scalable Optimization via ProbabilisticModeling, 2006ISBN 978-3-540-34953-2

Vol. 34. Ajith Abraham, Crina Grosan, VitorinoRamos (Eds.)Swarm Intelligence in Data Mining, 2006ISBN 978-3-540-34955-6

Vol. 35. Ke Chen, Lipo Wang (Eds.)Trends in Neural Computation, 2007ISBN 978-3-540-36121-3

Vol. 36. Ildar Batyrshin, Janusz Kacprzyk, LeonidSheremetov, Lotfi A. Zadeh (Eds.)Perception-based Data Mining and DecisionMaking in Economics and Finance, 2007ISBN 978-3-540-36244-9

Ildar BatyrshinJanusz KacprzykLeonid SheremetovLotfi A. Zadeh (Eds.)

Perception-based DataMining and DecisionMaking in Economicsand Finance

With 95 Figures and 37 Tables

Ildar BatyrshinMexican Petroleum Institute

Eje Central Lazaro Cardenas 152

Col. San Bartolo Atepehuacan

07730 Mexico

Mexico

Institute of Problems of Informatics

Academy of Sciences of Tatarstan

Mushtari st., 20, Kazan, 420012

Russia

E-mail: [email protected]

Janusz KacprzykSystems Research Institute

Polish Academy of Sciences

Newelska 6

01-447 Warszawa

Poland


Leonid SheremetovMexican Petroleum InstituteEje Central Lazaro Cardenas 152Col. San Bartolo Atepehuacan07730 MexicoMexico

St. Petersburg Institute for

Informatics and Automation

Russian Academy of Sciences

39, 14th Line, St. Petersburg, 199178

Russia


Lotfi A. ZadehUniversity of California

Computer Science Division

387 Soda Hall

94720-1776 Berkeley, CA

USA


Library of Congress Control Number: 2006939141

ISSN print edition: 1860-949XISSN electronic edition: 1860-9503ISBN-10 3-540-36244-4 Springer Berlin Heidelberg New YorkISBN-13 978-3-540-36244-9 Springer Berlin Heidelberg New York

This work is subject to copyright. All rights are reserved, whether the whole or part of the materialis concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broad-casting, reproduction on microfilm or in any other way, and storage in data banks. Duplication ofthis publication or parts thereof is permitted only under the provisions of the German Copyright Lawof September 9, 1965, in its current version, and permission for use must always be obtained fromSpringer-Verlag. Violations are liable to prosecution under the German Copyright Law.

Springer is a part of Springer Science+Business Mediaspringer.comc© Springer-Verlag Berlin Heidelberg 2007

The use of general descriptive names, registered names, trademarks, etc. in this publication does notimply, even in the absence of a specific statement, that such names are exempt from the relevantprotective laws and regulations and therefore free for general use.

Cover design: deblik, BerlinTypesetting by the editors and SPi using a Springer LATEX macro packagePrinted on acid-free paper SPIN: 11793922 89/SPi 5 4 3 2 1 0

Preface

The primary goal of this book is to present to the scientific and management communities a selection of applications using more recent Soft Computing (SC) and Computing with Words and Perceptions (CWP) models and techniques meant to solve the economics and financial problems. The selected examples could also serve as a starting point or as an opening out, in the SC and CWP techniques application to a wider range of problems in economics and finance.

Decision making in the present world is becoming more and more sophisticated, time consuming and difficult for human beings who require more and more computational support. This book addresses the significant increase on research and applications of Soft Computing and Computing with Words and Perceptions for decision making in Economics and Finance in recent years. Decision making is heavily based on information and knowledge usually extracted from the analysis of large amounts of data. Data mining techniques enabled with the capability to integrate human experience could be used for a more realistic business decision support. Computing with Words and Perceptions introduced by Lotfi Zadeh, can serve as a basis for such extension of traditional data mining and decision making systems. Fuzzy logic as a main constituent of CWP gives powerful tools for modeling and processing linguistic information defined on numerical domain.

Decision making techniques based on fuzzy logics in many cases have demonstrated better performance than competing approaches. The reason is that traditional, bivalent-logic-based approaches, are not a good fit to reality — the reality of pervasive imprecision, uncertainty and partiality of truth. On the other hand, traditional probabilistic interpretation of uncertainties in practice does not always correspond to the nature of uncertainties that often appear as the effects of subjective estimations. The list of practical situations, when it seems better to avoid the traditional probabilistic interpretation of uncertainty is very long. The centrepiece of fuzzy logic that everything is, or is allowed to be, a matter of degree, makes it possible to better deal with perception-based information. Such information plays an essential role in economics, finance and, more generally in all domains in which human perceptions and emotions are in evidence. For instance, it is the case for the studies of the capital markets/financial engineering including financial time series modeling; price projections for stocks, volatility analysis and the pricing of options and derivatives; and risk management to mention few.

The book consists of two parts: Data Mining and Decision Making. An introductory chapter by Lotfi A. Zadeh called “Precisiated Natural Language” describes the conceptual structure of Precisiated Natural Language (PNL), which

can be employed as a basis for computation with perceptions. PNL attempts to make possible treating propositions drawn from a natural language as objects of computation, capturing two fundamental facets of human cognition (a) partiality (of understanding, truth, possibility, etc.) and (b) granularity (clumping of values of attributes and forming granules with words as labels). The chapter shows that PNL has much higher expressive power than existing approaches to natural language processing based on bivalent logic. Its high expressiveness is based on the concept of a generalized constraint, which represents the meaning of propositions drawn from a natural language capturing their partiality and granularity. This chapter settles the conceptual basis for the rest of the book.

The first part of the book presents novel techniques of Data Mining. Researchers in the data mining field have traditionally focused their efforts on obtaining algorithms in order to deal with huge amounts of data. It is nevertheless true that the results we obtain using these algorithms are of limited use in practice hence limiting data mining spread use and acceptance in many real-world situations. One of the reasons for this is that purely statistical approaches, which do not consider the experience of the experts do not hit in the actual problem. Perception-based data mining should be able to manipulate with linguistic information, fuzzy concepts and perception based patterns of time series. That is why, additionally to classic techniques of data mining and classification algorithms such as decision trees or Bayesian classifiers, the subsequent chapters study other data mining operations such as clustering, moving approximations, fuzzy association rule generation more suitable to work with perceptual patterns. The subsequent chapters describe novel techniques of perception-based data mining with their applications to typical economics and finance problems.

The chapter titled “Towards Human-Consistent Data-Driven Decision Support Systems via Fuzzy Linguistic Data Summaries” by Janusz Kacprzyk and Sławomir Zadrożny focuses on construction of linguistic summaries of data. Summarization as one of the typical tasks of data mining provides efficient and human consistent means for the analysis of large amounts of data to be used for a more realistic business decision support. The chapter shows how to embed data summarization within the fuzzy querying environment, for an effective and efficient computer implementation. Realization of Zadeh’s computing with words and perception paradigm through fuzzy linguistic database summaries, and indirectly fuzzy querying, can open new vistas in data driven and also, to some extent, in knowledge driven and Web-based Decision Support System.

The next chapter titled “Moving Approximation Transform and Local Trend Associations in Time Series Data Bases” by Batyrshin I., Herrera-Avelar R., Sheremetov L., and Panova A. describes a new technique of time series analysis based on a replacement of time series by the sequences of slopes of linear functions approximating time series in sliding windows. Based on Moving Approximation (MAP) Transform several measures of local trend associations can be introduced which are invariant under linear transformation of time series. Due to this very important property the local trend association measures can serve as basic measures of similarity of time series and time series patterns in most of problems of time series data mining. The chapter considers several examples of

VI Preface

Preface

application of local trend association measures to construction of association networks for systems described by time series data bases. MAP can be used also as a basis for definition of perception based trend patterns like “quickly increasing”, “very slowly decreasing” etc in intelligent decision making systems including as a part expert knowledge and time series data bases. Nowadays Discrete Fourier Transform is a main technique for analysis of time series which describe signal propagation and oscillating processes, when the concept of frequency plays a key role. MAP transform can serve as a main instrument for analysis of local trends and tendencies of non-oscillating processes which is important for economic and financial applications.

Most of information used in economics and finance is stored in time series; that is why, the book pays special attention to time series data mining (TSDM). Development of intelligent question answering systems supporting decision making procedures related with time series data bases needs to formalize human perceptions about time, time series values, patterns and shapes, about associations between patterns and time series, etc. The chapter called “Perception Based Patterns in Time Series Data Mining” by Batyrshin I., Sheremetov L., and Herrera-Avelar R. presents an overview of the current techniques and analyse them from the point of view of their contribution to perception based TSDM. The survey considers different approaches to description of perception based patterns which use sign of derivatives, scaling of trends and shapes, linguistic interpretation of patterns obtained as result of clustering, a grammar for generation of complex patterns from shape primitives, temporal relations between patterns. These descriptions can be extended by using fuzzy granulation of time series patterns to make them more adequate to perceptions used in human reasoning. Several approaches to relate linguistic descriptions of experts with automatically generated texts of summaries and linguistic forecasts are considered. Finally, it is discussed the role of perception based time series data mining and computing with words and perceptions in construction of intelligent systems that use expert knowledge and decision making procedures in time series data base domains.

The next chapter titled “Perception Based Functions in Qualitative Forecasting” by Batyrshin I. and Sheremetov L. discusses application of fuzzy perception based functions (PBF) to qualitative forecasting of a new product life cycle. PBF are given by a sequence of rules Rk: If T is Tk then S is Sk, where Tk are perception based intervals defined on the domain of independent variable T, and Sk are perception based shape patterns of variable S on interval Tk. Intervals Tk can be expressed by words like Between N and M, Approximately M, Middle of the Day, End of the Week etc. Shape patterns Sk can be expressed linguistically, e.g. as follows: Very Large, Increasing, Quickly Decreasing and Slightly Concave etc. The authors consider new parametric patterns used for modeling convex-concave shapes of PBF and propose a method of reconstruction of PBF with these shape patterns. These patterns can be used also for time series segmentation in perception based time series data mining.

The chapter titled “Towards Automated Share Investment System” by Dymitr Ruta describes a classification model that learns the transaction patterns from optimally labelled historical data presented as time series and accordingly gives

VII

the profit-driven decision for the current-day transaction. Contrasting to the traditional regression-based approaches, a proposed model can facilitate the job of a busy investor prefering a simple decision on the current day transaction: buy, wait, sell that would maximise his return from the investment. The model is embedded into an automated client-server platform which automatically handles data collection and maintains client models on the database. The prototype of the system was tested over 20 years of NYSE:CSC share price history showing substantial improvement of the long-term profit compared to a passive long-term investment.

The Decision Tree is one of the most popular classification algorithms in current use in Data Mining. The chapter called “Estimating Classification Uncertainty of Bayesian Decision Tree Technique on Financial Data” by Shetinin et al. studies interpretability issues of classification models, which is crucial for experts responsible for making reliable classifications. Decision Trees classification model is combined with the Bayesian model averaging all possible DTs achieving thus the required diversity of the DT ensemble. The authors explore the classification uncertainty of the Bayesian Markov Chain Monte Carlo techniques on some datasets from the StatLog Repository and real financial data. The classification uncertainty is compared within an Uncertainty Envelope technique dealing with the class posterior distribution and a given confidence probability. This technique provides realistic estimates of the classification uncertainty, which can be easily interpreted in statistical terms with the aim of risk evaluation.

Another important method frequently used in data mining is Cluster Analysis. In the chapter “Invariant Hierarchical Clustering Schemes” by Batyrshin I. and Rudas T., the properties of general scheme of parametric invariant clustering procedures based on transformation of proximity function into fuzzy equivalence relation are studied. The scheme gives possibility to build clustering procedures invariant to numeration of objects and to monotone transformation of proximity values between objects. Several examples are used to illustrate the application of proposed clustering procedures to analysis of similarity structures of data.

The second part of the book focuses on the problems of perceptual Decision Making in Economics and Finance. As shown in the previous chapters, the time series analysis has three goals: forecasting (also called predicting), modeling, and characterization. Almost all managerial decisions are based on forecasting and modelling. The ability to model and perform decision modeling and analysis is an essential feature of many real-world applications.

The second part opens with the chapter called “Fuzzy Components of Cooperative Markets” by Milan Mareš deals with the Walras equilibrium model and its cooperative modification also analyzing some possibilities of its fuzzification. The main attention is focused on the vagueness of utility functions of prices, which can be considered for most subjective (utilities) or most unpredictable (prices) components of the model. The elementary properties of the fuzzified model are presented and the adequacy of the suggested fuzzy set theoretical methods to the specific properties of real market models is briefly discussed.

VIII Preface

Preface

The chapter titled “Possibilistic-Probabilistic Models and Methods of Portfolio Optimization” by Alexander Yazenin, considers a generalization of Markowitz models with fuzzy random variables characterized by corresponding possibility distributions. Such situation is incident for financial assets, particularly, of Russian market. In this case the financial asset profitability can be represented by fuzzy random variable. Approaches to definition of numerical characteristics of fuzzy random variables are analyzed and proposed. Appropriate methods of calculation, in particular within the framework of shift-scaled representation are obtained. Principles of decision making in fuzzy random environment using these possibilistic-probabilistic models are formulated.

In the chapter “Towards Graded and Nongraded Variants of Stochastic Dominance” by Bernard De Baets and Hans De Meyer, a pairwise comparison method for random variables is established. This comparison results in a probabilistic relation on a given set of random variables. The transitivity of this probabilistic relation is investigated, which allows identifying appropriate strict or weak thresholds, depending upon the copula involved, turning the probabilistic relation into a strict order relation. The proposed method can also be seen as a way of generating graded as well as non-graded variants of the concept of stochastic dominance.

The chapter titled “Option Pricing in the Presence of Uncertainty” by Silvia Muzzioli and Huguette Reynaerts studies the derivation of the European option price in the Cox Ross Rubinstein (1979) binomial model in the presence of uncertainty in the volatility of the underlying asset. Two different approaches to the issue that concentrate on the fuzzification of one or both the two jump factors are proposed. The first approach is derived by assuming that both the jump factors are represented by triangular fuzzy numbers. The second approach to the option pricing problem is derived under the assumption that only the up jump factor is uncertain.

In the chapter titled “Non-Stochastic-Model Based Finance Engineering” by Toshihiro Kaino and Kaoru Hirota, a new corporate evaluation model and option pricing model based on fuzzy measures are proposed, which deal with the ambiguous subjectivity evaluation of human in the real world.

The chapter called “Collective Intelligence in Multiagent Systems: Interbank Payment Systems Application” by Luis Rocha-Mier, Leonid Sheremetov and Francisco Villarreal describes a new approach to the interbank net settlement payment systems (NSPS) modeling in order to analyze the actions of individual depositors. The model is developed within the framework of a COllective INtelligence (COIN). This framework is focused on the interactions at the local and the global levels among the consumer-agents (without a global knowledge of the environment model) in order to assure the optimization of the global utility function (GUF). A COIN is defined as a large Multi-Agent System (MAS) with no centralized control and communication, but where there is a global task to complete. Reinforcement learning algorithms are used at the local level, while techniques based on the COIN theory are used to optimize the global behavior. The proposed framework was implemented using Netlogo (agent-based parallel modeling and simulation environment). The results demonstrate that the inter-

IX

bank NSPS is a good experimental field for the application of the COIN theory and shows how the consumer-agents behavior converges to the Nash equilibrium adapting their actions to optimize the GUF.

Finally, the chapter titled “Fuzzy Models in Credit Risk Analysis” by Antonio Carlos Pinto Dias Alves, presents some concepts guiding credit risk analysis using fuzzy logic systems. Fuzzy quantification theory is used to make a kind of multivariate analysis giving more usable answers than traditional Logit or Probit analysis. To make the analysis some interesting accounting indicators are used that can efficiently point the financial health of a company.

At the moment, most of the models in the field of finance engineering are based on the stochastic theory. As shown in this book, the prediction based on these models sometime does not hit in the actual problem. One of the reasons is that they assume a known probability distribution and try to describe a system with the precision obtained by means of the exact probability densities. Nevertheless, as shown in chapter 4, for example, classification model that learns

unnecessarily complex requires lots of investor attention and further analysis to make an investment decision. In many cases it is impossible or unnecessary to describe the behavior of modeled system with the precision that may be obtained by means of the exact probability densities. SC models and decision making procedures described in this book, overcome these drawbacks which makes them more suitable for real-world situations. Perception based models can be a powerful tool helping to develop a new generation of human consistent, natural language based and easy to use decision support systems.

Finally, we would like to thank all the contributing authors for excellent

the field of soft computing, Studies in Fuzziness and Soft Computing Series Editorial Board and Springer Verlag for their interest in the topic and giving the possibility to publish this book.

I. Batyrshin

investment decisions is inappropriate as on top of being uncertain and the transaction patterns applying a regression model to support or even make

J. Kacprzyk

research papers, their institutions for the support provided for the research work in

L. Sheremetov L.A. Zadeh

March 2006

X Preface

Contents

Precisiated Natural Language (PNL) ....................................................................... 1 Lotfi A. Zadeh

Towards Human-Consistent Data-Driven Decision Support Systems via Fuzzy Linguistic Data Summaries ......................................................................... 37

Janusz Kacprzyk and Sławomir Zadrożny Moving Approximation Transform and Local Trend Associations in Time Series Data Bases.......................................................................................... 55

Ildar Batyrshin, Raul Herrera-Avelar, Leonid Sheremetov, and Aleksandra Panova

Perception Based Patterns in Time Series Data Mining ........................................ 85

Ildar Batyrshin, Leonid Sheremetov and Raul Herrera-Avelar Perception-Based Functions in Qualitative Forecasting ...................................... 119

Ildar Batyrshin and Leonid Sheremetov Towards Automated Share Investment System ................................................... 135

Dymitr Ruta

Estimating Classification Uncertainty of Bayesian Decision Tree Technique on Financial Data................................................................................ 155

Vitaly Schetinin, Jonathan E. Fieldsend, Derek Partridge, Wojtek J. Krzanowski, Richard M. Everson, Trevor C. Bailey, and Adolfo Hernandez

Invariant Hierarchical Clustering Schemes.......................................................... 181

Ildar Batyrshin and Tamas Rudas

Fuzzy Components of Cooperative Markets........................................................ 209

Milan Mareš

1. Data Mining

2. Decision Making

Possibilistic–Probabilistic Models and Methods of Portfolio Optimization ......................................................................................................... 241

Alexander V. Yazenin Toward Graded and Nongraded Variants of Stochastic Dominance ................... 261

Bernard De Baets and Hans De Meyer Option Pricing in the Presence of Uncertainty..................................................... 275

Silvia Muzzioli and Huguette Reynaerts

Toshihiro Kaino and Kaoru Hirota

Collective Intelligence in Multiagent Systems: Interbank Payment Systems Application............................................................................................. 331

Luis Rocha-Mier, Leonid Sheremetov and Francisco Villarreal

Fuzzy Models in Credit Risk Analysis................................................................. 353 Antonio Carlos Pinto Dias Alves

Nonstochastic Model-Based Finance Engineering .............................................. 303

XII Contents

List of Contributors

T.C. Bailey School of Engineering, Computer Science and Mathematics,

UK e-mail: [email protected] I. Batyrshin Mexican Petroleum Institute, Mexico e-mail: [email protected] A. Carlos Pinto Dias Alves

Brasil e-mail: [email protected]

R.M. Everson School of Engineering, Computer Science and Mathematics,

H. De Meyer

Department of Applied Mathematics,Biometrics and Process Control,

B. De Baets

University of Exeter

Ghent UniversityBelgium

Unidade Gestão de Riscos - Banco do Brasil S.A.

and Computer Science,Department of Applied Mathematics

Ghent University

e-mail: [email protected]

Belgium

UK University of Exeter

R. Herrera-Avelar Mexican Petroleum Institute, Mexico K. Hirota Department of Computational Intelligence and Systems Science, Interdisciplinary Graduate School of Science and Engineering,

Japan e-mail: [email protected] J. Kacprzyk Systems Research Institute, Polish Academy of Sciences Poland e-mail: [email protected] T. Kaino School of Business Administration,

Japan e-mail: [email protected] W.J. Krzanowski School of Engineering, Computer Science and Mathematics,

UK e-mail: [email protected]

J.E. Fieldsend School of Engineering, Computer Science and Mathematics,

UK e-mail: [email protected] A. Hernandez School of Engineering, Computer Science and Mathematics,

UK e-mail: [email protected]

XIV List of Contributors

Tokyo Institute of Technology



Aoyama Gakuin University


T. Rudas

D. Ruta

V. Schetinin School of Engineering, Computer Science and Mathematics,

UK


A. Panova

Russia

D. Partridge School of Engineering, Computer Science and Mathematics,


M. Mares ( )

S. Muzzioli Department of Economics,

Italy

H. Reynaerts

L. Rocha-Mier

Mexico


List of Contributors XV

Czech Republic

University of Modena and Reggio Emilia

Kazan Power Engineering Institute

University of Exeter UK

Department of Applied Mathematics and Computer Science

Mexican Petroleum Institute

Eötvös Loránd University


Hungary

British Telecom Group, Research & Venturing UK


Institute of Information Theory and Automation UTIA

Belgium

L. Sheremetov

Mexico

S. Zadrożny Systems Research Institute, Polish Academy of Sciences Poland e-mail: [email protected]


F. Villarreal

Mexico

Computer Science Department,

Russia

A.V. azenin Y

L.A. Zadeh

XVI List of Contributors

USA University of California

Tver State University



Lotfi A. Zadeh

1Reprinted with permission from AI Magazine, 25(3) Fall 2004, 74–91.

www.springerlink.com © Springer-Verlag Berlin Heidelberg 2007 L.A. Zadeh: Precisiated Natural Language, Studies in Computational Intelligence (SCI) 36, 1–33 (2007)

1Precisiated Natural Language (PNL)

2 L.A. Zadeh

Toward a Computational Theory of Perceptions,” which appeared in the Spring 2001 issue of AI Magazine (volume 22, No. 1, 73–84) [47]. The concept of precisiated natural language (PNL) was briefly introduced in that article, and PNL was employed as a basis for computation with perceptions. In what follows, the conceptual structure of PNL is described in greater detail, and PNL’s role in knowledge representation, deduction, and concept definition is outlined and illustrated by examples. What

that the exposition that follows is an outline of the basic ideas that underlie PNL rather than a definitive theory.

A natural language is basically a system for describing perceptions. Perceptions, such as perceptions of distance, height, weight, color, tem-perature, similarity, likelihood, relevance, and most other attributes of physi-cal and mental objects are intrinsically imprecise, reflecting the bounded ability of sensory organs, and ultimately the brain, to resolve detail and store information. In this perspective, the imprecision of natural languages

objects of computation? This is what PNL attempts to do. In PNL, precisiation is accomplished through translation into what is

elements are so-called generalized constraints and their combinations. What distinguishes GCL from languages such as Prolog, LISP, SQL, and, more generally, languages associated with various logical systems, for example, predicate logic, modal logic, and so on, is its much higher expressive power.

The conceptual structure of PNL mirrors two fundamental facets of human cognition (a) partiality and (b) granularity [3]. Partiality relates to the fact that most human concepts are not bivalent, that is, are a matter of degree. Thus, we have partial understanding, partial truth, partial pos-sibility, partial certainty, partial similarity, and partial relevance, to cite a few examples. Similarly, granularity and granulation relate to clumping of values of attributes, forming granules with words as labels, for example, young, middle-aged, and old as labels of granules of age.

Abstract

This article is a sequel to an article titled “A New Direction in AI –

How can a natural language be precisiated – precisiated in the sense of

termed a precisiation language. In the case of PNL, the precisiation

is a direct consequence of the imprecision of perceptions [1, 2].

should be understood is that PNL is in its initial stages of development and

language is the generalized-constraint language (GCL), a language whose

making it possible to treat propositions drawn from a natural language as

Existing approaches to natural language processing are based on bivalent logic – a logic in which shading of truth is not allowed. PNL abandons bivalence. By so doing, PNL frees itself from limitations

3

imposed by bivalence and categoricity, and opens the door to new approaches for dealing with long-standing problems in AI and related

At this juncture, PNL is in its initial stages of development. As it matures, PNL is likely to find a variety of applications, especially in the realms of world knowledge representation, concept definition, deduction, decision, search, and question answering.

made in our understanding of how natural languages can be dealt with on processing, logical, and computational levels. A huge literature is in existence. Among the important contributions that relate to the ideas des-cribed in this article are those of Biermann and Ballard [5], Klein [6], Barwise and Cooper [7], Sowa [8, 9], McAllester and Givan [10], Macias and Pulman [11], Mani and Maybury [12], Allan [13], Fuchs and Schwertelm [14], and Sukkarieh [15].

In classical, bivalent logic the principal quantifiers are all and some. However, there is a literature on so-called generalized quantifiers ex-emplified by most, many, and few [7, 16]. In this literature, such quantifiers are treated axiomatically, and logical rules are employed for deduction.

By contrast, in PNL quantifiers such as many, most, few, about 5, close to 7, much larger than 10, and so on are treated as fuzzy numbers and are manipulated through the use of fuzzy arithmetic [17–19]. For the most

simple examples. First, let us consider the Brian example [17]:

At first glance it may appear that such questions are unreasonable.

How can one say something about Brian’s height if all that is known is that he is much taller than most of his close friends? Basically, what PNL provides is a system for precisiation of propositions expressed in a

Precisiated Natural Language

fields [4, 50, 52].

duced, a question that arises at the outset is: What can PNL do that cannot be done through the use of existing approaches? A simple and yet importantexample relates to the basic role of quantifiers such as all, some, most, many,and few in human cognition and natural languages.

1 Introduction

part, inference is computational rather than logical. Following are a few

Natural languages (NLs) have occupied, and continue to occupy, a position of centrality in AI. Over the years, impressive advances have been

When a language such as preciasiated natural language (PNL) is intro-

Brian is much taller than most of his close friends. How tall is Brian?

4 L.A. Zadeh

Now let us look at the balls-in-box problem:

Most are large. Many large balls are heavy. . What fraction of balls are large and heavy?

The PNL answer is: most × many, where most and many are fuzzy numbers defined through their membership functions, and most × many is their product in fuzzy arithmetic [18]. This answer is a consequence of the general rule

Q1 As are Bs Q2 (A and B)s are Cs . (Q1 × Q2) As are (B and C)s Another simple example is the tall Swedes problem (version 1):

(a* denotes “approximately a”).

ability to deal with problems of this kind. The reason is that in the theory of generalized quantifiers there is no concept of the count of elements in a fuzzy set. How do you count the number of tall Swedes if tallness is a matter of degree? More generally, how do you define the probability measure of a fuzzy event [20]?

natural language through translation into the generalized-constraint language (GCL). Upon translation, the generalized constraints (GCs) are propagated through the use of rules governing generalized-constraint propagation, indu-cing a generalized constraint on the answer to the question. More specifically, in the Brian example, the answer is a generalized constraint on the height of Brian.

A less simple version of the problem (version 2) is the following

A box contains balls of various sizes and weights. The premises are:

Swedes who are more than twenty years old range in height from 140 centimeters to 220 centimeters. Most are tall. What is the average height of Swedes over twenty?

Swedes over twenty range in height from 140 centimeters to 220 centimeters. Over 70* percent are taller than 170* centimeters; less than 10* percent are shorterthan 150* centimeters, and less than 15 percent are taller than 200* centimeters.What is the average height of Swedes over twenty?

There is a basic reason that generalized quantifiers do not have an A PNL-based answer is given in Appendix.

5

What should be stressed is that the existing approaches and PNL are complementary rather than competitive. Thus, PNL is not intended to be used in applications such as text processing, summarization, syntactic analysis, discourse analysis, and related fields. The primary function of PNL is to provide a computational framework for precisiation of meaning rather than to serve as a means of meaning understanding and meaning representation. By its nature, PNL is maximally effective when the number of precisiated propositions is small rather than large and when the chains of reasoning are short rather than long. The following is intended to serve as a backdrop.

It is a deep-seated tradition in science to view the use of natural languages in scientific theories as a manifestation of mathematical imma-turity. The rationale for this tradition is that natural languages are lacking in precision. However, what is not recognized to the extent that it should be is that adherence to this tradition carries a steep price. In particular, a direct consequence is that existing scientific theories do not have the capability to operate on perception-based information – information exemplified by “Most Swedes are tall,” “Usually Robert returns from work at about 6 PM,” “There is a strong correlation between diet and longevity,” and “It is very unlikely that there will be a significant increase in the price of oil in the near future” (Fig. 1).

Fig. 1. Modalities of measurement-based and perception-based information


6 L.A. Zadeh

Such information is usually described in a natural language and is intrinsically imprecise, reflecting a fundamental limitation on the cognitive ability of humans to resolve detail and store information. Due to their imprecision, perceptions do not lend themselves to meaning representation and inference through the use of methods based on bivalent logic. To illustrate the point, consider the following simple examples.

– Most are large. – There are several times as many large balls as small balls. The question is: What is the number of small balls?

My perception is: – Usually Robert returns from work at about 6 PM.

– Most tall men wear large-sized shoes. – Robert is tall. – What is the probability that Robert wears large-sized shoes?

An immediate problem that arises is that of meaning precisiation. How can the meaning of the perception “There are several times as many large balls as small balls” or “Usually Robert returns from work at about 6 PM” be defined in a way that lends itself to computation and deduction? Furthermore, it is plausible, on intuitive grounds, that “Most Swedes are tall” conveys some information about the average height of Swedes. But what is the nature of this information, and what is its measure? Existing bivalent-logic-based methods of natural language processing provide no answers to such questions.

The balls-in-box example:

The Robert example (b):

A box contains balls of various sizes. My perceptions of the contents of the box are:

– There are about twenty balls.

The question is: What is the probability that Robert is home at about 6:15 PM?

The Robert example (a):

7

The incapability of existing methods to deal with perceptions is a direct consequence of the fact that the methods are based on bivalent logic – a logic that is intolerant of imprecision and partial truth. The existing methods are categorical in the sense that a proposition, p, in a natural language, NL, is either true or not true, with no shades of truth allowed. Similarly, p is either grammatical or ungrammatical, either ambiguous or unambiguous, either meaningful or not meaningful, either relevant or not relevant, and so on. Clearly, categoricity is in fundamental conflict with reality – a reality in which partiality is the norm rather than an exception. But, what is much more important is that bivalence is a major obstacle to the solution of such basic AI problems as commonsense reasoning and knowledge representation [8, 9, 21--25], nonstereotypical summarization [12], unrestricted question answering, [26], and natural language compu-tation [5].

PNL abandons bivalence. Thus, in PNL everything is, or is allowed to be, a matter of degree. It is somewhat paradoxical, and yet is true, that

structure of bivalent logic. By abandoning bivalence, PNL opens the door to a major revision of

2 The Concepts of Generalized Constraint and Generalized-Constraint Language

A conventional, hard constraint on a variable, X, is basically an inelastic restriction on the values that X can take. The problem is that in most realistic settings – and especially in the case of natural languages – constraints have some degree of elasticity or softness. For example, in the case of a sign in a hotel saying “Checkout time is 1 PM,” it is understood that 1 PM is not a hard constraint on checkout time. The same applies to “Speed limit is 65 miles per hour” and “Monika is young.” Furthermore, there are many different ways, call them modalities, in which a soft constraint restricts the values that a variable can take. These considerations suggest the following expression as the definition of generalized constraint (Fig. 2):


concepts and techniques for dealing with knowledge representation, concept

precisiation of a natural language cannot be achieved within the conceptual

role in this revision is that of a generalized constraint [27]. The basic ideasdefinition, deduction, and question answering. A concept that plays a key

stressed that what follows is an outline rather than a detailed exposition. underlying this concept are discussed in the following section. It should be

8 L.A. Zadeh

Fig. 2. Generalized constraint

X isr R,

where X is the constrained variable; R is the constraining relation; and r is a discrete-valued modal variable whose values identify the modality of the constraint [1]. The constrained variable may be an n-ary variable, X = (X1,…,Xn); a conditional variable, X|Y; a structured variable, as in Location(Residence(X)); or a function of another variable, as in f(X ). The principal modalities are possibilistic (r = blank), probabilistic (r = p), veristic (r = v), usuality (r = u), random set (r = rs), fuzzy graph (r = fg), bimodal (r = bm), and Pawlak set (r = ps). More specifically, in a possibilistic constraint,

X is R,

R is a fuzzy set that plays the role of the possibility distribution of X. Thus, if U = {u} is the universe of discourse in which X takes its values, then R is a

9

Fig. 3. Trapezoidal membership function of “small number” (“small number” is context dependent)

fuzzy subset of U and the grade of membership of u in R, µR (u), is the possibility that X = u:

µR(u) = Poss{X = u}.

For example, the proposition p: X is a small number is a possibilistic constraint in which “small number” may be represented as, say, a trapezoidal fuzzy number (Fig. 3), that represents the possibility distribution of X. In general, the meaning of “small number” is context dependent.

In a probabilistic constraint:

X isp R, X is a random variable and R is its probability distribution. For example,

X isp N(m, 2)

means that X is a normally distributed random variable with mean m and variance 2.

In a veristic constraint, R is a fuzzy set that plays the role of the verity (truth) distribution of X. For example, the proposition “Alan is half German, a quarter French, and a quarter Italian,” would be represented as the fuzzy set


σ

σ

10 L.A. Zadeh

in which Ethnicity (Alan) plays the role of the constrained variable, 0.5 | German means that the verity (truth) value of “Alan is German” is 0.5, and + plays the role of a separator.

In a usuality constraint, X is a random variable, and R plays the role of the usual value of X. For example, X isu small means that usually X is small. Usuality constraints play a particularly important role in common-sense knowledge representation and perception-based reasoning.

In a random set constraint, X is a fuzzy-set valued random variable and R is its probability distribution. For example,

X isrs (0.3\small + 0.5\medium + 0.2\large),

means that X is a random variable that takes the fuzzy sets small, medium, and large as its values with respective probabilities 0.3, 0.5, and 0.2. Random set constraints play a central role in the Dempster–Shafer theory of evidence and belief [28].

In a fuzzy graph constraint, the constrained variable is a function, f,

Fig. 4. Fuzzy graph of a function

i i j(i)),

and R is its fuzzy graph (Fig. 4). A fuzzy graph constraint is represented as

F is fg (Σ A × B

Ethnicity (Alan) isv (0.5 | German + 0.25 | French + 0.25 | Italian),

11

in which the fuzzy sets Ai and Bj(i), with j dependent on i, are the granules of X and Y, respectively, and Ai × Bj(i) is the Cartesian product of Ai and Bj(i). Equivalently, a fuzzy graph may be expressed as a collection of fuzzy if then rules of the form

if X is Ai then Y is Bj(i), i = 1, …; m; j = 1, …, n For example:

F isfg (small × small + medium × large + large × small)

may be expressed as the rule set:

if X is medium then Y is large if X is large then Y is small

Such a rule set may be interpreted as a description of a perception of f. A bimodal constraint involves a combination of two modalities:

probabilistic and possibilistic. More specifically, in the generalized constraint

X is a random variable, and R is what is referred to as a bimodal distribution, P, of X, with P expressed as

P: ΣiPj(i) \ Ai,

i j(i), with j dependent on i, are

random variable with granules labeled small, medium, and large and probability granules labeled low, medium, and high, then

X isbm (low\small\+high\medium+low\large)

which means that

Prob {X is small} is low Prob {X is medium} is high Prob {X is large} is low


in which the A are granules of X, and the Pthe granules of probability (Fig. 5). For example, if X is a real-valued

if X is small then Y is small

X isbm R,

12 L.A. Zadeh

Fig. 5. Bimodal distribution: perception-based probability distribution

In effect, the bimodal distribution of X may be viewed as a description of a perception of the probability distribution of X. As a perception of likelihood, the concept of a bimodal distribution plays a key role in perception-based calculus of probabilistic reasoning [29].

The concept of a bimodal distribution is an instance of combination of different modalities. More generally, generalized constraints may be

composites of other generalized constraints. The set of all such constraints together with deduction rules – rules that are based on the rules governing

(X isp A) and ( (X, Y ) is B),

where A is the probability distribution of X and B is the possibility distri-bution of the binary variable (X,Y). Constraints of this form play an important role in the Dempster-Shafer theory of evidence [28].

combined and propagated, generating generalized constraints that are

language (GCL). An example of a generalized constraint in GCL is generalized-constraint propagation – constitutes the generalized-constraint

13

3 The Concepts of Precisiability and Precisiation Language

Informally, a proposition, p, in a natural language, NL, is precisiable if its meaning can be represented in a form that lends itself to computation and deduction. More specifically, p is precisiable if it can be translated into what may be called a precisiation language, PL, with the understanding that the elements of PL can serve as objects of computation and deduction. In this sense, mathematical languages and the languages associated with propositional logic, first-order and higher-order predicate logics, modal logic, LISP, Prolog, SQL, and related languages may be viewed as precisiation languages. The existing PL languages are based on bivalent logic. As a direct consequence, the languages in question do not have sufficient expressive power to represent the meaning of propositions that are descriptors of perceptions. For example, the proposition “All men are mortal” can be precisiated by translation into the language associated with first-order logic, but “Most Swedes are tall” cannot.

The principal distinguishing feature of PNL is that the precisiation language with which it is associated is GCL. It is this feature of PNL that makes it possible to employ PNL as a meaning-precisiation language for perceptions. What should be understood, however, is that not all perce-ptions or, more precisely, propositions that describe perceptions, are pre-cisiable through translation into GCL. Natural languages are basically systems for describing and reasoning with perceptions, and many perceptions are much too complex to lend themselves to precisiation.

The key idea in PNL is that the meaning of a precisiable proposition, p, in a natural language is a generalized constraint X isr R. In general, X, R, and r are implicit, rather than explicit, in p. Thus, translation of p into GCL may be viewed as explicitation of X, R, and r. The expression X isr R will be referred to as the GC form of p, written as GC(p).

In PNL, a proposition, p, is viewed as an answer to a question, q. To illustrate, the proposition p: Monika is young may be viewed as the answer to the question q: How old is Monika? More concretely:

p: Monika is young → p*: Age (Monika) is young q: How old is Monika? → q*: Age (Monika) is ?R

where p* and q* are abbreviations for GC(p) and GC(q), respectively. In general, the question to which p is an answer is not unique. For

example, p: Monika is young could be viewed as an answer to the question


14 L.A. Zadeh

q: Who is young? In most cases, however, among the possible questions there is one that is most likely. Such a question plays the role of a default question. The GC form of q is, in effect, the translation of the question to which p is an answer. The following simple examples are intended to clarify the process of translation from NL to GCL.

where much.older is a binary fuzzy relation that has to be calibrated as a whole rather through composition of much and older.

To deal with the example, it is necessary to have a means of counting the number of elements in a fuzzy set. There are several ways in which this can be done, with the simplest way relating to the concept of ΣCount (sigma count). More specifically, if A and B are fuzzy sets in a space U = {u1, …, un}, with respective membership functions µA and µB, respectively, then

ΣCount(A) = Σi µA( ),

and the relative ΣCount, that is, the relative count of elements of A that are in B, is defined as

ΣCount(A/B) = ΣCount(A∩B)/ΣCount(B)

in which the membership function of the intersection A∩B is defined as

µA∩B(u) = µA(u) µB(u),

where is min or, more generally, a t-norm [30, 31]. Using the concept of sigma count, the translation in question may be

expressed as

where most is a fuzzy number that defines most as a fuzzy quantifier

iµ

[32, 33] (Fig. 6).

∧

∧

p: Most Swedes are tall

p: Tandy is much older than Dana → (Age(Tandy), Age(Dana)) is much.older,

Most Swedes are tall → ΣCount(tall.Swedes/Swedes) is most

15

Fig. 6. Calibration of most and usually represented as trapezoidal fuzzy numbers

p: Usually Robert returns from work at about 6 PM q: When does Robert return from work? X: Time of return of Robert from work, Time(Return) R: about 6 PM (6* PM) r: u (usuality) p*: Prob {Time(Return) is 6* PM} is usually.

A less simple example is:

In this example, it is expedient to start with the semantic network

representation [8] of p that is shown in Fig. 7. In this representation, E is the main event and E* is a subevent of E:

E: significant increase in the price of oil in the near future E*: significant increase in the price of oil Thus, near future is the epoch of E*.

The GC form of p may be expressed as

Prob(E) is R,

where R is the fuzzy probability, very unlikely, whose membership function is related to that of likely by Fig. 8.

µvery.unlikely(u) = (1– µlikely)2,


p: It is very unlikely that there will be a significant increase in the price of oil in the near future.

16 L.A. Zadeh

Fig. 7. Semantic network of p. (It is very unlikely that there will be a significant increase in the price of oil in the near future)

Fig. 8. Precisiation of very unlikely

where it is assumed for simplicity that very acts as an intensifier that squares the membership function of its operand, and that the membership function of unlikely is the mirror image of that of likely.

Given the membership functions of significant increase and near future

represents a variation in the price of oil satisfies the conjunction of the constraints significant increase and near future. This degree may be employed to compute the truth value of p as a function of the probability

(Fig. 9), we can compute the degree to which a specified time function that


distribution of the variation in the price of oil. In this instance, the use of

What should be noted is that precisiation and meaning representation are not coextensive. More specifically, precisiation of a proposition, p, assu-mes that the meaning of p is understood and that what is involved is a precisiation of the meaning of p.

PNL may be viewed as an extension of truth-conditional semantics [34, 13].

17

A concept that plays a key role in PNL is that of a protoform – an abbreviation of prototypical form. Informally, a protoform is an abstracted summary of an object that may be a proposition, command, question, scenario, concept, decision problem, or, more generally, a system of such objects. The importance of the concept of a protoform derives from the fact that it places in evidence the deep semantic structure of the object to which it applies. For example, the protoform of the proposition p: Monika is

instantiation of A, Monika is instantiation of B, and young is instantiation of C. Abstraction may be annotated, for example, A/Attribute, B/Name, and

is a means of generalization. Abstraction has levels, just as summarization does. For example, successive abstractions of p: Monika is young are

4 The Concept of a Protoform and the Structure of PNL

C/Attribute.value. A few examples are shown in Fig. 10. Basically, abstraction

Fig. 9. Computation of degree of compatibility

is abstraction of Monika, and C is abstraction of young. Conversely, Age is young is PF( p): A(B) is C, where A is abstraction of the attribute Age, B

A(Monika) is young, A(B) is young, and A(B) is C, with the last abstraction resulting in the terminal protoform, or simply the protoform. With this understanding, the protoform of p: Most Swedes are tall is QAs are Bs, or

equivalently, Count(B/A) is Q, and the protoform of p: Usually Robert returns from work at about 6 PM, is Prob(X is A) is B, where X, A, and B are abstractions of “Time (Robert.returns.from work),” “About 6 PM,” and “Usually.” For simplicity, the protoform of p may be written as p**.

Abstraction is a familiar concept in programming languages and programming systems. As will be seen in the following, the role of abstraction in PNL is significantly different and more essential because PNL abandons bivalence. The concept of a protoform has some links to other basic concepts such as ontology [9, 35–37] conceptual graph [38] and Montague grammar [39]. However, what should be stressed is that the concept of a protoform is not limited – as it is in the case of related concepts – to propositions whose meaning can be represented within the con-ceptual structure of bivalent logic.

As an illustration, consider a proposition, p, which was dealt with earlier:

18 L.A. Zadeh

Fig. 10. Examples of translation from NL to PFL

may be expressed as:

Using the protoform of p and calibrations of significant increase, near-

which any given probability distribution of time functions representing the price of oil satisfies the generalized constraint, Prob(E ) is A. As was pointed out earlier, if the degree of compatibility is interpreted as the truth value of p, computation of the truth value of p may be viewed as a PNL-based extension of truth-conditional semantics.

By serving as a means of defining the deep semantic structure of an object, the concept of a protoform provides a platform for a fundamental mode of classification of knowledge based on protoform equivalence, or PF equivalence for short. More specifically, two objects are protoform equivalent at a specified level of summarization and abstraction if at that level they have identical protoforms. For example, the propositions p:

and q: Rome is much older than Boston. A simple example of PF equival-ent concepts is: cluster and mountain.

A less simple example involving PF equivalence of scenarios of decision problems is the following. Consider the scenarios of two decision problems, A and B:

Most Swedes are tall, and q: Few professors are rich, are PF equivalent since their common protoform is QAs are Bs or, equivalently, Count (B/A) is Q. The same applies to propositions p: Oakland is near San Francisco,

Scenario A:

Alan has severe back pain. He goes to see a doctor. The doctor tells him that there are two options (1) do nothing and (2) do surgery. In the case of surgery, there are two possibilities (a) surgery is successful, in

Should Alan elect surgery?

19 Precisiated Natural Language

future, and likely, (Fig. 9), we can compute, in principle, the degree to

With reference to the semantic network of p (Fig. 9), the protoform of p

in which case Alan will be paralyzed from the neck down. Question: which case Alan will be pain-free and (b) surgery is not successful,

the near future. p: It is very unlikely that there will be a significant increase in the price of oil in

Prob(E) is A (A: very unlikely) E: B(E*) is C (B: epoch; C: near.future) E*: F(D) (F: significant increase; D: price of oil) D: G(H) (G: price; H: oil)

Scenario B:

Alan needs to fly from San Francisco to St. Louis and has to get there as soon as possible. One option is to fly to St. Louis via Chicago, and the other is to go through Denver. The flight via Denver is scheduled to arrive in St. Louis at time a. The flight via Chicago is scheduled to arrive in St. Louis at time b, with a < b. However, the connection time in Denver is short. If the connection flight is missed, then the time of arrival in St. Louis will be c, with c > b. Question: Which option is best?

protoform means is that there are two options, one that is associated with a certain gain or loss and another that has two possible outcomes whose probabilities may not be known precisely.

The protoform language, PFL, is the set of protoforms of the elements of the generalized-constraint language, GCL. A consequence of the concept of PF equivalence is that cardinality of PFL is orders of magnitude lower than that of GCL or, equivalently, the set of precisiable propositions in NL. As will be seen in the sequel, the low cardinality of PFL plays an essential role in deduction.

20 L.A. Zadeh

The common protoform of A and B is shown in Fig. 11. What this

Fig. 11. Protoform equivalence of scenarios A and B

(3) a multiagent, modular deduction database, DDB; and (4) a world knowledge database, WKDB. The constituents of DDB are modules, with a module consisting of a group of protoformal rules of deduction, expressed in


Fig. 12. Basic structure of PNL

The principal components of the structure of PNL (Fig. 12) are (1) a dictionary from NL to GCL; (2) a dictionary from GCL to PFL (Fig. 13);

Fig. 13. Structure of PNL dictionaries

probability, possibility, usuality, fuzzy arithmetic [18], fuzzy logic, search, and so on. For example, a rule drawn from fuzzy logic is the compositional

°and B, defined in the computational part, in which µA, µB, and µA°B are the membership functions of A, B, and A°B, respectively. Similarly, a rule

computational part. The rules of deduction in DDB are, basically, the rules that govern

propagation of generalized constraints. Each module is associated with an agent whose function is that of controlling execution of rules and performing embedded computations. The top-level agent controls the

22 L.A. Zadeh

Fig. 14. Compositional rule of inference

PFL (Fig. 14), that are drawn from a particular domain, for example,

rule of inference, expressed in Fig. 14 where A B is the composition of A

drawn from probability is shown in Fig. 15, where D is defined in the

Fig. 15. Rule drawn from probability

passing of results of computation from a module to other modules. The structure of protoformal, that is, protoform based, deduction is shown in Fig. 16. A simple example of protoformal deduction is shown in Fig. 17.

describe world knowledge, for example, Parking near the campus is hard The world knowledge database (WKDB) consists of propositions that

to find on weekdays between 9 and 4; Big cars are safer than small cars; If


Fig. 16. Structure of protoform-based deduction

Fig. 17. Example of protoformal reasoning

World knowledge – and especially world knowledge about proba-bilities – plays an essential role in almost all search processes, including searching the Web. Semantic Web and related approaches have contributed to a significant improvement in performance of search engines. However, for further progress it may be necessary to add to existing search engines

knowledge-management systems such as the Web Ontology Language (OWL) [36], Cyc [40], WordNet [41], and ConceptNet [42].

An example of PFL-based deduction in which world knowledge is used is the so-called Robert example. A simplified version of the example is the following.

The initial data set is the proposition (perception) p: Usually Robert returns from work at about 6 PM. The question is q: What is the probability that Robert is home at 6:15 PM?

The first step in the deduction process is to use the NL to GCL dictionary for deriving the generalized-constraint forms, GC(p) and GC(q), of p and q, respecttively. The second step is to use the GCL to PFL dictionary to derive the protoforms of p and q. The forms are:

real challenge to employ PNL to add this capability to sophisticated the capability to operate on perception-based information. It will be a

24 L.A. Zadeh

A/person works in B/city then it is likely that A lives in or near B; If A/person is at home at time t then A has returned from work at t or earlier, on the understanding that A stayed home after returning from work. Much, perhaps most, of the information in WKDB is perception based.

The third step is to refer the problem to the top-level agent with the

query, Is there a rule or a chain of rules in DDB that leads from p** to q**? The top-level agent reports a failure to find such a chain but success in finding a proximate rule of the form

The fourth step is to search the WKDB for a proposition or a chain of propositions that allow Y to be replaced by X. A proposition that makes

Prob(Time(Robert.returns.from.work) is about 6 PM) is usually Prob(Time(Robert is home) is 6:15 PM) is ?E

p*:q*:

and

p**: Prob(X is A) is B q**: Prob(Y is C) is ?D

Prob(X is A) is B Prob(X is C) is D.

this possible is (A/person is in B/location) at T/time if A arrives at B before T, with the understanding that A stays at B after arrival.

The last step involves the use of the modified form of q**: Prob(X is E) is ?D, in which E is “before 6:15 PM.” The answer to the initial query is given by the solution of the variational problem associated with the rule

The value of D is the desired probability.

What is important to observe is that there is a tacit assumption that underlies the deduction process, namely, that the chains of deduction are short. This assumption is a consequence of the intrinsic imprecision of perception-based information. Its further implication is that PNL is likely to be effective, in the main, in the realm of domain-restricted systems associated with small universes of discourse.

As we move further into the age of machine intelligence and automated reasoning, a problem that is certain to grow in visibility and importance is that of definability – that is, the problem of defining the meaning of a concept or a proposition in a way that can be understood by a machine.

5 PNL as a Definition Language


that was described earlier (Fig. 15):

It is a deeply entrenched tradition in science to define a concept in a language that is based on bivalent logic [43–45]. Thus defined, a concept, C, is bivalent in the sense that every object, X, is either an instance of C or it is not, with no degrees of truth allowed. For example, a system is either stable or unstable, a time series is either stationary or nonstationary, a sentence is either grammatical or ungrammatical, and events A and B are either independent or not independent.

The problem is that bivalence of concepts is in conflict with reality. In most settings, stability, stationarity, grammaticality, independence, relevance, causality, and most other concepts are not bivalent. When a concept that is not bivalent is defined as if it were bivalent, the ancient Greek sorites (heap) paradox comes into play. As an illustration, consider the standard bivalent definition of independence of events, say A and B. Let P(A), P(B), and PA(B) be the probabilities of A, B, and B given A, respectively. Then A and B are independent if and only if PA(B) = P(B).

Prob(X is C) is D Prob(X is A) is B

Now assume that the equality is not satisfied exactly, with the differ-ence between the two sides being ε. As ε increases, at which point will A and B cease to be independent?

Clearly, independence is a matter of degree, and furthermore the degree is context dependent. For this reason, we do not have a universally accepted definition of degree of independence [46].

One of the important functions of PNL is that of serving as a definition

general definitions exist, for example, causality, summary, relevance, and

and so on. In what follows, the concept of independence of random vari-ables will be used as an illustration.

For simplicity, assume that X and Y are random variables that take

with S, M, and L denoting the fuzzy intervals small, medium, and large.

language. More specifically, PNL may be employed as a definition language for two different purposes: first, to define concepts for which no

smoothness; and second, to redefine concepts for which universally accep-

Using the definition of relative ΣCount, we construct a contingency table,

ted definitions exist, for example, linearity, stability, independence,

26 L.A. Zadeh

values in the interval [a, b]. The interval is granulated as shown in Fig. 18,

C, of the form show in Fig. 18, in which an entry such as ΣCount (S/L) is a granulated fuzzy number that represents the relative ΣCount of occurrences of Y, which are small, relative to occurrences of X, which are large.

Fig. 18. PNL-based definition of statistical independence

Based on the contingency table, the degree of independence of Y from X may be equated to the degree to which the columns of the contingency table are identical. One way of computing this degree is, first, to compute the distance between two columns and then aggregate the distances between all pairs of columns. PNL would be used for this purpose.

An important point that this example illustrates is that, typically, a PNL-based definition involves a general framework with a flexible choice of details governed by the context or a particular application. In this sense, the use of PNL implies an abandonment of the quest for universality, or, to put it more graphically, of the one-size-fits-all modes of definition that are associated with the use of bivalent logic.

Another important point is that PNL suggests an unconventional approach to the definition of complex concepts. The basic idea is to define a complex concept in a natural language and then employ PNL to precisiate the definition.

More specifically, let U be a universe of discourse and let C be a concept that I wish to define, with C relating to elements of U. For example, U is a set of buildings, and C is the concept of tall building. Let p(C) and d(C) be, respectively, my perception and my definition of C. Let I(p(C)) and I(d(C)) be the intensions of p(C) and d(C), respectively, with intension used in its logical sense [34, 43], that is, as a criterion or procedure that identifies those elements of U that fit p(C) or d(C). For example, in the case of tall buildings, the criterion may involve the height of a building.


Informally, a definition, d(C), is a good fit or, more precisely, is co-intensive, if its intension coincides with the intension of p(C). A measure of goodness of fit is the degree to which the intension of d(C) coincides with that of p(C). In this sense, co-intension is a fuzzy concept. As a high-level definition language, PNL makes it possible to formulate definitions

Existing theories of natural languages are based, anachronistically, on Aristotelian logic – a logical system whose centerpiece is the principle of the excluded middle: Truth is bivalent, meaning that every proposition is either true or not true, with no shades of truth allowed.

whose degree of co-intensiveness is higher than that of definitions for mulated through the use of languages based on bivalent logic.

6 Concluding Remarks

The problem is that bivalence is in conflict with reality – the reality of pervasive imprecision of natural languages. The underlying facts are

ability of sensory organs, and ultimately the brain, to resolve detail and store information.

PNL abandons bivalence. What this means is that PNL is based on fuzzy logic – a logical system in which everything is, or is allowed to be, a matter of degree.

Abandonment of bivalence opens the door to exploration of new directions in theories of natural languages. One such direction is that of precisiation. A key concept underlying precisiation is the concept of a

the meaning of a proposition drawn from a natural language as a generalized constraint. Conventional, bivalent constraints cannot be used for this purpose. The concept of a generalized constraint provides a basis for construction of GCL – a language whose elements are generalized constraints and their combinations. Within the structure of PNL, GCL serves as a precisiation language for NL. Thus, a proposition in NL is

What should be underscored is that in its role as a high-level definition language, PNL provides a basis for a significant enlargement of the role of natural languages in scientific theories.

is precisiable. In effect, the elements of PNL are precisiable propositions in NL.

generalized constraint. It is this concept that makes it possible to represent

precisiated through translation into GCL. Not every proposition in NL

28 L.A. Zadeh

Dedication

This article is dedicated to Noam Chomsky.

References

1. Zadeh, L. A. 1999. From computing with numbers to computing with words – From manipulation of measurements to manipulation of perceptions. IEEE Transactions on Circuits and Systems 45(1): 105–119

2. Zadeh, L. A. 2000. Toward a logic of perceptions based on fuzzy logic. In Discovering the World with Fuzzy Logic: Studies in Fuzziness and Soft Computing (57), eds. V. Novak and I. Perfilieva, pp. 4–25. Heidelberg: Physica-Verlag

(a) a natural language, NL, is, in essence, a system for describing percep-tions and (b) perceptions are intrinsically imprecise, reflecting the bounded

4. Novak, V. 1991. Fuzzy logic, fuzzy sets, and natural languages. International Journal of General Systems 20(1): 83–97

5. Biermann, A. W. and Ballard, B. W. 1980. Toward natural language comput-ation. American Journal of Computational Linguistics (6)2: 71–86

6. Klein, E. 1980. A semantics for positive and comparative adjectives. Linguistics and Philosophy 4(1): 1–45

7. Barwise, J. and Cooper, R. 1981. Generalized quantifiers and natural language. Linguistics and Philosophy 4(1): 159–209

8. Sowa, J. F. 1991. Principles of Semantic Networks: Explorations in the Representation of Knowledge. San Francisco: Morgan Kaufmann

9. Sowa, J. F. 1999. Ontological categories. In Shapes of Forms: From Gestalt Psychology and Phenomenology to Ontology and Mathematics, ed. L. Albertazzi, pp. 307–340. Dordrecht, The Netherlands: Kluwer.

10. McAllester, D. A. and Givan, R. 1992. Natural language syntax and first-order inference. Artificial Intelligence 56(1): 1–20

11. Macias, B. and Stephen G. Pulman. 1995. A method for controlling the production of specifications in natural language. The Computing Journal 38(4): 310–318

12. Mani, I. and Maybury M. T., eds. 1999. Advances in Automatic Text Summarization. Cambridge, MA: The MIT Press

13. Allan, K. 2001. Natural Language Semantics. Oxford: Blackwell 14. Fuchs, N. E. and Schwertel, U. 2003. Reasoning in Attempto Controlled

English. In Proceedings of the Workshop on Principles and Practice of Semantic Web Reasoning (PPSWR 2003), pp. 174–188. Lecture Notes in Computer Science. Berlin: Springer


3. Zadeh, L. A. 1997. Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Fuzzy Sets and Systems 90(2): 111–127

15. Sukkarieh, J. 2003. Mind Your Language! Controlled Language for Inference Purposes. Paper presented at the Joint Conference of the Eighth International Workshop of the European Association for Machine Translation and the Fourth Controlled Language Applications Workshop, Dublin, Ireland, 15–17 May

16. Peterson, P. 1979. On the Logic of Few, Many and Most. Journal of Formal Logic 20(1–2): 155–179

17. Zadeh, L. A. 1983. A computational approach to fuzzy quantifiers in natural languages. Computers and Mathematics 9: 149–184

18. Kaufmann, A. and Gupta, M. M. 1985. Introduction to Fuzzy Arithmetic: Theory and Applications. New York: Van Nostrand

19. Hajek, P. 1998. Metamathematics of Fuzzy Logic: Trends in Logic (4). Dordrecht, The Netherlands: Kluwer

20. Zadeh, L. A. 1968. Probability measures of fuzzy events. Journal of Mathe- matical Analysis and Applications 23: 421–427

21. McCarthy, J. 1990. Formalizing Common Sense, eds. V. Lifschitz and J. McCarthy. Norwood, New Jersey: Ablex

22. Davis, E. 1990. Representations of Common-sense Knowledge. San Francisco: Morgan Kaufmann

23. Yager, R. R. 1991. Deductive approximate reasoning systems. IEEE Transactions on Knowledge and Data Engineering 3(4): 399–414

24. Sun, R. 1994. Integrating Rules and Connectionism for Robust Commonsense Reasoning. New York: Wiley

25. Dubois, D. and Prade, H. 1996. Approximate and commonsense reasoning: From theory to practice. In Proceedings of the Foundations of Intelligent Systems. Ninth International Symposium, pp. 19–33. Berlin: Springer

26. Lehnert, W. G. 1978. The Process of Question Answering – A Computer Simulation of Cognition. Hillsdale, New Jersey: Lawrence Erlbaum

27. Zadeh, L. A. 1986. Outline of a computational approach to meaning and knowledge representation based on the concept of a generalized assignment statement. In Proceedings of the International Seminar on Artificial Intelligence and Man-Machine Systems, eds. M. Thoma and A. Wyner, pp. 198–211. Heidelberg: Springer

28. Shafer, G. 1976. A Mathematical Theory of Evidence. Princeton, New Jersey: Princeton University Press

29. Zadeh, L. A. 2002. Toward a perception-based theory of probabilistic reasoning with imprecise probabilities. Journal of Statistical Planning and Inference 105(1): 233–264

30. Pedrycz, W. and F. Gomide. 1998. Introduction to Fuzzy Sets. Cambridge, MA: MIT.

31. Klement, P., Mesiar, R., and Pap, E. 2000. Triangular norms – Basic properties and representation theorems. In Discovering the World with Fuzzy Logic: Studies in Fuzziness and Soft Computing (57), eds. V. Novak and I. Perfilieva, pp. 63–80. Heidelberg: Physica-Verlag

30 L.A. Zadeh

32. Zadeh, L. A. 1984. Syllogistic reasoning in fuzzy logic and its application to reasoning with dispositions. In Proceedings International Symposium on

Society 33. Mesiar, R. and H. Thiele. 2000. On T-quantifiers and S-quantifiers. In

Discovering the World with Fuzzy Logic: Studies in Fuzziness and Soft Computing (57), eds. V. Novak and I. Perfilieva, pp. 310–318. Heidelberg: Physica-Verlag

34. Cresswell, M. J. 1973. Logic and Languages. London: Methuen 35. Smith, B. and C. Welty. 2002. What is ontology? Ontology: Towards a new

synthesis. In Proceedings of the Second International Conference on Formal Ontology in Information Systems. New York: Association for Computing Machinery

36. Smith, M. K., C. Welty, and D. McGuinness, eds. 2003. OWL Web Ontology Language Guide. W3C Working Draft 31. Cambridge, MA: World Wide Web Consortium (W3C)

Multiple-Valued Logic, pp. 148–153. Los Alamitos, CA: IEEE Computer

37. Corcho, O., Fernandez-Lopez, M., and Gomez-Perez, A. 2003. Methodologies, tools and languages for building ontologies. Where is their meeting point? Data and Knowledge Engineering 46(1): 41–64

38. Sowa, J. F. 1984. Conceptual Structures: Information Processing in Mind and Machine. Reading, MA: Addison-Wesley

39. Partee, B. 1976. Montague Grammar. New York: Academic 40. Lenat, D. B. 1995. CYC: A large-scale investment in knowledge infrastructure.

Communications of the ACM 38(11): 32–38 41. Fellbaum, C., ed. 1998. WordNet: An Electronic Lexical Database. Cambridge,

MA: MIT 42. Liu, H. and Singh, P. 2004. Commonsense reasoning in and over natural

language. In Proceedings of the Eighth International Conference on

43. Gamat, T. F. 1996. Language, Logic and Linguistics. Chicago: University of Chicago Press

44. Gerla, G. 2000. Fuzzy metalogic for crisp logics. In Discovering the World with Fuzzy Logic: Studies in Fuzziness and Soft Computing (57), eds. V. Novak and I. Perfilieva, pp. 175–187. Heidelberg: Physica-Verlag

45. Hajek, P. 2000. Many. In Discovering the World with Fuzzy Logic: Studies in Fuzziness and Soft Computing (57), eds. V. Novak and I. Perfilieva, pp. 302–304. Heidelberg: Physica-Verlag

46. Klir, G. J. 2000. Uncertainty-based information: A critical review. In Discover-ing the World with Fuzzy Logic: Studies in Fuzziness and Soft Computing (57), eds. V. Novak and I. Perfilieva, pp. 29–50. Heidelberg: Physica-Verlag

47. Zadeh, L. A. 2001. A new direction in AI – Toward a computational theory of perceptions. AI Magazine 22(1): 73–84

World with Fuzzy Logic: Studies in Fuzziness and Soft Computing (57), eds. V. Novak and I. Perfilieva, pp. 192–232. Heidelberg: Physica-Verlag

Logic: Studies in Fuzziness and Soft Computing. Heidelberg: Physica-Verlag

Knowledge-Based Intelligent Information and Engineering Systems Brighton,

48. Lehmke, S. 2000. Degrees of truth and degrees of validity. In Discovering the

49. Novak, V., and I. Perfilieva, eds. 2000. Discovering the World with Fuzzy

U.K.: KES Secretariat, Knowledge Transfer Partnership Centre.


In the following, a* denotes “approximately a.” Swedes more than 20 years of age range in height from 140 to 220 cm. Over 70*% are taller than 170* cm; less than 10*% are shorter than 150* cm; and less than 15% are taller than 200* cm. What is the average height of Swedes over 20?

The Tall Swedes Problem (Version 2)

Appendix

Consider a population of Swedes over 20, S = {Swede1, Swede2, …, SwedeN}, with hi, i = 1, …, N, being the height of Si.

The datum “Over 70*% of S are taller than 170* cm,” constrains the hi in h = (hi, …, hN). The constraint is precisiated through translation into GCL. More specifically, let X denote a variable taking values in S, and let X|(h(X) is ≥ 170*) denote a fuzzy subset of S induced by the constraint h(X) is ≥ 170*. Then Over 70*% of S are taller than 170* →

( ) ( )( )∑ ≥≥ *7.0*170|1: isisXhXCountN

GCL

where ΣCount is the sigma count of Xs that satisfy the fuzzy constraint h(X) is ≥ 170*. Similarly,

Less than 10*% of S are shorter than 150*→

( ) ( )( )∑ ≤≤ 1.0*150|1: isisXhXCountN

GCL

and

Less than 15*% of S are taller than 200*→

( ) ( )( ) 51.0*200|1: ∑ ≤≥ isisXhXCountN

GCL

A general deduction rule in fuzzy logic is the following. In this rule, X is a variable that takes values in a finite set U = {u1, u2, …, uN}, and a(X) is a real-valued attribute of X, with ai = a (ui) and a = (ai, …, aN)

Fuzzy Logic Solution

32 L.A. Zadeh

( )( )( ) DisXAv

BisCisXaXCountN

?

|1∑

where Av(X ) is the average value of X over U. Thus, computation of the average value, D, reduces to the solution of the nonlinear programming problem

( ) ( )⎟⎠⎞

⎜⎝⎛

∑= i iacNBiavD µµµ 1max

subject to

∑= i iaN

v 1 (average value)

where µD, µB, and µC are the membership functions of D, B, and C, respectively. To apply this rule to the constraints in question, it is necessary to form their conjunction. Then, the fuzzy logic solution of the problem may be reduced to the solution of the nonlinear programming problem

( ) ( ) ( ) ∧⎟⎠⎞

⎜⎝⎛∧⎟⎟

⎠

⎞⎜⎜⎝

⎛⎟⎠⎞

⎜⎝⎛= ∑∑ ≤≤≥≥ i ii ihD h

Nh

Nv *150*1.0*170*7.0

11max µµµµµ

( )⎟⎠⎞

⎜⎝⎛∧ ≥≤ ih

N *200*15.01 µµ

subject to

∑= i ihN

v 1

Note that computation of D requires calibration of the membership

functions of ≤ 170*, ≤ 0.7*, ≤ 150*, ≤ 0.1*, ≥ 200*, and ≤ 0.15*.

Precisiated Natural Language 33

Ildar Batyrshin, Janusz Kacprzyk, Leonid Sheremetov ...978-3-540-36247-0/1.pdf · Ildar Batyrshin, Janusz Kacprzyk, Leonid Sheremetov, Lotﬁ A. Zadeh (Eds.) Perception-based Data

Documents