Linear Algebra by Nayak

.

PREFACE

Linear Algebra plays an important role in the spheres of Mathematics, Physics, andEngineering due to their inherent viabilities. The aim of this text book is to give rigorousand thorough analysis and applications of various aspects of Linear algebra and analysis withapplications. Also, the present book has been designed in a lucid and coherent manner sothat the Honours and Postgraduate students of various Universities may reap considerablebenefit out of it. I have chosen the topics with great care and have tried to present themsystematically with various examples.

The author expresses his sincere gratitude to his teacher Prof. S. Das, Departmentof Mathematics, R. K. Mission Residential College, Narendrapur, India, who taught himthis course at the UG level. Author is thankful to his friends and colleagues, especially,Dr. S. Bandyopadhyay, Mr. Utpal Samanta and Mr. Arup Mukhopadhyay of BankuraChristian College, Dr. Jayanta Majumdar, Durgapur Govt. College, Pratikhan Mandal,Durgapur Govt. College, for their great help and valuable suggestions in the preparationof the book. Author also extends his thanks to Prof. (Dr.) Madhumangal Pal, Dept. ofApplied Mathematics, Vidyasagar University, for his encouragement and handy suggestions.

This book could not have been completed without the loving support and encouragementof my parents, wife (Mousumi) and son (Bubai). I extend my thanks to other well wishersrelatives and students for embalming me to sustain enthusiasm for this book. Finally, Iexpress my gratitude to Books and Allied (P) Ltd., specially Amit Ganguly, for bringingout this book.

I would like to thank to Dr. Sk. Md. Abu Nayeem of Aliah University, West Bengal andmy student Mr. Buddhadeb Roy for support in writing/typing in LaTex verision.

This book could not have been completed without the loving support and encouragementof my parents, wife (Mousumi) and son (Bubai). I extend my thanks to other well wishersrelatives and students for embalming me to sustain enthusiasm for this book. Finally, Iexpress my gratitude to Asian Books Private Limited, Delhi, for bringing out this book.

Critical evaluation, suggestions and comments for further improvement of the book willbe appreciated and gratefully acknowledged.

Prasun Kumar Nayak ,(nayak [email protected])

Bankura Christian College,Bankura, India, 722 101.

.

Dedicated to my parentsSankar Nath Nayak and Mrs. Indrani Nayakfor their continuous encouragement and support..

Contents

1 Theory of Sets 11.1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 Description of Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.1.2 Types of Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Algebraic Operation on Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2.1 Union of Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2.2 Intersection of Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2.3 Disjoint Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.2.4 Complement of a Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.2.5 Difference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.2.6 Symmetric Difference . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.3 Duality and Algebra Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.4 Cartesian Product of Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151.5 Cardinal Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181.6 Relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

1.6.1 Equivalence Relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221.7 Equivalence Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

1.7.1 Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311.8 Poset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

1.8.1 Dual Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351.8.2 Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361.8.3 Universal Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361.8.4 Covering Relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371.8.5 Maximal and Minimal Elements . . . . . . . . . . . . . . . . . . . . . 381.8.6 Supremum and Infimum . . . . . . . . . . . . . . . . . . . . . . . . . . 40

1.9 Lattices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421.9.1 Lattice Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441.9.2 Sublattices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451.9.3 Bounded Lattices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451.9.4 Distributive Lattices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451.9.5 Trivially Complement . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

1.10 Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471.10.1 Types of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481.10.2 Composite mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

1.11 Permutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631.11.1 Equal permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631.11.2 Identity permutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631.11.3 Product of permutations . . . . . . . . . . . . . . . . . . . . . . . . . . 631.11.4 Inverse of permutations . . . . . . . . . . . . . . . . . . . . . . . . . . 64

iii

iv CONTENTS

1.11.5 Cyclic permutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651.12 Enumerable Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

2 Theory of Numbers 832.1 Number System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

2.1.1 Non-positional Number System . . . . . . . . . . . . . . . . . . . . . . 832.1.2 Positional Number System . . . . . . . . . . . . . . . . . . . . . . . . 84

2.2 Natural Number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 852.2.1 Basic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 852.2.2 Well Ordering Principle . . . . . . . . . . . . . . . . . . . . . . . . . . 852.2.3 Mathematical Induction . . . . . . . . . . . . . . . . . . . . . . . . . . 86

2.3 Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 892.3.1 Divisibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 902.3.2 Division Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

2.4 Common Divisor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 942.4.1 Greatest Common Divisor . . . . . . . . . . . . . . . . . . . . . . . . . 95

2.5 Common Multiple . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 972.5.1 Lowest Common Multiple . . . . . . . . . . . . . . . . . . . . . . . . . 97

2.6 Diophantine Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 992.6.1 Linear Diophantine Equations . . . . . . . . . . . . . . . . . . . . . . . 99

2.7 Prime Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1022.7.1 Relatively Prime Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 1032.7.2 Fundamental Theorem of Arithmetic . . . . . . . . . . . . . . . . . . . 108

2.8 Modular/Congruence System . . . . . . . . . . . . . . . . . . . . . . . . . . . 1112.8.1 Elementary Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 1112.8.2 Complete Set of Residues . . . . . . . . . . . . . . . . . . . . . . . . . 1172.8.3 Reduced Residue System . . . . . . . . . . . . . . . . . . . . . . . . . 1212.8.4 Linear Congruences . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1222.8.5 Simultaneous Linear Congruences . . . . . . . . . . . . . . . . . . . . 1252.8.6 Inverse of a Modulo m . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

2.9 Fermat’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1312.9.1 Wilson’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

2.10 Arithmetic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1362.10.1 Euler’s Phi Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1362.10.2 The Mobius Function: . . . . . . . . . . . . . . . . . . . . . . . . . . . 1412.10.3 Divisor Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1432.10.4 Floor and Ceiling Functions . . . . . . . . . . . . . . . . . . . . . . . . 1442.10.5 Mod Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

3 Theory of Matrices 1493.1 Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

3.1.1 Special Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1493.1.2 Square Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

3.2 Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1533.2.1 Equality of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1533.2.2 Matrix Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1533.2.3 Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . 1543.2.4 Transpose of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

3.3 Few Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1613.3.1 Nilpotent Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1613.3.2 Idempotent Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

CONTENTS v

3.3.3 Involuntary Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1633.3.4 Periodic Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1633.3.5 Symmetric Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1633.3.6 Skew-symmetric Matrices . . . . . . . . . . . . . . . . . . . . . . . . . 1643.3.7 Normal Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

3.4 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1663.4.1 Product of Determinants . . . . . . . . . . . . . . . . . . . . . . . . . 1713.4.2 Minors and Co-factors . . . . . . . . . . . . . . . . . . . . . . . . . . 1813.4.3 Adjoint and Reciprocal of Determinant . . . . . . . . . . . . . . . . . 1833.4.4 Symmetric and Skew-symmetric Determinants . . . . . . . . . . . . . 1843.4.5 Vander-Monde’s Determinant . . . . . . . . . . . . . . . . . . . . . . . 1863.4.6 Cramer’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

3.5 Complex Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1893.5.1 Transpose Conjugate of a Matrix . . . . . . . . . . . . . . . . . . . . . 1903.5.2 Harmitian Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1903.5.3 Skew-Harmitian Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 1913.5.4 Unitary Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1923.5.5 Normal Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

3.6 Adjoint of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1923.6.1 Reciprocal of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 1953.6.2 Inverse of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1953.6.3 Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . . . . 201

3.7 Orthogonal Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2023.8 Submatrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2053.9 Partitioned Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

3.9.1 Square Block Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 2063.9.2 Block Diagonal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 2073.9.3 Block Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2083.9.4 Block Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2083.9.5 Inversion of a Matrix by Partitioning . . . . . . . . . . . . . . . . . . . 209

3.10 Rank of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2113.10.1 Elementary Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . 2123.10.2 Row-reduced Echelon Matrix . . . . . . . . . . . . . . . . . . . . . . . 213

3.11 Elementary Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2163.11.1 Equivalent Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2183.11.2 Congruent Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2203.11.3 Similar Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220

4 Vector Space 2354.1 Vector Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

4.1.1 Vector Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2404.2 Linear Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246

4.2.1 Smallest Subspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2474.2.2 Direct Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247

4.3 Quotient Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2494.4 Linear Combination of Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . 251

4.4.1 Linear Span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2524.4.2 Linearly Dependence and Independence . . . . . . . . . . . . . . . . . 257

4.5 Basis and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2624.6 Co-ordinatisation of Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277

4.6.1 Ordered Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278

vi CONTENTS

4.6.2 Co-ordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2784.7 Rank of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279

4.7.1 Row Space of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 2794.7.2 Column Space of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . 281

4.8 Isomorphic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283

5 Linear Transformations 2935.1 Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293

5.1.1 Kernal of Linear Mapping . . . . . . . . . . . . . . . . . . . . . . . . . 2975.1.2 Image of Linear Mapping . . . . . . . . . . . . . . . . . . . . . . . . . 300

5.2 Isomorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3115.3 Vector Space of Linear Transformation . . . . . . . . . . . . . . . . . . . . . . 316

5.3.1 Product of Linear Mappings . . . . . . . . . . . . . . . . . . . . . . . . 3185.3.2 Invertible Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321

5.4 Singular and Non-singular Transformation . . . . . . . . . . . . . . . . . . . . 3245.5 Linear Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3265.6 Matrix Representation of Linear Transformation . . . . . . . . . . . . . . . . 3275.7 Orthogonal Linear Transformation . . . . . . . . . . . . . . . . . . . . . . . . 3415.8 Linear Functional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344

5.8.1 Dual Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3455.8.2 Second Dual Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3495.8.3 Annihilators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350

5.9 Transpose of a Linear Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . 354

6 Inner Product Space 3656.1 Inner Product Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365

6.1.1 Euclidean Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3656.1.2 Unitary Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366

6.2 Norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3696.3 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374

6.3.1 Orthonormal Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3756.3.2 Orthogonal Complement . . . . . . . . . . . . . . . . . . . . . . . . . . 3766.3.3 Direct Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377

6.4 Projection of a Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380

7 Matrix Eigenfunctions 3957.1 Matrix Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395

7.1.1 Polynomials of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 3957.1.2 Matrices and Linear Operator . . . . . . . . . . . . . . . . . . . . . . . 396

7.2 Characteristic Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3967.2.1 Eigen Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3987.2.2 Eigen Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3987.2.3 Eigen Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410

7.3 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4137.3.1 Orthogonal Diagonalisation . . . . . . . . . . . . . . . . . . . . . . . . 417

7.4 Minimal Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4207.5 Bilinear Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425

7.5.1 Real Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . 4257.6 Canonical Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427

7.6.1 Jordan Canonical Form . . . . . . . . . . . . . . . . . . . . . . . . . . 4357.7 Functions of Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439

CONTENTS vii

7.7.1 Powers of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4407.7.2 Roots of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441

7.8 Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4437.8.1 Exponential of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 4437.8.2 Logarithm of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 445

7.9 Hyperbolic and Trigonometric Functions . . . . . . . . . . . . . . . . . . . . . 446

8 Boolean Algebra 4558.1 Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455

8.1.1 Unary Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4558.1.2 Binary Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455

8.2 Boolean Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4558.2.1 Boolean Algebra as a Lattice . . . . . . . . . . . . . . . . . . . . . . . 4558.2.2 Boolean Algebra as an Algebraic System . . . . . . . . . . . . . . . . . 4568.2.3 Boolean Algebra Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . 4608.2.4 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4618.2.5 Partial Order Relation . . . . . . . . . . . . . . . . . . . . . . . . . . . 466

8.3 Boolean Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4678.3.1 Constant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4678.3.2 Literal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4678.3.3 Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4688.3.4 Monomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4688.3.5 Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4688.3.6 Factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4688.3.7 Boolean Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468

8.4 Truth Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4698.5 Disjunctive Normal Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470

8.5.1 Complete DNF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4708.6 Conjunctive Normal Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472

8.6.1 Complete CNF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4728.7 Switching Circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475

Chapter 1

Theory of Sets

George Cantor gave an intuitive definition of sets in 1895. Sets are building blocks of variousdiscrete structures. The theory of sets is one of the most important tools of mathematics.The main aim of this chapter is to discuss some properties of sets.

1.1 Sets

A well defined collection of distinct object is defined as set. Each object is known as anelement or member of a set. The following are some examples of the set.

(i) All integers.(ii) The positive rational numbers less than or equal to 5.(iii) The planet in solar system.(iv) Indian rivers.(v) 4th semester B.Sc. students of Burdwan University.(vi) The peoples in particular locality.(vii) Cricketers in the world.

By the term ‘well defined’, we mean that we are given a collection of objects with certaindefinite property or properties, given in such a way that we are clearly able to distinguishwhether a given object is our collection or not. The following collections are not examplesof set.

(i) Good students of a class, because ’good’ is not well-defined word, a student may begood for particular people, but he/she may not be good for other people.

(ii) Tall students, ’tall’ is not well-defined measurement.

(iii) Girls and boys of a particular locality, because there is no sharp boundary of age forwhich a female can surely identify.

These type of collections are designated as fuzzy sets. The elements of a set must bedistinct and distinguishable. By ’distinct’, it means that no element is repeated, and by’distinguishable’, means there is no doubt whether an element is either in the set or not inthe set.

(i) The standard mathematical symbols used to represent sets are upper-case letters likeA,B,X, · · · etc. and the elements of the set can be written in lower-case letters like a,b, p, q, x, y, etc.

(ii) If an element x is a member of a set A, we write x ∈ A, read as ‘x belongs to A’ or’a is an element of S’ or ’a is in S’. The symbol ∈ is a Greek alphabet epsilon. On

1

2 Theory of Sets

the other hand, if x is not the element of A, then we write x 6∈ A , which is read as ’xdoes not belong to A’ or ’x is not an element of the set A’.

(iii) If A is a set and a is any object, it should be easy to decide whether a ∈ A or a 6∈ A.Only then is A termed well-defined.

For example, A = 1, 2, 3, 4, 5 be a set, has elements 1, 2, 3, 4, 5. Here 1 ∈ A, but 6 6∈ A.Note that, S = 1, 1, 3 is not a set.

1.1.1 Description of Sets

As a set is determined by its elements, we have specify the elements of set A in order todefine A. Five common methods are used to describe the sets, they are (i) roster or list orenumeration or tabular method, and (ii) selector or rule or set-builder property method (iii)The characteristics method and (iv) Diagrammatic method.

(i) Roster method

In this method, all elements are listed explicitly separated by commas and are enclosedwithin braces . Sometimes parenthesis ( ) or square [ ] may also be used.

A set is defined by naming all its members and can be used only for finite sets. Let X,whose elements are x1, x2, · · · , xn is usually written as X = x1, x2, · · · , xn. For examples,the set of all natural numbers less than 5 can be represented as A = 1, 2, 3, 4.

Sometimes, it is not humanly possible to enumerate all elements, but after knowing someinitial elements one can guess the other elements. In this case dots are used at the end withinthe braces. For an example, set of positive integers can be written as A = 1, 2, 3, 4, · · ·,set of all integers B = . . . ,−2,−1, 0, 1, 2, . . ., etc.

It may be noted that the elements of a set can be written in any order, but the nameof an element is listed only once. For example, 2,3,4, 4,3,2, 2,4,3 all represent thesame set. Thus, while we describe a set in this manner, the order of the elements is notimportant.

(ii) Set-builder method

In this method, a set can be specified by stating one or more properties, which uniquelysatisfy by the elements. A set in this method is written as

A = x : P1(x) or P2(x), etc, (1.1)

i.e., x ∈ A if x satisfy the properties P1(x), P2(x), etc. The symbol ’:’ is read as ’suchthat’, it is also denoted by ’—’ or ’/’. For example,

A = x : x is a positive even integers B = x : x is a vowel in English alphabet C = x : x is integer and 1 ≤ x ≤ 10 etc .

It is required that the property P be such that for any given x ∈ U , the universal set, theproposition P (x) is either true of false.(iii) The characteristics method

A set is defined by a function, usually called a characteristic function, that declares whichelements of U are members of the set and which are not. Let U = u1, u2, . . . , un be theuniversal set and A ⊆ U . Then the characteristic function of A is defined as

χA(ui) =

1, if ui ∈ A0, if ui 6∈ A.

(1.2)

Sets 3

i.e., the characteristic function maps elements of U to elements of the set 0, 1, whichis formally expressed by χA : U → 0, 1. For example, let U = 1, 2, 3, . . . , 10 and A =2, 4, 6, 8, 10, then χA(2) = 1, χA(4) = 1, χA(6) = 1, χA(8) = 1, χA(10) = 1 and χA(a) = 0for all other elements of U . It may be observed that χA is onto function but not one-one.

(iv) Diagrammatic method

A set can be represented diagrammatically by closed figures like circles, triangles, rectangles,etc. The point in the interior of the figure represents the elements of the set. Such a repre-sentation is called a Venn diagram or Venn-Euler diagram, after the British mathematicianVenn. In this diagram the universal set U is represented by the interior of a rectangle andeach subset of U is represented by the circle inside the rectangle.

If two sets are equal then they represent by same circle. If the sets A and B are disjointthen the circles for A and B are drawn in such a way that they have no common area, Ifthe sets A and B have a small common area. If A ⊆ B then the circle for A is drawn fullyinside the circle for B. This visual representation helps us to prove the set identities veryeasily.

(v) Recursion method

A set can be described by giving one or more elements of the set and a rule for generatingthe remaining elements. The underlying process is called recursion. For example, (a) theset A = 1, 4, 7, · · · can be described as A = a0 = 1, an+1 = an + 3; (b) F = Fn : F0 =0, F1 = 1, Fn = Fn−1 + Fn−2 is a set described by recursion. This set is called the set ofFibonacci numbers.Some standard sets and their notations

Some sets are frequently used in mathematical analysis or in algebraic structure, which arestated below.

N → The set of all natural numbers.Z → The set of all integers.Z+ → The set of all positive integer.Q → The set of all rational numbers.Q+ → The set of all positive rational numbers.R → The set of all real numbers.R+ → The set of all positive real numbers.C → The set of all complex numbers.

1.1.2 Types of Sets

Null set

A set which contains no element is called null set or empty set or void set and is denoted bythe Greek alphabet φ (read as phi). In Roaster method, it is denoted by . For example,

A = x : x2 + 4 = 0 and x ∈ Ris a null set. To describe the null set, we can use any property, which is not true for anyelement. It may be noted that the set φ or 0 is not a null set.

A set which is not a null set, is called non-empty set.

Singleton set

A set consisting only a single element is called a singleton or unit set. For example, A = 0,B =x: 1 < x < 3, x is integer, the solution set C = x : x − 2 = 0 = 2 etc., areexamples of singleton set. Note that 0 is not a null set, since it contains 0 as its member,it is singleton set.

4 Theory of Sets

Finite and infinite sets

A set containing finite elements is called finite sets, otherwise it is called infinite set, i.e., aset does not contain a definite number of elements.

(i) A =x: x is the consonant in English alphabet,

(ii) B =x: 1 < x < 15, x is integer

are examples of finite sets, whereas

(i) A =x: x is a rational numbers,

(ii) B =x: x is a straight line in space,

are the examples of infinite sets. Here, the process of counting the different elements comesto one end. Here, the process of counting the different elements comes to an end. We let|A| to denote the numbers of elements of a finite set A.

Indexed set

A set, whose elements are themselves sets is often referred to as a family of sets. Let usconsider a family of n sets A1, A2, . . . , An in the form F = Aα|α ∈ I where Aα correspondsto an element α in the set I. I is said to be an indexing set and α is called the set index.In general, I be an arbitrary set then F = Aα|α ∈ I is an arbitrary collection of sets Aα

indexed by I.

Set of sets

If the elements of a set be also the some other sets, then this set is known as a family ofsets, or a set of sets. For example, A =2, 3, 5, 6, 8, z, R+ is a set of sets.

Subset and superset

Let A and B be two given sets. The set B is said to be a subset of A if

x ∈ B =⇒ x ∈ A, (1.3)

i.e., every element of B is an element of A. This is very often denoted by B ⊆ A (writtenas B is contained or included in A). This is called set inclusion. For example,

(i) The set of all integers (Z) is the subset of all rational numbers (Q).

(ii) A =a, b, c, d, B =a, b, c, d, e, f. Here each element of A is also an element of B,thus A ⊆ B.

(iii) A =1, 5, 7 and B =1, 5, 7. Here A ⊆ B and B ⊆ A.

(iv) φ is a subset of every set.

(v) The subsets of A =2, 3, 4 are φ, 2, 3, 4, 2, 3, 2, 4, 3, 4 and 2, 3, 4.

(vi) A =2, 3, 5, B =2, 3, 6. Then A 6⊆ B and B 6⊆ A, because, 5 ∈ A but 5 6∈ B and6 ∈ B but 6 6∈ B.

If B ⊆ A, then A is called the super set of B, which is read as ‘A is a super set of B’.

Sets 5

Proper subset

The set A is called proper subset of B if every element of A is a member of B and there isat least one element in B such that it is not in the set A. It is written as A ⊂ B. Therefore,B is the proper subset of A if

(i) x ∈ B =⇒ x ∈ A

(ii) ∃ y ∈ A such that y /∈ B.

In this case, B ⊆ A and A 6= B and B is said to the proper subset of A and is denoted byB ⊂ A. If B is the subset of A(i.e. B ⊂ A) A is called the super set of B. For example,

(i) 1, 2 is the proper subset of 1, 2, 3, 4.

(ii) The set of vowels is a proper subset of the set of English alphabet

(iii) N (set of natural numbers) is a proper subset of Z (set of integers).

Note the following:

(i) If ∃ even a single element in A which is not in B, then A is not a subset of B and wewrite A 6⊆ B. For example 1, 2 6⊆ 2, 4, 6, 8, 9.

(ii) If A ⊆ B or B ⊆ A, then the sets A and B are said to be comparable. For example,if A = 1, 2, B = 5, 6, 7 then A 6⊂ B and these are not comparable.

(iii) Every set is a subset of itself and every set is a subset of the universal set.

(iv) φ has no proper subset. Also, A ⊆ φ⇒ A = φ.

(v) For any set A, A ⊆ A. This is known as the reflexive law of inclusion.

(vi) If A ⊆ B and B ⊆ C, then A ⊆ C. This is known as transitive law of inclusion.

In Venn diagram, the universal set is usually represented by a rectangular region and itssubset by closed bounded regions inside the rectangular region.

Equality of sets

If A ⊆ B and B ⊆ A, then A and B contain the same members. Two sets A and B aresaid to be equal if every element of A is an element of B and also every element of B is anelement of A. That is, A ⊆ B and B ⊆ A. The equality, of two sets is denoted by A = B.Conversely, if A = B then A ⊆ B and B ⊆ A must be satisfied. For example, A = 1, 4, 9and B = 4, 9, 1 are equal sets. To indicate that A and B are not equal, we write A 6= B.

Theorem 1.1.1 The null set is a subset of every set.

Proof: Let A be an arbitrary set. Then, in order to show that φ ⊂ A, we must show thatthere is no element of φ which is not contained in A. Since φ contains no element at all, nosuch element can therefore be found. Hence, φ ⊂ A.

Theorem 1.1.2 The number of subsets of a given set containing n elements is 2n.

Proof: Let A be an arbitrary set containing n elements. Then, one of its subsets is theempty set. Apart from this,

(i) The number of subsets of A, each containing 1 element =(n1

).

(ii) The number of subsets of A, each containing 2 elements =(n2

).

6 Theory of Sets

(iii) The number of subsets of A, each containing 3 elements =(n3

).

...

(iv) The number of subsets of A, each containing n elements =(nn

).

Therefore, the total number of subsets of A is

= 1 +(n

1

)+(n

2

)+ · · ·+

(n

n

)= (1 + 1)n = 2n.

The number of proper subsets of a set with n elements is 2n − 1.

Power set

A set formed by the all subsets of a given non empty set set S is called the power set of theset S and is denoted by P (S). If S = a, b, c then

P (S) =φ, a, b, c, a, b, a, c, b, c, a, b, c

.

Note that P (φ) = φ. The power set of any given set is always non-empty. The family ofall subsets of P (A) is called a second order power set of A and is denoted by P 2(A), whichstands for P (P (A)). Similarly, higher order power sets P 3(A), P 4(A), . . . are defined.

Order of a set is defined by the numbers of elements of A and is denoted by O(A). Fromthe above property, it is observed that the number of elements of the power set P (A) is 2n

if A contains n elements. For example, if A = 1, 2,−1 then O(A) = 3 and OP (A) = 8.In general, if O(A) = n then OP (A) = 2n and O(P 2(A)) = 22n.

Universal sets

In the theory of set it is observed that all the sets under consideration are the subsets ofa certain set. This set is called the universal set and it is usually denoted by U or S.Conversely, the universal set is the superset of every set. For example, The set of realnumbers < is the universal set for the set of integers Z and set of rational numbers Q.Again, the set of integers Z is the universal set for the sets of even integers, set of positiveintegers, etc.

This set is of all possible elements that are relevant and considered under particularcontext or application from which sets can be formed. The set U is not unique and it is asuper set of each of the given set. In venn diagram, the universal set is usually representedby a rectangular region.

1.2 Algebraic Operation on Sets

Like addition, multiplication and other operations on numbers in arithmetic, there are cer-tain operations on sets, namely union, intersection, complementation, etc. In this section,we shall discuss several ways of combining different sets and develop some properties amongthem.

1.2.1 Union of Sets

Let A and B be two given subsets of an universal set U . Union (or join) of two subsets Aand B, denoted by A ∪B, is defined by

A ∪B = x : x ∈ A or x ∈ B or both, (1.4)

Algebraic Operation on Sets 7

here the ‘or’ is means ‘and/or’, i.e. the set contains the elements which either belong to Aor B or both. The Venn diagram of Fig. 1.1 illustrate pictorially the meaning of ∪, whereU is the rectangular area, and A and B are disks. Union is also known as join or logical sumof A and B. Note that, the common elements are to be taken only once. For example,

A B U

Figure 1.1: Venn diagram of A ∪B (shaded area)

(i) If A =1, 3, 4, 5, a, b and B =a, b, c, 2, 3, 4, 6 thenA ∪B =1, 2, 3, 4, 5, 6, a, b, c.

(ii) If A = [2, 5] and B = [1, 3], then A ∪B = [1, 5] = x : 1 ≤ x ≤ 5.

From the Venn diagram we get the following properties of set union:

(i) Union s idempotent, i.e., A ∪A = A,

(ii) Set union is associative, i.e., (A ∪B) ∪ C = A ∪ (B ∪ C),

(iii) A ∪ U = U : Absorption by U . A ∪ φ = A: identity law

(iv) Set union is commutative, i.e., A ∪B = B ∪A,

(v) A ⊆ A ∪B and B ⊆ A ∪B for sets A and B.

(vi) If A ⊆ U ⇒ A ∪ U = U , U = the universal set and if A ⊆ B then A ∪B = B.

The union operation can be generalized for any number of sets. The union of the subsetsA1, A2, . . . , An is given by,

n⋃i=1

Ai = A1 ∪A2 ∪ . . . ∪An = x : x ∈ Ai; for some i = 1, 2, . . . , n

and for a family of sets Ai; i ∈ I is defined as⋃i∈I

Ai = x : x ∈ Ai, for some i ∈ I.

1.2.2 Intersection of Sets

Let A and B be two given subsets of an universal set U . The intersection of A and B isdenoted by A ∩B and is defined by

A ∩B = x : x ∈ A and x ∈ B. (1.5)

The intersection of the sets A and B is the set of all elements which are in both the sets Aand B. The A∩B is shown in Fig. 1.2. The intersection is also known as meet. A∩B is readas A intersection B or A meet B. For example, (i) let A =2, 5, 6, 8 and B =2, 6, 8, 9, 10then A∩B =2, 6, 8 (ii) if A = [2, 5] and B = [1, 3], then A∩B = [2, 3] = x : 2 ≤ x ≤ 3.From the Venn diagram we get the following properties of intersection:

8 Theory of Sets

A B

Figure 1.2: Venn diagram of A ∩B (shaded area)

(i) Intersection is idempotent, i.e., A ∩A = A, follows from A ⊂ A,

(ii) A ∩B ⊂ A ; A ∩B ⊂ B and A ⊆ B then A ∩B = A.

(iii) A ∩ φ = φ : absorption by φ. A ∩ U = A: identity law.

(iv) Set intersection is commutative, i.e., A ∩B = B ∩A.

(v) Set intersection is associative, i.e., (A ∩B) ∩ C = A ∩ (B ∩ C).

The intersection of the n subsets is given byn⋂

i=1

Ai = A1 ∩A2 ∩ . . . ∩An = x : x ∈ Ai,∀i.

Ex 1.2.1 For given two sets A = x : 2 cos2 x + sinx ≤ 2;B = x : x ∈ [π2 ,

3π2 ], find

A ∩B. [KH 06]

Solution: The solution of the trigonometric relation 2 cos2 x+ sinx ≤ 2 is given by,2 cos2 x+ sinx ≤ 2 ⇒ sinx[1− 2 sinx] ≤ 0

⇒ sinx ≤ 0 and 1− 2 sinx ≥ 0or sinx ≥ 0 and 1− 2 sinx ≤ 0

⇒ sinx ≤ 0 or sinx ≥ 12.

If x ∈ [π2 ,

3π2 ], then the solutions of sinx ≤ 0 are given by π ≤ x ≤ 3π

2 and the solutions ofsinx ≥ 1

2 are given by π2 ≤ x ≤ 5π

6 . Therefore,

A ∩B =[π,

3π2

]∪[π

2,5π6

].

1.2.3 Disjoint Sets

It is sometimes observed that the intersection between two non-empty sets produced a nullset. In this case, no element is common in A and B and these two sets are called disjointor mutually exclusive sets. Thus two sets A and B are said to be disjoint if and only ifA∩B = φ, i.e. they have no element in common. Then Venn diagram of disjoint sets A andB is shown in Fig. 1.3. For example, (i) A = 1, 2, 3 and B = 6, 7, 9 and (ii) if A =x:x is even integer and B =x: x is odd integer then A ∩B = φ, i.e., A and B are disjoint.

When A ∩B 6= φ, the sets A and B are said to be intersecting.

Note 1.2.1 The three relations B ⊂ A, A∪B = A and A∩B = B are mutually equivalent,i.e., one implies the other two.

Algebraic Operation on Sets 9

A B

Figure 1.3: Disjoint sets A and B

1.2.4 Complement of a Set

Let A be a subset of an universal set U . Then complement of a subset A, with respect toU , denoted by A′, Ac, A or −A, is defined by

A′ = x : x ∈ U but x 6∈ A, (1.6)

i.e., the set contains the elements which belong to the universal set U but not elements ofA. The venn diagram of Ac is shown in Fig. 1.4. Clearly, if A′ is the complement of A, then

"!#

A

U

Ac

Figure 1.4: Complement of A

A is a complement of A′. For example, let A = 1, 3, 5, 7, 9, if U = 1, 2, 3, 4, 5, 6, 7, 8, 9then A′ = 2, 4, 6, 8. From Venn diagram we have

(i) (A′)′ = A : involution property.

(ii) U ′ = φ ; φ′ = U .

(iii) If A ⊆ B then Bc ⊆ Ac and conversely, if Ac ⊆ Bc then B ⊆ A.

(iv) A ∪A′ = U : law of excluded middle. A ∩A′ = φ : law of contradiction.

(v) (a) (A ∪B)c = Ac ∩Bc and (b) (A ∪B)c = Ac ∪Bc : De Morgan’s laws.

In particular, for a finite family of subsets F = A1, A2, . . . , An, the De Morgan’s law canbe written as (

n⋃i=1

Ai

)′=

n⋂i=1

A′i and

(n⋂

i=1

Ai

)′=

n⋃i=1

A′i.

Ex 1.2.2 Prove that (A ∩ C) ∪ (B ∩ C ′) = φ⇒ A ∩B = φ.

Solution: The relation (A ∩ C) ∪ (B ∩ C ′) = φ gives A ∩ C = φ and B ∩ C ′ = φ. Now,

B ∩ C ′ = φ⇒ B ⊆ C.

Therefore, A ∩ C = φ⇒ A ∩B = φ.

10 Theory of Sets

1.2.5 Difference

Let A,B be any two subsets of an universal set U . The difference of two subsets A and Bof an universal set U is a subset of A, denoted by A−B or A/B and is defined by

A−B = x : x ∈ A and x /∈ B, (1.7)

i.e., the set containing of those elements of A which are not elements of B. Also

B −A = x : x ∈ B and x 6∈ A. (1.8)

This is also called the relative component of the set B with respect to the set A. A− B iscalled the complement of B relative to A. The differences A − B and B − A are shown in

A B

A−B B −A6 6

Figure 1.5: Set difference A−B and B −A

Fig. 1.5. A − B is read as A difference B or A minus B. For example, if A =2, 4, 5, 8and B =2, 5, 7, 10 then A−B =4, 8 and B −A =7, 10. From the Venn diagram wehave:

(i) A−A = φ, A− φ = A,

(ii) A−B ⊆ A, B −A ⊆ B, and A−B = A if A ∩B = φ.

(iii) Set difference is non-commutative, i.e., A−B 6= B −A,

(iv) A−B = φ when A ⊆ B.

(v) A ∩A = φ then A−B = A and B −A = B,

(vi) A−B = A ∩B′.

(vii) (A−B)∪A = A, (A−B) ∪B = A ∪B and (A−B) ∩B = φ.

(viii) A−B = A if and only if A ∩B = φ.

(ix) A−B, A ∩B and B −A are mutually exclusive.

If the set A is the universal set, the complement is absolute, known as complementation andis usually denoted by B.

Ex 1.2.3 For two subsets A and B of an universal set U , show thatA− (B − C) = (A−B) ∪ (A ∩ C).

Solution: Let x be any element of A− (B − C), then by definition,

A− (B − C) ⇔ x : x ∈ A and x 6∈ (B − C)⇔ x : x ∈ A and (x 6∈ B or x 6∈ C)⇔ x : (x ∈ A and x 6∈ B) or (x ∈ A and x ∈ C)⇔ x : x ∈ (A−B) or x ∈ (A− C) = (A−B) ∪ (A ∩ C).

Hence, A− (B−C) ⊆ (A−B)∪ (A∩C) and (A−B)∪ (A∩C) ⊆ A− (B−C), consequently,A− (B − C) = (A−B) ∪ (A ∩ C).

Duality and Algebra Sets 11

1.2.6 Symmetric Difference

If A and B be two subsets of an universal set U . The symmetric difference, denoted byA∆B or A⊕B, is defined by

A∆B = = x : x ∈ A or x ∈ B but x 6∈ B= x : (x ∈ A and x /∈ B) or (x ∈ B and x /∈ A).

The set (A−B) ∪ (B −A) is also called the symmetric difference of A and B. Thus,

A∆B = (A ∪B)− (A ∩B) = (A ∩B′) ∪ (A′ ∩B)= (A ∪B)4 (A ∩B).

The Venn diagram of A∆B is shown in Fig. 1.6. If A = 1, 2, 4, 7, 9 and B = 2, 3, 7, 8, 9

A B

A4B

Figure 1.6: Symmetric difference of A and B (shaded ares)

then, A∆B = 1, 3, 4, 8. Note that,

(A−B) ∩ (B −A) = (A ∩B′) ∩ (B ∩A′)= A ∩ (B ∩B′) ∩A′

= (A ∩ φ) ∩A′ = A ∩A′ = φ.

Therefore, A 4 B can be considered as the union of disjoint subsets A − B and B − A,provided A−B and B−A are both non empty. From the definition, we have the following,

(i) Symmetric difference is commutative, i.e., A∆B = B∆A,

(ii) Symmetric difference is associative, i.e., (A∆B)∆C = A∆(B∆C),

(iii) A∆φ = A, for all subsets of A.

(iv) A∆A = φ, for all subsets of A.

(v) A∆B = φ iff A = B,

(vi) A ∩ (B 4 C) = (A ∩B)4 (A ∩ C) : Distributive property.

1.3 Duality and Algebra SetsPrinciple of duality

Let E is an equation(or law) of set algebra (involving ∪, ∩, U , φ). If we replace ∪ by ∩, ∩by ∪, U by φ and φ by U in E then we obtain E∗ another law, which is also a valid law.This is known as principle of duality. For example,

12 Theory of Sets

(i) A ∪ (B ∩ C) = (A ∪B) ∩ (A ∪ C), its dual law is A ∩ (B ∪ C) = (A ∩B) ∪ (A ∩ C),

(ii) A ∩A = φ its dual is A ∪A = U .

It is a fact of set algebra, called the principle of duality, that, if any equation E is an identity,then its dual E∗ is also an identity.

Algebra of sets

Some commonly used laws of sets are stated below. Note that the law stated in (b) is thedual law of (a) and conversely.

1. Idempotent laws(a) A ∩A = A (b) A ∪A = A.

2. Identity laws(i) (a) A ∪ φ = A (b) A ∩ U = A.(ii) (a) A ∩ φ = φ (b) A ∪ U = U.3. Commutative laws

(a) A ∪B = B ∪A (b) A ∩B = B ∩A.4. Associative laws

(a) (A ∪B) ∪ C = A ∪ (B ∪ C) (b) (A ∩B) ∩ C = A ∩ (B ∩ C).5. Distributive laws

(a) A ∪ (B ∩ C) = (A ∪B) ∩ (A ∪ C) (b) A ∩ (B ∪ C) = (A ∩B) ∪ (A ∩ C)6. Inverse law

(a) A ∪Ac = U (b) A ∩Ac = φ7. Domination laws

(a) A ∪ U = U (b) A ∩ φ = φ8. Absorption laws

(a) A ∪ (A ∩B) = A (b) A ∩ (A ∪B) = A9. De Morgan’s laws

(a) (A ∪B)c = Ac ∩Bc (b) (A ∩B)c = Ac ∪Bc.Let S1 and S2 be two set expressions. The notation S1 ⇒ S2 as well as S2 ⇒ S1. These

are the main algebraic operations on sets.

Property 1.3.1 Let A,B and C are any three finite sets, then [WBUT 09](i) A ∪ (B ∩ C) = (A ∪B) ∩ (A ∪ C); (ii) A ∩ (B ∪ C) = (A ∩B) ∪ (A ∩ C)

Proof: (i) Let x be any element of A ∪ (B ∩ C). Then,x ∈ A ∪ (B ∩ C) ⇔ x ∈ A or x ∈ (B ∩ C).

⇔ x ∈ A or (x ∈ B and x ∈ C)⇔ (x ∈ A or x ∈ B) and (x ∈ A or x ∈ C)⇔ (x ∈ A ∪B) and (x ∈ A ∪ C)⇔ x ∈ (A ∪B) ∩ (A ∪ C).

The symbol ⇔ stands ‘implies and is implied by’. It also stands for ‘if and only if’. HenceA ∪ (B ∩ C) ⊆ (A ∪B) ∩ (A ∪ C) and (A ∪B) ∩ (A ∪ C) ⊆ A ∪ (B ∩ C). Hence

A ∪ (B ∩ C) = (A ∪B) ∩ (A ∪ C).(ii) Let x be any element of A ∩ (B ∪ C). Then,

x ∈ A ∩ (B ∪ C) ⇔ x ∈ A ∧ x ∈ (B ∪ C)⇔ x ∈ A ∧ (x ∈ B ∨ x ∈ C)⇔ (x ∈ A ∧ x ∈ B) ∨ (x ∈ A ∧ x ∈ C)⇔ (x ∈ A ∩B) ∨ (x ∈ A ∩ C)⇔ x ∈ (A ∩B) ∪ (A ∩ C)

Duality and Algebra Sets 13

Hence, A ∩ (B ∪ C) ⊆ (A ∩B) ∪ (A ∩ C) and (A ∩B) ∪ (A ∩ C) ⊆ A ∩ (B ∪ C). Hence,A ∩ (B ∪ C) = (A ∩B) ∪ (A ∩ C).

Property 1.3.2 Let A,B and C are any three finite sets. Then,

(i) (A ∪B)′ = A′ ∩B′.

(ii) (A ∩B)′ = A′ ∪B′.

(iii) A− (B ∪ C) = (A−B) ∩ (A− C).

(iv) A− (B ∩ C) = (A−B) ∪ (A− C)

Proof: (i) Let x be an arbitrary element of (A ∪B)′. Then,

x ∈ (A ∪B)′ ⇔ x /∈ (A ∪B)⇔ x /∈ A and x /∈ B⇔ x ∈ A′ and x ∈ B′

⇔ x ∈ (A′ ∩B′)

Hence (A ∪B)′ ⊆ A′ ∩B′ and A′ ∩B′ ⊆ (A ∪B)′. Therefore,(A ∪B)′ = A′ ∩B′.

(ii) Similarly, (A ∩B)′ = A′ ∪B′.(iii) Let x be an arbitrary element of A− (B ∪ C). Now,

x ∈ [A− (B ∪ C)] ⇔ [(x ∈ A) and x /∈ (B ∪ C)]⇔ [(x ∈ A) and (x /∈ B and x /∈ C)]⇔ [(x ∈ A and x /∈ B) and (x ∈ A and x /∈ C)]⇔ [x ∈ (A−B)] and [x ∈ (A− C)]⇔ x ∈ [(A−B) ∩ (A− C)]

Thus, A− (B ∪ C) ⊆ (A−B) ∩ (A− C) and (A−B) ∩ (A− C) ⊆ A− (B ∪ C) and so,A− (B ∪ C) = (A−B) ∩ (A− C).

(iv) Similarly, A− (B ∩ C) = (A−B) ∪ (A− C).

Ex 1.3.1 Prove that (A− C) ∩ (B − C) = (A ∩B)− C.

Solution: We shall use suitable laws of algebra in sets. Using the results A− C = A ∩ C ′;B − C = B ∩ C ′, we get,

L.H.S = (A− C) ∩ (B − C) = (A ∩ C ′) ∩ (B ∩ C ′)= (A ∩ C ′) ∩ (C ′ ∩B) = (A ∩ C ′) ∩ C ′ ∩B= (A ∩ C ′) ∩B = (A ∩B) ∩ C ′

= (A ∩B)− C = R.H.S.(proved)

Ex 1.3.2 Show (A−B) ∪ (B −A) = (A ∪B)− (A ∩B), where A,B,C are three sets.

Solution: We shall use suitable laws of algebra in sets. Using the results A−B = A ∩B′;B −A = B ∩A′, we get,

L.H.S = (A−B) ∪ (B −A) = (A ∩B′) ∪ (B ∩A′)= (A ∩B′) ∪B ∩ (A ∩B′) ∪A′= (A ∪B) ∩ (B ∪B′) ∩ (A ∪A′) ∩ (B′ ∪A′)= (A ∪B) ∩ S ∩ S ∩ (B′ ∪A′) ; S being Universal set.

= (A ∪B) ∩ (B′ ∪A′) = (A ∪B) ∩ (A ∩B)′

= (A ∪B)− (A ∩B) = R.H.S.(proved)

14 Theory of Sets

Ex 1.3.3 Show that (A ∩B)− (A ∩ C) = (A ∩B)− C, where A,B,C are sets.

Solution: We shall use suitable laws of algebra in sets.L.H.S = (A ∩B)− (A ∩ C) = (A ∩B) ∩ (A ∩ C)′

= (A ∩B) ∩ (A′ ∪ C ′) = (A ∩B) ∩A′ ∪ (A ∩B) ∩ C ′= φ ∪ (A ∩B) ∪ (A ∩B) ∩ C ′= φ ∪ (A ∩B) ∩ C ′ = (A ∩B) ∩ C ′

= (A ∩B)− C = R.H.S.(proved)

Ex 1.3.4 Show that (A ∩B ∩C) ∪ (A ∩B ∩C ′) ∪ (A ∩B′ ∩C) ∪ (A ∩B′ ∩C ′) = A, whereA,B,C are sets. [ KH 07]

Solution: We shall use suitable laws of algebra in sets.L.H.S = (A ∩B ∩ C) ∪ (A ∩B ∩ C ′) ∪ (A ∩B′ ∩ C) ∪ (A ∩B′ ∩ C ′)

= (X ∩ C) ∪ (X ∩ C ′) ∪ (Y ∩ C) ∪ (Y ∩ C ′) where X = A ∩B , Y = A ∩B′.

= [X ∩ (C ∪ C ′)] ∪ [Y ∩ (C ∪ C ′)]= (X ∩ U) ∪ (Y ∩ U); where U = Universal set.

= X ∪ Y = (A ∩B) ∪ (A ∩B′)= A ∩ (B ∪B′) = A ∩ U = A = R.H.S.(Proved).

Ex 1.3.5 If A∪B = A∪C and A∩B = A∩C simultaneously for subsets A,B,C of a setS, prove that B = C. [ CH 09]

Solution: We shall use suitable laws of algebra in sets.

L.H.S = B = (A ∪B) ∩B = (A ∩B) ∪ (B ∩B)= (A ∩ C) ∪B [as A ∩B = A ∩ C]= (A ∪B) ∩ (C ∪B)= (A ∪ C) ∩ (B ∪ C) [as A ∪B = A ∪ C]= (A ∩ C) ∪ C = C = R.H.S.(Proved).

From this, we conclude only A ∪ B = A ∪ C or only A ∩ B = A ∩ C does not necessarilyimply B = C.

Ex 1.3.6 Define complement of A by A′ such that A ∪A′ = S,A ∩A′ = φ show that(A ∪B)′ = A′ ∩B′.

Solution: Let C = A ∪B and D = A′ ∩B′. We shall show that D = C ′. Now,

C ∪D = (A ∪B) ∪ (A′ ∩B′)= (A ∪B ∪A′) ∩ (A ∪B ∪B′)= (A ∪A′) ∪B ∩ A ∪ (B ∪B′)= S ∪B ∩ A ∪ S = S ∩ S = S.

Also using the relations C = A ∪B and D = A′ ∩B′, we get,

C ∩D = (A ∪B) ∩ (A′ ∩B′)= (A ∩A′ ∩B′) ∪ (B ∩A′ ∩B′)= (φ ∩B′) ∪ (φ ∩A′) = φ ∪ φ = φ.

Hence C ′ = D i.e. (A ∪B)′ = A′ ∩B′.

Cartesian Product of Sets 15

Ex 1.3.7 If A ⊆ B and C is any set then show that A ∪ C ⊆ B ∪ C.

Solution: Let x be any element of A ∪ C. Hence,x ∈ A ∪ C ⇒ x ∈ A or x ∈ C.

Again, x ∈ A⇒ x ∈ B (since A ⊆ B). Therefore,x ∈ A ∪ C ⇒ x ∈ A or x ∈ C

⇒ x ∈ B or x ∈ C⇒ x ∈ B ∪ C.

Again, x ∈ C ⇒ x ∈ B ∪ C. Hence, A ∪ C ⊆ B ∪ C. (Proved)

Ex 1.3.8 Prove that (A′ ∩B′ ∩ C) ∪ (B ∩ C) ∪ (A ∩ C) = C.

Solution: L.H.S. = (A′ ∩B′ ∩ C) ∪ (B ∩ C) ∪ (A ∩ C). Now consider,(B ∩ C) ∪ (A ∩ C) = (C ∩B) ∪ (C ∩A)

= C ∩ (B ∪A) = (A ∪B) ∩ C.

Now again, A′ ∩B′ ∩ C = (A ∪B)′ ∩ C. Hence,

L.H.S. = A ∪B)′ ∩ C ∪ (A ∪B) ∩ C= (A ∪B)′ ∪ (A ∪B) ∩ C = S ∩ C (S = Universal set)= C = R.H.S. (Proved).

Ex 1.3.9 A,B,C are subsets of U , prove that [A ∩ (B ∪ C)] ∩ [A′ ∪ (B′ ∩ C ′)] = φ.

Solution: Using the properties of sets, we get,

LHS = [A ∩ (B ∪ C)] ∩ [A′ ∪ (B′ ∩ C ′)]= [A ∩ (B ∪ C) ∩A′] ∪ [A(B ∪ C) ∩ (B′ ∩ C ′)]= [A ∩A′ ∩ (B ∪ C)] ∪ [A ∩ (B ∪ C) ∩ (B ∪ C)′]= [φ ∩ (B ∪ C)] ∪ [A ∩ φ] = φ ∪ φ = φ.

1.4 Cartesian Product of Sets

Let A and B are two nonempty sets. An order pair consists of two elements, say a ∈ Aand b ∈ B, and it is denoted by (a, b). The element a is called the first element or firstcoordinate and the element b is called the second element or second coordinate. The orderedpairs (a, b) and (b, a) are distinct unless a = b. Thus (a, a) is a well-defined ordered pair.

If a, c ∈ A and b, d ∈ B, two ordered pairs (a, b) and (c, d) are said to be equal, i.e.,(a, b) = (c, d) if and only if a = c and b = d.

An order triple is ordered triple of objects (a, b, c) where a is first, b is second and c isthird element of triple. An order triple can also be written in terms of ordered pairs as(a, b), c. Similarly, ordered quadruple is an ordered pair ((a, b), c), d with first elementas ordered pair.

Definition 1.4.1 Let A and B be any two finite sets. The cartesian product (or crossproduct or direct product) of A and B, denoted by A × B, (read as A cross B), is the setdefined by,

A×B =

(x, y)|x ∈ A and y ∈ B

16 Theory of Sets

i.e., A × B is the set of all distinct order pairs (x, y), the first element of the pair is anelement of A and the second is an element of B. For example, let A =a, b and B =1, 2,3. Then

A×B =(a, 1), (a, 2), (a, 3), (b, 1), (b, 2), (b, 3) andB ×A =(1, a), (2, a), (3, a), (1, b), (2, b), (3, b).

The geometric representation of A×B is depicted in the Fig. 1.7.

-

6

1

2

3

(a, 1)

(a, 2)

(a, 3)

•

•

•

•

•

•

(b, 1)

(b, 2)

(b, 3)

a b

Figure 1.7: Representation of A×B

From this example, it is observed that A×B 6= B ×A, so in general, A×B 6= B ×A.

Let A1, A2, · · · , An be finite collection of non-empty sets. The cartesian product of thecollection, denoted by A1 ×A2 × · · · ×An, is the set defined by,

n∏i=1

Ai = A1 ×A2 × · · · ×An = (x1, x2, · · · , xn) : xi ∈ Ai.

In particular, if A1 = A2 = · · · = An = A, the cartesian product of the collection of sets,denoted by An, is the set of all ordered n tuples,

An = (x1, x2, · · · , xn) : xi ∈ A.

If A = B = <, the set of all real numbers, then < × < (= <2) is a set of all points in theplane. The ordered pair (a, b) represents a point in the plane. Similarly, <×<×< (= <3)is a set of all points in space, i.e, (a, b, c) ∈ < × <× < is a point in space. Below are someimportant properties of cartesian product

(i) The cartesian product is non-commutative, i.e., in general A×B 6= B ×A.

(ii) n(A) = p and n(B) = q then n(A×B) = n(B ×A) = pq.

(iii) If A = φ or B = φ then A×B = φ.

(iv) If either A or B is infinite and other is empty then A×B = φ.

(v) If either A or B is infinite and other is non-empty then A×B is infinite.

Following are some results on cartesian product.

(i) If A ⊆ B then A× C ⊆ B × C for any sets A, B, C.

(ii) If A ⊆ B and C ⊆ D then A× C ⊆ B ×D.

(iii) If A ⊆ B then A×A = (A×B) ∩ (B ×A).

Cartesian Product of Sets 17

Result 1.4.1 For any non-empty sets A and B, A×B = B ×A iff A = B.

Proof: Let A×B = B ×A. Then let,

x ∈ A⇒ (x, y) ∈ A×B, where y ∈ B⇒ (x, y) ∈ B ×A, since A×B = B ×A⇒ x ∈ B.

Thus A ⊆ B. Similarly, B ⊆ A. Hence A = B. Conversely, let A = B. Then A×B = A×Aand B ×A = A×A. Hence, A×B = B ×A.

Ex 1.4.1 For three non-empty sets A,B,C, prove thatA× (B ∩ C) = (A×B) ∩ (A× C).

Solution: This result is the direct consequence of the definition. Let (x, y) be an arbitraryelement of the set A× (B ∩ C). Then,

(x, y) ∈ A× (B ∩ C) ⇔ x ∈ A and y ∈ (B ∩ C)⇔ x ∈ A and [y ∈ B and y ∈ C]⇔ [x ∈ A and y ∈ B] and [x ∈ A and y ∈ C]⇔ [(x, y) ∈ A×B] and [(x, y) ∈ A× C]⇔ (x, y) ∈ (A×B) ∩ (A× C).

Therefore, A × (B ∩ C) ⊆ (A × B) ∩ (A × C) and (A × B) ∩ (A × C) ⊆ A × (B ∩ C) andconsequently, A× (B ∩ C) = (A×B) ∩ (A× C). Similarly,

A× (B ∪ C) = (A×B) ∪ (A× C).

Ex 1.4.2 For any three sets A,B,C prove thatA× (B − C) = (A×B)− (A× C).

Solution: Let (x, y) be an arbitrary element of A× (B − C). Then,

(x, y) ∈ A× (B − C) ⇔ x ∈ A and y ∈ (B − C)⇔ x ∈ A and [y ∈ B and y 6∈ C]⇔ [x ∈ A and y ∈ B] and [x ∈ A and y 6∈ C]⇔ [(x, y) ∈ (A×B)] and [(x, y) 6∈ (A× C)]⇔ (x, y) ∈ (A×B)− (A× C).

Therefore, A× (B − C) ⊆ (A× B)− (A× C) and (A× B)− (A× C) ⊆ A× (B − C) andconsequently, A× (B − C) = (A×B)− (A× C).

Ex 1.4.3 For any sets A,B,C and D, we have, [ CH 96, 01](A×B) ∩ (C ×D) = (A ∩ C)× (B ∩D).

Solution: Let (x, y) be an arbitrary element of (A×B) ∩ (C ×D). Then,

(x, y) ∈ (A×B) ∩ (C ×D) ⇔ (x, y) ∈ (A×B) and (x, y) ∈ (C ×D)⇔ (x ∈ A and y ∈ B) and (x ∈ C and y ∈ D)⇔ (x ∈ A and x ∈ C) and (y ∈ B and y ∈ D)⇔ x ∈ (A ∩B) and y ∈ (B ∩D)⇔ (x, y) ∈ (A ∩ C)× (B ∩D).

Thus, (A×B)∩ (C ×D) ⊆ (A∩C)× (B ∩D) and (A∩C)× (B ∩D) ⊆ (A×B)∩ (C ×D),so that (A×B) ∩ (C ×D) = (A ∩ C)× (B ∩D). Similarly, for any sets A,B,C and D, wehave A ⊆ B and C ⊆ D ⇒ (A× C) ⊆ (B ×D).

18 Theory of Sets

1.5 Cardinal Numbers

The number of distinct elements in a set A is called the cardinal number of the set and itis denoted by n(A) or |A| or card(A). For example, n(φ) = 0, n(a) = 1, n(a, b) = 2,n(Z) = ∞, etc. Following are the important properties of cardinal number

(i) If A and B are disjoint, then

(a) n(A ∩B) = n(φ) = 0 and (b) n(A ∪B) = n(A) + n(B),

(ii) Let A and B be two finite sets, with A ∩B 6= φ, then

n(A ∪B) = n(A) + n(B)− n(A ∩B),

(iii) If A, B, C be three arbitrary finite sets then

n(A ∪B ∪ C) = n(A) + n(B) + n(C)− n(A ∩B)

−n(B ∩ C)− n(C ∩A) + n(A ∩B ∩ C),

(iv) Suppose we have any finite number of finite sets, say, A1, A2, · · · , Am. Let sk be thesum of the cardinalities, then

n(A1 ∪A2 ∪ · · · ∪Am) = s1 − s2 + s3 − · · ·+ (−1)m−1sm.

(v) n(A ∪A) = n(A) and n(A ∩A) = n(A).

The inclusion and exclusion principle

The number of elements in finite sets such as A ∪ B, A ∩ B, A∆B, etc. are obtained byadding and as well as deleting certain elements. This method of finding the number ofelements in a finite set is known as inclusion and exclusion principle.

Ex 1.5.1 If n(A) and n(B) denote the number of elements in the finite sets A and Brespectively, then prove that n(A) + n(B) = n(A ∪B) + n(A ∩B).

Solution: If A and B are disjoint then n(A ∪ B) is equal to the sum of the elements ofA and the elements of B. That is n(A ∪ B) = n(A) + n(B). If A and B are disjoint then

A B

B − (A ∩B) A− (A ∩B)7 K

A ∩B

6

Figure 1.8:

A ∪ B is express as union of three disjoint sets A ∩ B, A − (A ∩ B) and B − (A ∩ B). Letus draw the Venn diagram showing A ∩B′, A ∩B, A′ ∩B and A′ ∩B′, where A and B aretwo subsets of the universal set U . From the diagram, we have,

A = (A ∩B′) ∪ (A ∩B) and B = (A ∩B) ∪ (A′ ∩B)

and the sets A ∩B′, A ∩B and A′ ∩B are disjoint. Therefore,

n(A) = n(A ∩B′) + n(A ∩B) and n(B) = n(A ∩B) + n(A′ ∩B)⇒ n(A) + n(B) = n(A ∩B′) + 2n(A ∩B) + n(A′ ∩B).

Cardinal Numbers 19

Again, from the diagram, we have,

A ∪B = (A ∩B′) ∪ (A ∩B) ∪ (A′ ∩B),

in which (A ∩B′), (A ∩B) and (A′ ∩B) are disjoint sets and so

n(A ∪B) = n(A ∩B′) + n(A ∩B) + n(A′ ∩B).

Subtracting, we get, n(A) + n(B) = n(A ∪ B) + n(A ∩ B). Let n(A) = r, n(B) = s andn(A ∩B) = t. Then from Fig. 1.8,

n(A− (A ∩B)) = r − t and n(B − (A ∩B)) = s− t.

Therefore, n(A ∪B) = t+ (r − t) + (s− t) = r + s− t

= n(A ∪B) = n(A) + n(B)− n(A ∩B).

Following are some results on Venn diagrams:

(i) n(A ∪B) = n(A) + n(B)− n(A ∩B) and n(A ∪B) = n(A) + n(B),

provided A and B are disjoints, i.e., n(A ∩B) = 0.

(ii) n(A ∩B′) = n(A)− n(A ∩B) and n(A′ ∩B) = n(B)− n(A ∩B).

(iii) n(A4B) = n(A) + n(B)− 2n(A ∩B).

(iv) n(A′ ∪B′) = n(U)− n(A ∩B) and n(A′ ∩B′) = n(U)− n(A ∪B).

Ex 1.5.2 In a canteen, out of 122 students, 42 students buy ice cream, 36 buys buns and10 buy cakes, 15 student buy ice-cream and buns, 10 ice-cream and cakes, 4 cakes and bunshut not ice-cream and buns butt no cakes. Find (i) how many students buy nothing at all.(ii) how many students buy at least two items. (iii) how many students buy all three items.

Solution: Define the sets A,B and C such that, A = Set of students who buy cakes, B =Set of students who buy ice-cream, C = Set of students who buy buns.

According to question, we have, n(A) = 10, n(B) = 42, n(C) = 10, n(B ∩ C) = 15, n(A ∩B) = 10, n[(A ∩ C)−B] = 4, n[(B ∩ C)−A] = 11 and n[A− (B ∪ C)] = 10. Now, we have,

n(B ∪ C) = n(B) + n(C)− n(B ∩ C) = 42 + 36− 15 = 63.n(B ∪ C)− n(B) = 63− 42 = 21.

and n(B ∪ C)− n(C) = 63− 36 = 27.

The above distribution of the student can be illustrated by Venn diagram. Now the totalnumber of students buying something

= 10 + 6 + 21 + 4 + 4 + 11 + 17 = 73.Therefore, the number of students who did not buy anything = 123 − 73 = 50. Number ofstudents buying all three items = 4.

Ex 1.5.3 In a recent survey 500 students in a college it was found that 150 students readnewspaper A and 200 read newspaper B , 80 students read both the newspaper A and B.Find how many read either newspaper.

Solution: Let X and Y denote the set of students who read newspapers A and B respec-tively. It is given that n(A) = 150, n(B) = 200, n(A ∩ B) = 80, n(U) = 500. Number ofstudents who read either A or B is n(A ∪B). Also,

n(A ∪B) = n(A) + n(B) + n(A ∩B) = 150 + 200− 80 = 270.

20 Theory of Sets

Ex 1.5.4 In a class of 42 students, each play at least one of the three games: Cricket, Hokeyand Football. It is found that 14 play Cricket, 20 play Hokey and 24 play Football, 3 palyboth Cricket and Football, 2 play both Hokey and Football and none paly all the three games.Find the number of students who paly Cricket but not Hokey.

Solution: Let C, H and F be the sets of students who play Cricket, Hockey and Football.Given that n(C) = 14, n(H) = 20, n(F ) = 24, n(C∩F ) = 3, n(H∩F ) = 2, n(H∩F ∩C) = 0and n(H ∪ F ∪ C) = 45 (since each student play at least one game). We know,

n(H ∪ F ∪ C) = n(H) + n(F ) + n(C)− n(H ∩ F )−n(H ∩ C)− n(F ∩ C) + n(H ∩ F ∩ C).

or, 42 = 20 + 24 + 14− 2− n(H ∩ C)− 3 + 0⇒ n(H ∩ C) = 11.

Now, the number of students who paly Cricket but not Hockey is n(C −H). Also, we know

n(C) = n(C ∩H) + n(C −H)i.e., n(C −H) = n(C)− n(C ∩H) = 14− 11 = 3.

Hence 3 students play Cricket but not Hockey.

1.6 Relation

Let A and B are two non empty sets and a ∈ A, b ∈ B. A relation or binary relationbetween two sets A and B is a subset of A × B i.e. if (a, b) ∈ ρ, where ρ ⊆ A × B we saythat a stands in the relationship ρ with b and is denoted by aρb i.e.,

(a, b) ∈ ρ⇒ (a, b) ∈ A×B.

For example, let A = 4, 5, 6, 9 and B = 20, 22, 24, 28, 30. Let us define a relation ρ fromA into B by stipulating aρb if and only if a divides b, where a ∈ A and b ∈ B. Then it isclear that

ρ = (4, 20), (4, 24), (4, 28), (5, 20), (5, 30), (6, 24), (6, 30).If n(A) = n and n(B) = m then the number of elements of A× B is mn. It is known thatthe number of elements of the power set 2mn, where A × B is the original set. Again, anysubset of power set is a relation. Thus the total number of binary relations from A to B is2mn.

A relation ρ between the non empty set A and A is also said to be a binary relation onA. If n(A) = n, then there are 2n2

relations on it.

Definition 1.6.1 [Domain and range :] Let ρ be a relation from a set A into the set B.Then the domain of ρ denoted by D(ρ) is the set

D(ρ) = a : (a, b) ∈ ρ, a ∈ ρ, for some b ∈ B. (1.9)

The domain of a binary relation is the set of all first elements of the ordered pairs in therelation. The range or image of ρ denoted by I(ρ) is the set

I(ρ) = b : (a, b) ∈ ρ, b ∈ ρ, for some a ∈ B. (1.10)

The range of a binary relation is the set of all second elements of the ordered pairs in therelation. For example, let A =a, b, c and B =1, 2, 3. Then let a relation ρ =(a, 1),(b, 1), (c, 2), (a, 2). For this relation D(ρ) =a, b, c and I(ρ) =1, 2.

If (a, b) /∈ ρ, i.e., (a, b) ∈ (A×B)−ρ, then we say that a does not stand in the relation ship ρwith b and is denoted by aρb. Let ρ ⊆ N×N be given by ρ = (x, y)|x is a devisor of y.Then xρy, if x is divisor of y. As example 3ρ6 but 2ρ5.

Relation 21

Universal and null relation

We know, (A×A) is a subset of A×A. ρ = A×A is defined as a universal relation on A. Ifρ = A×B then ρ is called universal relation on A and B. Null set φ ⊂ A×A and if ρ = φthen ρ is called null relation or empty relation from A to B. In general, φ ⊆ ρ ⊆ A×A.

For example, in the set Z of integers ρ = (a, b)|a + b is an integer is an universalrelation and ρ = (a, b)|a+ b is not an integer is null relation.

Identity relation

If the set A, the idempotent relation ρ = (x, y) : x ∈ A, y ∈ A, x = y is called the identityor diagonal relation in A and it will be denoted by IA.

Inverse relation

If ρ be a relation from a set A to B. The inverse of ρ denoted by ρ−1, is a relation from aset B to A and defined by

ρ−1 = (y, x) : y ∈ B, x ∈ A, (x, y) ∈ ρ.

For example,

(i) Let ρ = (1, y), (1, z), (3, y) from A = 1, 2, 3 to B = x, y, z, then its inverse is

ρ−1 =

(y, 1), (z, 1), (y, 3).

(ii) Let A =a, b, c and B =1, 2, 3. The inverse of the relation ρ =(a, 1), (b, 1), (c,2), (a, 2) is ρ−1 =(1, a), (1, b), (2, c), (2, a).

Note that, if ρ ⊆ A × B then, ρ−1 ⊆ B × A. It is easy to verify that domain of ρ−1 =D(ρ−1) = I(ρ) = range of ρ and range of ρ−1 = I(ρ−1) = D(ρ) = domain of ρ. Clearly, if ρis any relation, then (ρ−1)−1 = ρ.

Composition of relations

Let ρ1 be a relation from a set A into the set B and ρ2 be another relation from a set Binto C, i.e. ρ1 ⊆ A×B and ρ2 ⊆ B ×C. The composition of ρ1 and ρ2, denoted by ρ1 ρ2,is the relation from A into C and is defined by a(ρ1 ρ2)c, as

a(ρ1 ρ2)c = (a, c) : (a, b) ∈ ρ1, (b, c) ∈ ρ2. (1.11)

if there exists some b ∈ B such that aρ1b and aρ2b for all a ∈ A and c ∈ C. For example, letA =1,2,3 and B = x, y and C = t, so, ρ1 = (1, x), (2, y), (3, y), ρ2 = (x, t), (y, t).Therefore, ρ1 ρ2 = (1, t), (2, t), (3, t). From this definition it follows that ρ1 = ρ, ρn =ρ ρn−1, n > 1.

Definition 1.6.2 [Set operation on relations] Since every binary relation is a set ofordered pair, so the set operations can also be defined on relations.

Let ρ1 and ρ2 be two relations from a set A to a set B. Then ρ1 ∪ ρ2, ρ1 ∩ ρ2, ρ1 − ρ2,ρ′1 are relations given by

(i) Union : a(ρ1 ∪ ρ2)b = aρ1b or aρ2b.

(ii) Intersection : a(ρ1 ∩ ρ2)b = aρ1b and aρ2b.

22 Theory of Sets

(iii) Difference : a(ρ1 − ρ2)b = aρ1b and a 6 ρ2b.

a(ρ2 − ρ1)b = a 6 ρ1b and aρ2b.

(iv) Complement : a(ρ′1)b = a 6 ρ1b.

The relations correspond to the set operations-union, intersection, difference and comple-mentation on sets. For example, let A =a, b, c, B =2, 4, 6, C =a, b, D =2, 4, 5.Let ρ1 be a relation from A into B defined as ρ1 =(a, 2), (b, 4), (c, 6) and ρ2 be anotherrelation from C into D, which is defined by ρ2 =(a, 2), (b, 4), (b, 5). Thus,

(i) ρ1 ∪ ρ2 =

(a, 2), (b, 4), (c, 6), (b, 5)

(ii) ρ1 ∩ ρ2 =

(a, 2), (b, 4)

(iii) ρ1 − ρ2 =

(c, 6)

, ρ2 − ρ1 =

(b, 5)

and ρ′1 = the set of all ordered pairs of A×B those are not in ρ1

= (a, 4), (a, 6), (b, 2), (b, 6), (c, 2), (c, 4).

1.6.1 Equivalence Relation

Let A be a nonempty set and ρ be a relation on A. Then ρ is called

(i) reflexive, if for all a ∈ A, aρa,

(ii) symmetric, if aρb holds ⇒ bρa must hold, for a, b ∈ A,

(iii) antisymmetric, if aρb and bρa hold then a = b,

(iv) transitive, if aρb and bρc hold ⇒ aρc must be hold, for a, b, c ∈ A.

It may be remembered that the relation ρ is not reflexive, if (a, a) 6∈ ρ for at least a ∈ A,not symmetric if (a, a) ∈ ρ but (b, a) 6∈ ρ for at least one pair (a, b) ∈ ρ and not transitive if(a, b) ∈ ρ, (b, c) ∈ ρ but (a, c) 6∈ ρ if a, b, c ∈ A. If a relation which is not symmetric, it isnot necessarily antisymmetric.

Ex 1.6.1 Let ρ be a relation on Z defined by aρb iff ab ≥ 0 for all a, b ∈ Z. Show that ρ isreflexive and symmetric but not transitive.

Solution: (i) Let a ∈ Z. Then obvious a2 ≥ 0, so aρb holds for all a ∈ Z. Therefore ρ isreflexive.

(ii) Let a, b ∈ Z, aρb and bρc hold. Then it does not imply ac ≥ 0 and c ≥ 0. Forexample, if a = −2, b = 0, c = 8 then ab ≥ 0 and c ≥ 0 hold, but ac = −16 6≥ −16. Hence ρis not transitive.

That is, ρ is reflexive and symmetric but not transitive.

Ex 1.6.2 Show that the relation aρb if ab > 0, a, b ∈ < is symmetric, transitive but notreflexive.

Solution: (i) 0 ∈ <, but 0ρ0. Since 0.0 6> 0. So it is not reflexive.(ii) aρb⇒ ab > 0 ⇒ ba > 0 ⇒ bρa. Hence R is symmetric. (iii) Now,

(a, b) ∈ ρ; (b, c) ∈ ρ⇒ ab > 0; bc > 0⇒ ab2c > 0 ⇒ ac > 0; since b2 > 0⇒ (a, c) ∈ ρ.

Hence ρ is transitive. Therefore, ρ is symmetric, transitive but not reflexive.

Relation 23

Ex 1.6.3 Verify wheatear the relations are reflexive, symmetric or transitive on the set <?(i) xρy if |x − y| > 0. (ii) xρy if 1 + xy > 0. (iii) xρy if |x| ≤ y (iv) xρy if

2x+ 3y = 10.

Solution: (i) ρ is not reflexive, as for any x ∈ <, x− x = 0 and hence|x− x| 6> 0, i.e., xρx.

Again, as |x− y| = |y − x|, we have |x− y| > 0 ⇒ |y − x| > 0, whencexρy ⇒ yρx, for all x, y ∈ <.

So ρ is symmetric. Consider 0, 1 ∈ <, then |1− 0| = |0− 1| = 1 > 0 shows that1ρ0 and 0ρ1 but 1ρ1 as |1− 1| 6> 0.

Hence ρ is not transitive.(ii) Since ∀x ∈ <, x2 ≥ 0,⇒ 1 + x2 > 0 and hence xρx,∀x ∈ <, whence ρ is reflexive.Again, ∀x, y ∈ <, if 1 + xy > 0, then 1 + yx > 0 as xy = yx, whence we see that,

xρy ⇒ yρx, for all x, y ∈ <.So ρ is symmetric. Let 3,− 1

9 ,−6 ∈ <, then 1 + 3.(− 19 ) = 2

3 > 0 shows that3ρ(− 1

9 ) and 1 + (− 19 )(−6) = 5

3 > 0 ⇒ (− 19 )ρ(−6).

But, 1 + 3.(−6) = −17 6> 0, we conclude that 3ρ(−6),, whence ρ is not transitive.(iii) Let us consider −2 ∈ <, then | − 2| = 2 6≤ −2, whence (−2)ρ(−2), showing that ρ is

not reflexive. Indeed | − 2| = 2 ≤ 5 and so (−2)ρ5, but |5| = 5 6≤ −2, whence 5ρ(−2) andconsequently ρ is not symmetric.

But if a, b, c ∈ <, then |a| ≤ b, |b| ≤ c gives |a| ≤ b ≤ c, i.e., aρc and so ρ is transitive.(iv) As 2 · 1 + 3 · 1 = 5 6= 10, so 1ρ1. So ρ is not reflexive. As 2 · 1 + 3 · (8/3) = 10, so

1ρ(8/3). But 2 · (8/3) + 3 · 1 = 25/3 6= 10, so (8/3)ρ1. So ρ is not symmetric.As 2 · (1/2) + 3 · 3 = 10, so (1/2)ρ3. As 2 · 3 + 3 · (4/3) = 10, so 3ρ(4/3) but 2 · (1/2) + 3 ·

(4/3) = 5 6= 10, proving (1/2)ρ(4/3). Hence ρ is not transitive and so there is no questionof equivalence relation.

Ex 1.6.4 Verify wheatear the following relation are reflexive, symmetric or transitive(i) In Z, xρy if x+ y is odd. (ii) In Z, xρy if |x− y| ≤ y.

Solution: (i) Since, x+ x = 2x = even, so xρx and ρ is not reflexive. Also,

xρy ⇒ x+ y is odd.⇒ y + x is odd. ⇒ yρx; ∀x, y ∈ Z.

Hence ρ is symmetric. Again, ρ is not transitive, since 1ρ2(as 1+2 is odd) and 2ρ3(as 2+3 isodd) but 1ρ3(since 1+3 is even). Hence ρ is not reflexive but symmetric and not transitive.

(ii) Here, xρy if |x− y| ≤ y, ∀x, y ∈ Z. Now, |x− x| = 0 ≤ x is not true, for negative x.Hence ρ is not reflexive. Now,

1ρ3 since |1− 3| = 2 < 3 but 3ρ1 since |3− 1| = 2 > 1.

Hence, ρ is not symmetric. By definition, xρy, yρz ⇒ |x− y| ≤ y; |y − z| ≤ z. Now,

|x− z| = |x− y + y − z| ≤ |x− y|+ |y − z| ≤ y + z.

This suggests that |x − z| may not be ≤ z. For example, 4ρ7, 7ρ4 but 9ρ4. Hence ρ is nottransitive.

24 Theory of Sets

Result 1.6.1 The following tabular form will give a comprehensive idea:

xρy iff Reflexive Symmetric Transitivey = 4x × × ×x < y × ×

√

x 6= y ×√

×xy > 0 ×

√ √

y 6= x+ 2√

× ×x ≤ y

√×

√

xy ≥ 0√ √ √

x = y√ √ √

where ρ is the binary relation on <.

Definition 1.6.3 [Equivalence relation] Let ρ be a relation on a set A. A binary relationρ on a set A is said to be an equivalence relation(or an RST relation ) if

(i) ρ is defined to be a reflexive if (a, a) ∈ ρ,∀a ∈ A i.e. if aρa,∀a ∈ A.

(ii) ρ is defined to be a symmetric if for any two elements a and b ∈ A,(a, b) ∈ ρ⇒ (b, a) ∈ ρ i.e. aρb⇒ bρa.

(iii) ρ is defined to be transitive if

(a, b) ∈ ρ; (b, c) ∈ ρ⇒ (a, c) ∈ ρ i.e. aρb, bρc⇒ aρc.

For example, let A = a, b, c and a relation ρ = (a, a), (a, b), (b, b), (b, a), (a, c), (c,a), (c, c). It is easy to verify that ρ is reflexive, as (a, a) ∈ ρ, (b, a) ∈ ρ,(a, c) ∈ ρ, (c, a)∈ ρ. Again ρ is transitive. Hence ρ is an equivalence relation.

Ex 1.6.5 In Z, xρy if 3x+ 4y is divisible by 7, verify wheatear the relation is equivalent.

Solution: We know, ∀x ∈ Z, 3x+ 4x = 7x is divisible by 7. Hence xρx, ∀x ∈ Z and so ρis reflexive. As by definition, xρy if 3x+ 4y is divisible by 7, so, xρy ⇒ 3x+ 4y = 7k, wherek is an integer. Now,

(3x+ 4y) + (3y + 4x) = 7(x+ y) = 7k1k ∈ Zor, 3y + 4x = 7(k1 − k); k1 − k ∈ Z.

Hence 3y + 4x is divisible by 7. Thus xρy ⇒ yρx, ∀x, y ∈ Z. So ρ is symmetric. Now,xρy, yρz ⇒ 3x+ 4y = 7k1; 3y + 4z = 7k2.

⇒ 3x+ 4z + 7y = 7(k1 + k2)⇒ 3x+ 4z = 7k; k ∈ Z⇒ xρz; ∀x, y, z ∈ Z.

Hence ρ is transitive. Thus ρ is reflexive, symmetric and transitive i.e. equivalent relation.

Ex 1.6.6 In Z, define aρb iff a− b is divisible by a given ’+’ ve integer n. Show that ρ isan equivalence relation.

Solution: (i) a ∈ Z, and a− a = 0 is divisible by n. Hence aρa; ∀a ∈ Z and hence ρ is areflexive. Using the definition,

aρb⇒ a− b is divisible by n

⇒ b− a is divisible by n

⇒ bρa; ∀a, b ∈ Z.

Relation 25

Hence ρ is a symmetric. Now,

aρb, bρc⇒ a− b, b− c is divisible by n

⇒ (a− b) + (b− c) is divisible by n

⇒ (a− c)is divisible by n

⇒ aρc; ∀a, b, c ∈ Z.

Hence ρ is transitive and consequently, it is an equivalence relation.

Ex 1.6.7 In the set of all lines in a plane aρb if a ‖ b and aρb if a ⊥ b. Verify wheatear therelations are equivalence relation ?

Solution: We know, any line is parallel to itself, so, a ‖ a; ∀a. Hence aρa. Therefore, ρ isreflexive. Also,

aρb⇒ a ‖ b⇒ b ‖ a⇒ bρa.

Hence ρ is symmetric. ρ is transitive, sinceaρb, bρc⇒ a ‖ b, b ‖ c⇒ a ‖ c

Hence ρ is reflexive, transitive and symmetric. In other words it is an equivalent relation.In the set of all lines in a plane aρb if a ⊥ b, is only symmetric.

Ex 1.6.8 Verify wheatear the following relations are reflexive symmetric or transitive onthe set Z? (i) xρy if |x− y| ≤ 3. (ii) xρy if x− y is a multiple of 6.

Solution: (i) Let a ∈ Z. Then |a− a| = 0 < 3. Therefore, aρb holds for for all a ∈ Z, i.e.,ρ is reflexive.

Let a, b ∈ Z and aρb holds, i.e., |a − b| ≤ 3. Then |b − a| = |a − b| ≤ 3, i.e., bρa holds.Therefore, ρ is symmetric.

Let a, b, c ∈ Z and aρb, bρc hold. Then |a− b| ≤ 3 and |b− c| ≤ 3. Now,|a− c| = |a− b+ b− c| ≤ |a− b|+ |b− c| ≤ 3 + 3 = 6.i.e., |a− c| ≤ 3 does not hold for all a, b, c ∈ Z.

Therefore, ρ is not transitive. For example, 2ρ5, 5ρ7 hold, but2ρ7 as |2− 7| = 5 6≤ 3.

(ii) Let us consider any integer x ∈ Z, then x− x = 0 = 0 · 6 shows thataρa,∀a ∈ Z.

Hence ρ is reflexive. Now, let xρy for some x, y ∈ Z, this meansx− y = 6k, k ∈ Z ⇒ y − x = (−k) · 6, where −k ∈ Z

and this shows that yρa, whence ρ is symmetric. Finally, let x, y, z ∈ Z such that xρy andyρz. Then x− y = 6 · k1 and y − z = 6 · k2 for some k1, k2 ∈ Z. Then

x− y + y − z = 6(k1 + k2)⇒ x− z = 6 · (k1 + k2), where k1 + k2 ∈ Z.

This shows that xρz, whence ρ is transitive. Consequently ρ is an equivalence relation.

Ex 1.6.9 The relation ρ on the set N × N of ordered pairs natural numbers is defined as’(a, b)ρ(c, d) iff ad = bc’. Prove that ρ is an equivalence relation.

Solution: (i) Let (a, a) ∈ N ×N . Then (a, b)ρ(c, d) holds as ab = ba, i.e., ρ is reflexive.(ii) Let (a, b) and (c, d) be any elements of N ×N and (a, b)ρ(c, d) holds. Therefore,

ad = bc or, da = cb, i.e., (c, d)ρ(a, b)

26 Theory of Sets

holds. Hence ρ is symmetric.(iii) Let (a, b), (c, d), (e, f) ∈ N ×N and (a, b)ρ(c, d), (c, d)ρ(e, f) hold. Then ad = bc andde = cf . Now,

(bc)(de) = (ad)(cf) or, be = af ; as dc 6= 0⇒ (a, b)ρ(e, f) holds.

Therefore, ρ is transitive. Hence ρ is equivalence relation.

Ex 1.6.10 Show that the following relation ρ on Z is an equivalence relation: ρ = (a, b):a, b ∈ Z and a2 + b2 is a multiple of 2. [WBUT 08]

Solution: (i) Let a ∈ Z. Then a2 + b2 = 2a2, which is multiple of 2. Therefore, aρb holdsfor a ∈ Z. Thus ρ is reflexive.

(ii) Let a, b ∈ Z and aρb. Then a2 + b2 is a multiple of 2, and also b2 + a2 is a multipleof 2. Therefore bρa and hence ρ is symmetric.

(iii) Let a, b, c ∈ Z and aρb, bρc hold. Then a2 + b2 and b2 + c2 both are multiple of 2.Now,

a2 + c2 = (a2 + b2)− (b2 + c2) + 2c2

is a multiple of 2, i.e., aρc holds. Thus ρ is transitive. Hence ρ is an equivalence relation.

Ex 1.6.11 Let S be the set of all lines in 3-space. Let a relation ρ on the set S be definedby lρm if l,m ∈ S and l lies in the plane m. Examine if ρ is an equivalence relation.

Solution: (i) Let l ∈ S. Then l is a coplanar with itself. Therefore lρm holds for all l ∈ S,i.e, ρ is reflexive.

(ii) Let l,m ∈ S and lρm holds. Then obviously, mρl holds. That is, lρm ⇒ mρl.Therefore ρ is symmetric.

(iii) Let l,m, n ∈ S and lρm, mρn both hold. Then l lies in the plane m and m lies inthe plane of n. This does not always imply that l lies in the plane of n. For example, if llies on the xz-plane, m as taken as x-axis and n is a line on the xy-plane. In this case, lρmand mρn hold, but l and n lie on two different planes.

Thus, ρ is not transitive and hence ρ is not an equivalence relation.

Note 1.6.1 It may be noted that reflexive, symmetric and transitive relations are threeindependent relations, i.e., no two of them do not imply the third.

Elements of equivalence relation

In this section, some properties of the relations reflexive, symmetric and transitive arepresented.

Ex 1.6.12 Let A = a1, a2, . . ., an, be a finite set containing n elements. Find how manyrelations are constructed on A and how many of there are reflexive and symmetric.

Solution: Since A has n elements, so A × A has n2 elements. The 2n2relations can be

constructed on A including the null relation and the relation A×A itself.If the relation ρ on A is reflexive then each all n ordered pairs (ai, ai), i = 1, 2, . . . , n

should be in ρ. The remaining n2 − n = n(n − 1) ordered pairs (ai, aj), i 6= j may not bein ρ. Hence by the rule of cartesian product there are 2n(n−1) reflexive relations on A.

To count the number of symmetric relations we write A×A as A1 ∪A2, whereA1 =

(ai, ai): i = 1, 2, . . . , n

and A2 =

(ai, aj): i 6= j; i, j = 1, 2, . . . , n

.

Thus every element of A × A is exactly one element of A1, A2. The set A2 containsn(n− 1)/2 subsets of the form

(ai, aj), (aj , ai), i, j = 1, 2, . . . , n.

Relation 27

To construct a symmetric relation on A, for each ordered pair taken from A2 we have tochoose some number of ordered pairs from A1. Hence by the rule of cartesian product thereare

2n · 2n(n−1)/2 = 2n(n+1)/2

symmetric relations on A. Therefore, the number of relations which are both reflexive andsymmetric is 2n(n−1)/2.

Ex 1.6.13 If R and S are equivalence relations on a set A, prove that R∩S is an equivalencerelation on A.

Solution: Let R and S are equivalence relations on A. Therefore, R ⊆ A×A and S ⊆ A×A.Hence R ∩ S ⊆ A×A, i.e., R ∩ S is a relation on A.

(i) Since R and S are reflexive, (a, a) ∈ R and (a, a) ∈ S for all a ∈ A. Thus (a, a) ∈R ∩ S for all a ∈ R ∩ S. Hence R ∩ S is reflexive.

(ii) Let (a, a) ∈ R ∩ S. Therefore, (a, a) ∈ R and (a, a) ∈ S. Therefore, (b, a) ∈ R ∩ S.Hence R ∩ S is symmetric.

(iii) Let (a, a) ∈ R ∩ S and (a, c) ∈ R ∩ S. Therefore, (a, b) ∈ R, (a, b) ∈ S and (a, c) ∈R, (a, c) ∈ S. Since R is transitive, (a, b) ∈ R, (a, c) ∈ R ⇒ (a, c) ∈ R. Similarly,(a, c) ∈ S. Thus (a, c) ∈ R ∩ S, i.e., R ∩ S is transitive.

Hence R is an equivalence relation. But, union of two equivalence relations is not neces-sarily an equivalence relation. For example, let A = 1, 2, 3 and R = (1, 1), (2, 2), (3,3), (1, 2), (2, 1), S = (1, 1), (2, 2), (3, 3), (2, 3), (3, 2) be two equivalence relations onA. Then

R ∪ S = (1, 1), (2, 2), (3, 3), (1, 2), (2, 1), (2, 3), (3, 2).

Here (1, 2), (2, 3) ∈ R∪ S but (1, 3) 6∈ R∪ S, i.e., R∪ S is not reflexive and hence it is notan equivalence relation on A.

Theorem 1.6.1 If R and S are two relations from A into B, then(a) R−1 ⊆ S−1 then R ⊆ S.(b) (R ∩ S)−1 = R−1 ∩ S−1.(c) (R ∪ S)−1 = R−1 ∪ S−1.(d) If R is reflexive, R−1 is also reflexive.(e) R is symmetric iff R = R−1.

Ex 1.6.14 f R is an equivalence relation on a set A, then prove that R−1 is also an equiv-alence relation on A.

Solution: Since R is an equivalence relation on A, R is reflexive, symmetric and transitive.(i) Let a ∈ A and (a, a) ∈ R. Therefore, (a, a) ∈ R−1, i.e., R−1 is reflexive.(ii) Let (a, b) ∈ R−1. Then (b, a) ∈ R, a, b ∈ A.⇒ (a, b) ∈ R, since R is symmetric⇒ (b, a) ∈ R−1.

Thus R−1 is symmetric.(iii) Let (a, b), (b, c) ∈ R−1. Then (b, a), (c, b) ∈ R for a, b, c ∈ A.⇒ (c, b), (b, a) ∈ R⇒ (c, a) ∈ R, since R is transitive⇒ (a, c) ∈ R−1.

Therefore, R−1 is transitive. Hence R−1 is an equivalence relation on A.

28 Theory of Sets

Antisymmetric relation

A relation ρ is said to be antisymmetric if aρb, bρa⇒ a = b. For example,

(i) In <, the relation ≤ is antisymmetric, since a ≤ b, b ≤ a⇒ a = b.

(ii) In the set of all sets, the relation “ is a subset of ” is antisymmetric for A ⊆ B,B ⊆A⇒ A = B.

(iii) Let F consists of all real valued functions f(x) defined on [−1, 1] and let f(x) ≥ g(x)mean that f(x) ≥ g(x) for every x ∈ [−1, 1].

(iv) A relation ρ, defined on Z by aρb if and only if a is the divisor of b is not antisymmetric.It contain pairs of elements x, y which are incomparable, in the sense that neither x ≤ ynor y ≤ x holds.

Partial ordering relation

A binary relation % defined on a non-empty set A is said to be a partial ordering relation, if %is reflexive, antisymmetric and transitive. There is also an alternative notation for specifyingpartial ordering relation.

(i) P1: Reflexive: x ≤ x, ∀x ∈ A.

(ii) P2: Anti symmetric: x ≤ y and y ≤ x⇒ x = y, for x, y ∈ A.

(iii) P3: Transitive: x ≤ y and y ≤ z ⇒ x ≤ z, for x, y, z ∈ A.

If x ≤ y and x 6= y, one writes x < y and says that x ‘is less than’ or ‘properly containedin’ y. The relation x ≤ y is also written y ≥ x and read y contains x (or includes x).Similarly, x < y is also written y > x. Strict inclusion is characterized by the anti-reflexiveand transitive laws.

Digraph of a relation

A relation ρ on a finite set A can be represented by a diagram called digraph or directedgraph. Draw a dot for each element of A. Now, join the dots corresponding to the elementsai and aj (ai, aj ∈ A) by an arrowed are if and only if ai ρ aj . In case of ai ρ ai for someai ∈ A, the arrowed arc from ai should come back to itself and forms a loop. The resulting

ssss

1 2

34@@

@@@

Figure 1.9: Directed graph of the relation ρ

diagram of ρ is called directed graph or digraph, the dots are called vertex and the arrowedarc is called directed edge or an arc. Thus the ordered pair (A, ρ) is a directed graph ofthe relation ρ. Here two vertices ai, aj ∈ A are to be adjacent if aiρaj . For example, letA = 1, 2, 3, 4 and a relation on A be ρ = (1, 1), (2, 2), (4, 4), (2, 3), (3, 2), (3, 4), (4, 1),(4, 2).

Relation 29

The directed graph (A, ρ) is shown in Fig. 1.9. From the digraph representation of arelation R one can test wether it is an equivalence relation or not. The following test are tobe performed to test an equivalence relation.

(i) The relation is reflexive iff there is a loop on each vertex of the digraph.

(ii) The relation is symmetric iff whenever there is an arc from a vertex a to another vertexb (where a, b are two vertices), there should be an arc from b to a.

(iii) The relation is transitive iff whenever there is an arc from vertex a to a vertex b, anarc from b to a vertex c, then there must be an arc from a to c.

Ex 1.6.15 Find the relation determined by Fig. 1.10.

Solution: Since aiρaj iff there is an edge from ai to aj .

s sss n

ma

b

c

d

Figure 1.10:Thus ρ = (a, a), (a, c), (b, c), (c, b), (c, c), (d, c), (d, d).

Matrix of a relation

A relation between two finite sets can also be represented by a matrix. Let A = a1, a2, . . .,am and B = b1, b2, . . ., bn. The matrix for the relation is denoted by Mρ = [mij ]m×n,where,

mij =

1, if (ai, bj) ∈ ρ,0, if (ai, bj) 6∈ ρ.

The matrix Mρ is called the matrix of ρ. From this matrix one check the property of ρ.

Ex 1.6.16 Let A = a, b, c and B = 1, 2, 3, 4. Consider a relation ρ from A into B as ρ= (a, 1), (a, 3), (b, 2), (b, 3), (b, 4), (c, 1), (c, 2), (c, 4). Then the matrix Mρ is

Mρ =

1 2 3 4abc

1 0 1 00 1 1 11 1 0 1

.From the matrix Mρ one can draw the digraph of the relation and conversely, from thedigraph the matrix Mρ can also be obtained.

Ex 1.6.17 Let A = 2, 4, 6 and let R be given by the digraph shown in Fig. 1.11.s ssm

m2 4

6

Figure 1.11:

Find the matrix Mρ and the relation ρ.

30 Theory of Sets

Solution: The matrix Mρ = [mij ] where,

mij =

1, if (ai, bj) ∈ ρ,0, if (ai, bj) 6∈ ρ.

Therefore, Mρ =

0 1 11 1 00 1 1

and the relation is ρ =

(2, 4), (2, 6), (4, 2), (4, 4), (6, 4), (6,

6)

.

1.7 Equivalence Class

Let ρ be an equivalence relation on a non-empty set A. Then for each a ∈ A, the elementx ∈ A satisfying xρa constitute a subset of A. This subset is called an equivalence class orequivalence set of a with respect to ρ. The equivalence class of a is denoted by cl(a), classa or (a) or [a] or Aa or Ca, i.e.,

[a] = x : x ∈ A and (x, a), i.e., xρa ⊂ A. (1.12)

Again, the set of all equivalence classes of elements of A under the equivalence relation ρ onA is called quotient set denoted by A/ρ, i.e.,

A/ρ = [a] : a ∈ A.

For example, let A = 1, 2, 3, 4 and ρ = (1, 1), (1, 2), (2, 1), (2, 2), (3, 3), (4, 3), (4, 4) be anequivalence relation on A. This equivalence relation has the following equivalence classes:

[1] = 1, 2, [2] = 1, 2, [3] = 3, 4, [4] = 3, 4and the quotient set is A/ρ =

[1], [2], [3], [4]

.

Property 1.7.1 Given that ρ is an equivalence relation, i.e., it is reflexive. Therefore,(a, a) ∈ ρ for all a ∈ A. Also,

[a] = x : x ∈ A and (x, a) ∈ ρ.

Therefore, from the definition aρa ⇒ a ∈ [a] for all a ∈ A. Hence [a] 6= φ for all a ∈ ρ. So[a] is a non-empty subset of A.

Property 1.7.2 Let ρ be an equivalence relation on the set A. Then if b ∈ [a] then [a] = [b],where a, b ∈ A.

Proof: Let b ∈ [a]. Then bρa holds. Let x be an arbitrary element of [b], then

x ∈ [b] and b ∈ [a] ⇒ xρb and bρa⇒ xρa; transitive property⇒ aρx; symmetric property⇒ x ∈ [a].

Thus [b] ⊆ [a]. Similarly, it can be prove that [a] ⊆ [b]. Hence we arrive at a conclusion that[a] = [b].

Property 1.7.3 Two equivalence classes are either equal or disjoint.

Equivalence Class 31

Proof: If for any two classes [a] and [b], [a]∩[b] = φ, then the theorem is proved. [a]∩[b] 6= φ,then let x ∈ [a] ∩ [b]. Then x ∈ [a] and x ∈ [b]. Therefore,

x ∈ [a], x ∈ [b] ⇒ xρa and xρb hold⇒ aρx and xρb; symmetric property,⇒ aρb; transitive property,⇒ a ∈ [b].

Hence by the previous property, [a] = [b]. Hence for all a, b ∈ A, either [a] = [b] or [a]∩[b] = φ,i.e., equivalence classes are either equal or disjoint.

Property 1.7.4 aρb if and only iff a, b belong to the same equivalence classes.

Proof: We know, a, b ∈ [α]. Then by definition αρa and αρb. Hence aρα; αρb (bysymmetric property). Hence aρb (by transitive property).

Conversely, let aρb, then by definition, b ∈ [a]. Also a ∈ [a] (since aρa). Hence a, b belongto the same class.

Property 1.7.5 Let ρ be an equivalence relation on the set A. Then A =⋃

a∈A

[a].

Proof: Let a ∈ A. Then a ∈ [a] ⊆⋃

a∈A

, therefore A ⊆⋃

α∈A

[a]. Again, if X =⋃

a∈A

[a] then

all elements of X belong to A. Therefore,X ⊆ A, i.e.,

⋃a∈A

[a] ⊆ A.

Hence A =⋃

a∈A

[a].

1.7.1 Partitions

Let ρ be an equivalence relation on a non empty set S, then the ρ equivalence classes areeach non-empty and pairwise disjoint. Further, the union of the family of the classes is theset S.

Let S = S1 ∪ S2 ∪ . . ., where S1, S2, S3, . . . are the non-empty subsets of S. Precisely, apartition of S is a collection ρ = S1, S2, S3, . . . of nonempty subsets of S such that

(i) Each x in S belongs to one of the Si.

(ii) The sets of Si are mutually disjoint; that is, if

Si 6= Sj then Si ∩ Sj = φ.

The set of all partitions of a set S is denoted by π(S). The disjoint sets S1, S2, S3, . . .are called cells or blocks. For example,

(i) Consider the subcollection1, 3, 5, 2, 4, 6, 8, 7, 9

of subset of S = 1, 2, · · · , 8, 9,

then it is a partition of S.

(ii) Consider the subcollection1, 3, 5, 2, 6, 4, 8, 9

of subset of S = 1, 2, · · · , 8, 9,

then it is not a partition of S since 7 in S does not belong to any of the subsets.

(iii) Consider the subcollection1, 3, 5, 2, 4, 6, 8, 5, 7, 9

of subset of S = 1, 2, · · · , 8, 9,

then it is not a partition of S since 1, 3, 5 and 5, 7, 9 are not disjoint.

32 Theory of Sets

Another examples are

(i) Let Z, Z−, Z+, Zc, Z0 be the set of integers, negative integers, positive integers, eveninteger and odd integers. Then the partitions of Z are Z−, 0,Z+, Zc,Z0.

(ii) Let Z be the set of integers. Consider a relation ρ = (a, b) : (a− b) is divisible by 5.It is shown that ρ is an equivalence relation on Z. This relation partitions the set Zinto five equivalence classes as [a] = x : xρa i.e., x− a is divisible by 5 . Thus,

[0] = x : x ∈ Z and xρ0= x : x− 0 is divisible by 5= x : x− 0 = 5k, k ∈ Z= . . . ,−10,−5, 0, 5, 10, . . .

Similarly,

[1] = x : x− 1 = 5k, k ∈ Z= . . . ,−9,−4, 1, 6, 11, . . .

[2] = . . . ,−8,−3, 2, 7, 12, . . .[3] = . . . ,−7,−2, 3, 8, 13, . . .[4] = . . . ,−5,−1, 4, 9, 14, . . . .

It is observed that [0]∪ [1]∪ [2]∪ [3]∪ [4] = Z and any two of them are disjoint. Thusa partition of Z is [0], [1], [2], [3], [4].

Ex 1.7.1 Find all partitions of S = a, b, c.

Solution: Since S = a, b, c, the partition π(S) is given by

π(S) =a, b, c

,a, b, c

,a, b, c

,a, c, b

,a, b, c

.

Ex 1.7.2 Determine whether the sets φ, 1, 3, 5, 8, 2, 4, 6, 9, 5, 9, 11, 12 is a partition ofthe set S = 1, 2, 3, · · · , 12.

Solution: Let S1 = φ, S2 = 1, 3, 5, 8, S3 = 2, 4, 6, 9, S4 = 5, 9, 11, 12. We see that,S1 ∪ S2 ∪ S3 ∪ S4 = S, but S2 ∩ S4 = 5 6= φ. Hence the given subsets

φ, 1, 3, 5, 8, 2, 4, 6, 9, 5, 9, 11, 12do not form a partition on S.

Theorem 1.7.1 Fundamental theorem of equivalence relation: An equivalence rela-tion ρ on a set A gives a partition of A into mutually disjoint equivalence classes, such thata, b belongs to the same class if and only iff aρb.

Proof: First, we shall define class [a] (for a given a ∈ A) as [a] = x/aρx, x ∈ A. Let Pbe the set of all distinct equivalence classes in A. If a ∈ A, then since a ∈ [a] and [a] ∈ P .Hence a belongs to the union of all members of P . Hence the union of all members of Pis A. Also the members of P are all pairwise disjoint. Hence P is a partition of A. Nowfor two elements a, b ∈ A, it can be shown that aρb, if and only if they belongs to the sameequivalent classes.Converse theorem: A partition P of a set A gives an equivalence relation for which thenumbers of P are the equivalence classes.

We define a relation ρ in A by aρb if and only if a, b belongs to the same class of P . a, abelongs to the same class of the partition P . Hence aρa, ∀a ∈ A, and so ρ is reflexive. Let

Equivalence Class 33

a, b ∈ S and aρb. Now,

aρb⇒ a, b belongs to the same class.

⇒ b, a belongs to the same class.

⇒ bρa.

Hence ρ is symmetric. Let a, b ∈ S and aρb, bρc hold. Then,

aρb, bρc⇒ a, b and b, c belongs to the same class.

Since b belongs to both classes, it follows that these two classes are the same subset of thepartition P and consequently, a, c belong to one and the same subset of P . Thus

a, c belongs to the same class⇒ aρc.

Hence ρ is transitive. Hence ρ is an equivalence relation on A. And the equivalence classesare just the classes of P . This completes the proof.

Ex 1.7.3 In Z, define aρb iff a−b is divisible by an integer 7. Show that ρ is an equivalencerelation. Hence find the corresponding partition of Z.

Solution: (i) a ∈ Z, and a − a = 0 is divisible by 7. Hence aρa; ∀a ∈ Z and hence ρ is areflexive. Using the definition,

aρb⇒ a− b is divisible by 7⇒ b− a is divisible by 7⇒ bρa; ∀a, b ∈ Z.

Hence ρ is a symmetric. Now,

aρb, bρc⇒ a− b, b− c is divisible by 7⇒ (a− b) + (b− c) is divisible by 7⇒ (a− c)is divisible by 7⇒ aρc; ∀a, b, c ∈ Z.

Hence ρ is transitive and consequently, it is an equivalence relation. For this relation, theequivalent classes are

Ep = 7k + p; k ∈ Z and p = 0, 1, 2, 3, 4, 5, 6.

Therefore, the distinct equivalent classes can be written as

E0 = · · · ,−14,−7, 0, 7, 14, · · ·; E1 = · · · ,−13,−6, 1, 8, 15, · · ·E2 = · · · ,−12,−5, 2, 9, 16, · · ·; E3 = · · · ,−11,−4, 3, 10, 17, · · ·E4 = · · · ,−10,−3, 4, 11, 18, · · ·; E5 = · · · ,−9,−2, 5, 12, 19, · · ·E6 = · · · ,−8,−1, 6, 13, 20, · · ·.

Now, we see that, Z = E0 ∪ E1 ∪ · · · ∪ E6 and Ei ∩ Ej = φ, for i 6= j, i.e., the classes aremutually disjoint. Therefore, E0, E1, E2, E3, E4, E5, E6 is a partition of Z.

Ex 1.7.4 Consider the equivalence relation ρ on Z by xρy if and only if x2−y2 is a multipleof 5. Hence find the corresponding partition of Z.

34 Theory of Sets

Solution: If x ∈ Z, then we have x = 5k+r, where 0 ≤ r. Therefore x2 = 25k2+10kr+r2 ⇒x2 ≡ r2(mod5). For,

r = 0, r2 ≡ 0(mod5); r = 1, r2 ≡ 1(mod5)r = 2, r2 ≡ 4(mod5); r = 3, r2 ≡ 9 ≡ 4(mod5)r = 4, r2 ≡ 16 ≡ 1(mod5).

Hence x2 ≡ 02 or 12 or 22(mod5). Consequently, there are only three congruence classes[0], [1] and [2]. Here,

[0] = a ∈ Z : a2 ≡ 0(mod5) = 5k : k ∈ Z = A0(say),[0] = a ∈ Z : a2 ≡ 1(mod5) = 5k + 1 : k ∈ Z ∪ 5k + 4 : k ∈ Z = A1(say),[2] = a ∈ Z : a2 ≡ 4(mod5) = 5k + 2 : k ∈ Z ∪ 5k + 3 : k ∈ Z = A2(say).

Hence A0, A1, A2 is the corresponding partition of Z.

Theorem 1.7.2 The set Z/(n) of residue classes is a finite set of order n.

We shall first show that a−b is divisible by n iff a, b when divided by n have same remainder.Let a = np+ r1 where p, r1 are integers and 0 ≤ r1 < n and let b = nq + r2 where q, r2 areintegers and 0 ≤ r2 < n. Hence,

a− b = n(p− q) + (r1 − r2) (1.13)

If r1 = r2, this relation shows that a− b is divisible by n. Conversely, if a− b is divisible byn, then (1.13) shows that r1 − r2 is divisible by n. But 0 ≤ |r1 − r2| < n. Hence,

r1 − r2 = 0 i.e. r1 = r2.

Hence the result is proved. Now, all the possible remainders are 0, 1, 2, . . . , (n − 1). Ac-cordingly we get n distinct classes in Z/(n). These are denoted by (0), (1), (2), . . . , (n− 1)(called class 0,class 1,. . . etc.) Hence Z/(n) = (0), (1), (2), . . . , (n− 1) and it is a finite setof order n.

1.8 Poset

Let ρ be a relation on a set A satisfying the following three properties:

(i) (Reflexive): For any x ∈ A, we have aρa.

(ii) (Antisymmetric): If aρb and bρa, then a = b.

(iii) (Transitive): If aρb and bρc, then aρc.

Then ρ is called a partial order or, simply an order relation, and ρ is said to define a partialordering of A. The set A with the partial order is called a partially order set or, simply, anorder set or poset. Thus a non-empty set A together with a partial ordering relation ≤ onA is called a partial ordered set (poset) and is usually denoted by (A,≤). For example,

(i) Consider the setN of positive integers. Herem ≤ nmeans “m divides n”, writtenm|n,if there exists an integer p such that mp = n, m, n ∈ N . For example, 2|4, 3|12, 7|21,and so on. Thus relation of divisibility is a partial ordering and (N ,≤) is a poset.

(ii) Let A be the set of all positive divisors of 72, then (A,≤) is a poset, where m ≤ nmeans ‘m is a divisor of n’, for m,n ∈ A.

Poset 35

(iii) Let P be the set of all real valued continuous functions defined on [0, 1]. Let f, g ∈ Pand f ≤ g mean that f(x) ≤ g(x),∀x ∈ [0, 1]. Then, (P,≤) is a poset.

(iv) Let U be a non empty universal set, i.e., collection of sets and A be the set of allproper subsets of U . The relation P ≤ Q means P ⊆ Q of set inclusion, i.e., P ⊆ Q,for P,Q ∈ A is a partial ordering of U . Specially, P ⊆ P for any set P ; if P ⊆ Q andQ ⊆ P then P = Q; and if P ⊆ Q and Q ⊆ R then P ⊆ R. Therefore (A,≤) is aposet.

(v) (<,≤) is a poset, where m ≤ n means ‘m is less than or equal to n’, for m,n ∈ <.

Ex 1.8.1 Let A = 0, 1 and α = (a1, a2, a3), β = (b1, b2, b3) ∈ A3. Define a relation ρ onA3 by αρβ if and only if ai ≤ bi, for i = 1, 2, 3. Prove that (A3, ρ) is a poset.

Solution: Here, A = 0, 1 and the elements of A3 are of the form (a1, a2, a3), so, the ele-ments ofA3 can be written as (0, 0, 0), (1, 0, 0), (0, 1, 0), (0, 0, 1), (1, 1, 0), (1, 0, 1), (0, 1, 1), (1, 1, 1).Here the relation is defined as αρβ if and only if ai ≤ bi, for i = 1, 2, 3. Now,

αρα,∀α = (a1, a2, a3) ∈ A3.Hence ρ is reflexive. Let us now assume that α, β ∈ A3 and αρβ, βρα both hold. Then

ai ≤ bi and bi ≤ ai ⇒ ai = bi; i = 1, 2, 3.⇒ αρβ, βρα⇒ α = β.

Hence ρ is antisymmetric. Let α = (a1, a2, a3), β = (b1, b2, b3), γ = (c1, c2, c3) ∈ A3 and αρβand βργ both hold. Then

ai ≤ bi and bi ≤ ci ⇒ ai ≤ ci; for, i = 1, 2, 3.or, αρβ and βργ ⇒ αργ;∀α, β, γ ∈ A3

and so the relation ρ is transitive. As the ρ is reflexive, antisymmetric and transitive, so(A3,≤) is a poset.

1.8.1 Dual Order

Let ≤ be any partial ordering of a set S and (S,≤) be a poset. Let % be a binary relation onS such that for a, b ∈ S, a%b if and only if b ≤ a. Then the relation % is called the converseof the partial ordering relation ≤ and is denoted by ≥ . It may be easily seen that (S,≥) isalso a poset. It follows that we can replace the relation ≤ in any theorem about posets bythe relation ≥ throughout without affecting the truth. This is known as principle of duality.This duality principle applies to algebra, to projective geometry and to logic.

Ex 1.8.2 Let (A,≤) be a poset. Define a relation ≥ on A by a ≥ b if and only if b ≤ a, fora, b ∈ A, then show that (A,≥) is a poset.

Solution: The relation ≥ on A is defined as a ≥ b if and only if b ≤ a, for a, b ∈ A.

(i) Since, a ≤ a,∀a ∈ A, so, a ≥ a,∀a ∈ A and hence ≥ is reflexive.

(ii) Let a, b ∈ A be such that a ≥ b, b ≥ a, then b ≤ a and a ≤ b. Therefore, b = a, as ≤ isantisymmetric. Therefore, a ≥ b, b ≥ a⇒ a = b,∀a, b ∈ A. Hence, ≥ is antisymmetric.

(iii) Let a, b, c ∈ A be such that a ≥ b, b ≥ c, then b ≤ a and c ≤ b, i.e., c ≤ b and b ≤ a.This implies that c ≤ a since ≤ is transitive, i.e., a ≥ c. Therefore, a ≥ b, b ≥ c⇒ a ≥c,∀a, b, c ∈ A. Hence, ≥ is transitive.

As the ≥ is reflexive, antisymmetric and transitive, so (A,≥) is a poset.

36 Theory of Sets

1.8.2 Chain

Let (S,≤) be a poset. Given x, y ∈ S, let x ≤ y or y ≤ x. The poset which satisfies thisrelation is said to be ‘simply’ or ‘totally’ or ‘linearly’ ordered and is called a chain. In otherwords, of any two distinct elements in a chain, one is less and the other greater. A subsetof S is called an antichain if no two distinct elements in the subset are related. A poset(S,≤) is called a totally ordered set or simply an ordered set if S is a chain and in this casethe binary relation ≤ is called a total ordering relation.

(i) Any subset S of a poset P is itself a poset under the same inclusion relation (restrictedto S).

(ii) Every subset of a linearly ordered set S must be linearly ordered i.e., any subset of achain is a chain.

(iii) Although an ordered set S may not be linearly ordered, it is still possible for a subsetA of S to be linearly ordered.

We frequently refer to the number of elements in the chain as the length of the chain.Consider the following examples:

(i) Consider the set N of positive integers ordered by divisibility. Then 21 and 7 are com-parable since 7|21. On the other hand, 3 and 5 are non-comparable, since neither 3|5nor 5|3. Thus N is not linearly ordered by divisibility. Observe that A = 2, 6, 12, 36is a linearly ordered subset of N since 2|6, 6|12 and 12|36.

(ii) The set N of positive integers with the usual order ≤ is linearly ordered and henceevery ordered subset of N is also linearly ordered.

(iii) The power set P (A) of a set A with two or more elements is not linearly ordered byset inclusion. For instance, suppose a, b ∈ A. Then a and b are non-comparable.Observe that the empty set φ, a and A do not form a linearly ordered subset of P (A)since φ ⊆ a ⊆ A. Similarly, φ, b and A do not form a linearly ordered subset ofP (A).

1.8.3 Universal Bounds

In any poset P = (S,≤), the elements O and I of S, when they exist, will be universalbounds of P , if for any element x ∈ S, we have,

O ≤ x and x ≤ I, i.e., O ≤ x ≤ I,∀x ∈ P. (1.14)

We call these elements O and I as the least element and the greatest element of S.Lemma : A given poset (S,≤) can have at most one least element and at most one greatestelement.Proof: Let O and O∗ be universal lower bounds of (S,≤). Then, since O is the universallower bound, we have O ≤ O∗, and since O∗ is the universal lower bound we have O∗ ≤ O.Hence by the hypothesis P2, we have O = O∗ and similarly, I = I∗.Posets need not have any universal bounds. Thus under the usual relation of inequality, thereal numbers form a poset (<,≤) which has no universal bounds (unless −∞ and +∞ areadjoint to form extended reals).

Poset 37

1.8.4 Covering Relation

Let S be a partially ordered set, and suppose a, b ∈ S. We say that a is an immediateprecedence of b, or that b is an immediate successor of a, or that b is a cover of a, writtena << b, if b < a (i.e., b ≤ a and a 6= b), i.e., if there is no element c ∈ S such that b < c < aholds.

For example, (N ,≤) is a poset, where m ≤ n means m|n, for m,n ∈ N . Therefore, in(N ,≤), 6 covers 2, 10 covers 2, but 8 does not cover 2, as 2 < 4 < 8, although 2 < 8.

Suppose S is a finite partially ordered set. Then the order on S is completely knownonce we know all pairs a, b ∈ S such that a << b, that is, once we know the relation <<on S. This follows from the fact that x < y if and only if x << y or there exist elementsa1, a2, · · · , am in s such that

x << a1 << a2 << · · · << am << y.

Hasse diagram

The Hasse diagram of a finite partially ordered set S is a directed graph whose vertices arethee elements of S and there is a directed edge from a to b whenever a << b in S.

A convenient way of displaying the ordering relation among the elements of an orderedset is done by means of a graph whose vertices represent the elements of a set. Thus wedefine a graph, whose vertices are different elements a, b, c, · · · of S in which a and b arejoined by a segment if and only if a covers b or b covers a. If the graph is so drawn thatwhenever a covers b, the vertex a is at a higher level than the vertex b, then the graph iscalled the Hasse diagram of S. The Hasse diagram of a poset S is a picture of S; hence itis very useful in describing types of elements of S. Sometimes we define a partially orderedset by simply presenting its Hasse diagram, and note that they need not be connected.

Ex 1.8.3 Let A = a, b, c, d, e. The diagram in Fig. 1.12 defines a partial order on A in

@@

@

@

@@

AAAA

d

a

e

b c

Figure 1.12: Hasse diagramthe natural way. That is, d ≤ b, d ≤ a, e ≤ c, and so on.

Ex 1.8.4 Let S = a, b, c, then the power set ℘(S), i.e., the set of all subsets of S has

QQQ

@@@

QQQ

@@@a

a, b

a, b, c b, cc

φ

b

a, c

Figure 1.13: Hasse diagramelements φ, a, b, c, a, b, a, c, b, c, a, b, c. The Hasse diagram of this poset isshown in the Fig. 1.13.

Ex 1.8.5 Let S be the set of all positive divisors of 12, i.e., S = 1, 2, 3, 4, 6, 12. Here 1covers 2, 2covers 4, 4 coves 12, 1covers 3, 3 covers 6, 6 covers 12, 2 covers 6, but 2 does notcover 12, as 2 < 6 < 12. The covering diagram of this poset (S,≤) is given in the Fig. 1.14.

38 Theory of Sets

%%%

3

1 2 4

6 12

Figure 1.14: Hasse diagram

Ex 1.8.6 Let S be the set of all positive divisors of 30, i.e., S = 1, 2, 3, 5, 6, 10, 15, 30.

QQQ

@@@

QQQ

@@@1

5

15 30

6

2

10

3

Figure 1.15: Hasse diagramThe covering diagram of this poset (S,≤) is given in the figure 1.15.

Ex 1.8.7 A partition of a positive integer m is a set of positive integers whose sum is m.For instance, there are seven partitions of m = 5 as follows:

5, 3− 2, 2− 2− 1, 1− 1− 1− 1− 1, 4− 1, 3− 1− 1, 2− 1− 1− 1.

We order the partitions of an integer m as follows: A partition P1 precedes a partition P2 ifthe integers in P1 can be added to obtain the integers in P2 or equivalently, if the integers in

Q

QQ

bb

@@

@@

4− 1 3− 2

3− 1− 1 2− 2− 1

5

2− 1− 1− 1

1− 1− 1− 1− 1Figure 1.16: Hasse diagram

P2 can be further subdivided to obtain the integers in P1. For example, 2− 2− 1 precedes3 − 2 as 2 + 1 = 3. On the other hand, 3 − 1 − 1 and 2 − 2 − 1 are non-comparable. Fig.1.16 gives the Hasse diagram of the partitions of m = 5.

1.8.5 Maximal and Minimal Elements

Let S be a partial ordered set. An element a of a poset (S,≤) is minimal, if no other elementof S strictly precedes (is less than) a. Similarly, an element b is called a maximal element ifno element of S strictly succeeds (is larger than) b. For example,

(i) (N ,≤) is a poset, where m ≤ n means m|n, for m,n ∈ N . This poset (N ,≤) containsno greatest element and no maximal element. The least element is 1 and 1 is the onlyminimal element.

(ii) Let U be a non empty universal set and A be the set of all proper subsets of U , then(A,≤) is a poset, where P ≤ Q means ‘P is a subset of Q’, i.e., P ⊆ Q, for P,Q ∈ A.This poset (A,≤) contains no greatest element and no least element. The the minimalelements are 1, 2, 3 and the three maximal elements are 1, 2, 2, 3, 1, 3.

Poset 39

Geometrically speaking, a is a minimal element if no edge enters a (from below), and b is amaximal element if no edge leaves b (in the upward direction). An element a ∈ S is calleda first element if a ≤ x, for every element x ∈ S, i.e., if a precedes every other element inS. Similarly, an element b in S is called a least element if y ≤ b for every y ∈ S, i.e., if bsucceeds every other element in S. Note the following:

(i) If S is infinite, then S may have no minimal and no maximal element. For instance,the set Z of integers with the usual order ≤ has no minimal and no maximal element.

(ii) If S is finite, then S must have a least one minimal element and at least one maximalelement.

(iii) S can have more than one minimal and more than one maximal element. For example,Let X = 1, 2, 3, S = ρ(X) − φ,X; ∀A,B ∈ S,AρB ⇔ A ⊂ B. Here 1, 2, 3are minimal in poset (S, ρ). 1, 2, 2, 3, 1, 3 are maximal in poset (S, ρ).

(iv) S can have at most one first element, which must be a minimal element, and S cangave at most one last element, which must be a maximal element. Generally speaking,S may have neither a first nor a last element, even when S is finite.

(v) The least element in a poset is the minimal element and the greatest element in a

HHTT

HHH

b f

h

j

c

a

d ge

i

k


poset is a maximal element but the converse is not true. Let us consider the posetwhose Hasse diagram is given in Fig. 1.17, a, b, e are minimal elements and j, k aremaximal elements. Here f covers c but f does not cover a.

(vi) Let A = a, b, c, d, e. The diagram in Fig. 1.12 defines a partial order on A in thenatural way. That is, d ≤ b, d ≤ a, e ≤ c, and so on. A has two minimal elements, dand e, and neither is a first element. A has only one maximal element a, which is alsoa least element.

(vii) Let A = 1, 2, 3, 4, 6, 8, 9, 12, 18, 24 be ordered by the relation ‘x divides y’. The Hasse

@@

@@

@

@

TT

!!

24

8

4

2

13

9

18

6

12


diagram is given in Fig. 1.18. Unlike rooted trees, the direction of a line in the diagramof a poset is always upward. A has two maximal elements 18 and 24 and neither is alast element. A has only one minimal element, 1, which is also a first element.

40 Theory of Sets

v u z y x


(viii) The diagram of a finite linearly ordered set, i.e., a finite chain, consists simply of onepath. For example Fig. 1.19 shows the diagram of a chain with five elements. Thechain has one minimal element, x, which is a first element, and one maximal element,v, which is a last element.

(ix) Let A be any non empty set and let P (A) be the power set of A ordered by setinclusion. Then the empty set φ is a first element of P (A) since, for any set X, wehave φ ⊆ X. Moreover, A is a least element of P (A) since every element Y of P (A) is,by definition, a subset of A, that is, Y ⊆ A.

1.8.6 Supremum and Infimum

Let A a subset of a partially ordered set (S,≤). An element M ∈ S is called to be a upperbound of a subset A if M succeeds every element of A, i.e., if, for every x ∈ A, we havex ≤M. If an upper bound of A precedes every other upper bound of A, then it is called thesupremum or least upper bound of A and is denoted by sup(A) and we write sup(A) = M.

In particular, let (S,≤) be any poset and let a, b ∈ S be given. Then an element d ∈ S iscalled the glb or meet of a and b, when

d ≤ a, d ≤ b and x ≤ a, x ≤ b⇒ a ≤ d, (1.15)

and we write d = a ∧ b.An element m ∈ S is called to be a lower bound of a subset A of a poset S if m precedes

every element of A, i.e., if, for every y ∈ A, we have m ≤ y. If a lower bound of A succeedsevery other lower bound of A, then it is called the infimum or greatest lower bound of Aand is denoted by inf(A) and we write inf(A) = m.

Dually, an element s ∈ S is called the lub or join of a and b, when

a ≤ s, b ≤ s and a ≤ x, b ≤ x⇒ s ≤ x. (1.16)

In this case, we write s = a ∨ b.Below are some examples:

(i) Let S = a, b, c, d, e, f be ordered as pictured in Fig. 1.20, and let A = b, c, d.

@@

,,

,,

ll

lle f

c d

b

a


The upper bounds of A are e and f succeed every element in A. The lower boundsof A are a and b since only a and b precede every element of A. Note that e and fare non-comparable; hence sup(A) does not exist. However, b also succeeds a, henceinf(A) = b.

Poset 41

(ii) Let (N ,≤) be a poset, and ordered by divisibility, where m ≤ n means m|n, form,n ∈ N . The greatest common divisor of m and n in N , denoted by gcd(m,n) isthe largest integer which divides mand n. The least common multiple of m and n,denoted by lcm[m,n] is the smallest integer divisible by both m and n. From numbertheory, every common divisor of a and b divides gcd(m,n) and lcm[m,n] divides everymultiple of m and n. Thus

gcd(m,n) = inf(m,n) and lcm[m,n] = sup(m,n).

In other words, inf(m,n) and sup(m,n) do exist for every pair of elements of Nordered by divisibility.

(iii) For any positive integer m, we will let Dm denote the set of divisors of m ordered

"""

@@@

@@@

JJJ

aaa

@@

bbb

12

4

12

36

18

9

3

6


by divisibility. The Hasse diagram of D36 = 1, 2, 3, 4, 6, 9, 12, 18, 36 appears in Fig.1.21. Again, inf(m,n) = gcd(m,n) and sup(m,n) = lcm[m,n] exist for any pair m,nin Dm.

(iv) Let X be a non empty set and P (X) be the power set of X. Then, (P (X),≤) is aposet, where A ≤ B means ‘A is a subset of B’, i.e., A ⊆ B, for A,B ∈ P (X). In thisposet (P (X),≤), the lub of A and B is the union of A,B, i.e. A∪B and the glb of Aand B is the intersection of A,B, i.e., A ∩B.

It is important to note that, if the elements a, b in a poset (S,≤) have an upper bound (a lower bound) then they may not has a least upper bound ( a greatest lower bound). For aposet whose Hasse diagram is shown if figure (1.22), the set A = e, d, g has four elements

"""

aaaa

aaa b c

de f

g


a, b, c, d as upper bounds and d is the lub of A. But the subset A′ = a, f, c has no upperbound.

Theorem 1.8.1 Let (A,≤) be a poset and a, b ∈ A. Then one of the relations (i) a ≤ b,(ii) a ∧ b = a and (iii) a ∨ b = b implies the other two.

Proof: Given that, (A,≤) be a poset and a, b ∈ A, then a ≤ a and a ≤ b. This means thata is the lower bound od a and b. Let m be any lower bound of a and b, then by definition,

42 Theory of Sets

m ≤ a,m ≤ b. Since a is a lower bound of a, b and any lower bound m ≤ a, a is the greatestlower bound of a, b, i.e., a = a ∧ b and consequently, (i) ⇒ (ii).

Let the relation (ii), i.e., a ∧ b = a holds. Then, by definition, a is the glb of a and b.Hence, a ≤ a, a ≤ b. Also, we have, b ≤ b. Now, a ≤ b and b ≤ b gives b is an upper boundof a and b. Let n be any upper bound of a and b, then a ≤ n and b ≤ n. Since, b is an upperbound of a and b and b ≤ n, for any upper bound, so, b is the lub of a, b, i.e., a ∨ b = b andconsequently, (ii) ⇒ (iii).

Let the relation (iii), i.e., a ∨ b = b holds. Then, by definition, b is the lub of a and b.As, b is an upper bound of a, b, we have a ≤ b and consequently, (iii) ⇒ (i).

By the transitivity of the implication ⇒, it follows that, (i) ⇒ (ii) and (iii); (ii) ⇒ (iii);(iii) ⇒ (i) and (ii). Hence the theorem.

1.9 Lattices

Let L be a nonempty set closed under two binary operations called meet and join, denotedrespectively by ∧ and ∨. Then L is called a lattice if the following axioms hold where a, b, care elements in L:

(i) (Commutative law): a ∧ b = b ∧ a; a ∨ b = b ∨ a.

(ii) (Associative law): (a ∧ b) ∧ c = a ∧ (b ∧ c); (a ∨ b) ∨ c = a ∨ (b ∨ c).

(iii) (Absorption law): a ∧ (a ∨ b) = a; a ∨ (a ∧ b) = a.

We will sometimes denote the lattice by (L,∧,∨) when we want to show which operationsare involved. A chain (L,≤) is a lattice since lub(a, b) = b and glb(a, b) = a when a ≤ b andlub(a, b) = a and glb(a, b) = b when b ≤ a. For example Fig. 1.19 shows the diagram of achain with five elements, which is a lattice. Below are some examples:

(i) (N ,≤) is a poset, wherem ≤ nmeansm|n, form,n ∈ N . This poset (N ,≤) is a lattice,where for any two elements m,n ∈ N , m ∨ n = lcm(m,n) and m ∧ n = gcd(m,n).

(ii) Let X be a non empty set and P (X) be the power set of X. Then, (P (X),≤) is aposet, where A ≤ B means ‘A is a subset of B’, i.e., A ⊆ B, for A,B ∈ P (X). Thisposet (P (X),≤) is a lattice, where for any two elements A,B ∈ P (X), A∨B = A∪Band A ∧B = A ∩B.

(iii) (Z,≤) is a poset, where m ≤ n means ‘m is less than or equal to n’, for m,n ∈ Z. Thisposet (Z,≤) is a lattice, where for any two elements m,n ∈ Z, m ∧ n = minm,nand m ∨ n = maxm,n. This is a chain.

(iv) Then, (P,≤) is a poset, where, P be the set of all real valued continuous functionsdefined on [0, 1] and f ≤ g, f, g ∈ P mean that f(x) ≤ g(x),∀x ∈ [0, 1]. This poset(P,≤) is a lattice, where for any two elements f, g ∈ P,

f ∨ g = maxf(x), g(x); f ∧ g = minf(x), g(x); x ∈ [0, 1].

(v) Let U be a non empty universal set and A be the set of all proper subsets of U , then(A,≤) is a poset, where P ≤ Q means ‘P is a subset of Q’, i.e., P ⊆ Q, for P,Q ∈ A.This poset (A,≤) is a not a lattice, as the pair of elements 1, 2 and 2, 3 has nolub and the pair of elements 1 and 2 has no glb.

(vi) For any positive integer m, we will let Dm denote the set of divisors of m ordered bydivisibility (|). Let D30 = 1, 2, 3, 5, 6, 10, 15, 30 denotes the set of all divisors of 30.The Hasse diagram of the lattice (D30, |) appears in Fig. 1.23. m∧n = lcm(m,n) and

Lattices 43

SS

S

@

@@

\

\\

AAAA

1

2

6

30

15

5

10

3


m∨n = gcd(m,n). traverse upwards from the vertices representing a and b and reacha meeting point of the two paths. The corresponding element is a ∧ b. By traversingdownwards onwards, we can get a ∨ b similarly.

Ex 1.9.1 Show that the poset (L,≤) represented by its Hasse diagram (Fig.1.24) is a lattice.

@@

@

@@

"""

c

cc

a

b c

d

e

g

f

Figure 1.24: Hasse diagram of Ex.1.9.1

Solution: We have to prove that each pair of elements of L = a, b, c, d, e, f, g has a luband glb.

a b c d e f ga (a, a) (a, b) (a, c) (a, d) (a, e) (a, f) (a, g)b (a, b) (b, b) (a, d) (b, d) (b, e) (a, g) (b, g)c (a, c) (a, d) (c, c) (c, d) (a, g) (c, f) (c, g)d (a, d) (b, d) (c, d) (d, d) (b, g) (c, g) (d, g)e (a, e) (b, e) (a, g) (b, g) (e, e) (a, g) (e, g)f (a, f) (a, g) (c, f) (c, g) (a, g) (f, f) (f, g)g (a, g) (b, g) (c, g) (d, g) (e, g) (f, g) (g, g)

If the element in a-row and b-column is (x, y), then x = a ∧ b and y = a ∨ b.

Ex 1.9.2 Show that the following posets are not lattices:

(i) L1 = (1, 2, · · · , 12, |).

(ii) L2 = (1, 2, 3, 4, 6, 9, |).

Solution: For (L1, |) 2 ∧ 7 = lcm2, 7 = 14 6∈ L1. So L1 is not a lattice.For (L2, |) 4 ∧ 6 = lcm4, 6 = 12 6∈ L2. So L2 is not a lattice.

Ex 1.9.3 Which of the following posets given in the Fig.1.25 is a lattice? Not a lattice?

Solution: P1 : d, e, f are upper bounds of b and c. f cannot be the lub of b and c sinced ≤ f and d 6= f . d or e cannot be the lub of b and c since d 6≤ e and e 6≤ d. So lub(b, c)

44 Theory of Sets

@@@

l

ll

###

b

bb

bb

AAAAQ

QQ

ee

SS

S

a

b

d

f

e

ca b a b

P1 P2 P3Figure 1.25: Hasse diagram of Ex.1.9.3

does not exist. Hence (L,≤) is not a lattice.P2 : It is not a lattice since there is no lower bound for (a, b) and hence glb(a, b) does notexist.P3 : It is not a lattice since glb(a, b) does not exist.

1.9.1 Lattice Algebra

The binary operations ∧ and ∨ in lattices have important algebraic properties, some of themanalogous to those of ordinary multiplication and addition.

Theorem 1.9.1 In any lattice the following identities hold:

(i) L1: x ∧ x = x and x ∨ x = x: Idempotency

(ii) L2: x ∧ y = y ∧ x and x ∨ y = y ∨ x, Commutativity

(iii) L3: (x ∧ y) ∧ z = x ∧ (y ∧ z) and (x ∨ y) ∨ z = x ∨ (y ∨ z), Associativity

(iv) L4: x ∧ (x ∨ y) = x and x ∨ (x ∧ y) = x, Absorption.

Moreover x ≤ y is equivalent to each of the conditions: x∧y = x and x∨y = y : Consistency.

Proof: By the principle of duality, which interchange ∧ and ∨ it suffices to prove one ofthe two identities in each of L1− L4.L1 : Since x ∧ y ≤ x, we have x ∧ x ≤ x. Also, d ≤ x, d ≤ y ⇒ d ≤ x ∧ y. It follows thatx ≤ x, x ≤ x⇒ x ≤ x ∧ x. Hence it follows that x ∧ x = x.L2 : Since the meaning of glbx, y is not attend by interchanging x and y, it follows thatx ∧ y = y ∧ x.L3 : Since both of x ∧ (y ∧ z) and (x ∧ y) ∧ z represent the glbx, y, z, the result follows.L4 : Since x ∧ (x ∨ y) is the lower bound of x and x ∨ y, we have x ∧ (x ∨ y) ≤ x. Sincex ≤ x, x ≤ x∨ y, it follows that x is the lower bound of x and x∨ y. And since x∧ (x∨ y) isthe glb of x and x ∨ y, we must have x ≤ x ∧ (x ∨ y). Hence it follows that x ∧ (x ∨ y) = x.

Theorem 1.9.2 In a lattice L, y ≤ z implies x ∧ y ≤ x ∧ z and x ∨ y ≤ x ∨ z, ∀x ∈ L.

Proof: Since y ≤ z, we have y = y ∧ z. Therefore,x ∧ y = x ∧ (y ∧ z) = (x ∧ x) ∧ (y ∧ z), as x ∧ x = x

= x ∧ (x ∧ (y ∧ z)); Associative= x ∧ ((x ∧ y) ∧ z); Associative= x ∧ ((y ∧ x) ∧ z); commutative= (x ∧ (y ∧ x)) ∧ z = ((x ∧ y) ∧ x) ∧ z= (x ∧ y) ∧ (x ∧ z).

Since x ∧ y is the glb of x ∧ y and x ∧ z, so x ∧ y ≤ x ∧ z. By the principle of duality,x ∨ y ≤ x ∨ z.

Lattices 45

Theorem 1.9.3 Any lattice satisfies the distributive inequalities (or semi distributive laws):

(i) x ∧ (y ∨ z) ≥ (x ∧ y) ∨ (x ∧ z).

(ii) x ∨ (y ∧ z) ≤ (x ∨ y) ∧ (x ∨ z).

Proof: We have, x ∧ y ≤ x and x ∧ y ≤ y ≤ y ∨ z. Therefore, x ∧ y is a lower bound of xand y ∨ z. Since x ∧ (y ∨ z) is the glb of x and y ∨ z, we have,

x ∧ y ≤ x ∧ (y ∨ z) and similarly, x ∧ z ≤ x(y ∨ z).

These shows that x ∧ (y ∨ z) is an upper bound of x ∧ y and x ∧ z. But (x ∧ y) ∨ (x ∧ z) isthe lub of x ∧ y and x ∧ z. Therefore

x ∧ (y ∨ z) ≥ (x ∧ y) ∨ (x ∧ z).

1.9.2 Sublattices

Let T is a non empty subset of a lattice L (T ⊆ L).We say T is sublattice of L if T is itselfa lattice (with respect to the operations of L). Therefore, T is a sublattice of L if and onlyif T is closed under the operations of ∧ and ∨, i.e.,

a, b ∈ T ⇒ a ∧ b ∈ T and a ∨ b ∈ T.

For example, the set Dm of divisors of m is a sublattice of the positive integers N underdivisibility.

Two lattices L and L′ are said to be isomorphic if there is a one-to-one correspondencef : L→ L′ such that

f(a ∧ b) = f(a) ∧ f(b) and f(a ∨ b) = f(a) ∨ f(b).

1.9.3 Bounded Lattices

A lattice L is said to have a lower bound 0 if for any element x ∈ L we have 0 ≤ x.Analogously, L is said to have an upper bound I if for any x ∈ L we have x ≤ I. We say Lis bounded if L has both a lower bound 0 and upper bound I. In such a lattice we have theidentities

a ∧ I = I, a ∨ I = a, a ∧ 0 = a, a ∨ 0 = 0,

for any element a ∈ L. For example:

(i) The nonnegative integers with the usual ordering 0 < 1 < 2 < · · · have 0 as the lowerbound but have no upper bound.

(ii) The lattice P (U) of all subsets of any universal set U is a bounded lattice with U asan upper bound and the empty set φ as a lower bound.

Therefore every finite lattice is bounded.

1.9.4 Distributive Lattices

A lattice L in which the distributive laws

(i) x ∧ (y ∨ z) = (x ∧ y) ∨ (x ∧ z)

(ii) x ∨ (y ∧ z) = (x ∨ y) ∧ (x ∨ z)

46 Theory of Sets

SS

JJJ

@

@

@

@

I

a b c

0

I

0

a

c

b

(a) (b)Figure 1.26: Hasse diagram

hold for all elements x, y, z in L is called a distributive lattice. We note that by the principleof duality the condition (i) holds if and only if (ii) holds. If the lattice L is not distributive,it is said to be non distributive. Fig. 1.26 is a non distributive lattice, since

a ∧ (b ∨ c) = a ∧ 0 = a but (a ∧ b) ∨ (a ∧ c) = I ∨ c = c.

Fig. 1.26(b) is also a non distributive lattice. In fact, we have the following characteriza-tion of such lattices. A lattice L is non distributive if and only if it contains a sublatticeisomorphic to Fig. 1.26(a) or (b).

Theorem 1.9.4 In a distributive lattice, a ∧ x = a ∧ y and a ∨ x = a ∨ y together implyx = y.

Proof: We have, x = x ∨ (x ∧ a), L4= x ∨ (y ∧ a) = (x ∨ y) ∧ (x ∨ a)= (y ∨ x) ∧ (y ∨ a) = y ∨ (x ∧ a)= y ∨ (y ∧ a) = y.

1.9.5 Trivially Complement

The case in which a ∧ x = a ∧ y = O and a ∨ x = a ∨ y = I is the particular interest. Ingeneral, by a complement of an element a in a lattice L with universal bounds O and I, wemean an element x ∈ L such that

a ∧ x = O and a ∨ x = I.

The elements O and I are trivially complementary in any lattice. It is obvious that in anychain elements other than O and I have no complements.

Theorem 1.9.5 In any distributive lattice L, a given element can have at most one com-plement.

Proof: Let the element a have two complements a′ and a′′, thena ∧ a′ = O = a ∧ a′′ and a ∨ a′ = I = a ∨ a′′.

Since the lattice is distributive, we have a′ = a′′.

Theorem 1.9.6 In any distributive lattice, the set of all complemented elements is a sub-lattice.

Proof: Let a, a′ and b, b′ be complementary pairs, then by definition,a ∧ a′ = O = b ∧b′ and a ∨ a′ = I = b ∨ b′. Now,

Mapping 47

(a ∧ b) ∧ (a′ ∨ b′) = (a ∧ b ∧ a) ∨ (a ∧ b ∧ b′)= (a ∧ a′ ∧ b) ∨ (a ∧O)= (O ∧ b) ∨O = O ∨O = O.

Also, (a ∧ b) ∨ (a′ ∨ b′) = (a ∨ a′ ∨ b′) ∧ (b ∨ a′ ∨ b′)= (I ∨ b′) ∧ (a′ ∨ I) = I ∧ I = I.

Hence a ∧ b and a′ ∨ b′ are complementary. Similarly, a ∨ b and a′ ∧ b′ are complementary.Thus if L is a distributive lattice and if S is a subset of L consisting of the complementarypairs of L, then we see that for any two elements a and b (with complements a′ and b′

respectively ) in S a ∧ b and a ∨ b also belong to S. Hence by definition, S is a sublattice.

1.10 Mapping

Let A and B be two non-empty sets. A function f from A to B, which is denoted byf : A→ B, is a relation from A to B with the property that for every a ∈ A, there exactlyone b ∈ B such that (a, b) ∈ f . Functions are called mappings or transformations. Theelement a ∈ A is called an argument of the function f and f(a) is called the value or imageor f-image of a under f .

rr rrrr r r

f range

A: domain B: Co-domain

Figure 1.27: Pictorial representation of domain, co-domain and range

The set A is called the domain of the function f and B is called the co-domain of f .The range of f consists of those elements in B which appears as the image of at least oneelement of A. It is also known as image set. Thus range of f is denoted by f(A), i.e.,f(A) = f(x) : x ∈ A. Obviously, f(A) ⊆ B. For example, (i) let A = 1, 2, 3, 4 andB = x, y, z and let

f = 1, x, 2, x, 3, y, 4, z.i.e., f(1) = x, f(2) = x, f(3) = y, f(4) = z.

Thus for each element of A has a unique value in B, so it is a function.(ii) Let A = 1, 2, 3 and Let B = x, y, z. Consider the relations

f1 = (1, x), (2, x) and f2 = (1, x), (1, y), (2, z), (3, y).

The relation f1 is a function, as f1(1) = x and f1(2) = x, i.e., each element has uniqueimage. But, the relation f2 is not a function as f2(1) = x and f2(1) = y, i.e., 1 ∈ A has twodistinct images x and y.

If x(∈ A) correspondence to y(∈ B), it is said that y is the image of x under the mappingf and it is expressed by writing either xf = y or f(x) = y. In this case x is said to be the orinverse image of y. Sometimes, it is possible to write down a precise formula showing howf(x) is determine by x. For example, f(x) =

√x, f(x) = 2x+ 5, f(x) = ex + 2x, etc. That

is, for the function f(x) = x3 we mean the function f : R → R which associates with anyx ∈ R, its cube x3. In the notation of binary relation f = (x, x3) : x ∈ R.

Thus, a sub set f of A×B is called a function or mapping from A to B if to each a ∈ A,there exists a unique b ∈ B such that the ordered pair (a, b) ∈ f .

48 Theory of Sets

1.10.1 Types of Functions

Constant function

A function f : A → B is said to be a constant function (or a constant mapping) if f mapseach element of A to one and the same element of B, i.e., f(A) is a singleton set. Forexample, f(x) = 5 for all x ∈ R is a constant function.

Identity function

A function f : A → A is said to be the identity function on A if f(x) = x for all x ∈ A. Itis denoted by IA.

Into function

A function f : A → B is said to be an into function if f(A) is a proper subset of B, i.e.,f(A) ⊂ B. In this case, we say that f maps A into B.

Onto function

Let A and B are two non empty subsets. A mapping f : A → B is defined as a surjectiveor surjection if

∀y ∈ B,∃x ∈ A such that f(x) = y. (1.17)

This is also called an onto mapping and is denoted by f(A) = B. In this case, we say thatf maps A onto B. For example,

(i) Let f : Z → Z be given by f(x) = 3x, x ∈ Z. Then f is into function becausef(z) = 0,±3,±6,±9, . . . is a proper subset of Z (co-domain).

(ii) Let f : Z → Z be given by f(x) = x + 2, x ∈ Z. The f is onto function becausef(z) = z (co-domain).

Pre-image

If f : A → B be a function and x ∈ A then f(x) is a unique element in B. The element xis said to be a pre-image (or inverse image) of f(x).

One-to-one function

A function f : A → B is said to be a one-to-one function, if different elements in A havedifferent images in B, i.e., if x1 6= x2 then f(x1) 6= f(x2) for all x1, x2 ∈ A. The one-to-onefunction is also known as one-one or injective or injection. For example

(i) A mapping f : Z+ → Q is defined by n→ n2n+1 is an injective mapping.

(ii) Let A = 1, 2, 3, 4, . . . and B = 1, 12 ,

13 ,

14 , . . . thus the mapping f : x → 1

x . Hereone element x ∈ A maps exactly one element y ∈ B. So this is an one-to-one mapping.

Ex 1.10.1 Two sets are given by X = 1, 2, 3, 4, Y = α, β, γ and f = (1, α), (2, β),(3, β), (4, β), g = (1, α), (2, β), (3, γ). Test whether f, g are functions and if they arefunctions, test whether they are (i) injective, (ii) surjective.

Mapping 49

r rr rr rr rr rr rr r

f g

X XY Y

(a) (b)

1 12 23 34 4

α α

β βγ γ

Figure 1.28: Functions f and g

Solution: The pictorial representation of f is shown in Fig. 1.28(a). From the Fig. 1.28(a),it is seen that f is a function since every element of X has a unique image. But, all elementsof X are not mapped with the elements of Y so f is not surjective. Also, f is not injectivesince the image of the elements 2, 3, 4 have the same image β. The pictorial representationof g is shown in Fig. 1.28(b). g is not a function because all elements of the domain X arenot mapped to the element of co-domain Y .Difference between relation and function : Let A and B be two sets. By definitionof relation every subset of A × B will be a function if for each x ∈ A there is one andonly one ordered pair. Thus, every function is relation, but, converse is not true. Forexample, let A = a, b, c, B = 0, 1, 2 and f = (a, 0), (a, 1), (b, 1), (c, 2) is a relation butnot a function, since two different ordered pairs, viz., (a, 0) and (a, 1) have the same firstcoordinates. If we consider f as (a, 0), (b, 1), (c, 2) then it becomes a function as well as arelation.

Ex 1.10.2 If Z∗ be the set of nonnegative integers and f : Z → Z∗ is defined by f(x) =12 [x+ |x|], test whether it is injective or not. [CH04]

Solution: The mapping f : Z → Z∗ defined by f(x) = 12 [x+ |x|], is given by,

f(x) =12(x+ x) = x; for x ≥ 0

=12(x− x) = 0; for x < 0.

We know, −1,−2 < 0 but f(−1) = f(−2) = 0. Thus f(x1) = f(x2) although x1 6= x2, so isnot injective.

Ex 1.10.3 If Z∗ be the set of nonnegative integers and f : Z∗ → Z is defined by

f(x) =x

2; when x is an even integer

=−x+ 1

2; when x is an odd integer

Is f injective or surjective ? [ CH’03]

Solution: Let x, y ∈ Z∗ and x, y be even integers, then f(x) = x2 , f(y) = y

2 so thatf(x) = f(y) ⇒ x = y. Next, let x, y ∈ Z∗ and x, y be odd integers, then

f(x) =−x+ 1

2and f(y) =

−y + 12

,

then f(x) = x2 , f(y) = y

2 so that f(x) = f(y) ⇒ x = y. So f is injective. Now, 1 ∈ Z, but 1has no preimage under f . Therefore, f is not surjective.

50 Theory of Sets

Bijective mapping

A mapping f : A → B is defined as a bijective mapping or bijection if it is both injectiveand surjective. For example, let f : Z → Z be given by f(x) = x+ 1,∀ x ∈ Z. This is aninjective and surjective mapping.

Ex 1.10.4 Decide whether the following mapping are surjective or injective?(i)f : C → < defined by f(a+ ib) = a2 + b2.(ii)f : Z → Z+ defined by x→ x2 + 1.(iii)f : Z+ → Q defined by x→ x

2x+1 .

Solution: (i) By definition, f(2 + 3i) = 4 + 9 and f(3 + 2i) = 9 + 4. So,

f(2 + 3i) = f(3 + 2i) 6⇒ 2 + 3i = 3 + 2i.

Hence f(a+ ib) is not an injective or one-one mapping. It is not a surjection. Since −3 ∈ <but it has no pre-image in C, Since a2 + b2 ≥ 0,∀ real value of a and b.

(ii) If x = 2 and x = −2 then f(2) = 22 + 1 = 5 and f(−2) = (−2)2 + 1 = 5. The imageis 5 and ±2 ∈ Z. Therefore,

f(x1) = f(x2) 6⇒ x1 = x2.

Hence it has not an injective mapping. It has not a surjection, since 3 ∈ Z+ but 3 = f(n)given, 3 = n2 + 1 i.e. n =

√2 6∈ Z. Hence 3 has no pre-image in Z.

(iii) Let n1 and n2 be such that f(n1) = f(n2). Therefore,

f(n1) = f(n2) ⇒n1

2n1 + 1=

n2

2n2 + 1⇒ 2n1n2 + n1 = 2n1n2 + n2

⇒ n1 = n2

Hence it is an injection. For positive n, we have n2n+1 > 0. Hence negative numbers in Q

have no pre-image. Hence it is not onto.

Ex 1.10.5 Is the mapping f : < → (−1, 1), defined by, f(x) = x1+|x| a bijective mapping?

Justify your answer. R = set of real numbers and (−1, 1) = x : R : −1 < x < 1. [ KH’06]

Solution: Here, the mapping f : < → (−1, 1) is defined by, f(x) = x1+|x| . Since |x| 6= −1,

so the mapping is well defined. Therefore,

f(x) =x

1− x; when, x < 0, i.e.,−1 < f(x) < 0,

=x

1 + x; when, x > 0, i.e., 0 < f(x) < 1,

= 0; when, x = 0.

In the first case, if f(x1) = f(x2), then,

x1

1− x1=

x2

1− x2⇒ x1 = x2.

For the second case, if f(x1) = f(x2), then,

x1

1 + x1=

x2

1 + x2⇒ x1 = x2.

Mapping 51

This shows that f is one-one. Let y ∈ (−1, 1) and y = f(x). When, x < 0, we have,

y =x

1− x⇒ x =

y

1 + y⇒ f

(y

1 + y

)= y, as x ∈ <.

When, x > 0, we have,

y =x

1 + x⇒ x =

y

1− y⇒ f

(y

1− y

)= y, as x ∈ <.

Thus for x ∈ R their pre-image exist and since f is one-one therefore f(x) is onto. Hencef(x) is bijective mapping.

Ex 1.10.6 Let S be the set of all 2×2 real matrices(a bc d

); ad−bc 6= 0 and <∗ denote the set

of all non zero real numbers. Show that the mapping f : S → <∗ defined by f((

a bc d

))=

ad− bc, is surjective but not injective.

Solution: Let us consider two real matrices A =(

2 00 1

)and B =

(−1 00 −2

). Since

2.1 − 0.0 = (−1).(−2) − 0.0 6= 0 so, A,B ∈ S. Now, we see that, A 6= B, although, usingdefinition,

f

((2 00 1

))= 2 = f

((−1 00 −2

)).

Thus, when, A 6= B but still f(A) = f(B). Therefore, the given mapping f is not injective.Also, ∣∣∣∣a bc d

∣∣∣∣ = ad− bc 6= 0,

consequently, the inverse of the mapping exists and belongs to S. Thus for every real number

a, b, c, d with ad− bc 6= 0, ∃ a real matrix(a bc d

)for which the mapping exists. Thus, f is

surjective.

Ex 1.10.7 Discuss the mapping f : R → (−1, 1) defined by f(x) = x1+x2 , x ∈ R, where R

is the set of real numbers and (−1, 1) = x ∈ R : −1 < x < 1.

Solution: Since x ∈ < so, the given mapping is well defined. Take two elements x1, x2 ∈ <.If f(x1) = f(x2), then

x1

x21 + 1

=x2

x22 + 1

⇒ (x1 − x2)− x1x2(x1 − x2) = 0

⇒ either, x1 = x2 or, x1x2 = 1.

Taking x1 = 5 and x2 = 15 , we see that f(x1) = f(x2) = 5

26 . Thus when, x1 6= x2 but stillf(x1) = f(x2). Therefore, f is not injective. Let y an arbitrary element of (−1, 1), then

y =x

x2 + 1⇒ x =

1±√

1− 4y2

2y; −1 < y < 1.

When, y(∈ <) > 12 , we have, 1−4y2 < 0 and so

√1− 4y2 6∈ < consequently, x = 1±

√1−4y2

2y 6∈

<. Similarly, when, y(∈ <) > − 12 , then also

√1− 4y2 6∈ < and so x = 1±

√1−4y2

2y 6∈ <.Therefore,

x =1±

√1− 4y2

2y6∈ <; ∀y ∈ (−1, 1).

This means that − 12 ≤ f(x) ≤ 1

2 , i.e., f does not map entire co-domain (−1, 1). So, it isnot onto.

52 Theory of Sets

Restriction Mapping

Let f : A→ B and A0 ⊂ A. The mapping g : A0 → B, such that g(x) = f(x); ∀ x ∈ A0,is said to be the restriction mapping of f to A0. It is denoted by f |A0 or fA, read as ’frestricted to A0’. f is said to be an extension of g to A. As for examples: f : R → R begiven by f(x) = |x| −x, x ∈ R and g : R+ → R be given by g(x) = 0x ∈ R+ then g = f |R+.

Inverse mapping

Let f : A → B be a mapping and b ∈ B be arbitrary. Let the mapping f : A → B beone-one onto mapping, then corresponding to each element y ∈ B, ∃ an unique elementx ∈ A such that f(x) = y. Thus a mapping, denoted by f−1, is defined as

f−1 : B → A : f−1(y) = x⇔ f(x) = y. (1.18)

The mapping f−1 defined above is called the inverse of f .The pictorial representation of f−1

is shown in Fig. 1.29. If f−1 is the inverse of f and if f(x) = y then x = f−1(y). For example,let A = 1, 2, 3, B = a, b, c and f = (1, a), (2, b), (3, c). Then inverse relation of f isf−1 = (a, 1), (b, 2), (c, 3), which is a function from B to A. Again g = (1, a), (2, a), (3, b)is a function from A to B and its inverse relation g−1 = (a, 1), (a, 2), (b, 3), which is not a

A B

x y = f(x)

f

f−1

Figure 1.29: Inverse function

function since f−1 = 1, 2.

Theorem 1.10.1 The necessary and sufficient condition that a mapping is invertible is thatit is one-one and onto.

Proof: Suppose f is invertible. Then f−1 : B → A is a function. Let b1, b2 ∈ B. Thereexists a1, a2 ∈ A (a1 6= a2) such that f−1(b1) = a1 and f−1(b2) = a2. That is, b1 = f(a1)and b2 = f(a2). Since f−1 is a function, for different b1 and b2 their images must be different,i.e., f(a1) 6= f(a2).

Again, f−1 is a function, all elements of B must be mapped with some elements of Aunder f−1. Thus, f is onto. Thus the condition is necessary.

Conversely, let f is bijective, i.e., one-one and onto. Then for different a1, a2 ∈ A thereexists b1, b2 ∈ B (where b1 6= b2) such that a1 = f−1(b1) and a2 = f−1(b2). Thus fordifferent b1 and b2 there images a1, a2 under f−1 are different.

Again, f is onto, all elements of B are mapped with the elements of A, i.e., each elementof the domain B of f−1 is mapped with each element of A. Hence f−1 is a function.

Theorem 1.10.2 Let f : A→ B is bijective function then f−1 : B → A is also bijective.

Proof: Let b1, b2 be any two elements of B. Since f is one-one, therefore there existunique elements a1, a2 ∈ A such that b1 = f(a1) and b2 = f(a2). Let a1 = f−1(b1) anda2 = f−1(b2). Suppose and

f−1(b1) = f−1(b2) ⇒ a1 = a2

⇒ f(a1) = f(a2) as f is one-one ⇒ b1 = b2.

Therefore f−1(b1) = f−1(b2) iff b1 = b2. Hence f−1 is one-one.

Mapping 53

To prove f−1 onto : Let a be any element of A. Since f is a function from A to B,there exists a unique element b ∈ B such that b = f(a) or, a = f−1(b). That is, image underf−1 of the element b ∈ B is a ∈ A. Hence f−1 is onto.

Ex 1.10.8 Let A = <−− 12, B = <− 1

2 and f : A→ B is defined by f(x) = x−32x+1 ,∀x ∈

A. Does f−1 exists?

Solution: Since A = < − − 12, so, the given mapping f(x) = x−3

2x+1 is well defined. Let,x1, x2 ∈ A. If f(x1) = f(x2), then

x1 − 32x1 + 1

=x2 − 32x2 + 1

⇒ 2x1x2 + x1 − x2 − 3

= 2x1x2 − 6x1 + x2 − 3 ⇒ x1 = x2.

Therefore, f is injective. Let y an arbitrary element of B, then

y =x− 32x+ 1

⇒ x =y + 31− 2y

,

which is defined as y 6= 12 ∈ B. Also,

f

(y + 31− 2y

)=

y+31−2y − 3

2 y+31−2y + 1

=y + 3− 3(1− 2y)2(y + 3) + 1− 2y

= y,

so for each y ∈ B, ∃ an element x = y+31−2y ∈ A such that f(x) = y. Hence f is surjective and

consequently it is bijective. Therefore, f−1 exists.

Ex 1.10.9 If the function f : < → < be defined by f(x) = x2 + 1, then findf−1(−8) and f−1(17).

Solution: We have, from the definition of the inverse mapping,

f−1(−8) = x ∈ <; f(x) = −8= x ∈ <; x2 + 1 = −8= x ∈ <; x = ±3

√−1 = φ,

as ±3√−1 are not real numbers. Again,

f−1(17) = x ∈ <; f(x) = 17= x ∈ <; x2 + 1 = 17= x ∈ <; x = ±4 = 4,−4.

Theorem 1.10.3 The inverse of a bijective mapping is also bijective.

Proof: Let f : A → B be a bijective mapping, then, we are to show that f−1 : B → A isalso bijective. Let y1, y2 ∈ B. As f is bijective, i.e., one-one and onto so, f−1(y1) = x1 andf−1(y2) = x2. Therefore,

f−1(y1) = f−1(y2) ⇒ x1 = x2

⇒ f(x1) = f(x2); as f is one-one⇒ y1 = y2.

Thus f−1 is one-one. Given any element x ∈ A, we can find an element y ∈ B, wheref−1(y) = x. Thus x is the f−1image of y ∈ B. This shows that f−1 is an onto mapping.Hence the theorem.

54 Theory of Sets

Theorem 1.10.4 For a bijective mapping, the inverse mapping is unique.

Proof: Let, if possible, the mapping f : A → B has two inverses, say, g : B → A andh : B → A. Let b be any element of B and g(b) = a1, h(b) = a2, a1, a2 ∈ A. Since f isone-one function, therefore

f(a1) = f(a2) ⇒ a1 = a2 ⇒ g(b) = h(b),

i.e., h = g. Thus the inverse mapping of f : A→ B is unique.

Theorem 1.10.5 If f is a one-one correspondence between A and B then,f−1of = IA and f−1of = IB.

Proof: By definition, IA(a) = b = f(f−1(b)) = (f−1of)(a). Thus, IA = f−1of . Again,IB(b) = b = f(f−1(b)) = (fof−1)(b). Hence IB = fof−1.

Ex 1.10.10 f : Q → Q is define by f(x) = 5x + a where a, x ∈ Q, the set of rationalnumbers, then show that f is one-one and onto. Find f−1.

Solution: Let x1, x2 ∈ Q. If f(x1) = f(x2) then 5x1 + a = 5x2 + a⇒ x1 = x2. Thus f isone-one. Let y = f(x) = 5x + a, then x = 1

5 (y − a). Since x ∈ Q, 15 (y − a) ∈ Q. Again,

f(y−a5 ) = y−a

5 + a = y. Thus, y ∈ Q is the image of the element y−a5 ∈ Q.

Hence f is onto. Since f is bijective, f is invertible and hence x = f−1(y) = 15 (y − a).

Ex 1.10.11 Show that the functions f : R → (1,∞) and g : (1,∞) → R defined byf(x) = 32x + 1, g(x) = 1

2 log3(x− 1) are inverses to each other.

Solution: Let x ∈ R, then (gof)(x) is given by,

(gof)(x) = gf(x) = g(32x + 1)= 1

2 log3[(32x + 1)− 1]

= log332x = 12 .2x = x

Therefore gof = IR. For x ∈ (1,∞), we have

(fog)(x) = fg(x) = f( 12 log3(x− 1))

= 3( 12 log3(x−1)) + 1

= (x− 1) + 1 = x.

Therefore fog = IR and so the functions f : R → (1,∞) and g : (1,∞) → R are inverses toeach other.

Ex 1.10.12 Show that the functions f : [−π2 ,

π2 ] → [−1, 1] such that f(x) = sinx is one-one

and onto. Also find f−1.

Solution: Let x1, x2 ∈ [−π2 ,

π2 ] be any two numbers. Let

f(x1) = f(x2) ⇒ sinx1 = sinx2 ⇒ x1 = x2.

Hence f is one-one. Let y = f(x) = sinx, or, x = sin−1 y ∈ [−π2 ,

π2 ] where y ∈ [−1, 1]. Also,

f(x) = f(sin−1 y) = sin(sin−1 y) = y. Thus f is onto.Hence f is one-one and onto, so f is invertible. Now,

y = sinx⇒ x = sin−1 y

or, f−1(y) = sin−1 y; ∀y ∈ [−1, 1].

Mapping 55

Ex 1.10.13 Show that the mapping f : Z → Z, defined by f(x) = x + 2 is a bijectivemapping.

Solution: Let x1, x2 be two different elements of Z. If possible let f(x1) = f(x2). Thenx1 + 2 = x2 + 2, i.e., x1 = x2, which contradicts x1 6= x2.

Thus f(x1) = f(x2) iff x1 = x2. Hence f is one-one.Let y = f(x) = x+ 2. Then x = y − 2 ∈ Z andf(x) = f(y − 2) = (y − 2) + 2 = y ∈ Z (co-domain).Thus, f is onto.Hence f is a bijective mapping.

Ex 1.10.14 Let f : R → R be a function defined by

f(x) =−1, if x is rational ,1, if x is irrational ,

then show that f is neither injective nor surjective (This function is known as Dirichlet’sfunction).

Solution: The function is not injective since all rational numbers are mapped to 1 and allirrational numbers are mapped to −1.

It is not surjective as only two elements ofR, viz., −1 and 1 are the images of the elementsof R.

But, if we redefine the function f as f : R → −1, 1 then it become an surjectivefunction.

Ex 1.10.15 Show that the mapping f : R → R given by f(x) = |x|, x ∈ R is neitherone-one nor onto.

Solution: The function f is not one-one, since f(−1) = |−1| = 1 and f(1) = |1| = 1. Thatis, for different elements of R have same image.

Also, it is not onto, as f(R) = R+ ∪ 0 6= R, i.e., only the positive numbers of R(co-domain) are mapped with the elements of R (domain).

Theorem 1.10.6 If |A| = n = |B| then number of bijective mapping from A to B is n!.

Proof: Let Xn be the set of all bijective functions on A, when |A| = n. Let A =a1, a2, . . . , an and B = b1, b2, . . . , bn. When n = 1 there is only one bijection (a1, b1).

When n = 2, then there are two bijections, viz., (a1, b1), (a2, b2) and (a1, b2), (a2, b1),i.e., X2 = (a1, b1), (a2, b2), (a1, b2), (a2, b1). There the number of bijections from A toB, when n = 2 is 2 = 2!.

When n = 3, we construct all bijections starting from a bijective function of X2 as follows.Let (a1, b1), (a2, b2) ∈ X2. One can add new elements a3 ∈ A and b3 ∈ B to this

bijective in three different ways shown below.(a1, b1), (a2, b2), (a3, b3), (a1, b3), (a2, b2), (a3, b1), (a1, b1), (a2, b3), (a3, b2).Similarly, from second element of X2 one can generate other three bijective functions.Hence X3 contains 6 = 3! bijective functions when |A| = |B| = 3.We assume that Xn has n! bijections when |A| = |B| = n. Let A = a1, a2, . . . , an+1

and B = b1, b2, . . . , bn+1 and Xn+1.be the set of all bijections from A onto B.Let (a1, b1), (a2, b2), ldots, (an, bn) be a bijective function of Xn. By introducing an+1 ∈

A and bn+1 ∈ B, one can generates the following bijections starting from the above bijection

(a1, b1), (a2, b2), . . . , (an, bn), (an+1, bn+1)

56 Theory of Sets

rrrrrr rr

f

A B

(a) One-one into function

rrrrr rr

f

A B

(b) Many-one into function

rrrrrr

f

A B

(c) One-one onto function

rrrrrrr

f

A B

(d) Many-one onto function

Figure 1.30: Pictorial representation of different type of function

(a1, bn+1), (a2, b2), . . . , (an, bn), (an+1, b1)(a1, b1), (a2, bn+1), . . . , (an, bn), (an+1, b2)

......

...(a1, b1), (a2, b2), . . . , (an, bn+1), (an+1, bn).

Thus starting from a single bijective function of Xn we generate (n+ 1) bijective functionsby increasing only one element each of A and B. Hence the number of bijective functionsXn+1 is n!× (n+ 1) = (n+ 1)!.

Hence by mathematical induction the number of different bijective functions from A toB is n! when |A| = |B| = n.

If f : A→ B is a bijective function from A onto B then |A| = |B|, i.e., both the sets havesame number of elements. If |A| 6= |B| then f can not be bijective.

Result 1.10.1 Let A and B be two non-empty sets with cardinality m and n respectively.The number of possible relations from A to B is 2mn − 1.

Many-one function

A function f : A → B is said to be many-one function if two or more elements of Acorrespond to the same element of B. The constant function is an example of many-onefunction. Foe example, let A = 1, 2, 3, 4, B = 0, 1, 2. Then

(i) f = (1, 0), (2, 0), (3, 1), (4, 2) is a many-one into function.

(ii) f = (1, 0), (2, 1), (3, 1), (4, 2) is a many-one onto function.

Let A = 1, 2, 3, B = 0, 1, 2, 3 and f = (1, 0), (2, 0), (3, 3) is one-one and into function.

Equality of two function

Two functions f : A → B and g : C → D are said to be equal if A = C, B = D andf(x) = g(x) for all x ∈ A. It is written as f = g.

Ex 1.10.16 Suppose A = 1, 2, 3 and B = 8, 9. Examine whether the following subsetsof A×B are functions from A to B.

Mapping 57

(i) f1 = (1, 8), (1, 9), (2, 8), (3, 9)

(ii) f2 = (1, 9), (2, 9), (3, 9)

(iii) f3 = (1, 8), (2, 9), (3, 9).

How many mapping are there from A into B ? Identify the one-one and onto mapping.

Solution: Here A and B are domain and co-domain respectively. Then,

(i) f1 is not a function since 1 ∈ A has two different images 8 and 9.

(ii) f2 is a function, particularly it is a constant function.

(iii) f3 is also a function.

The possible mapping from A to B areg1 = (1, 8), (2, 8), (3, 8), g2 = (1, 9), (2, 9), (3, 9),g3 = (1, 8), (2, 8), (3, 9), g4 = (1, 8), (2, 9), (3, 8),g5 = (1, 9), (2, 8), (3, 9), g6 = (1, 9), (2, 9), (3, 8),g7 = (1, 9), (2, 8), (3, 9), g8 = (1, 9), (2, 8), (3, 8).

Therefore, there are 8 mappings from A to B. The mappings are many-one and none ofthem are one-one. The onto mappings are g3, g4, g5, g6, g7, g8.

1.10.2 Composite mapping

Let A,B,C be any non empty sets and let f : A → B and g : B → C be two functions. Ifa function h is defined in such a way that h : A → C by h(x) = gf(x), x ∈ A, then h is

r r rf g

h = gof

A B C

x f(x) gf(x)

Figure 1.31: Composite function gof

called the product or composite function of f and g. It is denoted by gof or gf . Thus, theproduct or composite mapping of the mappings f and g, denoted by gof : A→ C is definedby

(g0f)(x) = g[f(x)], for all x ∈ A. (1.19)

Under the mapping f , an element x ∈ A is mapped to an element y = f(x) ∈ B. Again, yis mapped by g to an element z ∈ C such that z = g(y) ∈ C and hence z = g(y) = g[f(x)].Obviously, the domain of (gof) is A and co-domain is C. For example, let f : R → R andg : R → R be two functions where f(x) = 3x+ 2 and g(x) = x2 + 1. Now,

(fog)(x) = f(g(x)) = f(x2 + 1) = 3(x2 + 1) + 2 = 3x2 + 5 and(gof)(x) = g(f(x)) = g(3x+ 2) = (3x+ 2)2 + 1 = 9x2 + 12x+ 5.

For this example, it is seen that (fog)(x) 6= (gof)(x). Thus in general fog 6= gof , i.e.,product of the function is non-commutative.

58 Theory of Sets

Ex 1.10.17 Let f, g : < → < be two mappings, defined by f(x) = |x| + x, x ∈ < andg(x) = |x| − x, x ∈ <. Find fg and gf .

Solution: Here the two mappings f, g : < → < is defined by f(x) = |x| + x, ; g(x) =|x| − x, x ∈ <, i.e.,

f(x) = 2x; if x ≥ 0 g(x) = −2x; if x < 0= 0; if x < 0 = 0; if x ≥ 0. .

Now, let x ≥ 0, then,

(fg)(x) = f(g(x)) = f(0) = 0 and (gf)(x) = g(f(x)) = g(2x) = 0.

Now, let x < 0, then,

(fg)(x) = f(g(x)) = f(−2x) = −4x and (gf)(x) = g(f(x)) = g(0) = 0.

Therefore, (gf)(x) = 0, for all x ∈ < and

(fg)(x) = 0; if x ≥ 0= −4x; if x < 0.

Theorem 1.10.7 The product of any function with the identity function is the functionitself.

Proof: Let f : X → Y and let us denote by Ix and Iy the identity functions on X andY respectively. Then we must show that IY of = f and foIX = f . Since f : X → Y andIY : Y → Y, so IY of : X → Y. Now, let x be an arbitrary element of X and let f(x) = y.Then,

(IY of)(x) = IY [f(x)] = IY (y) = y = f(x)⇒ IY of = f.

Again, since IX : X → X and f : X → Y , so foIX : X → Y. Now, for arbitrary x ∈ X, wehave,

(foIX)(x) = f [IX(x)] = f(x)⇒ foIX = f.

Therefore, the product of any function with the identity function is the function itself.

Theorem 1.10.8 The product of any invertible mapping f with its inverse mapping f−1,is an identity mapping.

Proof: Let f be an one-one mapping of X onto Y and let IX and IY be the identitymappings on X and Y respectively. Clearly, f−1 is a one-one mapping of Y onto X. Now,

f : X → Y and f−1 : Y → X ⇒ f−1of : X → X.

Now, let x be an arbitrary element of X and let f(x) = y so that, x = f−1(y). Therefore,

(f−1of)(x) = f−1[f(x)] = f−1(y) = x = IX(x)⇒ f−1of = IX .

Again, f−1 : Y → X and f : X → Y , so fof−1 : Y → Y. Now, for an arbitrary y ∈ Y , thereis associated an unique x ∈ X such that f(x) = y or x = f−1(y). Therefore,

(fof−1)(y) = f [f−1(y)] = f(x) = y = IY (y)⇒ fof−1 = IY .

Hence, f−1of = IX and fof−1 = IY . Therefore, the product of any invertible mapping fwith its inverse mapping f−1, is an identity mapping.

Mapping 59

Theorem 1.10.9 Composites of functions is associative.

Proof: Let X,Y, Z, T be the four non-empty sets. Let f : X → Y, g : Y → Z andh : Z → T be the three mappings. Then, we are to show that, (hog)of = ho(gof). Now,hog : Y → T, gof : X → Z and so (hog)of : X → T, ho(g)f) : X → T. Let x be an arbitraryelement of X. Then,

[(hog)of ](x) = (hog)[f(x)] = h[gf(x)]= h[(gof)(x)] = [ho(gof)](x); ∀x ∈ X.

Hence (hog)of = ho(gof) and so composites of functions is associative.

Theorem 1.10.10 Let f : X → Y and g : Y → X. If gof is an identity function on Xand fog is an identity function on Y , then g = f−1.

Proof: Being given that, gof = IX and fog = IY . We are to show that g = f−1. We firstprove that f is invertible, i.e., it is one-one and onto. Now,

f(x1) = f(x2) ⇒ g[f(x1)] = g[f(x2)]⇒ (gof)(x1) = (gof)(x2)⇒ IX(x1) = IX(x2) ⇒ x1 = x2.

So, f is one-one. In order to show that f is onto, let y be an arbitrary element of Y and letg(y) = x. Then,

g(y) = x⇒ f [g(y)] = f(x)⇒ (fog)(y) = f(x)⇒ IY (y) = f(x) ⇒ y = f(x).

Thus, for each y ∈ Y , there is an x such that f(x) = y. So, f is onto. Now,

fog = IY ⇒ f−1(fog) = f−1oIY

⇒ (f−1of)og = f−1; associativity⇒ IY og = f−1 ⇒ g = f−1.

Theorem 1.10.11 Let f : A→ B and g : B → C be invertible. Then gof is invertible and(gof)−1 = f−1og−1.

Proof: Since f and g are invertible, they are bijective. Let a1, a2 ∈ A. Now,

(gof)(a1) = (gof)(a2) ⇒ gf(a1) = gf(a2)⇒ f(a1) = f(a2); as g is one-one⇒ a1 = a2; as f is one-one.

Hence (gof) is one-one. Let c ∈ C. Since g is onto, every c has one pre-image, say b ∈ B,such that g(b) = c. Again, since f is onto, every b ∈ B has a pre-image a ∈ A, such thatf(a) = b. Now,

(gof)(a) = gf(a) = g(b) = c.

Thus, for every element c ∈ C, there is an element a ∈ A, such that (gof)(a) = c, and hencegof is onto. Thus (gof) is bijective and so it is invertible. Now,

(gof)(a) = c⇒ (gof)−1(c) = a.

Also, f−1og−1(c) = f−1g−1(c) = f−1(b) = a

as g(b) = c and f(a) = b.

⇒ (gof)−1(c) = f−1og−1(c), i.e., (gof)−1 = f−1og−1.

60 Theory of Sets

Therefore, if f and g be one-one mappings of A onto B and B onto C respectively, so thatf and g are both invertible, then (goh) is also invertible and (goh)−1 = f−1og−1.

Theorem 1.10.12 Let f : A → B and g : B → C be both injective then gof : A → C isalso injective.

Proof: Let x1, x2 ∈ A.Now,

(gof)(x1) = (gof)(x2) ⇒ gf(x1) = gf(x2)⇒ f(x1) = f(x2)( since g is injective )⇒ x1 = x2( since f is injective ).

Hence gof is injective.

Theorem 1.10.13 Let f : A → B and g : B → C be two surjective functions then gof :A→ C is surjective.

Proof: Let z ∈ C. Since g is surjective, there exist y ∈ B such that g(y) = z. Again, sincef is surjective, there exist x ∈ A such that f(x) = y. Now,

(gof)(x) = gf(x) = g(y) = z,

and it is true for arbitrary z. Hence gof is surjective.

Theorem 1.10.14 If f : A→ B and g : B → C be two mappings and gof is bijective, thenf is one-one and g is onto.

Proof: Let any two elements x1, x2 ∈ A be such that f(x1) = f(x2). Now,

(gof)(x1) = g[f(x1)] = g[f(x2)] = (gof)(x2).

Now, since gof is one-one, therefore,

(gof)(x1) = (gof)(x2) ⇒ x1 = x2.

Therefore, f(x1) = f(x2) ⇒ x1 = x2.

Hence f is one-one. Again, since gof : A → C is onto, for all z ∈ C, ∃ an element a ∈ Asuch that (gof)(x) = z. Therefore, z = g[f(x)],∀z ∈ C. Now, for each x ∈ A, we havef(x) = y ∈ B. Thus, for each z ∈ C, there is an element y ∈ B such that z = g(y). Henceg : B → C is onto.

Theorem 1.10.15 The inverse of the inverse of a function is the function itself, i.e.,(f−1)−1 =f.

Proof: Let a mapping f : A→ B be invertible. Then ∃ a function g = f−1 : B → A suchthat

f(x) = y ⇒ x = g(y), x ∈ A and y ∈ B.

Again, f , being invertible, is one-one and onto and so g is one-one and onto, i.e., g isinvertible and g−1 exists. Now,

(fog)(y) = f [g(y)] = f(x),i.e., (fog)(y) = y, as f(x) = y.

This shows that (fog) = IB , identity mapping in B. Thus, f is the inverse of g, i.e., f = g−1.This gives (f−1)−1 = f.

Mapping 61

Definition 1.10.1 Images and inverse images of sets under a mapping : Let X andY be any two non-empty sets and f be a mapping of X into Y . Let A ⊆ X and B ⊆ Y.Then, we define

f(A) = y ∈ Y : y = f(x) for some x ∈ A (1.20)and f−1(B) = x ∈ X : f(x) ∈ B. (1.21)

Thus, y ∈ f(A) ⇒ y = f(x), for some x ∈ A and x ∈ f−1(B) ⇒ f(x) ∈ B.

Note : Here note that, f(x) ∈ f(A) not necessarily implies that x ∈ A, for example, if weconsider the mapping

f : < → < : f(x) = x2,∀x ∈ <

and if A = [0, 1] is a subset of <, then obviously, f(A) = [0, 1]. Also by definition, of f , wehave, f(−1) = 1 ∈ [0, 1] = f(A) but (−1) 6∈ A. However x ∈ A⇒ f(x) ∈ f(A).

Theorem 1.10.16 If X and Y are two non-empty sets and f be a mapping of X into Y ,then for any subsets A and B of X,

(i)f(A ∪B) = f(A) ∪ f(B), (ii)f(A ∩B) ⊆ f(A) ∩ f(B).

Proof: (i) Let y be an arbitrary element of f(A ∪B), then

y ∈ f(A ∪B) ⇔ y = f(x) for some x ∈ (A ∪B)⇔ y = f(x) for some x ∈ A or x ∈ B⇔ y = f(x), for f(x) ∈ f(A) or f(x) ∈ f(B)⇔ y ∈ f(A) or y ∈ f(B) ⇔ y ∈ f(A) ∪ f(B).

Consequently, f(A∪B) ⊆ f(A)∪ f(B) and f(A)∪ f(B) ⊆ f(A∪B) and hence, f(A∪B) =f(A) ∪ f(B).(ii) Let y be an arbitrary element of f(A ∩B), then,

y ∈ f(A ∩B) ⇒ y = f(x) for some x ∈ (A ∩B)⇒ y = f(x) for some x ∈ A and x ∈ B⇒ y = f(x), such that f(x) ∈ f(A) and f(x) ∈ f(B)⇒ y ∈ f(A) and y ∈ f(B) ⇒ y ∈ f(A) ∩ f(B).

Consequently, f(A∩B) ⊆ f(A)∩f(B). Note that, the relation can not in general be replacedby equality. For example, if a = [−1, 0] and B = [0, 1] are any two subsets of the set < ofall real numbers and

f : < → < : f(x) = x2,∀x ∈ <,

then clearly, f(A) = [0, 1] and f(B) = [0, 1] so that f(A) ∩ f(B) = [0, 1] and since A ∩B =0, so f(A ∩ B) = 0. Thus, f(A ∩ B) 6== f(A) ∩ f(B). Also it may be noted here thatf(A ∩B) ⊆ f(A) ∩ f(B). Thus in general, f(A ∩B) ⊆ f(A) ∩ f(B).

Theorem 1.10.17 If X and Y be two non-empty sets and f be a mapping of X into Y ,then for any subsets A and B of Y ,

(i)f−1(A ∪B) = f−1(A) ∪ f−1(B) and (ii)f−1(A ∩B) = f−1(A) ∩ f−1(B).

62 Theory of Sets

Proof: (i) Let x be an arbitrary element of f−1(A ∪B), then

x ∈ f−1(A ∪B) ⇔ f(x) ∈ (A ∪B)⇔ f(x) ∈ A or f(x) ∈ B⇔ x ∈ f−1(A) or x ∈ f−1(B)⇔ x ∈ f−1(A) ∪ f−1(B).

Consequently, f−1(A ∪ B) ⊆ f−1(A) ∪ f−1(B) and f−1(A) ∪ f−1(B) ⊆ f−1(A ∪ B) andhence, f−1(A ∪B) = f−1(A) ∪ f−1(B).(ii) Let x be an arbitrary element of f−1(A ∩B), then

x ∈ f−1(A ∩B) ⇔ f(x) ∈ (A ∩B)⇔ f(x) ∈ A and f(x) ∈ B⇔ x ∈ f−1(A) and x ∈ f−1(B)⇔ x ∈ f−1(A) ∩ f−1(B).

Consequently, f−1(A ∩ B) ⊆ f−1(A) ∩ f−1(B) and f−1(A) ∩ f−1(B) ⊆ f−1(A ∩ B) andhence, f−1(A ∩B) = f−1(A) ∩ f−1(B).

Theorem 1.10.18 If X and Y be two non-empty sets and f be a mapping of X into Y ,then, (i) for any subset A of X, A ⊆ f−1[f(A)] and in general, A 6= f−1[f(A)]. (ii) Forany subset B of Y , f [f−1(B)] ⊆ B and further if B is a subset of the range of f , thenf [f−1(B)] = B.

Proof: (i) Let A be any subset of X. If A = φ, then the result is obvious. So, let A 6= φand let x be an arbitrary element of A, then

x ∈ A⇒ f(x) ∈ f(A) ⇒ x ∈ f−1[f(A)].

Hence, A ⊆ f−1[f(A)]. Now, in order to show that, in general, A 6= f−1[f(A)], considerthe mapping, f : < → < : f(x) = x2,∀x ∈ <. Now, let A = [−1, 0] be any subset of<, then obviously, f(A) = [0, 1]. Therefore, f−1[f(A)] = [−1, 1] 6= A. Thus in general,A 6= f−1[f(A)].(ii) Let B be any subset of X and let y be an arbitrary element of f [f−1(B)], then

y ∈ f [f−1(B)] ⇒ y = f(x) for some x ∈ f−1(B)⇒ y = f(x) such that f(x) ∈ B⇒ y ∈ B ⇒ f [f−1(B)] ⊆ B.

Further, if B is a subset of the range of f , then for each y ∈ B,∃ an x ∈ f−1(B) such thaty = f(x) and so

y ∈ B ⇒ y = f(x) for some x ∈ f−1(B)⇒ y = f(x) for f(x) ∈ f [f−1B]⇒ y ∈ f [f−1B] ⇒ B ⊆ f [f−1(B)].

Hence it follows that, f [f−1B] = B. Note here that f [f−1B] = B holds only when B is asubset of the range of f and so in general, f [f−1B] 6= B. For example, consider the mapping,f : < → < : f(x) = x2,∀x ∈ <. Now, let B = [−1, 0] then f−1(B) = 0 and therefore,f [f−1(B)] = 0 6= [−1, 0] = B. Thus in general, f [f−1B] 6= B.

Permutation 63

1.11 Permutation

Permutation of a non-empty finite set is defined to be a bijective mapping of a finite setonto itself. Let a, b, c, · · · , k be any arrangement of the set of positive integers 1, 2, · · · , n.The one-one mapping p : S → S of the finite set S = 1, 2, · · · , n onto itself

1 2 3 · · · n↓ ↓ ↓ ↓a b c · · · k

where p(1) = a, p(2) = b, · · · , p(n) = k, denoted by the symbol

p =(

1 2 3 · · · np(1) p(2) p(3) · · · p(n)

)=(

1 2 3 · · · na b c · · · k

)(1.22)

is known as the permutation of degree n or n symbols. Obviously, the order of the columnin the symbol is immaterial so long as the corresponding elements above and below in thatcolumn remain unchanged. The order, in which the first row is written, does not matter,

what actually matters is which element is replaced by which. Thus,(

1 2 3a b c

),

(2 1 3b a c

)and(

2 3 1b c a

), are the same. In the standard form, the elements in the top row are in natural

order. If p be a permutation of n symbols, then the set of all permutations, denoted by Pn,will contain n! distinct elements, as n distinct elements can be arranged in n! ways and isknown as symmetric set of permutations.

Ex 1.11.1 Construct the symmetric set of permutations P3.

Solution: The symmetric set of permutations P3 contains 3! = 6 elements, where eachpermutation has 3 symbols. Therefore,

P3 =(

1 2 31 2 3

),

(1 2 31 3 2

),

(1 2 32 1 3

),

(1 2 32 3 1

),

(1 2 33 1 2

),

(1 2 33 2 1

).

1.11.1 Equal permutations

Two permutations p and q of degree n are said to be equal, if p(a) = q(a), for all a ∈ S. For

example, the permutations p =(

1 2 3 42 3 4 1

)and q =

(2 4 3 13 1 4 2

)are equal permutations.

1.11.2 Identity permutation

If S = a1, a2, · · · , an, then the bijective mapping I : S → S, defined by I(ai) = ai, iscalled an identity permutation of degree n. For example,

I =(a b ca b c

)or(

1 2 3 · · · n1 2 3 · · · n

)are identity permutations. Thus, if there be no change of the elements, i.e., if each elementbe replaced by itself, then it is called identity permutation and is denoted by I.

1.11.3 Product of permutations

Since permutation is just a bijective mapping, the product of two permutations is just theproduct of two mappings. Let S = a1, a2, · · · , an and let p : S → S and q : S → S be two

64 Theory of Sets

permutations of S. Since range of p = domain of q, the composite mapping p0q : S → S isdefined. Since the permutations p and q are bijective, p0q is also bijective. Hence p0q is apermutation of S. The product of two permutations p : S → S and q : S → S, denoted byp0q or simply pq is defined by,

pq =(

a1 a2 a3 · · · an

p[q(a1)] p[q(a2)] p[q(a3)] · · · p[q(an)]

).

Since, composition of mappings is non-commutative, so pq 6= qp, in general. Also, sincecomposition of mappings is associative, so p(qr) = (pq)r. Let p be a permutation on a finiteset of degree n, then we define,

pn = p.p · · · p(n factors), ∀n ∈ N

with p0 = I. Also for all integral values of m,n, we have the index laws (i) pmpn = pm+n

and (ii) (pm)n = pmn. As, pq 6= qp, in general, so (pq)n = pnqn does not hold. If ∃ a leastpositive value integer k, such that pk = I, then k is called the order of the permutation p.

Ex 1.11.2 If p = (1 2 3 4 5), q = (2 3)(4 5), find pq.

Solution: Using the definition of product of permutations we have,

pq =(

1 2 3 4 52 3 4 5 1

)(1 2 3 4 51 3 2 5 4

)The product is given by the following mapping procedure, from the right, follows from thedefinition of mapping f(g)(x) = f(g(x)).

1 2 3 4 5↓ ↓ ↓ ↓ ↓1 3 2 5 4↓ ↓ ↓ ↓ ↓2 4 3 1 5

Therefore the product of p and q is given by(

1 2 3 4 52 4 3 1 5

)= (1 2 4). Similarly,

qp =(

1 2 3 4 51 3 2 5 4

)(1 2 3 4 52 3 4 5 1

)=(

1 2 3 4 53 2 5 4 1

)= (1 3 5).

1.11.4 Inverse of permutations

Let S = a1, a2, · · · , an and let f : S → S be a permutation of S. As p : S → S is abijective mapping, it has unique inverse which is also bijective. Let p−1 be the inverse, thenp−1 : S → S is the permutation of S and is defined by

p−1 =(p(a1) p(a2) p(a3) · · · p(an)a1 a2 a3 · · · an

).

The important property is that pp−1 = p−1p = I.

Ex 1.11.3 If p = (1 2 4 3), q = (1 4 3 2), show that, (pq)−1 = q−1p−1.

Permutation 65

Solution: Here the product of p and q is given by,

pq =(

1 2 3 42 4 1 3

)(1 2 3 44 1 2 3

)=(

1 2 3 43 2 4 1

)= (1 3 4).

(pq)−1 =(

3 2 4 11 2 3 4

)=(

1 2 3 44 2 1 3

)= (1 4 3).

As, p = (1 2 4 3), q = (1 4 3 2), so the inverses are given by

p−1 =(

2 4 1 31 2 3 4

)=(

1 2 3 43 1 4 2

); q−1 =

(1 2 3 42 3 4 1

)⇒ q−1p−1 =

(1 2 3 42 3 4 1

)(1 2 3 43 1 4 2

)=(

1 2 3 44 2 1 3

)= (1 4 3).

Hence, (pq)−1 = q−1p−1 is verified.

1.11.5 Cyclic permutation

Let S = a1, a2, · · · , an. A permutation p : S → S is said to be a cycle of length r, oran r cycle, if there are r elements ai1 , ai2 , · · · , air

in S such that, p(ai1) = ai2 , p(ai2) =ai3 , · · · , p(air−1 = air

, p(air) = ai1 and p(aj) = aj ; j 6= i1, i2, · · · , ir. The cycle is denoted

by (ai1 , ai2 , · · · , air) and the elements appear in a fixed cyclic order ai1 , ai2 , · · · , air

are saidto be the elements of the cycle.

(i) Two cycles p and q on the same set S = a1, a2, · · · , an are said to be disjoint if theyhave no common elements.

(ii) A cycle of length 2 is called transposition. The cycle of length 1 may be ignored.

Theorem 1.11.1 Every permutation on a finite set is either a cycle or it can be expressedas a product of disjoint cycles.

Proof: Let S = a1, a2, · · · , an and p : S → S be a permutation on S. Let us con-sider the elements a1, p(a1), p2(a1), · · ·, all these can not de distinct as all of them be-long to a finite set S. Let k be the least positive integer such that pk(a1) = a1. Then,a1, p(a1), p2(a1), · · · , pk−1(a1) are k distinct elements of S, because, if pr(a1) = ps(a1), forsome r, s such that 0 < p < q < r, then pr−s(a1) = p0(a1) = a1 holds and this contradictsthat k is the least positive integer satisfying pk(a1) = a1.

Let us consider a k cycle p1 =(a1, p(a1), p2(a1), · · · , pk−1(a1)

). If k = n, then p = p1 and

the theorem is proved. If k < n, let al be the first element among a2, a3 · · · , an such that al

does not belong to the cycle p1. Let us consider the elements am, p(am), p2(am), · · ·. Neitherof these belong to p1, because, if pi(a1) = pj(am) for some integers i, j then pi−j(a1) = am,a contradiction. Arguing as before we arrive at a cycle p2 of length m, say. If k + m = nthen p is the product of disjoint cycles p1 and p2. If k +m < n, let at be the first elementamong a3, a4 · · · , an, which does not belong to p1 or p2 and proceed as before.Since S is finite set, this process terminates after a finite number of steps and we arrive atthe decomposition of p as the product p1p2 · · · pr of disjoint cycles.

(i) Let p be a permutation on a finite set S = a1, a2, · · · , an. The order of the permu-tation p : S → S is the least positive integer n such that pn = I, I being the identitypermutation.

(ii) The order of an k− cycle is k.

66 Theory of Sets

(iii) The order of a permutation on a finite set is the l.c.m. of the lengths of its disjointcycles.

(iv) Every permutation on a finite set S = a1, a2, · · · , an, n ≥ 2 can be expressed as aproduct of transpositions.

(v) A permutation p is said to be an even permutation, if it can be expressed as a productof even number of transpositions.

(vi) A permutation p is said to be an odd permutation, if it can be expressed as a productof odd number of transpositions.

(vii) The number of even permutations on a finite set S = a1, a2, · · · , an, n ≥ 2 is equalto the number of odd permutations on it.

Ex 1.11.4 Express p =(

1 2 3 4 5 6 7 83 5 4 1 2 6 8 7

)as the product of disjoint cycles.

Solution: Here p is not a cycle. Note that,

p(1) = 3, p2(1) = p(3) = 4, p3(1) = p(4) = 1.

Thus the first cycle is (1 3 4). Since 2 6∈ (1 3 4), we compute

p(2) = 5, p2(2) = p(5) = 2.

Thus the second cycle is (2 5). Also, as 6 6∈ (1 3 4) and (2 5), and p(6) = 6, so the thirdcycle is (6). Again,

p(7) = 8, p2(7) = p(8) = 7.

Thus the fourth cycle is (7 8). All elements have been exhausted. Therefore,

p = (1 3 4)(2 5)(6)(7 8) = (1 3 4)(2 5)(7 8) = (1 4)(1 3)(2 5)(7 8).

Also, we see that, p is the product of 4(even) number of transpositions, so p is an evenpermutation.

Ex 1.11.5 Describe all the permutations on the set 1, 2, 3 and their respective orders.

Solution: There are possible 3! = 6 permutations on the set S = 1, 2, 3, given by,(1 2 31 2 3

)= I,

(1 2 32 3 1

)= (1 2 3),

(1 2 33 1 2

)= (1 3 2),(

1 2 31 3 2

)= (2 3),

(1 2 33 2 1

)= (1 3),

(1 2 32 1 3

)= (1 2).

Thus the even permutations are I, (1 2 3), (1 3 2) and the odd permutations are(2 3), (1 3), (1 2). As I1 = I, so the order of I is 1. As, (1 2 3), (1 3 2) are the cyclesof length 3, the orders of (1 2 3), (1 3 2) are 3. As, (2 3), (1 3), (1 2) are the cycles oflength 2, the orders of (2 3), (1 3), (1 2) are 2.

Enumerable Set 67

1.12 Enumerable Set

Let S and N be the set of real numbers and natural numbers respectively. The set S isdefined as enumerable or de-enumerable or countable if there is a bijection f : S → N .So corresponding to every positive integer n, there exist one and only one element of anenumerable set. This element may be denoted by an or bn or un etc. Thus a countableset can be written as a1, a2, . . . , an, . . .. For example, the set S = 2n|n ∈ N is anenumerable set.

(i) A countable set is an infinite set.

(ii) Obviously an enumerable set is an infinite set. Obviously every infinite set is not enu-merable. If an infinite set be enumerable then it is sometimes said to be an enumerablyinfinite set. It is needless to say that a non enumerable infinite set can not be writtenas : a1, a2, . . . , an, . . .

(iii) A set S is defined to be almost enumerable if it is either finite or enumerably infinite.

(iv) Any sub set of an enumerable set is almost an enumerable.

(v) Any super-set of non-enumerable set is non enumerable.

Theorem 1.12.1 Union of a finite set and an enumerable set is an enumerable.

Proof: Let A be a finite set which can be written as A = a1, a2, . . . , ar in which theelements are increasing order of magnitude. Let B = b1, b2, . . . , bn, . . . be an enumerableset. If A ∩ B = φ, we can define a bijective mapping f : A ∪ B → N such that f(1) =a1, f(2) = a2, . . . , f(r) = ar and then f(r+ k) = bk for k = 1, 2, . . . . That is A ∪B may bewritten as

A ∪B = a1, a2, . . . , ar+1, ar+2, . . . , ar+k, . . .,

where ar+k = bk; ∀ k. Hence A ∪B is an enumerable set. If A ∩B 6= φ, let, B1 = B −A,then B1 ∪A = A ∪B and B1 ∩A = φ. Now, B1 is an infinite subset of B and therefore, B1

is enumerable. Hence B1 ∪A is enumerable and so, A ∪B is enumerable.

Theorem 1.12.2 Union of finite number of enumerable sets is enumerable.

Proof: Let A1, A2, . . . , Ar be (each of) a finite number of enumerable sets. Let

A1 = a11, a12, a13, . . . , a1n, . . .A2 = a21, a22, a23, . . . , a2n, . . .

......

...Ar = ar1, ar2, ar3, . . . , arn, . . .

We can write the elements ofr⊔

i=1

Ai asr⊔

i=1

Ai = a11, a21, a31, . . . , ar1, a12, a22, a32, . . . ,

ar2, . . . , a1n, a2n, a3n, . . . , arn, . . .. Hencer⊔

i=1

Ai is an enumerable set.

Theorem 1.12.3 Union of enumerable set of enumerable sets is enumerable.

68 Theory of Sets

Proof: Let A1, A2, . . . , An, . . . be an enumerable set where each Ai is an enumerable set.

We are to show that∞⊔

i=1

Ai is an enumerable. Let,

A1 = a11, a12, a13, . . . , a1n, . . .A2 = a21, a22, a23, . . . , a2n, . . .

......

...Ar = ar1, ar2, ar3, . . . , arn, . . .

......

...

The elements∞⊔

i=1

Ai are arranged as:

a11; a12, a21; a13, a22, a31; . . .

in which there are several blocks. kth block contains all aij such that i+ j = k + 1 in eachblock the elements are written in the increasing order of the first suffix. In this arrangementthe element aij occupies in (i+ j − 1)th block and occupies ith position in the block. Alsokth block contains exactly k elements. Hence in the above arrangement aij occurs [1 + 2 +

· · ·+ (i+ j − 2)+ i]th position. Hence∞⊔

i=1

Ai is an enumerable set. The set of all positive

rational numbers can be written as Q+ =∞⊔

i=1

Ai, where Ai = ni ; n ∈ N and it is the

union of enumerable set of enumerable sets and hence enumerable. Now, Q+ is similar toQ−, hence Q = Q+ ∪Q− ∪ 0 is enumerable.

Theorem 1.12.4 The set of real numbers is non enumerable.

Proof: We shall first show that the interval 0 < x ≤ 1 is non-enumerable. If possible , letus assume , the set is an enumerable set. If the real numbers lying in the above interval,then they can be written as a1, a2, . . . , an, . . .. Since a real number can be expressed as aninfinite decimal(if we agree not to use recurring in this can be done in only one way). Let,

a1 = 0.a11a12a13 . . .

a2 = 0.a21a22a23 . . .

......

...an = 0.an1an2an3 . . .

Now we construct a number b = 0.b1b2b3 . . ., where br is different from arr, 0 and 9 for allr. Obviously b is a real number lying in 0 < x ≤ 1, and so must itself appear somewherein the succession a1, a2, . . . , an, . . . if this section is to contain all real numbers between0 and 1. But b is different from every ai, since it differs from ai at least in the ith placeof decimal. This contradict the assumption that the given interval is an enumerable set.Hence 0 < x ≤ 1 is non-enumerable. The whole of real numbers is a super set of thisnon-enumerable set and hence is non-enumerable.

(i) The open interval (0, 1) is an non-enumerable. For if this is enumerable then (0, 1)∪1i.e. 0 < x ≤ 1 is also an enumerable which contradict the above result.

(ii) The close interval [0, 1] i.e. 0 ≤ x ≤ 1 being a super set of the non-enumerable set0 < x ≤ 1, is also non-enumerable.

Enumerable Set 69

(iii) Any interval a ≤ x ≤ b is non-enumerable. First we shall prove 0 < x ≤ 1 is a non-enumerable. Let us define a bijection f : x → x−a

b−a ; (b > a) i.e. f(x) = x−ab−a then if

x goes from a to b, f goes from 0 to 1. Hence the interval a < x ≤ b is similar to0 < x ≤ 1.As 0 < x ≤ 1 is non-enumerable so a < x ≤ b is also non-enumerable. Hence its superset [a, b] is also non-enumerable.

(iv) The open interval (a, b) is non-enumerable.

Exercise 1

Section-A[Multiple Choice Questions]

1. A− φ and φ−A respectively(a) A,A′ (b) φ (c) A,φ (d) A.

2. (A ∩B) ∪ (B ∩ C) is equals to(a) B (b) A ∪B ∪ C (c) A ∪ (B ∩ C) (d) (A′ ∩ C ′)′ ∩B.

3. ((((A ∪B) ∩A) ∪B) ∩A) is equals to(a) A (b) B (c) A ∪B (d) A ∩B.

4. (A−B) ∪ (B −A) ∪ (A ∩B) is equals to(a) A ∪B (b) Ac ∪Bc (c) A ∩B (d) Ac ∩Bc.

5. The number of elements in the power set P (S) of the set S = φ, 1, 2, 3 is(a) 2 (b) 4 (c) 8 (d) None of these.

6. Let A be a finite set of size n, the number of elements in the power set of A×A is(a) 22n

(b) 2n2(c) (2n)2 (d) None of these.

7. The number of binary relations on asset with n elements is(a) 2n (b) 2n2

(c) 2n (d) None of these.

8. Suppose A is a finite set with n elements. The number of elements in the largestequivalence relation of A is(a) 1 (b) n (c) n+ 1 (d) n2.

9. The number of equivalence relations of the set 1, 2, 3, 4 is(a) 4 (b) 15 (c) 16 (d) 24

10. The power set 2S of the set S = 3, 1, 4, 5 is

(a) S, 3, 1, 4, 1, 3, 5, 1, 4, 5, 3, 4, φ(b) S, 3, 1, 4, 5(c) S, 3, 3, 1, 4, 3, 5, φ(d) None of these.

11. If there is no onto function from 1, 2, · · · ,m onto 1, 2, · · · , n, then(a) m = n (b) m < n (c) m > n (d) m 6= n.

12. If |A ∪B| = 12, A ⊆ B and |A| = 3, then |B| is(a) 12 (b) 9 (c) ≤ 9 (d) None of these.

70 Theory of Sets

13. Let P (S) denote the power set of the set S. Which of the following is always TRUE?(a) P (P (S)) = P (S) (b) P (S)∩S = P (S). (c) P (S)∩P (P (S)) = φ (d) S 6∈ P (S).

14. The number of relations from A to B with |A| = m and |B|n is(a) mn (b) 2n (c) 2m (d) 2mn.

15. If mρn if m2 = n, then(a) (−3,−9) ∈ ρ (b) (3,−9) ∈ ρ (c) (−3, 9) ∈ ρ (d) (9, 3) ∈ ρ.

16. The relation ρ on Z defined by mρn if m+ n is even is(a) Reflexive (b) Not reflexive (c) Not symmetric (d) Not antisymmetric

17. φ and A×A are(a) Both reflexive (b) Both symmetric (c) Both antisymmetric (d) Both equivalencerelation.

18. The relation aρb if |a− b| = 2 where a and b are real numbers, is(a) Neither reflexive nor symmetric (b) Neither symmetric nor transitive (c) Anequivalence relation (d) Symmetric but not transitive.

19. The relation defined in Z by aρb if |a− b| < 2 is(a) Not reflexive (b) Not symmetric (c) Not transitive (d) An equivalence relation.

20. The relation defined in N by aρb if m2 = n is(a) Reflexive (b) Symmetric (c) Transitive (d) Antisymmetric.

21. The relation defined in N by aρb if m|n or n|m is(a) Not reflexive (b) Not symmetric (c) Not transitive (d) None of these.

22. A relation defined in N by aρb if m and n are relatively prime is(a) A partial ordering (b) Transitive (c) Not transitive (d) An equivalence relation.

23. The ‘subset’ relation on a set of sets is(a) A partial ordering (b) Transitive and symmetric only (c) Transitive and antisym-metric only. (d) An equivalence relation.

24. The binary relation S = φ on set A = 1, 2, 3 is(a) Neither reflexive nor symmetric (b) Symmetric and reflexive (c) Transitive andreflexive (d) Transitive and symmetric.

25. The less than relation, <, on real is

(a) A partial ordering since it is asymmetric and reflexive

(b) A partial ordering since it is anti-symmetric and reflexive

(c) Not a partial ordering because it is not asymmetric and not reflexive

(d) Not a partial ordering because it is not anti-symmetric and not reflexive.

26. A partial order ≤ is defined on the set S = x, a1, a2, · · · , an, y as x ≤ ai, for all i andai ≤ y for all i, where n ≥ 1. The number of total orders on the set S which containthe partial order ≤ is

(a) 1 (b) n (c) n+ 2 (d) n!.

27. The number of possible partial ordering on a, b, c in which a ≤ b is(a) 3 (b) 4 (c) 5 (d) 6

Enumerable Set 71

QQQ

bbb

a

b

e

c

d

f

g

Figure 1.32: Poset

28. In a lattice defined by the Hasse diagram given below, how many (Fig. 1.32) compo-nents does the element e have?

(a) 2 (b) 3 (c) 0 (d) 1.

29. The maximal and minimal elements of poset given by the Hasse diagram (Fig. 1.33)are

%%

%%%@

@

@@

@@

1

2

34

5

6

Figure 1.33: Poset

(a) Max=5,6; Min.=2 (b) Max.=5,6; Min.=1 (c) Max.=3,5: Min.=1,6 (d) None ofthe above.

30. The greatest and least element of the poset given by the Hasse diagram (Fig. 1.34) are

@@

@1 23

4 5

Figure 1.34: Poset(a) Greatest=4,5; least=1,2 (b) Greatest=5; least=1 (c) Greatest=None; least=1 (d)None of the above.

31. If a ≤ b ≤ c in a lattice L(a) a ∨ b = b ∧ c (b) a ∧ b = b ∨ c (c) a ∨ b = b ∨ c (d) a ∧ b = b ∧ c

32. In a lattice L, a ∨ b = b. Then(a) a ≥ b (b) b ≤ a (c) a ∧ b = a (d) None of these.

33. Let X = 2, 3, 6, 12, 24, let ≤ be the partial order defined by x ≤ y if x divides y.Then number of edges in the Hasse diagram of (X,≤) is

(a) 3 (b) 4 (c) 5 (d) None of these.

34. In a lattice L, ((a ∧ b) ∨ a) ∧ b is(a) a ∧ b (b) a ∨ b (c) (a ∧ b) ∨ a (d) ((a ∨ b) ∧ a) ∨ b).

35. In a lattice, if a ≤ b and c ≤ d, then(a) b ≤ c (b) a ≤ d (c) a ∨ c ≤ b ∨ d (d) None of these.

36. (1, 2, 5, 10, 15, a, |) is a lattice if the smallest value for a is(a) 150 (b) 100 (c) 75 (d) 30.

37. S = 1, 2, 3, 12 and T = 1, 2, 3, 24, then

(a) S and T are sublattice of (D24, |)

72 Theory of Sets

(b) Neither S nor T are sublattices of (D24, |)(c) S and T are sublattices of (1, 2, 3, 12, |)(d) S and T are sublattices of (1, 2, 3, 24, |)

38. S = 1, 2, 4, 8 and T = 1, 3, 9, then

(a) Only S is a sublattice of (D72, |)(b) Only T is a sublattices of (D72, |)(c) Both S and T are sublattices of (D72, |)(d) Neither S nor T is a sublattice of (D72, |).

39. If the posets P1 and P2 are given in Fig. 1.35, then

,,

,@@@

%%llP1 P2

Figure 1.35: Poset for Self-Test(a) P1 and P2 are lattices (b) P1 is lattice (c) P2 is lattice (d) None of them is alattice.

40. (1, 2, 4, 6, 12, 24, |) is(a) Not a poset (b) A lattice (c) A complemented lattice (d) A lattice which is notcomplemented.

41. The lattice given in Fig. 1.36 is

ZZ

ZZ

l

lll

A

AAJ

JJ

Figure 1.36: Hasse diagram(a) Complemented but not distributive (b) Distributive but not complemented (c)Both complemented and distributive (d) Neither complemented nor distributive.


cc

L

LL


43. If a, b, c ∈ L, L being a distributive lattice, then(a) (a ∨ b) ∧ c ≤ a ∨ (b ∧ c) (b) (a ∨ b) ∧ c ≤ (a ∧ b) ∧ c (c) (a ∨ b) ∧ c = a ∨ b (d)(a ∨ b) ∧ c = c.

44. (D45, |) is not distributive since(a) 1, 3, 5, 45 is a sublattice of D45 (b) 1, 3, 9, 45 is a sublattice of D45 (c)1, 5, 9, 15, 45 is a sublattice of D45 (d) 1, 5, 15, 45 is a sublattice of D45.

Enumerable Set 73

45. A chain with 3 elements is(a) Complemented but not distributive (b) Distributive but not complemented (c)Both complemented and distributive (d) Neither complemented nor distributive.


HHH

@

@

"

""bb


47. In a distributive lattice, if a ∧ b′ = 0, then(a) b ≤ a (b) a ≤ b (c) a′ ∨ b = 0 (d) a ∨ b′ = 1.

48. (1, 2, 3, 6, 12, 30, 60, |) is(a) Complemented but not distributive (b) Distributive but not complemented (c)Both complemented and distributive (d) Neither complemented nor distributive.

49. (1, 1, 2, 1, 3, 1, 2, 3,⊆) is(a) Complemented but not distributive (b) Distributive but not complemented (c)Both complemented and distributive (d) Neither complemented nor distributive.

50. The number of functions from an m element set to an n element set is(a) m+ n (b) mn (c) nm (d) m ∗ n.

51. Let A and B be sets with cardinalities m and n respectively. The number of one-onemappings from A and B, when m < n, is

(a) mn (b) nPm (c) nCm (d) None of these.

52. It is given that there is exactly 97 functions from the set A to B. From this one canconclude that

(a) |A| = 1, |B| = 97 (b) |A| = 97, |B| = 1 (c) |A| = 97, |B| = 97 (d) None ofthese.

53. Let f : <×< → <×< be a bijective function defined by f(x, y) = (x+ y, x− y). Theinverse function of f is given by

(a) f−1(x, y) =( 1x+ y

,1

x− y

)(b) f−1(x, y) = (x − y, x + y) (c) f−1(x, y) =(

x+y2 , x−y

2

)(d) f−1(x, y) =

(2(x− y), 2(x+ y)

)54. The range of g f when f : Z → Z and g : Z → Z are defined by f(n) = n + 1 and

g(n) = 2n is(a) Z (b) Z+ (c) The set of all odd numbers (d) The set of all even numbers.

55. If f, g, h are functions from < → < defined by f(x) = x+1, g(x) = x2+2, h(x) = 2x+1,then (h g f)(2) is(a) 20 (b) 23 (c) 21 (d) 22.

56. If f, g, h are functions from Z → Z defined by f(x) = x−3, g(x) = 2x+3, h(x) = x+3,then g f h is(a) f (b) g (c) h (d) h g f.

74 Theory of Sets

57. If f is a function from Z → Z defined by f(x) = x+ 2, then f−3(10) is(a) 7 (b) 6 (c) 5 (d) 4.

58. If f and g are functions from <+ to <+ defined by f(x) = ex and g(x) = x− 3, then(g f)−1(x) is(a) log(3 + x) (b) log(3− x) (c) e3−x (d) log(x− 3).

59. f(A1 ∩A2) = f(A1) ∩ f(A2) holds(a) if f is injective (b) If f is surjective (c) If f is any function (d) For no function.

60. The relation (x, y) ∈ <2 : ax+ by = c is an invertible function from < → < if(a) a 6= 0 (b) b 6= 0, a 6= 0 (c) c 6= 0 (d) c 6= 0, a 6= 0.

61. The number of invertible functions from 1, 2, 3, 4, 5 to a, b, c, d, e is(a) 55 (b) 25 (c) 5! (d) None of these.

62. The number of odd permutations of the set 1, 3, 5, 7, 9 is(a) 15 (b) 30 (c) 60 (d) 120

63. Which one of the following is an even permutation?(a) f = (1, 2, 3)(1, 2) (b) f = (1, 2)(1, 3)(1, 4)(2, 5) (c) f = (1, 2, 3, 4, 5)(1, 2, 3)(4, 5) (d)None of these

64. Which power multiplying itself of the permutation f =(

1 2 3 41 3 4 2

)gives

(1 2 3 41 2 3 4

)(a) f (b) f2 (c) f3 (d) f4

Section-B[Objective Questions]

1. Let S be a non-empty set and P (S) be its power set. Show that there exists nobijection from S to P (S).

2. Let f : X → Y be a map. Show that f is surjective if there exists a map h : Y → Ysuch that f h = IY (identify map).

3. Let f : X → Y be a map. Show that f is injective if and only if there is a mapg : Y → Y such that g f = IX(identify map).

4. Consider the map defined by f(x, y) = (x, 0). Let A = (x, y) ∈ <2 : x − y = 0 andB = (x, y) ∈ <2 : x− y = 1. Show that f(A ∩B) 6= f(A) ∩ f(B).

5. Show that every infinite set contains a countable subset.

6. Let N be the set of positive integers and a ≤ b be the divisibility relation defined bya ≤ b if and only if b is divisible by a. Show that (N ,≤) is a poset.

7. Let ≤ denote the natural ordering in <. Show that the poset (<,≤) has neither minimalnor maximal elements.

8. Show that the following posets are not lattices:

(a) (2, 3, 5, 30, 60, 120, 360, |)(b) (1, 2, 3, 4, 6, 8, 12, |)(c) (2, 3, 6, 12, 24, 36, |)(d) (1, 2, 3, 6, 12, 30, |).

Enumerable Set 75

Section-C[Long Answer Questions]

1. A and B be any two sets, prove that the sets A − B,A ∩ B and B − A are pairwisedisjoint. [ VH’94, ’95, ’99]

2. (a) If A,B,C be three nonempty sets such that A ∪B = A ∪C and A ∩B = A ∩C,prove that B = C. [VH’00, CH’05, ’01, BH’03 ]

(b) If A,B,C be three nonempty sets such that A∩C = B ∩C and A∩C ′ = B ∩C ′,prove that B = C.

3. For each n ∈ N , let An = [n, , 2n] = x ∈ Z : n ≤ x ≤ 2n. Find the value of8⋃

n=4An.

4. Prove the following set theoretic statement if it true or give counter example to disproveit.

(a) A× (B − C) = (A×B)− (A× C). [CH’09]

(b) (AB)′ = (B −A)′. [CH’08,10]

5. For the subsets A,B,C of an universal set U , prove the following:

(a) A− (B − C) = (A−B) ∪ (A ∩ C).

(b) A− (B ∪ C) = (A−B) ∩ (A− C). [KH’07]

(c) A− (B ∩ C) = (A−B) ∪ (A− C). [VH’96]

(d) A× (B ∪ C) = (A×B) ∪ (A× C).

(e) A× (B ∩ C) = (A×B) ∩ (A× C).

(f) (A−B)× C = (A× C)− (B × C).

(g) (A ∪B)c = Ac ∩Bc and (b) (A ∩B)c = Ac ∪Bc. [KH’08]

6. Prove that

(a) (A ∩B) ∩B = φ

(b) A−B, A ∩B and B −A are mutually disjoint

(c) (A−B) ∪A = A.

(d) If A ⊆ B then show that A ∪ (B −A) = B.

7. (a) (a) Show that A ⊆ B ⇔ A−B = φ.

(b) If A ⊆ B and C is any set then show that nA ∪ C ⊆ B ∪ C.

(c) If A ∩X = A ∩ Y and A ∪X = A ∪ Y then prove that X = Y .

(d) If A ∩ C = B ∩ C and A ∩ C ′ = B ∩ C ′, then prove that A = B.

8. Simplify the following expression by using the laws of algebra of sets.

(a) [(A ∪B) ∩ Cc ∪Bc]c

(b) (Ac ∩Bc ∩ C) ∪ (B ∩ C) ∪ (A ∩ C)

(c) A ∩ (B ∩ C) ∩ (Ac ∩ (Bc ∩ Cc))

(d) (A ∩B)c ∪ (Ac ∩Bc).

(e) (A ∩B′) ∪ (B ∩ C).

76 Theory of Sets

9. Let A = 1, 2, 3, 4. List all subsets B of A such that 1, 2 ⊆ B.

10. Let A = 1, 2, 3 and B = a, b. Find A×B and B×A and verify that A×B 6= B×A.

11. Prove that

(a) (A×B) ∩ (C ×D) = (A ∩ C)× (B ∩D).

(b) A× (B ∩ C) = (A×B) ∩ (A× C).

(c) (A−B)× C = (A× C)− (B ×D).

(d) (A× C)− (B ×D) = (A×B)− (C ×D) ∪ (A×B)× (C −D) ∪ (A−B)× (C ∩D).

12. Prove that

(a) A∆B = A∆C ⇒ B = C

(b) A ∩ (B∆C) = (A ∩B)∆(A ∩ C).

13. If A and B are subsets of a set X, then prove that A ⊆ B ⇔ X −B ⊆ X −A.

14. S × T = T × S ⇔ S = T or one is phi.

15. Find the power set of the set A = a, b, c, 1.

16. (a) If the number of elements of the set A is n then show that the number of elementsof the power set P (A) is 2n.

(b) If A and B are two non-empty sets having n elements in common, then prove thatA×B and B ×A have n2 elements in common.

17. If the set X has 5 elements, then find n(P (X)) and P (P (P (φ))).

18. There are 1000 students in a college studying Physics, Chemistry and Mathemat-ics. 658 study Physics, 418 study Chemistry and 328 study Mathematics. Use Venndiagram to find the number of students studying Physics or Mathematics but notChemistry. [JECA’03]

19. Among 100 students, 32 study Mathematics, 20 study Physics, 45 study Biology, 15study Mathematics and Biology, 7 study Mathematics and Physics, 10 study Physicsand Biology and 30 do not study any of three subjects.

(a) Find the number of students studying all three subjects.

(b) Find the number of students studying exactly one of the three subjects.

20. In a city, three daily newspaper A, B and C are established. 42 percent of the peoplein that city read A, 6 percent read B, 60 percent read C, 24 percent read A and B,34 percent read B and C, 32 percent read C and A, 8 percent do not read any of thethree newspapers. Find the percentage of the persons who read all the three papers.

21. Let A = 1, 2, 3, 4, 5, 6. Determine whether or not each of the following is a par-tition of A. (a) P1 = 1, 2, 3, 1, 4, 5, 6 (b) P2 = 1, 2, 3, 5, 6 (c) P3 =1, 3, 5, 2, 4, 6 (d) P4 = 1, 3, 5, 2, 4, 6.

22. Let A, B, C be three finite sets of U . Show that

(a) |A−B| = |A| − |A ∩B|(b) |A ∪B| ≤ |A|+ |B|(c) |A ∪B ∪ C| ≤ |A|+ |B|+ |C|

Enumerable Set 77

(d) |A ∪B| ≤ |A|+ |B| − |A ∩B|(e) |A ∪B ∪ C| ≤ |A|+ |B|+ |C| − |A ∩B| − |A ∩ C| − |B ∩ C|+ |A ∩B ∩ C|.

23. Let A = 1, 2, 3, 4. For each of the following relations on A, decide whether it isreflexive, symmetric, antisymmetric or transitive

(a) (1, 3), (3, 1)(b) (2, 2), (1, 1)(c) (1, 2), (1, 4), (2, 3)(d) (1, 1), (2, 2), (3, 3), (4, 4), (1, 3), (3, 1).

24. The following relations are defined on the set of real numbers. Find whether theserelations are reflexive, symmetric or transitive

(a) aRb iff |a− b| > 0

(b) aRb iff 1 + ab > 0

(c) aRb iff |a| ≤ |b|(d) aRb iff |a| ≥ |b|.

25. In each of the following cases, examine whether the relation ρ is an equivalence relationon the set given below

(a) ρ = (a, b) ∈ Z × Z : |a− b| ≤ 3(b) ρ = (a, b) ∈ Z × Z : a− b is a multiple of 6.(c) xρy if and only if |x− y| ≤ y;x, y ∈ <.

26. Let Z∗ be the set of nonzero integers and S = Z×Z∗. Let ρ = x : x = ((r, s), (t, u)) ∈S × S with ru = st. Prove that ρ is an equivalence relation. [CH‘05]

27. A relation ρ on the set of integers Z is define by ρ = (a, b): a, b ∈ Z and |a− b| ≤ 5.Is the relation reflexive, symmetric and transitive? [WBUT 07]

28. Determine whether the relation ρ on the set A of all triangles in the plane defined byρ = (a, b) : triangle a is similar to the triangle b is an equivalence relation.

29. In the set of all points in a plane show that the relation of equidistance from the originis an equivalence relation.

30. A relation ρ on the set of integers Z is defined as aρb iff (a − b) is divisible by m (apositive integer). Show that ρ is an equivalence relation.

31. Determine whether the relation ρ is an equivalence relation on the set of positiveintegers Z+.

(a) aρb iff a = 4b.

(b) aρb iff a = b2.

32. If A and B be equivalence relation in a set X, show that A ∩ B is an equivalencerelation.

33. Let H be a subgroup of a group G. Show that the relation ρ = (a, b) ∈ G × G :a−1b ∈ H is an equivalence relation on the set G.

34. A relation ρ is defined on a set Z by aρb if and only if 2a + 3b is divisible by 5 fora, b ∈ Z. Prove that ρ is an equivalence relation. [ VH‘97, ‘05]

78 Theory of Sets

35. A relation ρ is defined on a set Z by aρb if and only if either a = b or both a, b arepositive, for a, b ∈ Z. Prove that ρ is an equivalence relation on Z. Write down thedistinct equivalence classes of ρ.

36. A relation ρ is defined on a set Z by aρb if and only if a−b is divisible by 5 for a, b ∈ Z.Prove that ρ is an equivalence relation. Write down the distinct equivalence classes ofρ. [VH‘99, ‘03]

37. A relation ρ is defined on a set Z by aρb if and only if a+ b is even, for a, b ∈ Z. Provethat ρ is an equivalence relation.

38. For natural numbers a and b, define aρb iff a2+b is even. Prove that ρ is an equivalencerelation on N .

39. For a, b ∈ R, define aρb iff a− b ∈ Z. Show that ρ is an equivalence relation.

40. For any integers a, b define

(a) aρ1b iff 2a+ 3b = 5n for some integer n.

(b) aρ2b iff 3a+ 4b is divisible by 7.

41. For a, b ∈ Z define

(a) aρ1b iff a2 − b2 is divisible by 3.

(b) aρ2b iff 3a+ b is multiple of 4.

42. A relation ρ is defined on a set Z by aρb if and only if a2− b2 is an even integer. Provethat ρ is an equivalence relation on Z and write down the equivalence classes. [BH‘02]

43. A relation ρ is defined on a set < by aρb if and only if a− b is rational. Prove that ρ isan equivalence relation on < and the set of equivalence classes is uncountable.[BH‘03]

44. A relation ρ is defined on a set Z by aρb if and only if ma+ nb is divisible by (m+ n)for a, b ∈ Z. Prove that ρ is an equivalence relation. Find out an infinite sequence ofpositive integers lying in the equivalence class containing 0. [ BH‘05]

45. Let A be the set of all straight lines in the plane.

(a) aρ1b iff a2 − b2 a is parallel to b

(b) aρ2b iff 3a+ b a is perpendicular to b.Show that ρ1 is is an equivalence relation but ρ2 is not.

46. Determine which of the following define equivalence relations in <2.

(a) (a, b)ρ(c, d) iff a+ 2b = c+ 2d.

(b) (a, b)ρ(c, d) iff a2 + b = c+ d2.

(c) (a, b)ρ(c, d) iff ab = cd.

(d) (a, b)ρ(c, d) iff ab = c2.

47. Let S be a finite set and let f : S → S. If f is one=to-one then show that f is onto.Examine whether this remains true if the set S is infinite. [ CH: 08]

48. ρ1 is a relation on Z such thatρ1 = (a, b) : a, b ∈ Z; a− b = 5n, n ∈ Z.

Show that ρ1 is an equivalence relation. If ρ2 be another relation defined byρ2 = (a, b) : a, b ∈ Z; a− b = 3n, n ∈ Z.

Show that the relation ρ1 ∪ ρ2 is symmetric but not transitive.

Enumerable Set 79

49. Given A = 1, 2, 3, 4 and B = x, y, z. Let ρ be the relation from A to B defined asρ = (1, x), (2, y), (2, z), (3, z).(a) Find the inverse of the relation ρ−1 of ρ.

(b) Determine the domain and range of ρ.

50. Given A = 1, 2, 3, 4. Let ρ be the relation on A and is defined as

ρ = (1, 1), (2, 2), (2, 3), (3, 2), (4, 1), (4, 4).(a) Draw its digraph, (b) Is ρ is equivalence relation?

51. If ρ is an equivalence relation, then prove that ρ−1 is also an equivalence relation inthe set A.

52. If R and S are equivalence relations in the set A then show that R ∩ S is also anequivalence relation in A.

53. Let A = 1, 2, 3, 4. Consider two equivalence relations

R = (1, 2), (1, 1), (2, 1), (2, 2), (3, 3), (4, 4), (4, 5), (5, 4), (5, 5)and S = (1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (1, 3), (3, 1), (4, 5), (5, 4).Determine the partitions corresponding to following relations

(a) R−1, (b) R ∪ S, (c) R ∩ S.

54. Let ρ be an equivalence relation on the set A = a, b, c, d define by the partitionP = a, b, c, d. Determine the elements of equivalence relation and also findthe equivalence classes of ρ.

55. For the partition P = a, b, c, d, e, write the corresponding equivalence relationon the set A = a, b, c, d, e.

56. Let S = n ∈ N : 1 ≤ x ≤ 20. Define a relation ρ on A by aρb iff 5 divides a − bfor all a, b ∈ S. Show that ρ is an equivalence relation on S. Find all the equivalenceclasses.

57. Let A be a finite set with n elements. Prove that the number of reflexive relationsthat can be defined on S is 2(n2−n), the number of symmetric relations is 2n(n+1)/2

and the number of relations that are both reflexive and symmetric is 2n(n−1)/2.

58. Let A and B be two non-empty sets with cardinality m and n respectively. Show thatthe number of possible relations from A to B is 2mn − 1.

59. Let A = 1, 2, 3, 4 and B = a, b, c. Determine whether the relation R from A to Bis a function. If it a function, find its domain and range.

(a) R = (1, a), (2, a), (3, b), (2, b), (b) R = (1, c), (2, a), (3, b),(c) R = (1, a), (2, b), (3, c), (4, b), (1, b), (d) R = (1, c), (2, a), (3, a), (4, c).

60. If A = 2, 3, 4, B = 2, 0, 1, 4 and relation f is defined as f(2) = 0, f(3) = 4,f(4) = 2. Find out whether it depends a mapping.

61. Let f : A → B and g : B → C be two mappings. Show that, if g f is injective, f isinjective but g is not so. [ CH: 09,10]

62. Let f : A → B and g : B → C be both surjective, then prove that the compositemapping g f : A→ C is surjective. Give an example to show that f is not surjectiveif g f : A→ C is surjective.

80 Theory of Sets

63. A mapping f : N ×N → N is defined by f(m.n) = 2m.3n. Show that f is injectivebut not surjective. [ CH: 07]

64. If Z+ is the set of positive integers and f(n) = 2n + 1 then show that f : Z+ → Z+

is one-one into mapping.

65. If R is the set of real numbers and f(x) = x2 + 7 then prove that f : R → R ismany-one into mapping.

66. If A and B be two sets having n distinct elements, show that the number of bijectivemappings from A to B is n!. [ CH: 07]

67. Show that the function f defined by f : Q → Q such that f(x) = 3x+4 for all x ∈ Q isone-one onto, where Q is the set of rational numbers. Also find a formula that definesthe inverse function f−1.

68. A function f : Z → Z is defined by:

f(x) =x

2; if x is even

= 7; if x is odd .

Find the left inverse of f ; if it exists. [ CH: 10]

69. Consider the sets A = k, l,m, n and B = 1, 2, 3, 4. Let f : A → B such that (a)f = (k, 4), (l, 1), (m, 2), (n, 3), (b) f = (k, 1), (l, 2), (m, 1), (n, 2).

Determine whether f−1 is a function.

70. Let f(x) =

1, if x is a rational0, if x is a irrational be a function from R to R. Find f(0.5) and

f(√

2).

71. Is the mapping f : X → Z defined by f(x) = 2x−11−|2x−1| is a bijective? Here z = set of

integers and X = x : 0 < x < 1.

72. If A = 1, 2 and B = a, b, find all relations from A into B. Delete which of theserelations are functions from A to B.

73. Show that the following functions are neither injective nor surjective.

(a) f : R→ R given by f(x) = |x|+ 1 x∀R

(b) f : R→ R given by f(x) = sinx x∀R.

74. Show that the following functions are injective but not surjective.

(a) f : Z → Z given by f(x) = 2x+ 3 x∀Z

(b) f : N → N given by f(x) = sinx x∀Z.

75. Show that the following functions are surjective but not injective

(a) f : Z → 1,−1 given by f(n) = (−1)n, n ∈ Z

(b) f : N → Z10 given by f(n) = [r], where r is the remainder when n is divided by10.

Enumerable Set 81

76. Determine which of the following functions are bijective

(a) f : R→ R where f(x) = |x|+ 1 x∀R(b) f : Z → Q where f(x) = 2x x∀R(c) f : R→ R where f(x) = x2 − 3x+ 4 x∀R(d) f : R→ S where f(x) = x

1+|x| where S = x ∈ R : −1 < x < 1.

77. Let A be a finite set and let f : A→ Bbe a surjective function. show that the numberof elements of B cannot be greater than that of A.

78. Let A = 1, 2, 3. Find all possible bijective functions from A into itself.

79. Let |A| = n. Prove that there can be n! different bijective functions on A.

80. Consider the function f : R → R and g : R → R where f(x) = x + 2 and g(x) = x2.Find fog and gof .

81. Suppose f and g are two functions from R into R such that fog = gof . Does itnecessarily imply that f = g? Justify your answer.

82. Let f, g and h : R→ R defined by f(x) = x+ 2, g(x) = 11+x2 , h(x) = 3.

Compute gof , fog, gohof , gof−1of and f−1ogof .

83. Let A = 1, 2, 3, 4 and define functions f, g : A→ A byf = (1, 3), (3, 2), (3, 1), (4, 2) and g = (1, 4), (2, 3), (3, 1), (4, 2).Find fog, gof , g−1ofog, fog−1og and gog−1of .

84. Let A = a, b, c. Define f : A → A such that f = (a, b), (b, a), (c, c). Find (a) f2,(b) f3, (c) f4. [Hints: f3 = fofof ]

85. Define f : Z → N by f(x) =

2|x|, if x < 02x+ 1, if x ≥ 0. Show that f has an inverse and find

f−1(25), f−1(20).

86. If f : x→ x+ 1 and g : x→ 3x be mappings of the set of integers into itself, examinewhether each of f and g is surjective, injective. Also, show that fg 6= gf. [VH‘95, ‘05]

87. Prove that the mapping f : < → < is defined by f(x) = 2x + 3, x ∈ < is a bijectivemapping. [ VH‘96]

88. Prove that the mapping f : Q → Q is defined by f(x) = 5x + 2, x ∈ Q is a bijectivemapping. [ VH‘03]

89. Test whether the mapping f : C → < defined by f(x) = |x|, x ∈ C is a bijectivemapping. [ VH‘98]

90. Show that the mapping f : < → <, defined by f(x) = x3 − x2 is surjective but notinjective. [ BH‘03]

91. A mapping f : < → < is defined by f(x) = xx2+1 , x ∈ <. Examine whether it is a

bijective mapping. [ VH‘97, CH‘05]

92. A mapping f : Z → Z is defined by f(x) = x2 + x − 2, x ∈ <, find f−1(4) andff(−2). [ VH‘01]

82 Theory of Sets

93. A mapping f : < → < is defined by f(x) = x2 + x − 2, x ∈ <, find f−1(−8) andf−117, 37.

94. For the mappings f(x) = x2 and g(x) = 1 + x, x ∈ <, find the set x ∈ < : fg(x) =gf(x).

95. For the mappings f(x) = |x| + x and g(x) = |x| − x, x ∈ <, find fg, gf and the setx ∈ < : fg(x) = gf(x). BH‘04

96. For the mappings f : N → Q; f(x) = 32x+ 1 and g : Q→ Q; g(x) = 6x,, examine with

justification if fg and gf are defined.

97. Let the mappings f, g : Z → Z be defined by f(x) = (−1)x and g(x) = 2x, x ∈ Z, findgf and fg.

98. Prove that the set of rational numbers in [0, 1] is countable. JECA‘06

99. If f =(

1 2 3 42 4 1 3

)and g =

(1 2 3 44 1 2 3

), find fg, f−1, g−1 and prove that (fg)−1 =

g−1f−1.

100. Examine whether the permutations(

1 2 3 4 5 63 1 5 6 4 2

),(

1 2 3 4 5 6 7 8 94 7 9 1 8 2 6 3 5

)are odd or

even.

101. Let X = a, b, c and f, g : X → X be defined by f(a) = b, f(b) = c, f(c) = a andg(a) = a, g(b) = c, g(c) = b. Show that fg 6= gf.

102. A relation ρ is defined on a set Z by aρb if and only if b is the divisor of a, for a, b ∈ Z.Prove that ρ is an partial order relation.

103. Give an example of a partially ordered set which is a lattice and another which is notlattice. Justify your answer.

104. Let X = 0, 1, 2, · · · , 100, define a binary relation ‘≤’ on X by x ≤ y if and only ifx divides y. (i) Prove that, it is a partially ordered set. Find the least and greatestelement of (X,≤) if they exist. (ii) Is (X,≤) a lattice. Justify the answer.

Chapter 2

Theory of NumbersThe integers are the main elements of mathematics. The theory of numbers is concerned,at least in its elementary aspects, with basic properties of the integers and more particu-larly with the positive integers 1, 2, 3, . . ., known as natural numbers. Here we shall discusssome basic properties of integers including well-ordering principle, mathematical induction,Euclidean algorithm representation of integers etc.

2.1 Number System

Number systems are basically of two types (i) Non-positional number system, (ii) Positionalnumber system.

2.1.1 Non-positional Number System

In this number system, people counted on figures in the early days, when ten figures werenot adequate, small stones, balls, sticks, pebbles were used to indicate values. This methodof counting uses an additive approach or the non-positional number system. Each symbolrepresent the same value regardless of its position in the number and the symbols are simplyadded to find out the value of the particular number. Since it is very difficult to performarithmetic with such a number system, positional number system were developed as thecenturies passed.

(i) In this system, we have symbols (Roman number system) I for 1, II for 2, III for 3etc. and so on.

(ii) An example of earlier types of notation can be found in Roman numerals, which areessentially additive: III = I + I + I, XXV = X +X + V. New symbols X,C,M, . . .etc. were used as the numbers increased in value: thus rather than IIIII is equals to5.

(iii) The only importance of position in Roman numbers lies in whether a symbol precedesor follows another symbol, i.e., IV = 4, while V I = 6.

(iv) The clumsiness of this system can be seen easily if we try to multiply XII by XIV .Calculating with roman numbers was to difficult that early mathematicians were forcedto perform arithmetic operations almost entirely on abaci, or counting boards, trans-lating their results back to Roman numeral form.

Some of such roman number system are given below in the tabular form:

1 2 3 4 5 6 7 8 9 10I II III IV V V I V II V III IX X

83

84 Theory of Numbers

11 · · · 39 40 41 · · · 49 50 51 · · · 89XI · · · XXXIX XL XLI · · · XLIX L LI · · · LXXXIX90 91 · · · 99 100 · · · 200 300 400 500 600XC XCI · · · XCIX C · · · CC CCC CD D DC

700 800 900 1000 1100 1200 1300 1400 1500 1600 1700DCC DCCC CM M MC MCC MCCC MCD MD MDC MDCC

1800 1900 2000 5000 10000 50000 100000 500000 1000000MDCCC MCM MM V X L C D M

Pencil and paper computations are unbelievably intricate and difficult in such systems. Infact the ability to perform such operations as addition and multiplication was considered agreat accomplishment in earlier civilizations.

2.1.2 Positional Number System

In a positional number system, there are only a few symbols called digits and these symbolsrepresent different values depending on the position they occupy in the number. In thisnumber system the position of the digit is very important, the digit will view be it values.The value of each digit in such a number is determined by three considerations

(i) the digit itself

(ii) the position of the digit in the number

(iii) the base of the number system.

The positional number system are groups as(i) Decimal number system

(ii) Binary number system

(iii) Octal number system

(iv) Hexadecimal number system.

There are two characteristic of all number systems that are suggested by the value of thebase

(i) The total number of digits (symbols) available to represent numbers in a positionalnumber system. Commonly base as a subscript notation. In all number system thevalue of the base determines the total number of different symbol or digit available inthe number system.

(ii) The second characteristic is that the maximum value of a single digit is always equalto one less than the value of the base. For example, 0011 base 2 first digit is 0 lessthan base 2.

Decimal Number system : In this number system, the base or radix is equal to 10because there are altogether ten symbols or digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 are used. In day today life this number system are more useful. The general rule for representing numbers inthe decimal system by using positional notation as

an−1 × 10n−1 + an−2 × 10n−2 + . . .+ a1 × 10 + a0 (2.1)

Natural Number 85

is expressed as an−1, an−2, . . . , a1, a0 where n is the number of digits to the left of the decimalpoint. In this number system, we can start counting from 0 for the converting purpose, wecan (−1) from the total number of digit of the number. For example,

(2586)10 = 2× 104−1 + 5× 104−2 + 8× 104−3 + 6× 104−4.

The other positional number systems, may consult the Author’s Numerical Book.

2.2 Natural Number

A set N of natural numbers is defined by a set in which the following axioms (known asPeano’s axiom) are satisfied:

(i) every element a ∈ N has a unique successor denoted by a∗, a∗ ∈ N .

(ii) If two natural numbers have equal successor, then they are themselves equal, i.e.,

a∗ = b∗ ⇒ a = b; ∀a, b ∈ N .

(iii) ∃ an unique element (denoted by 1) in N , which has no predecessor.

(iv) If M ⊆ N such that 1 ∈ M and k ∈ M ⇒ k∗ ∈ M , then M = N . This is calledprinciple of mathematical induction or first principle of finite induction.

The set of numbers 1, 2, 3, . . . is called natural numbers and is denoted by N = 1, 2, 3, . . ..

2.2.1 Basic Properties

We are acquainted with the following familiar properties of integers.

(i) Closure law : a+ b ∈ N ; ∀a, b ∈ N

(ii) Associative law: (a+ b) + c = a+ (b+ c); ∀a, b, c ∈ N

(iii) Identity law : a+ 0 = 0 + a, a.1 = 1.a = a; ∀a ∈ N

(iv) Additive inverse law : a+ b = b+ a; ∀a, b ∈ N

(v) Commutative law : a+ b = b+ a; ∀a, b ∈ N

(vi) Distributive law : a.(b+ c) = a.b+ a.c; ∀a, b, c ∈ N

(vii) Cancellation law : a+ b = a+ c⇒ b = c; ∀a, b ∈ N

The set of all natural numbers or is closed with respect to addition and multiplication butnot closed with respect to subtraction and division.

2.2.2 Well Ordering Principle

The well ordering principle plays an important role in the proof of the next sections. Theprinciple states that,

every non empty subset of N , of natural numbers have unique least element.Let S be a non empty subset of the set N of natural numbers. Thus ∃ m ∈ S such that

m ≤ a ,∀a ∈ S; m is called the least element of N .From well ordering principle it follows that, every descending chain of natural numbers

must terminate.


Theorem 2.2.1 There is no integer m satisfying 0 < m < 1.

If possible, there lie integers in (0,1), then consider the set S =n ∈ Z : 0 < n < 1

.By

assumption S is non empty subset of N . So by well ordering principle it has least elementsay c so that 0 < c < 1 and c ∈ N . Therefore, 1− c > 0 and also c > 0. Thus,

c(1− c) > 0 ⇒ c− c2 > 0 ⇒ 0 < c2 < c < 1.

Thus c2 ∈ Z and 0 < c2 < 1. Hence c2 ∈ S,but c2 < c which contradict the fact that c isthe least element in S. This contradiction shows that our assumption is wrong. Hence thereis no integer satisfying 0 < m < 1.

2.2.3 Mathematical Induction

Form 1: If M, a set of positive integers, be ⊆ N with two specific properties

(i) the integer 1 ∈M and

(ii) k ∈M ⇒ k + 1 ∈M

then M = N .Proof: Let F = N −M , it is sufficient to show that F = φ, null set. Let us suppose F 6= φ,then F is a set of all positive integers not is M . So by well ordering principle, it has an leastelement q(say), then q ∈ F . Since 1 ∈M , so q 6= 1 so q − 1 ∈ N . Since 0 < q − 1 < q and qis the least element in F . So q − 1 ∈M . The hypothesis (ii) gives

q − 1 ∈M ⇒ (q − 1) + 1 ∈M ⇒ q ∈M,

which is a contradiction. This contradiction shows that F = φ and consequence thatM = N .Form 2: Let P (n) be a mathematical statement involving positive value integer n. If

(i) P (1) is valid,

(ii) validity of P (k) ⇒ validity of P (k + 1),

then P (n) is valid for all ’+’ve integer n.Proof: Let M be the subset of all N , of natural numbers n for which P (n) is true . SinceP (1) is true so 1 ∈M and condition (ii) gives k ∈M ⇒ k + 1 ∈M . Consider F = N −M ,we are to show that F = φ, by the previous, F = φ so M = N , hence P (n) is valid for all’+’ve integer n. We see that all the forms are equivalent.

Result 2.2.1 From the above equivalent forms, we see that, the mathematical inductionconsists of three steps

(i) Basis: Show that P (1) is true

(ii) Inductive hypothesis: Write the inductive hypothesis like, let P (k) be true.

(iii) Inductive step: Show that P (k + 1) is true.

Result 2.2.2 Although mathematical induction provides a standard technique for attempt-ing to prove a statement about the positive integers, one disadvantage is that it gives no aidin formulating such statements.

Ex 2.2.1 Prove that 23n − 1 is divisible by 7 for all n ∈ N .

Natural Number 87

Solution: Let us write P (n) for 23n − 1. We have P (1) = 23− 1 = 7 , which is divisible by7. Thus the proposition is true for n = 1. Let us consider P (m+ 1)− P (m), which is givenby,

P (m+ 1)− P (m) = (23m+1 − 1)− (23m − 1)= 23m+3 − 23m = 23m(8− 1)= 23m.7 = 7p,

where p = 23m = an integer. Hence P (m + 1) is divisible by 7, if P (m) is so. This provesthat the proposition is true for n = m + 1, if it is true for n = m. Hence by principle ofmathematical induction, the proposition is true for all n ∈ N .

Ex 2.2.2 Show that n5 − n is divisible by 30 for all n ∈ N .

Solution: Let us write P (n) for n5 − n. We have,

P (1) = 1− 1 = 0, which is divisible by 30.P (2) = 25 − 2 = 30, which is divisible by 30.

Thus the proposition is true for n = 1, 2. Let P (m) = m5 − m is divisible by 30 i.eP (m) = 30k, where k ∈ N

P (m+ 1) = (m+ 1)5 − (m+ 1)= m5 + 5m4 + 10m3 + 10m2 + 5m+ 1−m− 1= (m5 −m) + 5m(m+ 1)(m+ 2)2 − 15m(m+ 1)2

= 30k + 30q − 30r; q, r ∈ N .

P (m+1) is divisible by 30 if P (m) is divisible by 30. Hence, by the principle of mathematicalinduction, the proposition is true for all n ∈ N .

Ex 2.2.3 Show that n3 − n is divisible by 6 for all n ∈ N .

Solution: Let us write P (n) for n3 − n. We have

P (1) = 1− 1 = 0, which is divisible by 6.P (2) = 23 − 2 = 6, which is divisible by 6 .

Thus the proposition is true for n = 1, 2. Let P (m) = m3 −m is divisible by 6 i.e P (m) =6k, k ∈ N

P (m+ 1) = (m+ 1)3 − (m+ 1) = m3 + 3m2 + 2m= (m3 −m) + 3m(m+ 1) = 6k + 6q; q ∈ N

as product of two consecutive number is divisible by 2. P (m + 1) is divisible by 6 if P (m)is divisible by 6. Hence, by the principle of mathematical induction, the proposition is truefor all n ∈ N .

Ex 2.2.4 Show that 2.7n + 3.5n − 5is divisible by 24 for all n ∈ N .

Solution: Let us write P (n) for 2.7n + 3.5n − 5. We have,

P (1) = 2.7 + 3.5− 5 = 24, which is divisible by 24.


Thus the proposition is true for n = 1. Let P (m) be divisible by 24 i.e P (m) = 2.7m +3.5m − 5 = 24q, q ∈ N . Now,

P (m+ 1) = 2.7m+1 + 3.5m+1 − 5= 7[2.7m + 3.5m − 5− 3.5m + 5] + 3.5m+1 − 5= 7(2.7m + 3.5m − 5)− 6.5m + 30= 7.24.q − 6.5(5m−1 − 1)= 7.24.q − 6.5.4(5m−2 + 5m−3 + . . .+ 1)= 24[7.q − 5(5m−2 + 5m−3 + . . .+ 1)].

Therefore, P (m+ 1) is divisible by 24 if P (m) is divisible by 24. Hence, by the principle ofmathematical induction, the proposition is true for all n ∈ N .

Ex 2.2.5 Show that, 34n+2 + 52n+1 is divisible by 14 for all n ∈ N .

Solution: Let us write P (n) for 34n+2 + 52n+1. We have,

P (1) = 36 + 53 = 14.61, which is divisible by 14.

Thus the proposition is true for n = 1. Let P (m) be divisible by 14 i.e P (m) = 34m+2 +52m+1 = 14q, q ∈ N . Thus, 52m+1 = 14.q − 34m+2. Now,

P (m+ 1) = 34(m+1)+2 + 52(m+1)+1

= 34m+2.34 + 52m+1.52

= 34m+2.81 + 25(14q − 34m+2)= 34m+2(81− 25) + 25.14q= 14[4.34m+2 + 25q]; where q ∈ N= 14k; where k = 4.34m+2 + 25q ∈ N .

Therefore, P (m+ 1) is divisible by 14, if P (m) is divisible by 14. Hence, by the principle ofmathematical induction, the proposition is true for all n ∈ N .

Ex 2.2.6 Show that nn > 1.3.5 . . . (2n− 1) for n > 1.

Solution: For n = 2, the LHS = 22 = 4 and RHS = 1.3 = 3. As 4 > 3, the inequality holdsfor n = 2. Let the result holds for n = m i.e.,mm > 1.3.5 . . . (2m− 1). Hence,

(2m+ 1)mm > 1.3.5 . . . (2m− 1)(2m+ 1).

Now, (m+ 1)m+1 − (2m+ 1)mm = mm+1[(1 +1m

)m+1 − (2 +1m

)]

= mm+1

[(1 +m+ 1C1

1m

+m+ 2C2

1m2

+ · · ·+ 1mm+1

)−(

2 +1m

)]= mm+1

[m+ 12m

+ · · ·+ 1mm+1

]> 0.

Hence, (m+ 1)m+1 > (2m+ 1)mm > 1.3.5 . . . (2m+ 1).

Thus the inequalities holds for n = m+ 1 when it holds for n = m. Hence it is true for allpositive value integer n.

Ex 2.2.7 For what natural number n is the inequality 2n > n2 valid.

Integers 89

Solution: We shall prove this by using the principle of mathematical induction. Forn = 1 as 2 > 1 so the inequality is valid.n = 2 as 22 = 22 so the inequality is not valid.n = 3 as 23 6> 32 so the inequality is not valid.n = 4 as 24 = 42 so the inequality is not valid.n = 5 as 25 > 52 so the inequality is valid.

Let 2k > k2, when k > 4 and k ∈ N . Therefore,2k > 2k + 1; for k > 4

⇒ 2k + 2k > k2 + 2k + 1 ⇒ 2k+1 > (k + 1)2.

Thus the inequality is valid for n = k + 1, when it is valid for n = k, and k > 4. Hence bythe principle of mathematical induction the inequality is valid for n = 1 and n > 4.

Ex 2.2.8 Prove that the product of r consecutive numbers is divisible by r!.

Solution: Let pn = n(n+ 1)(n+ 2) · · · (n+ r − 1); n ∈ N , then,

pn+1 = (n+ 1)(n+ 2) · · · (n+ r)⇒ n.pn+1 = (n+ r)pn = n.pn + r.pn

⇒ pn+1 − pn =pn

n× r

= r × product of (r − 1) consecutive natural numbers.

If the product of (r − 1) consecutive natural nos. is divisible by (r − 1)! then,pn+1 − pn = k.r! ; k ∈ N .

Now, p1 = r! so p2, p3, p4, . . . are also multiple of r!. We shall show that product of (r − 1)consecutive natural numbers is divisible by (r−1)! then the product of r consecutive naturalnumbers is divisible by r!. The product of two consecutive natural numbers is divisible by2!, so the product of three consecutive natural numbers is divisible by 3! and so on.

Ex 2.2.9 Prove that (2 +√

3)n + (2−√

3)n is an even integer for all n ∈ N .

Solution: Let pn be the statement that (2 +√

3)n + (2 −√

3)n is an even integer. Since,(2 +

√3)1 + (2 −

√3)1 = 4 = even integer, so p1 is true. Let pk be true, i.e., (2 +

√3)k +

(2−√

3)k is an even integer, then,(2 +

√3)k+1 + (2−

√3)k+1

= ak+1 + bk+1; where, a = 2 +√

3, b = 2−√

3= (ak + bk)(a+ b)− (ak−1 + bk−1)ab= 4(ak + bk)− (ak−1 + bk−1).

This is an even integer, as ak + bk and ak−1 + bk−1 are even integers, by assumption.This shows that Pk+1 is true whenever p1, p2, · · · , pk are true. Therefore, by principle ofmathematical induction, the statement is true for all n ∈ N .

2.3 Integers

The set of all integers, denoted by Z,consists of whole numbers asZ =

0,±1,±2,±3, · · · .

(2.2)

The set of all positive integers is identified with the set of natural number N . We shalluse the properties and principles of N in connection with the proof of any theorem aboutpositive integers.


2.3.1 Divisibility

In this section, we define the divisibility and division algorithm, for two give integers, whichare most important and fundamental concept in number theory.

Definition 2.3.1 Let a ∈ Z and x is any member of Z. Then ax is called multiple of a.For example,

(i) 3× 7 = 21, then 21 is called a multiple of 3. Also it is called a multiple of 7.

(ii) The number 0 is multiple of every member of Z, as a.0 = 0, ∀ a ∈ Z.

These exist infinitely many elements which are multiple of a ∈ Z.

Definition 2.3.2 An integer a(6= 0) is said to divide as integer b, if

∃ unique c ∈ Z such that ac = b. (2.3)

This is expressed by saying ’a divides b’ or ’a is the divisor of b’ or ’b is the divisible of a’and is denoted by a|b. We also say that b is a multiple of a, that a is a divisor of b or thata is a factor of b.

For example

(i) 9|63 as 63 = 9.7, where 7 ∈ Z, i.e. 63 is a multiple of 9.

(ii) Also −81 is divisible by 3 as −81 = 3.(−27), and −27 ∈ Z.

(iii) Again 3 6 |16 because for 16 there is no integer x such that 16 = 3 · x.

(±1), (±a) are called the improper divisors of a nonzero integer a. We write, a 6 |b to indicatethat b is not divisible by a. Divisibility establishes a relation between any two integers withthe following elementary properties.

Property 2.3.1 If a|b, then every divisor of a divides b.

Proof: Since a|b, ∃ c ∈ Z such that b = ac. Let m be any divisor of a, thena = md, for some d ∈ Z.

Thus, b = mdc⇒ m|b, as cd ∈ Z.

Property 2.3.2 If a|b and a 6= 0 then (b/a)|b

Proof: From definition, we have,a|b⇒ b = ac⇒ b/a = c, an integer

Now, b =(b

a

)a, a ∈ Z ⇒ (b/a)|b.

b/a is called the devisor conjugate of a.

Property 2.3.3 For integers a, b, c ∈ Z,

(i) a|a, ∀a(6= 0) ∈ Z,(reflexive property)

(ii) 1|a, a|0, −1|a;

(iii) a|b, b|c⇒ a|c (transitive property). The converse of this property need not hold. Forexample, a = 5, b = 10, c = 15, then 5|15 but 10 6 |15 although 5|10.

Integers 91

(iv) a|b and b|a if and only if a = ±b.

These properties are immediately follows from definition.

Property 2.3.4 If a, b ∈ Z then a|b implies a|bm.

Proof: If a, b ∈ Z such that, a|b, by definition, b = ac; c ∈ Z. Therefore,

bm = (ac)m = a(cm); where cm ∈ Z,⇒ a|(bm).

Thus, if a|b, then a|(bm),m ∈∈ Z. The converse is not always true. For example, letb = 5,m = 8 and a = 10, then 10|5.8 i.e., 10|40 but 10 6 |8.

Also, if a|b, then ma|mb, m 6= 0. This is known as multiplication property. Also, thecancellation law states that ma|mb and m 6= 0 implies a|b.

Property 2.3.5 If a|b and a|c, then a|(bm+ cn);m,n being arbitrary integers.

Proof: The relations a|b, a|c ensure that ∃ suitable integers x, y such that b = ax, c = ay.Hence mb = max and nc = nay. Thus whatever the choice of integers m and n,

mb+ nc = max+ nay = a(mx+ ny)⇒ mb+ nc = n(mx+ ny), where m,n, x, y ∈ Z

⇒ a|(bm+ nc).

The converse of this result need not hold. For example, let a = 5, b = 6, c = 7,m = 3 andn = 1, so 5|25, but 5 6 |6 and 5 6 |7.

This property of this theorem extends by induction to sums of more than two terms.That is, if a|bk for k = 1, 2, . . . , n, then

a|(b1x1 + b2x2 + . . . bnxn) = a|n∑

i=1

bixi

for any integers x1, x2, . . . , xn. This is known as linearity property of divisibility.

Property 2.3.6 If a|b, ∃c ∈ Z such that b = ac. Also, b 6= 0 implies c 6= 0. Upon takingabsolute values, we get |b| = |ac| = |a||c|.

Because, c 6= 0, it follows that |c| ≥ 1, whence,

|b| = |ac| = |a||c| ≥ |a| ⇒ |b| ≥ |a|.

Thus, if a|b and b 6= 0, then |a| ≤ |b|. Let b 6= 0, then,

a|b⇒ |a| ≤ |b|.

Again, if a 6= 0, then b|a⇒ |b| ≤ |a|. Therefore, if a 6= 0, b 6= 0, then

a|b and b|a⇒ |a| = |b|.

This is known as comparison property.

Property 2.3.7 If 0 ≤ a ≤ b and b|a, then a = 0. For, let a 6= 0 and

b|a⇒ |b| ≤ |a| ⇒ b ≤ a,

which is a contradiction as a, b are both non negative. This contradiction shows the hypoth-esis that a = 0. If b|a and |a| < |b| then a = 0. For if a 6= 0, and b|a ⇒ |b| ≤ |a| which iscontradictory to the hypothesis and hence a = 0.


2.3.2 Division Algorithm

Given integers a and b(b > 0), ∃ unique two integers q and r, such that

a = bq + r; where, 0 ≤ r < b. (2.4)

Proof: Existence : We begin by considering the set of non negative integers, given by,

S = a− bx : x ∈ Z, a− bx ≥ 0.

First, we shall show that S is non empty. To do this, it suffices to exhibit a value of xmaking a− bx nonnegative. Now,

−|a| ≤ a and b ≥ 1 ⇒ |a| ≤ b|a|.Therefore, a ≥ −|a| ≥ −b|a| ⇒ a+ b|a| ≥ 0

⇒ a+ b|a| ∈ S.

For this choice of x = −|a|, S is non empty set of non negative integers. Thus,

(i) either S contains 0 as its least element, or,

(ii) S does not contain 0, so, S is a nonempty subset of N , by well ordering principle, ithas a least element which is positive.

Hence in each case, S has a least element r ≥ 0, (say) and r is of the form a− bq. Thus,

r = a− bq; q ∈ Z⇒ a = bq + r; q ∈ Z and r ≥ 0.

We shall now show that r < b. If possible let r ≥ b then r − b ≥ 0 and

r − b = a− b(q + 1); where (1 + q) ∈ Z,

so that (r − b) ∈ S, smaller than its smallest member r, which is a contradiction. Hence,r < b (b > 0, r − b < r). Thus ∃q, r ∈ Z and 0 ≤ r < b such that a = bq + r.

Uniqueness : To prove the uniqueness of integers q, r; we assume that we can findanother pair q1, r1 ∈ Z such that,

a = bq1 + r1; 0 ≤ r1 < b.

⇒ 0 = b(q − q1) + (r − r1)or, b(q − q1) = r − r1 ⇒ b|(r − r1),

where, |r − r1| < b⇒ (r − r1) = 0 ⇒ r = r1

so, b(q − q1) = 0 ⇒ q = q1; as b > 0.

Thus q and r are unique, ending the proof. Also, it is clear that r = 0, if and only if, b|a.This important theorem is is known as division algorithm. The advantage of this algorithmis that it allows us to prove assertions about all the integers by considering only a finitenumber of cases.

Result 2.3.1 The two integers q and r, termed as quotient and remainder in the divisionof a by b respectively.

Result 2.3.2 Though it is an existence theorem, its proof actually gives us a method forcomputing the quotient q and remainder r.

Integers 93

Theorem 2.3.1 If a and b(> 0) be two integers, then ∃ integers Q and R such that

a = bQ±R; 0 ≤ R <b

2. (2.5)

Proof: For any two integers a and b with b > 0, the division algorithm shows that ∃ q, r ∈ Zsuch that

a = bq + r; 0 ≤ r < b (2.6)

Case1: Let r < b2 . Taking q = Q and r = R in (2.6), we have,

a = bQ+R; 0 ≤ R <b

2

Case2: Let r > b2 . Now, a = bq + r can be written in the form

a = b(q + 1) + r − b = b(q + 1)− (b− r)

Taking q + 1 = Q and b− r = R, we have,

a = bQ−R, 0 ≤ R <b

2, R = b− r < b− b

2=b

2.

Thus combining the Case(1) and (2) we have,

a = bQ±R; 0 ≤ R <b

2.

Case3: Let r = b2 , then a = bq + r can be written in the form a = bQ+ R, where we take

q = Q and r = R = b2 . Again,

a = bq + r = b(q + 1)− (b− r) = bQ+R

where q + 1 = Q and R = −(b− r) = − b2 . Thus, it follows that for r = b

2 , Q and R are notunique. In this case, R is called the minimal remainder, i.e. the absolutely least remainderof a with respect to b.

Theorem 2.3.2 (Generalized division algorithm) : Given integers a and b(b 6= 0), ∃ uniquetwo integers q and r, such that a = bq + r; 0 ≤ r < |b|.

Proof: When b is positive, then it is the previous theorem. So it is enough to consider thecase in which b is negative. When b is negative then |b| > 0 as b 6= 0. By the above theorem,∃ unique integers q1 and r such that

a = |b|q1 + r; 0 ≤ r < |b|= −bq1 + r = bq + r, where q = −q1.

Hence the theorem.

Ex 2.3.1 If n be any positive integer, show that the product of the consecutive naturalnumbers n, n+ 1, n+ 2 is divisible by 6.

Solution: In case of division by 3, one of the numbers 0, 1, 2 will be the remainder and thecorresponding integers of the form 3k, 3k + 1, 3k + 2; k ∈ Z. If

n = 3k, then 3|n; n = (3k + 1), then 3|n+ 2; n = 3k + 2, then 3|n+ 1.


Hence for any value of n in Z, 3|n(n+1)(n+2). In case of division by 2, one of the numbers0, 1 will be the remainder and the corresponding integers of the form 2k, 2k + 1; k ∈ Z. If,

n = 2k then 2|n and n = 2k + 1 then 2|n+ 1.

Hence for any n ∈ Z, 2|n(n + 1) i.e. the two consecutive integers n, n + 1 one is even i.e.divisible by 2. Therefore,

2|n(n+ 1)(n+ 2) and 3|n(n+ 1)(n+ 2).

Since (2, 3) = 1, so 6|n(n+1)(n+2). In the above procedure, we can show that the productof m consecutive integers is divisible by m.

Ex 2.3.2 Show that the square of an odd integer is of the form 8k + 1; k ∈ Z.

Solution: By division algorithm, we see that when an integer is divided by 4, the remainderwill be one of 0, 1, 2, 3 and the corresponding integers is of the form 4k, 4k+1, 4k+2, 4k+3.Of those form (4m+ 1) and (4m+ 3) will be odd integers. Now,

(4m+ 1)2 = 8(2m2 +m) + 1

where 2m2 +m ∈ Z, which is of the form 8k + 1 and

(4m+ 3)2 = 8(2m2 + 3m+ 1) + 1,

where 2m2 + 3m+ 1 ∈ Z, which is of the form 8k + 1. Therefore, the square of an oddinteger is of the form 8k + 1; k ∈ Z.Ex 2.3.3 Show that square of any integer is of the form 4n or (4n+ 1), for some n ∈ Z.

By division algorithm, we see that when an integer is divided by 2, the remainder of 0, 1and the corresponding integers of the form 2k, 2k + 1; k ∈ Z. Now,

(2k)2 = 4k2 = 4n; k2 = n ∈ Zso, (2k + 1)2 = 4(k2 + k) + 1

= 4n+ 1; k2 + k = n ∈ Z.

Hence square of any integer is of the form 4n, 4n+ 1; n ∈ Z.

2.4 Common Divisor

Let a and b be given arbitrary integers. If d divides two integers a and b, i.e., if both

d|a and d|b, (2.7)

then d is called a common divisor of a and b. The number of divisors of any non-zero integeris finite. Now

(i) 1 is a common divisor of every pair of integers a and b, so the set of positive commondivisors of integers a and b is non empty.

(ii) Every integer divides zero, so that if a = b = 0, then, every integer serves as a commondivisor of a and b. In this instance, the set of positive common divisors of a and b isinfinite.

(iii) However, when at least one of a and b is different from 0, there are only a finite numberof positive common divisors.

Every pair of integers a and b has a common divisor which can be expressed as a linearcombination of a and b. Every finite set has the largest value. It is defined as the gcd as inthe following definition.

Common Divisor 95

2.4.1 Greatest Common Divisor

For two given integers a and b, with at least one of them different from zero, a positiveinteger d is defined to be the greatest common divisor (gcd) of a, b if,

(i) d be a common divisor of a as well as b i.e., d|a, d|b.

(ii) every common divisor of a, b is a divisor of d ,i.e for an integer c ;i.e.,

c|a, c|b⇒ c|d.

The gcd of a, b is denoted by gcd(a, b) or simply (a, b). For more than two integers it isdenoted by (a1, a2, · · · , an). From definition, it follows that,

(a,−b) = (−a, b) = (−a,−b) = (a, b),

where, a, b are integers, not both zero. For example,

(i) (12, 30) = 6, (9, 4) = 1, (0, 5) = 5 etc.

(ii) (12,−30) = 6 and (−16, 40) = 8.

Result 2.4.1 Let d and d1 be two greatest common divisors of integers a and b. Thenby the definition, we find that d|d1 and d1|d. Hence, there exist integer r and t such thatd1 = dr and d = d1t. Now,

d = d1t = drt, d 6= 0 ⇒ rt = 1.

Thus, r = s = ±1, and hence d = ±d1. So it follows that, two different gcd’s of a and bdiffer in their sign only. But we take the positive value as the gcd.

Theorem 2.4.1 Any two non zero integers a, b, not both of which are zero, have an uniquegcd, which can be written as in the form ma+ nb;m,n ∈ Z.

Proof: Let us consider a set S of all positive linear combinations of a and b as,

S = xa+ yb : x, y ∈ Z, xa+ yb > 0.Also, a.a+ 0.b = a2(> 0) ∈ S.

so, S is non empty subset of N . Therefore, by the well ordering principle, it has an leastelement r (say), which is of the form

r = ma+ nb; m,n ∈ Z.

We shall first show that, r|a and r|b. If r is not a devisor of a, by the division algorithm,∃p, q ∈ Z such that a = pr + q; 0 < q < r, i.e.,

q = a− pr = a− p(ma+ nb)= (1−mp)a+ (−np)b ∈ S;

where, 1−mp,−np ∈ Z and q > 0. Since q < r, this representation would imply that, q isa member of S contradicting the fact that r is the least element in S. Hence q = 0 and r|aand similarly r|b. Next let, c|a, c|b, then

c|a, c|b⇒ a = ck1, b = ck2; k1, k2 ∈ Zso, r = ma+ nb = mck1 + nck2

= c(mk1 + nk2); where mk1 + nk2 ∈ Z.


Thus, c|r and so r = (a, b) and r = ma+nb; m,n ∈ Z. To the uniqueness of r, let there beanother gcd of a, b say r1 i.e r1 = (a, b) also. r|r1 and r1|r i.e r, r1 are associates ⇒ r1 = ±r.But as r and r1 are both positive so r = r1. Hence gcd is unique which can be expressedas a linear combination of a and b with integral multiplier m and n. This is the Euclideanalgorithm for existence of gcd. Note the following:

(i) This method involves repeated application of the division algorithm.

(ii) If m,n are integers then (a, b) is the least positive integer of the form ma+ nb, wherem and n range over integers.

(iii) The representation of d as ma+ nb is not unique.

(iv) If a and b are integers, not both of which are zero, we have,

(a, b) = (b, a) = (−a, b) = (a,−b) = (−a,−b) = (a, b+ ax),

for any integer x.

(v) The theorem does not give any algorithm how to express (a, b) in the desired formma+ nb.

(vi) If d = (a1, a2, . . . , ar), ai 6= 0; ∀i then ∃ integers m1,m2, . . . ,mr such that

d = a1m1 + a2m2 + . . .+ armr.

Theorem 2.4.2 (Method of finding gcd ) : For two given positive value integers a, b ifa = bq + r; q, r ∈ Z, 0 ≤ r < b then (a, b) = (b, r).

Proof: Let d = (a, b) and d1 = (b, r). Since d is the gcd of a and b so d|a and d|b i.e.,k1, k2 ∈ Z such that a = dk1, b = dk2. Now,

a = bq + r ⇒ r = a− bq

= dk1 − dk2q = d(k1 − k2q); k1 − k2q ∈ Z.

Thus, d|r also d|b so d|d1. Similarly we can get d1|d. As d1 = (b, r) so d1|b, d1|r. b = d1band r = d1r1 so

a = bq + r = d1b1q + d1r1 = d1(b1q + r1).

Therefore, d1|a also d1|b⇒ d1|d. d = ±d1, as d and d1 are both positive so d = d1.

Ex 2.4.1 Find the gcd of 120 and 275 and express the gcd in the form 120m+275n;m,n ∈ Z.

Solution: To find (120, 275), we have the following table:

3 120 275 2105 240

3 15 35 215 30

5

Hence (120, 275) = 5. Now,15 = 5.3 + 035 = 15.2 + 5 ⇒ 5 = 1.35− 15.2

120 = 35.3 + 15 ⇒ 15 = 1.120− 35.3275 = 120.2 + 35 ⇒ 35 = 275− 120.2

Common Multiple 97

Therefore, the gcd 5 = (120, 275) can be written as

5 = 35− 15.2 = 35− (120− 35.3).2= 7.35− 2.120 = 7(275− 120.2)− 2.120= 7.275 + (−16).120

which is of the form 120m+ 275n where m = −16 and n = 7.

Ex 2.4.2 Show that (a, a+ 2) = 1 or 2 for every integer a.

Solution: Let d = (a, a + 2), then d|a and d|a + 2. Therefore, by linearity property ofdivisibility, we have

d|ma+ n(a+ 2); ∀m,n ∈ Z.

Taking m = −1 and n = 1, it follows that d|2, i.e., d is either 1 or 2.

Ex 2.4.3 Let a, b be two integers not both zero. Prove that (ka, kb) = k(a, b) for any positiveinteger k.

Solution: Let (a, b) = d1, then there exist m,n ∈ Z such that

d1 = ma+ nb, i.e., kd1 = kma+ knb.

Let (ka, kb) = d2, so, d2 divides ka and kb, so that d2 divides the least positive value ofkma + knb, i.e., d2 divides kd1. On the other hand, d1 divides a and b, hence kd1 divideska and kb. But d2 = (ka, kb), consequently, kd1|d2. Hence,

(ka, kb) = d2 = kd1 = k(a, b).

This is actually distributive law.

2.5 Common Multiple

Let a1, a2, · · · , an be integers all different from zero. An integer b is said to be a commonmultiple b of a1, a2, · · · , an if ai|b for i = 1, 2, · · · , n. In fact common multiple do exist. Forexample, 2.3.5 is a common multiple of the integers 2, 3, 5, none of which is zero.

2.5.1 Lowest Common Multiple

Let a, b ∈ Z. Let us consider a set S as

S = x : x ∈ N and x is a common multiple of a and b

= x : x ∈ N such that a∣∣∣x and b|x.

Now, a∣∣∣a gives a| |a| or a

∣∣∣ |a||b| or a∣∣∣|ab| and similarly b

∣∣∣|ab| and |ab| ∈ N , so |ab| ∈ S.Therefore, S is nonempty subset of positive integers. Hence by well ordering principle, Shas a least element say m. This m is called the lowest common multiple (lcm) of a andb, written as [a, b]. The least of the positive common multiples is called the least commonmultiple of a1, a2, · · · , an and is denoted by [a1, a2, · · · , an]. If m = [a1, a2, · · · , an], then thecommon multiples of the integers is the set

0,±m,±2m,±3m, · · ·.

The lcm of any set of nonzero integers is unique. For example, the lcm of 2,3,6 is 6; that of-2,-3,-6 is 6; the lcm of -2,-6,10 is 30.


Property 2.5.1 The relation between gcd and lcm is [a, b](a, b) = |ab|.

Proof: It is sufficient if we prove the result for positive integers only. First we consider(a, b) = 1. suppose [a, b] = m, then m = ka for some k. Then b|ka and (a, b) = 1. Therefore,b|k and therefore, b ≤ k, ba ≤ ka. But ba, being a positive common multiple of b and a, cannot be less than the least common multiple, and so b = m.

ba = ka = [a, b].

Now, let us consider the general case, (a, b) = d > 1. Then,(a

d,b

d

)= 1 ⇒

[a

d,b

d

]=a

d.b

dby the above

⇒ [a, b](a, b) = ab.

Hence the theorem. Then if

(a, b) = d, [a, b] =ab

d.

From this, we have, if (a, b) = 1, then [a, b] = |ab|.

Property 2.5.2 If m = [a1, a2, · · · , an], then there exists l such that ai|l; i = 1, 2, · · · , n ifand only if m|l.

Proof: Since, m = [a1, a2, · · · , an], so ai|m; i = 1, 2, · · · , n. First let m|l, then,

ai|m; i = 1, 2, · · · , n and m|l⇒ ai|l; i = 1, 2, · · · , n.

Conversely, let ai|l; i = 1, 2, · · · , n. Suppose m 6 |i, then by division algorithm, l = mq + r,where 0 < r < m, i.e., r = l −mq. Now,

ai|l, ai|m⇒ ai|r; i = 1, 2, · · · , n.

But m is least such that ai|m; i = 1, 2, · · · , n and hence m|l. Therefore, if [a1, a2] =m2, [m2, a3] = m3, [m3, a4] = m4, . . . , [mn−1, an] = mn. Then [a1, a2, . . . , an] = mn.

Property 2.5.3 For k > 0, [ka, kb] = k[a, b].

Proof: Let m = [ka, kb], then by definition ka|m and kb|m, so we have m = kx.If [a, b] = x1, we note that a|x1, b|x1, ak|kx1, bk|kx1 and so kx|kx1. Thus, x|x1. Also,ak|kx, bk|kx, a|x, b|x and so x1|x. Hence x = x1 and therefore,

m = kx = kx1 = k[a, b] ⇒ [ka, kb] = k[a, b].

Hence the result.

Ex 2.5.1 Show that, (a+ b, [a, b]) = (a, b) for any two integers a and b.

Solution: Let d = (a, b) and l = [a, b], then, (a+ b, [a, b]) = (a+ b, l). Now,

d = (a, b) ⇒ d|a, d|b⇒ d|(a+ b).Also, a|l, b|l, so, d|a, a|l⇒ d|l.

If d is prime, then (a + b, l) = d = (a, b). Now, let d is not prime, i.e., d is a compositenumber. Then, say, d = d1.d2; where, (d1, d2) = 1; d1, d2 < d.

Thus ∃ a positive number d1 ( or d2) such that,d1|a+ b, d1|l and d1|d.

So, (a+ b, l) = d, i.e.,(a+ b, [a, b]) = (a, b).

Diophantine Equations 99

2.6 Diophantine Equations

In this section, we are to consider the Diophantine equations, named after the Greek math-ematician Diophantos of Alexandria. We apply the term Diophantine equations in one ormore unknowns with integer coefficients which is to be solved in integers only. Such a equa-tion is called an indeterminate equation (i.e., the number of equations is less than that ofthe unknowns). One of the basic interest in the theory of numbers is to obtain all solutionsin Z of a given algebraic polynomial equation

a0xn + a1x

n−1 + · · ·+ an−1x+ an = 0; ai ∈ Z.

Such a problem is called Diophantine problem and we say we are solving DiophantineEquations. As an example, we have to consider one of the oldest Diophantine Equations:x2 + y2 = z2, where x, y, z are pairwise relatively prime integers and obtained its completesolution of the form

x = a2 − b2, y = 2ab, z = a2 + b2 with (a, b) = 1.

In this type of equations we usually look for the solutions in a restricted class of numberssuch as positive integers, negative integers.

2.6.1 Linear Diophantine Equations

Let us consider a linear diophantine equations in two unknown variables x and y as

ax+ by = c; where, a, b, c ∈ Z (2.8)

with a, b are integers (not both zero). A integer solution of (2.8) is a pair of integers (x0, y0),that, when substituted into the equation, satisfies it, i.e., we ask that ax0 + by0 = c. Infinding the solution of a Diophantine equation ax+by = c, (a, b) = 1, we follows the followingmethods

(i) Substitution method,

(ii) Simple continued function,

(iii) Euclidean algorithm method.

If (a, b) = 1 and if x0, y0 is a particular solution of the linear Diophantine equation ax+by =c, then all solutions are x = x0 + bk; y = y0 − ak, for integral values of k.

(i) A given linear Diophantine equation can have a number of integral solutions, as is thecase with 2x+ 4y = 12, where,

2.4 + 4.1 = 12; 2.2 + 4.2 = 12

or may not have even a single solution.

(ii) Conversely, there are some linear Diophantine equations like 2x+ 6y = 13, which hasno solution, due to the fact that, the LHS is an even integer, whatever, the choice ofwhereas the RHS is not.

So, our first task is to find out the condition for solvability of the linear Diophantine equa-tions. The following theorem tells us when a Diophantine equation has a solution.

Theorem 2.6.1 The necessary and sufficient condition that, ax+ by = c has integral solu-tion if (a, b) divides c, where a, b, c are integers such that a, b are not both zero.


Proof: Let d = (a, b) = the greatest common divisor of a and b. If d 6 |c, then there exist nointegers x and y with ax+ by = c. Suppose d|c, in this case first determine integers x0 andy0 so that ax0 + by0 = d. Since (a, b) = d, so d can be expressed in the form d = ma + nb,where, m,n ∈ Z. This can be put in the general form as

d = a(m− kb) + b(n+ ka),

where k ∈ Z. We have d|c i.e c = ld; l ∈ Z. Now,

ld = al(m− kb) + bl(n+ ka)or, c = al(m− kb)+ bc(n+ ka).

Let x0 = l(m − kb) ∈ Z and y0 = l(n + ka) ∈ Z, so that (x0, y0) is an integral solution ofax + by = c. Conversely, let (x0, y0) be an integral solution of the equation ax + by = c.Then ax0 + by0 = c, where, x0, y0 are integers. Let (a, b) = d, then

d|a and d|b⇒ d|(ax0 + by0); i.e., d|c.

Now, if (x0, y0) be any particular solution of ax + by = c, we are to all integral solutions.Let (x1, y1) be an integral solution of ax+ by = c, where, set,

x1 = (c/d)x0, and y1 = (c/d)y0

so that ax1 + by1 = c. Suppose r and s are integers satisfying ar + bs = c, we get

ar + bs = ax1 + by1 = c

⇒ a

d(r − x1) = − b

d(s− y1). (2.9)

Now d = (a, b), so, (ad ,

bd ) = 1, then from (2.9) we conclude that a/d|(s−y1) and b/d|(r−x1)

and hence ∃ an integer t, such that

r = x1 +b

dt and s = y1 −

a

dt; t ∈ Z.

So the linear diophantine equations ax + by = c (a, b, c ∈ Z) has a solution iff d = (a, b)divides c. Moreover for integral solution (x∗, y∗), ∃ an integer t such that,

x∗ = x0 +b

dt and y∗ = y0 −

a

dt.

In fact (x0 + bd t, y0 −

ad t) is an integral solution of the given equation, for any integer t, as

a(x0 +b

dt) + b(y0 −

a

dt) = (ax0 + by0) +

ab

dt− ab

dt

= ax0 + by0 = c.

Hence, if (x0, y0) is an integral solution of the given equation, then all the integral solutionsare given by

x = x0 +b

dt; y = y0 −

a

dt,

where t is any integer. Therefore, there are an infinite number of solutions of the givenequation, one for each value of t.

Diophantine Equations 101

Ex 2.6.1 Find all solutions of the Diophantine equation 108x+ 45y = 81.

Solution: By the Euclidean’s algorithm, which is given by (45, 108) = 9. Because 9|81, aintegral solution to this equation exists. To obtain the integer 9 as a linear combination of108, 45, we work as follows:

9 = 45− 2× 18 = 45− 2(108− 2× 45)= 45× 5− 2× 108.

Upon multiplying this relation by 9, we arrive at

81 = 9.9 = 9.[5.45 + (−2).108] = 108.(−18) + 45.45

so that x = −18 and y = 45 provide one integral solution to the given linear Diophantineequation. Also, the equation can also be written in the form

108x+ 45y = 81 = 108.(−18) + 45.45

or,1089

(x+ 18) =459

(45− y).

Since 1089 and 45

9 are prime to each other, we have,

x+ 1845/9

=45− y

108/9= t say.

Thus other integral solutions can be expressed as

x = −18 +459t, y = 45− 108

9t, where, t ∈ Z

or, x = −18 + 5t, y = 45− 12t; where, t = 0,±1,±2, · · · .

Deduction 2.6.1 All integral solutions of ax + by = c, such that a, b, c ∈ N and(a, b) = 1: Since (a, b) = 1, ∃m,n ∈ Z such that am+ bn = 1. Thus,

ax+ by = c(am+ bn) ⇒ a(x− cm) = −b(y − cn)

⇒ x− cm

−b=y − cn

a= t(say) ∈ Z

⇒ x = cm− bt, y = cn+ at; t ∈ Z

where as (a, b) = 1 so b|x − cm and a|y − cn. This is the general solution in integers. Forpositive integral solution, we must have

cm− bt > 0 and cn+ at > 0 ⇒ −cna< t <

cm

b.

If we take cmb = p+f1 and cn

a = q+f2 where p =[

cmb

], q =

[cna

]are integers and 0 < f1 ≤ 1,

0 ≤ f2 < 1, then t ≤ p and t > q. In this case the total number of solutions in positiveintegers is p− q.

Ex 2.6.2 Find all positive integral solution of 5x+ 3y = 52.

Solution: Here, 5 and 3 are prime to each other, i.e., d = (5, 3) = 1. Thus there existsm,n ∈ Z such that 5m+ 3n = 1. Here, m = 2, n = −3. Thus,

5x+ 3y = 52[5.2 + 3.(−3)]or, 5(x− 104) = −3(y + 156).


Since 5 and 3 are prime to each other, x− 104 is divisible by 3 and y+ 156 is divisible by 5and therefore,

x− 104−3

=y + 156

5= t; t ∈ Z

or, x = 104− 3t; y = 5t− 156, where, t = 0,±1,±2, · · · .

This is the general solution of integers. For a positive integral solution, we must have

104− 3t > 0 and 5t− 156 > 0 ⇒ 1565

< t <1043.

The solutions in positive integers corresponds to t = 32, 33 and the solution is x = 8, y = 4and x = 5, y = 9.

Ex 2.6.3 Find all positive integral solution of 5x+ 12y = 80.

Solution: Here, 5 and 12 are prime to each other, i.e., d = (5, 12) = 1. Thus there existsm,n ∈ Z such that 5m+ 12n = 1. Here, m = 5, n = −2. Thus,

5x+ 12y = 80(5.5− 12.2)or, 5(x− 400) = −12(y + 160).

Since 5 and 12 are prime to each other, x− 400 is divisible by 12 and y+ 160 is divisible by5 and therefore, x− 400

−12=y + 160

5= t; t ∈ Z

or, x = 400− 12t; y = 5t− 160, where, t = 0,±1,±2, · · · .

This is the general solution of integers. For a positive integral solution, we must have,400− 12t > 0 and 5t− 160 > 0 ⇒ 32 < t < 100

3 .The only solution in positive integers corresponds to t = 33 and the solution is x = 4, y = 5.

Ex 2.6.4 Find all positive integral solution of 12x− 7y = 8.

Solution: Here, 12 and 7 are prime to each other, i.e., d = (12, 7) = 1. Thus there existsm,n ∈ Z such that 12m+ 7n = 1. Here, m = 3, n = −5. Therefore,

12x− 7y = 8[12.3 + 7.(−5)]or, 12(x− 24) = 7(y − 40).

Since 12 and 7 are prime to each other, x− 244 is divisible by 7 and y− 40 is divisible by 5and so, x− 24

7=y − 40

12= t; t ∈ Z

or, x = 24 + 7t; y = 12t+ 40, where, t = 0,±1,±2, · · · .

This is the general solution of integers. For a positive integral solution, we must have,24 + 7t > 0 and 12t+ 40 > 0 ⇒ t > − 24

7 ; t > − 103 .

The solution in positive integer corresponds to t = −3 and so x = 3, y = 4.

2.7 Prime Numbers

An integer p > 1 is called a prime number, or simply a prime, if there is no positive divisord of p satisfying 1 < d < p, i.e., its only positive divisors are 1 and p. If p > 1 is not primeit is called composite number.

(i) The integer 1 is regarded as neither prime nor composite.

(ii) 2 is the only even prime number. All other prime numbers are necessarily odd.

For example, the prime numbers less than 10 are 2, 3, 5, 7, while 4, 6, 8, 9 are composite.

Prime Numbers 103

2.7.1 Relatively Prime Numbers

Two integers a and b, not both of which are zero, are said to be relatively prime or co-primeif (a, b) = 1. In this case, it is guaranteed the existence of integers m and n such that

1 = ma+ nb.

For example, 4 and 9 are not prime numbers, but they are relatively prime as (4, 9) = 1. Aset of integers a1, a2, . . . , an, not all zero, are said to be relatively prime, if

(ai, aj) = 1; ∀i 6= j = 1, 2, . . . , n. (2.10)

Ex 2.7.1 Prove that, for n > 3, the integers n, n+ 2, n+ 4 cannot be all primes.

Solution: Any integer n is one of the forms 3k, 3k + 1, 3k + 2, where k ∈ Z. If

(i) n = 3k, then n is not a prime.

(ii) n = 3k + 1, then n+ 2 = 3(k + 3) and it is not prime.

(iii) n = 3k + 2, then n+ 4 = 3(k + 2) and it is not prime.

Thus in any case, the integers n, n+ 2, n+ 4 cannot be all primes.

Theorem 2.7.1 If m(6= 0) ∈ Z; then, (ma,mb) = m(a, b), where a, b ∈ Z are not bothzero.

Proof: Let (a, b) = k then a = kA, b = kB; k ∈ Z and (A,B) = 1. Therefore,

ma = mkA; mb = mkB and (A,B) = 1⇒ (ma,mb) = mk = m(a, b).

Theorem 2.7.2 If d = (a, b) > 0 then ad and b

d are integers prime to each other.

Proof: We observe that, although ad and b

d have the appearance of fractions, in fact,they are integers as d is a divisor of both a and b. Since d = (a, b) by existence theorem,∃m,n ∈ Z such that d = ma+ nb. Therefore,

1 =(ad

)m+

(b

d

)n.

Since d|a, d|b, by definition of gcd, ∃ integers u, v where ad = u and b

d = v, such that,1 = um+ vn. Since, u, v are integers, the conclusion is a

d and bd are relatively prime.

Theorem 2.7.3 If (a, b) = 1, then for any integer c, (ac, b) = (c, b).

Proof: Let (ac, b) = d and (c, b) = d1. Since (a, b) = 1,∃m,n ∈ Z such that

am+ bn = 1 ⇒ acm+ bcn = c.

Now, (ac, b) = d⇒ d|ac,d|b⇒ d|ac, d|bc; as b|bc

⇒ d|(acx+ bcy) ⇒ d|c.

Then as (c, b) = d, so d|c, d|b⇒ d|d1. Also,

(c, b) = d1 ⇒ d1|c, d1|b⇒ d1|ac, d1|b; as c|ac⇒ d1|d; as (ac, b) = d.

Thus it follows that d = d1. For example (2, 5) = 1, c = 10, (20, 5) = 5. Also, (10, 5) = 5,therefore (2.10, 5) = (10, 5). Therefore, if (ai, b) > 1, ∀i = 1, 2, . . . , n then (a1a2 · · · an, b) =1.


Theorem 2.7.4 If p(> 1) is prime and p|ab, then, p|a, or p|b; where a, b are any twointegers.

Proof: Let p be a prime and a, b are integers such that p|ab. If a = 0 or b = 0, the result istrue. If p|a then the result is also true. Let us assume that p 6 |a. Because the only positivedivisors of p are 1 and p itself, we have, (p, a) = 1, so, ∃ m,n ∈ Z such that 1 = ma + np.Multiplying both sides by b, we get,

b = mab+ npb = (mp)c+ npb; let ab = pc; c ∈ Z= p(mc+ nb) = p× an integer

according as p|b. Similarly if p 6 |b then p|a. Conversely, let us suppose that, the integerp(> 1) satisfies the given condition. Let q be a positive divisor od p such that q < p. Weare to show that p = qr. Since p|p, we have p|qr. Hence either p divides q or p divides r.Since 0 < q < p, p 6 |q, so p|r. Thus, ∃ some k ∈ Z such that r = pk, so that

p = qr = qpk ⇒ qk = 1 ⇒ q = 1

so we conclude that 1 and p are the only positive divisors of p. Hence p is a prime. Thus apositive integer p has the property that, if for any a, b ∈ Z,

p|ab⇒ p|a or p|b,

then p is prime. For example, 12|8.3, but neither 8 and 3 is divisible by 12. Hence 12 isnot prime. This theorem distinguish prime numbers from composite numbers, which is thefundamental problem in number theory. Now,

(i) If p = ab, then at least one of a and b must be less p.

(ii) If a(6= 1) ∈ Z, a must have a prime factor.

Ex 2.7.2 Show that the fraction9n+ 86n+ 5

is irreducible for all n ∈ N .

Solution: It is sufficient if we are to show that (9n+8, 6n+5) = 1. Let a = 9n+8, b = 6n+5,then 2a−3b = 1. Therefore a and b are relatively prime. Hence the fraction 9n+8

6n+5 is irreduciblefor all n ∈ N .

Theorem 2.7.5 If p is prime, and p|a1a2a3 · · · then p|ai for some i with 1 ≤ i ≤ n.

Proof: We shall prove this by use of the principle of mathematical principle on n, thenumber of factors. When n = 1,, i.e., if p|a1, then the result is true. Let n = 2 and ifp 6 |a1,then by the previous theorem, we get,

p|(a2a3 · · · an) = p|a2(a3a4 . . . an).

If p 6 |a2, then p|a3a4 · · · an. Let, as the induction hypothesis that n > 2 and that wheneverp divides a product of less than n factors, it divides at least one of the factors. Now,p|a1a2 . . . an, then either p|an or p|a1a2 . . . an−1, the inductive hypothesis ensures that p|ai

for some choice of i, with 1 ≤ i ≤ n − 1. In any event, p divides one of the integersa1, a2, . . . , an.

Therefore, if p, q1, q2, . . . , qn are all primes and p|q1q2 · · · qn, then p = qk, for some k,where 1 ≤ k ≤ n.

Ex 2.7.3 If a, b are both primes with a ≥ b ≥ 5, show that 24|a2 − b2.

Prime Numbers 105

Solution: Since a and b are primes > 3, both of them are of the form 3k + 1 or 3k + 2,where k ∈ Z. If both a and b are either of the forms then 3|a − b. If one of them is of theform 3k + 1 and the other is of the form 3k + 2 then 3|a+ b. Thus, in any case 3|a2 − b2.

Given that a, b are odd primes, so they are of the form 4k + 1 or 4k + 3, where k ∈ Z. Ifboth a and b are either of the forms 4k + 1 then 2|a+ b and 4|a− b. If both of them are ofthe form 4k + 3, then 4|a+ b and 2|a− b. Thus, in any case 8|a2 − b2.

Since (3, 8) = 1, we have 24|a2 − b2.

Theorem 2.7.6 If (a, b) = 1 and b|ac then b|c.

Proof: Since b|ac so ∃r ∈ Z such that ac = br. Also, (a, b) = 1,∃m,n ∈ Z such that1 = ma+ nb. Multiplication of this equation by c produces,

c = c.1 = c(ma+ nb)= mac+ nbc = mbr + nbc

= b(mr + nc) = b× some integer.

Because b|bc and b|ac, it follows that b|(mac+nbc), shows that b|c. This is known as Euclid’slemma. If ap = bq and (a, b) = 1, then a|q and b|p.

If a and b are not relatively prime, then the result may or may not be true. For example,12|9.8, but 12 6 |9 and 12 6 |8.

Theorem 2.7.7 If a|c, b|c and (a, b) = 1 then ab|c.

Proof: In as much a|c, b|c, ∃k1, k2 ∈ Z such that c = ak1 = bk2. Again, the relation(a, b) = 1 allows us to write ∃m,n ∈ Z, such that ma + nb = 1. Multiplying the equationby c, it appears that,

c = c.1 = c(ma+ nb) = mac+ nbc

= mabk2 + nbak1; as c = ak1 = bk2

= ab(mk2 + nk1) = ab× some integer.

Hence as the divisibility statement ab|c.

Theorem 2.7.8 a|b if and only if ac|bc, where c 6= 0.

Proof: If ac|bc then bc = (ac)q; q ∈ Z. Therefore,

c(b− aq) = 0 ⇒ b− aq = 0 as c 6= 0⇒ b = aq; i.e., a|b.

The converse part is obvious. Without the condition (a, b) = 1, a|c and b|c together maynot imply ab|c. For example, 4|12 and 6|12 do not imply 4.6|12.

Theorem 2.7.9 If (a, b) = 1, then for any integer q, (a+ bq, b) = 1.

Proof: Let (a+ bq, b) = k where k ≥ 1. If k = 1, then the result holds. Let k > 1, then

(a+ bq, b) = k(> 1) ⇒ k|a+ bq, k|b⇒ k|(a+ bq).1 + b(−q) ⇒ k|a.

Therefore, k|a, k|b ⇒ (a, b) 6= 1, which is a contradiction. Hence the theorem is true fork = 1 only. Therefore (a+ bq, b) = 1.

Ex 2.7.4 If (a, b) = 1, prove that (a+ b, a2 − ab+ b2) = 1 or 3.


Solution: Let d = (a+ b, a2 − ab+ b2), then,

(a+ b, a2 − ab+ b2) = d⇒ (−2ab+ b2, a+ b) = d

⇒ (−3ab, a+ b) = d⇒ (3a2, a+ b) = d⇒ d|3a2.

Also, (3b2, a+ b) = d⇒ d|3b2. Therefore,

d|3a2 ⇒ d = 3 or, d|a2 and d|3b2 ⇒ d = 3 or, d|b2.⇒ d = 3 or d|a2 and d|b2,⇒ d = 1 = (a2, b2), as (a, b) = 1.

Thus, we have, d = 1 or d = 3.

Theorem 2.7.10 If (a, b) = 1, (a, c) = 1 then (a, bc) = 1.

Proof: Since (a, b) = 1 = (a, c),∃m1,m2, n1, n2 ∈ Z such that,

1 = m1a+ n1b = m2a+ n2c

⇒ (n1b)(n2c) = (1−m1a)(1−m2a) = 1− a[k]; [k] = integer⇒ ak + n1n2bc = 1.

So if r = (a, b, c) then r|1 so r = 1 ⇒ (a, bc) = 1. Thus if a is prime to b and a is prime toc, then c is prime to bc. From this theorem, we have the following results

(i) If (a, x) = 1 ⇒ (a, x2) = 1 and in general (a, xn) = 1.

(ii) If a|xn then (a, x) = 1.

(iii) If (a, c) = (b, c) = 1 then (ab, c) = 1.

Theorem 2.7.11 If (a, b) = 1 and c|a then (c, b) = 1.

Proof: Since (a, b) = 1,∃m,n ∈ Z such that 1 = ma + nb. Also, c|a,∃k ∈ Z such thata = ck. Now, 1 = ma+ nb = mkc+ nb = (mk)c+ nb

⇒ (b, c) = 1; mk, n ∈ Z.

Theorem 2.7.12 If (a, b) = 1 then, (a+ b, ab) = 1.

Proof: Since a is prime to b, ∃m,n ∈ Z such that am+bn = 1. This expression am+bn = 1can be written in the form

a(m− n) + (a+ b)n = 1.

Since m − n and n are integers, it follows that a is prime to a + b. Again, the expressionam+ bn = 1 can be written in the form

(a+ b)m+ b(n−m) = 1.

Since m and n−m are integers, it follows that a+ b is prime to b. Hence a+ b is primeto ab.

Ex 2.7.5 If (a, b) = 1, prove that (a2, b) = 1 and (a2, b2) = 1.

Prime Numbers 107

Solution: Since (a, b) = 1, ∃m,n ∈ Z such that am+ bn = 1. Thus,a2m2 = (1− bn)2 = 1− 2bn+ b2n2

⇒ a2m2 + b(2n− bn2) = 1.

Since m2 and (2n − bn2) ∈ Z, it follows that (a2, b) = 1. As (a2, b) = 1, ∃m1, n1 ∈ Z suchthat

b2n21 = (1− a2m1)2 = 1− 2a2m1 + a4m2

1

⇒ a2(2m1 − a2m21) + b2n2

1 = 1.

Since 2m1 − a2m21 and n2

1 ∈ Z, it follows that (a2, b2) = 1. In general, if d = (a, b), then,d2 = (a2, b2).

Theorem 2.7.13 If c|ab and (a, c) = 1 then c|b.

Proof: Since (a, c) = 1,∃m,n ∈ Z such that 1 = ma+ nc. Also, as c|ab,∃k ∈ Z such thatab = kc. Now,

1 = ma+ nc⇒ b = mab+ nbc

= mkc+ nbc = c(mk + nb)= c× an integer ⇒ c|b.

Theorem 2.7.14 If a|c, b|c, (a, b) = d then ab|cd.

Ex 2.7.6 Prove that√m is irrational for any positive prime m.

Solution: If possible, let√m is a rational number. Then,

√m =

p

q; where, p, q ∈ Z, q > 0, (p, q) = 1

or, m =p2

q2⇒ p2 = q2m = q(qm) ⇒ q|p2.

If q > 1, then by fundamental theorem of arithmetic, ∃ a prime m such that m|q. Thus,

m|q and q|p2 ⇒ m|p2 ⇒ m|a.

Then (p, q) ≥ m > 1, a contradiction arises unless q = 1. When, q = 1, we have, p2 = mwhich is not possible, as the square of any integer cannot be prime. Hence,

√m is irrational

for any prime m.

Ex 2.7.7 Prove that Fermat’s numbers are relatively prime to each other.

Solution: The number of the form Fn = 22n

+ 1 is known as a Fermat’s number. It hasbeen also found that the Fermat’s number Fn is prime n ≥ 0.

Let r be the common divisor of the two Fermat’s numbers Fn and Fn+k. Since Fn andFn+k being odd integers cannot have any even integer as a common divisor, so r is odd.Now,

Fn = 22n

+ 1; Fn+k = 22n+k

+ 1 =(22n)2k

+ 1.

Thus, Fn+k − 2 =(22n)2k

− 1 has a factor 22n

+ 1 = Fn.

or, Fn|Fn+k − 2, also, r|Fn ⇒ r|Fn+k − 2.So, r|Fn+k and r|Fn+k − 2 ⇒ r|Fn+k − Fn+k + 2 ⇒ r|2,

which is not possible, since r is odd. Hence, (Fn, Fn+k) = 1, i.e., Fermat’s numbers arerelatively prime to each other.


Theorem 2.7.15 Every positive value integer greater than 1 has a least divisor (other than1) which is prime.

Proof: Let n(> 1) be a positive integer. Let S be the set of positive value divisor of nother than 1. So S is non-empty as n ∈ S (since,n|n). Thus S is a non-empty set of naturalnumber. Hence by well ordering principle it has an least element. Let k be the least elementof S. Then k is the least divisor of n other than 1. We assert that k is prime for if k be notprime then

k = k1k2 where, 1 < k1 < k2,

and k1|n, which is contradiction shows that k is a least divisor. Hence k is prime. Therefore,a composite number has at least one prime divisor.

Ex 2.7.8 If 2n − 1 be prime, prove that n is a prime.

Solution: Let n be composite, then n = pq, where, p and q are integers greater than 1.Now,

2n − 1 = 2pq − 1 = (2p − 1)(2p(q−1) + 2p(q−2) + · · ·+ 2p + 1).

Each of the factor on the right hand side is evidently greater than 1 and therefore, 2n− 1 iscomposite. Therefore, 2n − 1 is a prime, i.e., n is prime.

Ex 2.7.9 Let p be prime and a be a positive integer. Prove that an is divisible by p if andonly if a is divisible by p.

Solution: Let a is divisible by p, then a = pk for some k ∈ Z. Thus,

an = pnkn = p(pn−1kn) = p.m

where, m = (pn−1kn) ∈ Z. This shows that an is divisible by p. Let a is not divisible by p,i.e., (a, p) = 1. Therefore, ∃u, v ∈ Z such that au+ bv = 1. Then,

anun = (1− pv)n = 1− ps; s ∈ Zor, anr + ps = 1; r, s ∈ Z.or, (an, p) = 1.

Therefore an is not divisible by p. Hence a is not divisible by p, i.e., an is not divisible byp. Thus p|an ⇒ p|a. Therefore, an is divisible by p if and only if a is divisible by p.

2.7.2 Fundamental Theorem of Arithmetic

Every integer greater than 1 is either a prime number or can be expressed as a productof finite positive primes upto the order of factors and the expression is unique except therearrangement of factors.Proof: Existence: Let n(> 1) be a given positive integer. Since 2 is prime, so if n is 2 ora prime number there is nothing to prove. If n is a composite number then it has a primefactor n1(> 1) and so, ∃ an integer r1 such that

n = n1r1, where, 1 < r1 < n.

Among all such integers n1, choose r1 to be smallest (it is possible by use of well-orderingprinciple). If r1 is prime, then n is product of two primes and the result is obtained. If r1is not prime, then it has a least prime factor n2(> 1) and

n = n1.n2.r2; 1 < r2 < r1 < n.

Prime Numbers 109

This process of factorizing any composite factor is continued n = n1.n2 . . . nk−1.rk−1 where1 < rk−1 < rk−2 < · · · < r2 < r1 < n given a strictly descending chain of positive integersand the chain 1 < rk−1 < rk−2 < · · · < r1 must terminate after a finite steps. If terminates atfinite number of steps rk−1, then, rk−1 is prime say nk. This leads to the prime factorizationn = n1.n2.n3 . . . nk, where ni’s are all prime.Uniqueness : To prove the uniqueness of representation, let us assume that the integer ncan be represented as a product of primes in two ways, say,

n = n1.n2.n3 . . . nk = p1.p2.p3 . . . pl; k ≤ l

where n’s and p’s are all primes. Let l ≥ k then p1|n1.n2.n3 . . . nk and p1 is prime, so p1|ni

for some i and 1 ≤ i ≤ k. Since ni and pi are both prime we conclude pi = ni for some i;1 ≤ i ≤ k. Without loss of any generality we can say

p1 = n1.n2.n3 . . . nk = p2.p3 · · · pl.

Similar argument shows that n2 = p2, · · · , nk = pk, leaving 1 = pk+1 . . . pl which is absurd(as each pi’s are prime and > 1). Hence l = k and pi = ni; ∀i = 1, 2, 3, · · · , k, making thetwo factorizations of n identical. Thus n > 1 can be expressed as a product of finite positiveprimes, the representation being unique apart from the order of the factors. This is knownas fundamental theorem of arithmetic or unique factorization theorem.

Result 2.7.1 Standard form : In the application of this theorem, we may write, anypositive integer n(> 1) can be expressed uniquely in a canonical factorization as

n = p1α1 .p2

α2 . . . prαr , αi ≥ 0, for i = 1, 2, · · · , r,

where pn is the nth prime with p1 < p2 < · · · < pr and the integers If no αi in thecanonical form of n is greater than 1, then integer n is said to be square free. For example,n = 70 = 2.5.7 is square free number, whereas 140 = 22.5.7 is not square free.

Result 2.7.2 Two integers a and b greater than one, then by fundamental theorem ofarithmetic

a = pα11 pα2

2 · · · pαrr and b = pβ1

1 pβ22 · · · pβr

r

then(a, b) = p

minα1,β11 p

minα2,β22 · · · pminαr,βr

r

[a, b] = pmaxα1,β11 p

maxα2,β22 · · · pmaxαr,βr

r .

For example, let a = 491891400 = 23.33.52.72.111.132 and b = 1138845708 = 22.32.72.112.133.171,then

(a, b) = 22.32.50.71.111.132.170 = 468468 and [a, b] = 23.33.52.72.112.133.171 = 1195787993400.

Theorem 2.7.16 (Euclid’s Theorem): The number of prime numbers are infinite; alterna-tively, there is no greater prime.

Proof: If possible, let the number of primes be finite. Then there is a greater prime say pn

and arrange all the primes are ascending order in magnitude p1 < p2 < · · · < pn. Suppose,there is only a finite number, say p1, p2, · · · , pn. Let,

q = (p1.p2 . . . pn) + 1.


Here we see that, q > 1, so q is divisible by some prime p. But p1, p2, · · · , pn are the onlyprime numbers, so that p must be equal to one of p1, p2, · · · , pn. Now,

p|p1.p2 . . . pn and p|q⇒ p|q.p1p2 · · · pn ⇒ p|1.

The only positive divisor of the integer 1 is 1 itself and because p > 1, a contradiction arises.If q is prime, we get a contradiction as q > pn. If q is composite it has a prime factor, butnone of the primes p1, p2, . . . , pn divides q (since 1 is the remainder in each case). So theprime of q must be greater than pn, when is again a contradiction. This shows that, thereis no greatest prime i.e. the number of primes are infinite. Now,

(i) Every positive integer greater than one has a prime divisor(factor).

(ii) If n (integer, greater than one) is not a prime, then n has a prime factor not exceeding√n.

(iii) No rational algebraic formula can represent prime numbers only.

(iv) Consider the following consecutive integers(k + 1)! + 2, (k + 1)! + 3, . . . , (k + 1)! + k + 1.

Each of these numbers in a composite number asn|(k + 1)! + n; if 2 ≤ n ≤ k + 1.

Thus there are arbitrarily large gaps in the series of primes.

Ex 2.7.10 If pn is the nth prime number, then pn ≤ 22n−1.

Solution: Clearly, the equality sign holds for n = 1. As an hypothesis of the induction, weassume that the result holds for all integers up to k > 1. Euclid’s theorem shows that theexpression p1p2 . . . pk + 1 is divisible by at least one prime. If there are several such primedivisors, then pk+1 does not exceed the smallest of these so that

pk+1 ≤ p1p2 . . . pk + 1 ≤ 2.22 . . . 22k−1+ 1

≤ 21+2+22+···2k−1+ 1 = 22k−1

+ 1.

However, 1 ≤ 22k−1for all k, whence,

pk+1 ≤ 22k−1 + 22k−1 = 22k

.

Thus the result is true for n = k + 1, if it is true for n = k. Thus for n ≥ 1, there are atleast (n+ 1) primes less than 22k

.

Ex 2.7.11 Determine which of the following integers are primes 287 and 271.

Solution: First we find all primes p such that p2 ≤ 287. These primes are 2,3,5,7,11,13,17.Now, 7|287, hence 287 is not a prime. The primes satisfying p2 ≤ 271 are 2,3,5,7,11,13,17.None of these divide 271, hence 271 is a prime.

Theorem 2.7.17 If be a positive prime and n be a positive integer then prove that an isdivisible by p iff a be divisible by p where a is any positive integer.

Proof: Since a is a positive integer, by fundamental theorem of arithmetic we geta = p1.p2 . . . pk,

where p1.p2 . . . pk are primes and p1 < p2 < . . . < pk. Now, a divisible by p iff one ofp1, p2, . . . , pk is divisible by p. Also, as an = p1

n.p2n . . . pk

n so an is divisible by p iff a isdivisible by p.

Modular/Congruence System 111

2.8 Modular/Congruence System

C.F.Gauss introduces the remarkable concept of congruence and the notion that makes itsuch a powerful techniques for simplification of many problems concerning divisibility ofintegers.

Let m > 0 be a fixed integer. Then an integer a is said to be congruent to another integerb modulo m, if m|(a− b) i.e if m is a divisor of (a− b). Symbolically, this is expressed as

a ≡ b( mod m). (2.11)

The number m is called the modulus of the congruence, b is called the residue of a modulom. In particular, a ≡ 0(modm) if and only if m|a. Hence

a ≡ b(modm) if and only if a− b ≡ 0(modm).

For example,(i) 15 ≡ 7(mod8), 2 ≡ −1(mod3), 52 ≡ −1(mod2).

(ii) n is even if and only if n ≡ 0(mod2).

(iii) n is odd if and only if n ≡ 1(mod2).

(iv) a ≡ b(mod1) for all a, b ∈ Z, this case (m = 1) is not so useful and interesting.Therefore, m is taken to be positive integer greater than 1.

(v) Let a, b be integers and m a positive integer, then a ≡ b(modm), if and only if a =km+ b, for some integer k.

When m 6 |(a−b), we say that a is incongruent to b modulo m and in this case a 6≡ b(modm).For example, 2 6≡ 6(mod5),−3 6≡ 3(mod5).

Ex 2.8.1 Use the theory of congruences to prove that 7|25n+3 + 52n+3; ∀n(≥ 1) ∈ N .

Solution: 25n+3 + 52n+3 can be written as 8.32n + 125.25n. Now,

32n − 25n ≡ 0(mod7), for all n ≥ 1,⇒ 8.32n − 8.25n ≡ 0(mod7), for all n ≥ 1.

Also, we have 133.(25)n ≡ 0(mod7), for all n ≥ 1 and so 8.32n + 125.25n ≡ 0(mod7) for alln ≥ 1. Consequently, 7|25n+3 + 52n+3; ∀n(≥ 1) ∈ N.

2.8.1 Elementary Properties

The congruence is a statement about divisibility slightly different point of view more thanthe convenient notation. Congruence symbol ‘≡’ may be viewed as a generalized form ofequality sign, in the sense that its behavior with respect to addition and multiplication isreminiscent of ordinary equality. Some of the elementary properties of equality that carryover to congruences appear below.

Property 2.8.1 If a ≡ b (mod m), then a ≡ b (mod n), when n|m, m, n > 0.

Proof: From definition, n|m⇒ m = nk for some k ∈ Z. Given a ≡ b(mod m), so

m∣∣∣(a− b) ⇒ a− b = ml for some l ∈ Z

⇒ a− b = nkl = nr; r = kl ∈ Z

⇒ n∣∣∣(a− b) ⇒ a ≡ b(mod n)


Property 2.8.2 a ≡ a (mod m), for any m > 0 and a ≡ 0 (mod m), if m|a.

Property 2.8.3 The relation “congruence modulo m,” defined by a ≡ b (mod m) if m|(a−b), is an equivalence relation in the set of integers.

Proof: If m(> 0) be a fixed positive integer, then we define a relation ρ for any twoelements a, b ∈ Z such that

aρb⇔ a ≡ b(mod m).

We are to show that this relation is an equivalence relation.Reflexivity: Let a be any integer, then, we have, a− a = 0 and m|0 for any m(> 0) ∈ Z.Thus it follows that,

m|(a− a); a ∈ Z ⇒ a ≡ a(modm), for all a ∈ Z.⇒ aρa; ∀a ∈ Z.

Thus the relation ρ is reflexive.Symmetry: Let a, b ∈ Z be such that aρb (mod m). Then,

aρb⇒ a ≡ b(mod m) ⇒ m|(a− b) ⇒ m|(−1)(a− b)⇒ m|(b− a) ⇒ b ≡ a (mod m)

Hence, aρb⇒ bρa; ∀a, b ∈ Z.

Therefore, the relation ρ is symmetric.Transitivity: Let a, b, c ∈ Z such that aρb and bρc. Now,

aρb, bρc⇒ a ≡ b(mod m), b ≡ c(mod m)⇒ m|(a− b) and m|(b− c)⇒ m|[(a− b) + (b− c)]⇒ m|(a− c) ⇒ a ≡ c (mod m) ⇒ aρc.

So the relation ρ is transitive. The relation being reflexive, symmetric and transitive is anequivalence relation. Thus, congruence is an equivalence relation in Z.

Result 2.8.1 Hence the equivalence relation will partition I into equivalence classes orresidue classes modulo m. The number of these classes is m. They are denoted as,

[a] = the class in which all integers ≡ a(modm).Hence, [0] = [m] = [2m] = · · · and[a] = [a+m] = [a+ 2m] = · · · .

So the residue classes modulo 5 are

[0] = · · · ,−10,−5, 0, 5, 10, · · ·; [1] = · · · , ,−9,−4, 1, 6, · · ·[2] = · · · ,−8,−3, 2, 7, · · ·; [3] = · · · ,−7,−2, 3, 8, 13, · · ·[4] = · · · ,−6,−1, 4, 9, 14, · · ·

Property 2.8.4 Two congruences with same modulus can be added, subtracted, or mul-tiplied, member by member, as they were equations. Therefore, if a ≡ b (mod m), c ≡ d(mod m) then, a+ c ≡ b+ d (mod m), a− c ≡ b− d (mod m) and ac ≡ bd (mod m).

Proof: Since a ≡ b(modm); c ≡ d(modm), we have assumed that, a = mq + b andc = ms+ d, for some choice of q, s ∈ Z. Hence, adding these equations, we obtain,

a+ c = m(q + s) + (b+ d)⇒ (a+ c)− (b+ d) = m(q + s).


Since q, s ∈ Z so q + s ∈ Z and as a congruent statement a + c ≡ b + d (mod m). Theconverse is not always true. For example, let a = 10, c = 5, b = 1, d = 2 and m = 4. Then(10 + 5) ≡ (1 + 2)(mod4), but 10 6≡ 1(mod4) and 5 6≡ 2(mod4). Again,

ac = (mq + b)(ms+ d) = m(bs+ qd+ qsm) + bd.

Since b, s, q,m, d ∈ Z, bs + qd + qsm ∈ Z says that ac − bd is divisible by m, whenceac ≡ bd (mod m). The converse is not always true. For example, let 50 ≡ 2(mod4), but10 6≡ 1(mod4) and 5 6≡ 2(mod4). In general, if a1 ≡ b1(mod m), a2 ≡ b2(mod m), . . . ,an ≡ bn(mod m), then

a1.a2 · · · an ≡ b1.b2 · · · bn(modm).

Property 2.8.5 If a ≡ b (mod m), then ∀x ∈ Z, a + x ≡ b + x (mod m), a − x ≡ b − x(mod m) and ax ≡ bx (mod m).

Proof: As a ≡ b (mod m), ∃λ ∈ Z such that,

a− b = λm⇒ (a+ x)− (b+ x) = λm

or, (a+ x) ≡ (b+ x)(modm).

Also as a− b = λm, where λ ∈ Z, m|(a− b) and so

m|(ax− bx), where x ∈ Z ⇒ ax ≡ bx(modm).

The converse of the result a ≡ b(modm) ⇒ ax ≡ bx(modm) is not always true. For example,

2.4 ≡ 2.1(mod6), whence 4 6≡ 1(mod6).

Thus we conclude that, one cannot unrestrictedly cancel a common factor in the arithmeticof congruences. The same holds true for any finite number of congruences with the samemodulus. For example,

3.(−2) ≡ 2(mod8) and 3.14 ≡ 2(mod8) ⇒ −2 ≡ 14(mod8).

Cancellation is allowed however, in some restricted sense, which is provided in the followingtheorem.

Property 2.8.6 If a ≡ b(mod m) and d|m,m > 0, then a ≡ b(mod d).

Proof: Given that a ≡ b(modm) and d|m,m > 0. This implies that there are two integersx and y such that

(a− b) = xm and m = yd.

Now (a− b) = xyd. So

(a− b) = zd where xy = z.

Hence, a ≡ b(mod d). The converse of the result is not always true. For example, leta = 5, b = 2 and m = 3, then 5 ≡ 2(mod3). Again 3|6, but 5 6≡ 2(mod6).

Property 2.8.7 A common factor which is relatively prime to the modulus can always becancelled. Thus, if, m be a positive integer and a, x, y be integers, then

ax ≡ ay(mod m) iff x ≡ y(mod

m

d

); where, d = (a,m).


Proof: Let d = (a,m) 6= 0, as m > 0, then by definition, d|a, d|m, so that, ∃ k, l ∈ Z suchthat a = kd, m = ld where k and l are prime to each other. Since ax ≡ ay (mod m), i.e.,m|(ax− ay), so, ∃ q ∈ Z such that ax = mq + ay. Now,

kdx = ldq + kdy ⇒ kx = lq + ky; d 6= 0⇒ k(x− y) = lq ⇒ l|k(x− y).

Since k and l are prime to each other, Euclid’s lemma yields l|(x− y), which may be recastas x ≡ y (mod l), i.e.,

x ≡ y (modm

d) ⇒ x ≡ y

(mod

m

(a,m)

).

Thus, a common factor a can be cancelled provided the modulus is divided by d = (a,m).Conversely, let x ≡ y (mod m

d ), then x− y = tmd , for some integer t. Hence,

ax− ay = tm

da = tm

a

d= tmk = (mk)t

⇒ ax ≡ ay(modm).

This theorem gets its maximum force when the requirement that (a,m) = 1 is added, forthen the cancellation may be accomplished without a change in modulus. From this theoremwe have,

(i) If ax ≡ ay (mod m) and (a,m) = 1, then x ≡ y (mod m).

(ii) If x ≡ y(mod mi); i = 1, 2, · · · , r if and only if x ≡ y(mod[m1,m2, · · · ,mr]).

(iii) If ax ≡ ay(modm) and a|m, then x ≡ y(modma ). For example, 5.7 ≡ 5.10(mod15), as

5|15, we get 7 ≡ 10(mod3).

(iv) When ax ≡ 0(modm), with m a prime, then either a ≡ 0(modm) or x ≡ 0(modm).

(v) If ax ≡ ay (mod m) and m 6 |a, where m is a prime number, then x ≡ y(modm).

(vi) It is unnecessary to stipulate that a 6≡ 0(modm). Indeed, if a ≡ 0(modm), then(a,m) = m and in this case x ≡ y(mod1), for all integers x and y.

Property 2.8.8 Let a, b, c, d are integers and m a positive integer. a ≡ b (mod m) andc ≡ d (mod m) then ax+ cy ≡ (bx+ dy) (mod m), for all integers x and y.

Proof: Since, a ≡ b (mod m), c ≡ d (mod m); so m|(a− b) and m|(c− d), i.e., ∃ λ, µ ∈ Zsuch that a− b = mλ and c− d = mµ. For integers x, y we have,

(ax+ cy)− (bx+ dy) = x(a− b) + y(c− d)= mλx+mµy

= m(λx+ µy)

Since λ, µ, x, y ∈ Z so λx+ µy ∈ Z, we get ax+ cy ≡ bx+ dy (mod m).

Property 2.8.9 For arbitrary integers a and b, a ≡ b (mod m) iff a and b leave the samenonnegative principal remainder on division by m.

Proof: Let a = λ1m+r and b = λ2m+r where r is the common principal remainder whena, b are divided by m when λ1, λ2 ∈ Z and 0 ≤ r < m. Therefore,

a− b = (λ1 − λ2)m = λm, where, λ = λ1 − λ2 ∈ Z.⇒ m|(a− b), i.e., a ≡ b(modm).


Conversely, let, a ≡ b (mod m) and a, b leave the remainders r1 and r2 respectively whendivided by m. Hence,

a = λ1m+ r1 ; 0 ≤ r1 < m and λ1 ∈ Zb = λ2m+ r2 ; 0 ≤ r2 < m and λ2 ∈ Z

⇒ a − b = m(λ1 − λ2) + r1 − r2

⇒ r1 − r2 = (a− b) +m(λ2 − λ1).

As a ≡ b (mod m) so that m|(a− b). Also, as m|(a− b) and also m|(λ2 − λ1)m, therefore,

m∣∣∣[(a− b) + m(λ2 − λ1)] ⇒ m|(r1 − r2)

⇒ r1 − r2 = 0; since 0 ≤ |r1 − r2| < m

⇒ r1 = r2.

Thus, the congruent numbers have the same gcd with m. This theorem provides a usefulcharacterization of congruence modulo m in terms of remainders upon division by m. Forexample, let m = 7. Since 23 ≡ 2(mod7) and −12 ≡ 2(mod7), i.e., 23 and -12 leave the sameremainder upon division by 7, so 23 ≡ −12(mod7).

Property 2.8.10 If a ≡ b (mod m) then an ≡ bn (mod m) where n is a positive integer.

Proof: For n = 1, the theorem is certainly true. We assume that the theorem is true forsome positive integer k, so that ak ≡ bk (mod m). Also a ≡ b (mod m). These two relationstogether imply that,

a.ak ≡ b.bk(modm) ⇒ ak+1 ≡ bk+1(modm),

so that the theorem is seen to be true for the positive integer k + 1, if it is true for n = k.Hence the theorem is true for any positive integer n. The converse of the theorem is nottrue, for an example 52 ≡ 42 (mod 3) but 5 does not congruence to 4 (mod 3). The powerapplications are given below.

Ex 2.8.2 Prove that 1920 ≡ 1(mod181).

Solution: We have, 192 ≡ −1(mod181). Therefore,

1920 ≡ (−1)10(mod77) ≡ 1(mod181).

Ex 2.8.3 What is the remainder, when 730 is divided by 4?

Solution: Let r be the remainder, when 730 is divided by 4. Hence by definition, 730− r isdivisible by 4, where 0 ≤ r < 4 and so 730 ≡ r(mod4). Now,

7 ≡ 3(mod4) ⇒ 72 ≡ 32(mod4).

But, 32 ≡ 1(mod4), which implies that (72)15 ≡ 115(mod4), i.e., 730 ≡ 1(mod4). Hence theremainder is 1.

Ex 2.8.4 Let f(x) = a0 + a1x+ · · ·+ an−1xn−1 + anx

n is a polynomial whose coefficientsai are integral. If a ≡ b (mod m), then f(a) ≡ f(b) (mod m).

Solution: We have a ≡ b (mod m) so ak ≡ bk (mod m), where k ∈ Z. Hence,

aiak ≡ aib

k(modm), where, ai ∈ Z.


Putting i = 0, 1, 2, . . . , n respectively and adding the congruences, we get,

(a0 + a1a+ a2a2 + · · ·+ ana

n) ≡ (a0 + a1b+ a2b2 + · · ·+ anb

n)(modm)f(a) ≡ f(b)(modm).

If f(x, y, z) be a polynomial in x, y, z with integral coefficient and x = x′(mod m), y =y′(mod m), z = z′(mod m) then,

f(x, y, z) = f(x′, y′, z′)(mod m).

Deduction 2.8.1 Let n = ak10k +ak−110k−1+· · ·+a2102+a110+a0, where ai are integersand 0 ≤ ai ≤ 9; i = 0, 1, · · · , k be the decimal representation of a positive integer n. LetS = a0 + a1 + · · ·+ ak and T = a0 − a1 + · · ·+ (−1)kak. Then

(i) n is divisible by 2 if and only if a0 is divisible by 2;

(ii) n is divisible by 9 if and only if S is divisible by 9;

(iii) n is divisible by 11 if and only if T is divisible by 11.

Ex 2.8.5 Show that an integer N is divisible by 3, if and only if the sum of the digits of Nis divisible by 3.

Solution: Let the number N can be written as

N = am10m + am−110m−1 + · · ·+ a110 + a0.

Let f(x) = a0 + a1x+ · · ·+ am−1xm−1 + amx

m,

so, f(1) = am + am−1 + · · ·+ a1 + a0.

therefore, f(10) = N and f(1) = sum of the digits of N = S (say). Now,

10 ≡ 1(mod3) ⇒ f(10) ≡ f(1)(mod3)⇒ N ≡ S(mod3) ⇒ 3|(N − S).

Thus 3|N, if and only if 3|S. Thus integer N is divisible by 3, if and only if the sum of thedigits of N is divisible by 3.

Ex 2.8.6 N is divisible by 5 if the last digit is either 0 or 5.

Solution: Taking the last digit as a0, the number N can be written as

N = am10m + am−110m−1 + · · ·+ a110 + a0.

Let f(x) = a0 + a1x+ · · ·+ am−1xm−1 + amx

m,

then f(10) = N and f(0) = a0. We have,

10 ≡ 0(mod5) ⇒ f(10) ≡ f(0)(mod5)⇒ N = a0(mod5) ⇒ 5|N − a0.

Now 5|N, then, 5|N − a0 if and only if 5|a0. Therefore,

f |a0 ⇒ either a0 = 0 or, a0 = 5.

Hence, 5|N if the last digit of N is either 0 or 5.

Ex 2.8.7 Show that the integer 23456785 is divisible by 11.


Solution: Let N = 23456785, then N can be written as

N = 23456785 = 23× (1000)2 + 456× 1000 + 785.

Let f(x) = 23x2 + 456x+ 785, then f(1000) = N and f(−1) = 352. Now,

1000 ≡ −1(mod11) ⇒ f(1000) ≡ f(−1)(mod11)⇒ N = 352(mod11) ⇒ 11|N − 352.

Now 11|N, if 11|352 and this is true. Therefore, 11|N.

Ex 2.8.8 Show that the integer 205769 is not divisible by 3.

Solution: Let N = 205769, then N can be written as

N = 205769 = 20× (100)2 + 57× 100 + 69.

Let f(x) = 20x2 + 57x+ 69, then f(100) = N and f(1) = 146. Now,

100 ≡ 1(mod3) ⇒ f(100) ≡ f(1)(mod3)⇒ N = 146(mod3) ⇒ 3|N − 146.

Now 3|N, if and only if 3|146 and this is not true. Therefore, the integer 205769 is notdivisible by 3.

Property 2.8.11 If a ≡ b (mod m) and d|a, d|b and the integers a, b,m are such that(d,m) = 1, then a

d ≡bd (mod m), d > 0.

Proof: Since d|a, d|b, ∃ a1, b1 ∈ Z such that a = da1, b = db1. Now,

a ≡ b (mod m) ⇒ m|(a− b)⇒ m|d(a1 − b1); d(> 0) ∈ Z⇒ m|(a1 − b1); as, (m, d) = 1

⇒ a1 ≡ b1 (modm) ⇒ a

d≡ b

d(modm).

If a ≡ b(modm) and a ≡ b(modn), where, (m,n) = 1, then a ≡ b(modmn). For exam-ple, 8.7 ≡ 2.7(mod6), where (7, 6) = 1, then 8 ≡ 2(mod6). This is known as restrictedcancellation law.

Property 2.8.12 If a ≡ b (mod m) and if 0 ≤ |a− b| < m then a = b.

Proof: Since m|(a− b), we have, m ≤ |a− b|, unless a− b = 0.

2.8.2 Complete Set of Residues

Consider a fixed modulus m > 0. Given an integer a, let q and r be its quotient andremainder upon division by m, so that,

a = mq + r; 0 ≤ r < m.

Then, a− r = mq ⇒ a ≡ r(modm).

r is called the least residue of a modulo m. For example, 5 ≡ 1(mod4), so 1 is the residue of5 modulo 4. Because there are n choices for r, we see that every integer is congruent modulom to exactly one of the values 0, 1, 2, · · · ,m − 1. In particular, a ≡ 0(modm) if and only


if m|a. The set of integers 0, 1, 2, · · · ,m − 1 is called the set of least non negative residuesmodulo m.

For example, let a = 8 and m = 3, then

8 = 3.2 + 2 ⇒ 8− 2 = 3.2 ⇒ 8 ≡ 2(mod3).

Then 2 is the least residue of 8 modulo 3. We consider S = 0, 1, 2 and let a = 7 be anyinteger. Then 7 congruent to modulo 3, to exactly one of the number of S, and that is 1. Ifa = 32, then 32 is congruent to only 2 ∈ S modulo 3. Thus if a be any integer, then it mustbe congruent to mod 3, to exactly one of the members of S. S = 0, 1, 2 is called the setof least non-negative residues of an integer, modulo 3.

The following are the important properties of residue class

(i) If a and b be respectively the residue classes a, b modulo m, then a = b, if and only ifa ≡ b(modm).

(ii) Two integers a and b are in the same residue class, if and only if a ≡ b(modm).

(iii) The m residue classes 1, 2, · · · ,mare disjoint and their union is the set of all integers.

(iv) Any two integers in a residue class are congruent modulo m and any two integersbelonging to two different residue classes are incongruent modulo m.

A set S =r1, r2, . . . , rm

of m integers is called a complete residues system modulo m if

ri 6≡ rj(mod m) for 1 ≤ i < j ≤ m

For example,(i) Let m = 11, S = 0, 1, 2, . . . , 10, then ri 6≡ rj(mod m); ∀i, j = 0, 1, . . . , 10, i 6= j.

So S forms a complete system of residues.

(ii) Let m = 5, then S = −12,−15, 82,−1, 31 forms a complete system of residuesmodulo 5.

Complete residue system of a modulo system is not unique. For example,S1 = 0, 1, · · · , 8and S2 = 5, 6, · · · , 13 are two different complete residue system modulo 9.

If r1, r2, . . . , rm be a complete set of residues modulom and (a,m) = 1 then ar1, ar2, . . . , armis a complete set of residues modulo m, as, ri ≡ rj (mod m); 1 ≤ i < j ≤ m, we have

ari ≡ arj(modm) ⇒ m|a(ri − rj).Now, (a,m) = 1 ⇒ m|(ri − rj) ⇒ ri ≡ rj(modm)

⇒ ari ≡ arj(modm).

A reduced residue system modulo m is a set of integers ri such that (ri,m) = 1, ri ≡ rj(mod m) if i 6= j and such that every x prime to n is congruent modulo m to some integerri of the set.

Property 2.8.13 If (a1, a2, · · · , an) be a complete system of residues modulom and b1, b2, · · · , bnany set of integers such that

bi ≡ ai(modm); i = 1, 2, · · · , n

then (b1, b2, · · · , bn) is also a complete system( mod m).


Proof: Let ri be the least residues of ai, modulo m, for i = 1, 2, · · · , n, then ai ≡ ri(modm).Again given bi ≡ ai(modm), therefore, ri are also the same residues of bi(modm), i.e.,bi ≡ ai(modm).

The relation ai ≡ ri(modm) shows that ri 6≡ rj(mod m), then ri − rj is divisible by m,i.e., ri − rj = mk; for some k ∈ Z.

Now, the relation ai ≡ ri(modm) shows that,

ai − ri = mt1 and aj − rj = mt2; for some t1, t2 ∈ Z,⇒ (ai − aj)− (ri − rj) = m(t1 − t2)⇒ (ai − aj) = m(t1 − t2) +mk = mk1; k1 = t1 − t2 + k ∈ Z⇒ ai ≡ aj(modm),

which is a contradiction. Hence, ri ≡ rj(mod m), for every i and j with i 6= j. Therefore,the relation bi ≡ ai(modm) shows that

bi 6≡ bj(mod m),∀i and j with i 6= j

and hence (b1, b2, · · · , bn) forms a complete residue system( modm). Therefore, if a1, a2, · · · , an

be a complete system of residues modulo m, then,

(i) k + a1, k + a2, · · · , k + an

(ii) ka1, ka2, · · · , kan, (k,m) = 1

also form a complete system of residues modulo m.

Property 2.8.14 A set of m integers which are in arithmetic progression with commondifference d, (d,m) = 1, forms a complete residue system modulo m.

Proof: Let us consider the A.P. of m terms with common difference d as

a, a+ d, a+ 2d, · · · , a+ (m− 1)d; (d,m) = 1.Now, a+ id ≡ a+ jd(modm); i, j = 0, 2, · · · ,m− 1; i 6= j

⇒ ikd ≡ jd(modm) ⇒ i ≡ j(modm); (d,m) = 1,

which contradicts that i 6= j(modm). Therefore,

a+ id 6≡ a+ jd(modm).

Ex 2.8.9 Find the remainder when 273 + 143 is divisible by 11.

Solution: Here,

2 ≡ 2(mod11), 24 ≡ 5(mod11), 28 ≡ 3(mod11)⇒ 210 ≡ 3× 22(mod11) ≡ 1(mod11)⇒ 270 ≡ 1(mod11) ⇒ 273 ≡ 8(mod11).

Again, 143 = (11 + 3)3 ≡ 33(mod11) ≡ 5(mod11). Therefore,

273 + 143 ≡ 8 + 5(mod11) ≡ 2(mod11).

Hence the remainder is 2.

Ex 2.8.10 Find the least positive residues of 244(mod89).


Solution: We know, 26 ≡ −25(mod89), i.e.,

26 ≡ −52(mod89) ⇒ (26)2 ≡ (−52)2(mod89)or, 212 ≡ 625− 7× 89(mod89) ⇒ 212 ≡ 2(mod89)

⇒ 211 ≡ 1(mod89); as (2, 89) = 1or, (211)4 ≡ 1(mod89) ⇒ 244 ≡ 1(mod89).

Least positive residue is 1, therefore, 89|244 − 1.

Ex 2.8.11 What if the remainder when 15 + 25 + · · ·+ 1005 is divisible by 4.

Solution: We have the following results

1 ≡ 1(mod4) ⇒ 15 ≡ 1(mod4)3 ≡ −1(mod4) ⇒ 35 ≡ −1(mod4)5 ≡ 1(mod4) ⇒ 55 ≡ 1(mod4)

...99 ≡ −1(mod4) ⇒ 995 ≡ −1(mod4)

Adding the above 50 congruence relations, we get,

15 + 35 + · · ·+ 995 ≡ [1− 1 + · · · − 1](mod4) ≡ 0(mod4).

Again, we have the following results

2 ≡ −2(mod4) ⇒ 25 ≡ −25(mod4) ≡ 0(mod4)4 ≡ 0(mod4) ⇒ 45 ≡ −1(mod4)6 ≡ −2(mod4) ⇒ 65 ≡ 0(mod4)

...100 ≡ 0(mod4) ⇒ 1005 ≡ 0(mod4)

Adding the above 50 congruence relations, we get,25 + 45 + · · ·+ 1005 ≡ 0(mod4).

Adding the results, we get,15 + 25 + 35 + 45 · · ·+ 1005 ≡ 0(mod4).

Thus the given expression is completely divisible by 4 and hence the remainder is zero.,when divisible by 4.

Ex 2.8.12 Show that 3.52n+1 + 23n+1 ≡ 0(mod17) for all n ≥ 1.

Solution: The expression 3.52n+1 + 23n+1 can be written as

3.52n+1 + 23n+1 = 15.52n + 2.23n.

We have the following results,25 ≡ 8(mod17) ⇒ 52 ≡ 8(mod17)

or, (52)n ≡ 8n(mod17) ⇒ 15.52n ≡ 8n.15(mod17).Also, 8 ≡ 8(mod17) ⇒ 23n ≡ 8n(mod17)

⇒ 2.32n ≡ 8n.2(mod17).


Adding the two results, we get,

15.52n + 2.23n ≡ 8n(15 + 2)(mod17) ≡ 0(mod17)or, 3.52n+1 + 23n+1 ≡ 0(mod17).

Ex 2.8.13 For any natural number n, show that (2n+ 1)2 ≡ 1 (mod 8)

Solution: Here, (2n+ 1)2 can be written as

(2n+ 1)2 = 4n2 + 4n+ 1 = 4n(n+ 1) + 1.

Now, as n ∈ Z, n and (n + 1) are two consecutive numbers so 2|n(n + 1) so that ∃ k ∈ Zs.t. n(n+ 1) = 2k. Therefore,

(2n+ 1)2 = 4n(n+ 1) + 1 = 8k + 1⇒ (2n+ 1)2 − 1 = 8k, where k ∈ Z⇒ (2n+ 1)2 ≡ 1(mod8).

Ex 2.8.14 Prove that 1! + 2! + · · ·+ 1000! ≡ 3 (mod 15).

Solution: For n = 0 and for any integer n ∈ Z we have (5 + n)! ≡ 0(mod 15). Now,1! + 2! + 3! + 4! = 33 so that 1! + 2! + 3! + 4! ≡ 3 (mod 15). Hence

1! + 2! + · · ·+ 1000! ≡ 3(mod15).

Hence, we the remainder is 3, when 1! + 2! + · · ·+ 1000! is divisible by 15.

Ex 2.8.15 Find all natural number n ≤ 100 such that n ≡ 0(mod 7).

Solution: Here 100 = 14.7 + 2. Hence the required natural numbers n ≤ 100 such thatn ≡ 0 (mod 7) are 7, 14, 21, . . . , 98.

2.8.3 Reduced Residue System

By a reduced residue system modulo m we mean any set of φ(m) integers, incongruentmodulo m, each of which is relatively prime to m, where φ(m) is Euler’s phi function.Reduced set of modulo m can be obtained by deleting from a complete set of residuesmodulo m those members that are not relatively prime to m. A reduced set of residuestherefore, consists of the numbers of a complete system, which are relatively prime to themodulus. For example, in the modulo 8 system, complete set of residue is 0, 1, 2, · · · , 7,its reduced system is 1, 3, 5, 7.

Ex 2.8.16 Find the reduced residue system of m = 40.

Solution: We note that 40 = 5.23, and the suitable reduced residue system of 5 and 23

respectively are 1, 9, 17, 33; 1, 11, 21, 31. Therefore, residue system of 40 is seen from thefollowing table: can be arranged in n lines, each containing m numbers. Thus,

−− 1 9 17 331 1 9 17 3311 11 11× 9 ≡ 19 11× 17 ≡ 27 11× 33 ≡ 321 21 21× 9 ≡ 29 21× 17 ≡ 37 21× 33 ≡ 1331 31 31× 9 ≡ 39 31× 17 ≡ 7 31× 33 ≡ 23

Therefore, the reduced residue system is 1, 3, 7, 9, 11, 13, 17, 19, 21, 23, 27, 29, 31, 33, 37, 39modulo 42.


Ex 2.8.17 Find the least positive residues 336(mod77).

Solution: We know, 34 ≡ 4(mod77), therefore,

312 ≡ 43(mod77) = −13(mod77)⇒ 324 ≡ 169(mod77) = 15(mod77)⇒ 336 ≡ 15.(−13)(mod77) = 36(mod77).

Hence the least positive residue is 36.

2.8.4 Linear Congruences

Let a, b be two integers and m be a positive integer. An equation in unknown x of the formax ≡ b(modm) is called a linear congruence and by a solution of such an equation we meanan integer x0 for which ax0 ≡ b(modm). By definition,

ax0 ≡ b(modm) ⇒ ax0 − b = mk; for some k ∈ Z⇒ m|(ax0 − b).

In finding the solution we observe that the set of non-negative integers 0, 1, 2, . . . ,m − 1forms a complete residue system modulo m. There exists at least one member of S whichsatisfies the linear congruence ax ≡ b(mod m). Thus, the system has either a single solutionor more than one solution which are incongruent to each other with mod m. Thus, theproblem of finding all integers that will satisfy the linear congruence ax ≡ b(modm) isidentical with that of obtaining all solutions of the linear Diophantine equation ax−my = k.For example,

(i) Consider 4x ≡ 3(mod 5) and S = 0, 1, 2, 3, 4. Hence we see that only 2 ∈ S satisfiesthe linear congruence, so 2 is the only solution of linear congruence. Also we observethat (a,m) = (4, 5) = 1.

(ii) Consider 6x ≡ 3(mod 9) and S = 0, 1, 2, . . . , 8. Note that 2, 5, 8 ∈ S satisfies thelinear congruence.

Thus, the linear congruence system has more than one solution. Also,

2 6≡ 5(mod 9), 5 6≡ 8(mod 9), 2 6≡ 8(mod 9).

Hence the solutions are incongruent to each other. Hens we observe that (a,m) = (6, 9) 6= 1,i.e. not prime to each other. Hence, x ≡ 2, 5, 8(mod 9).Note : Now, x = 2 and x = −4 both satisfy the linear congruence 2x ≡ 4(mod6) as2 ≡ −4(mod6), so they are not counted as different solution. Therefore, when we speak tothe number of solutions of a congruence ax ≡ b(modm), we shall mean number of incongruentintegers satisfying this congruence.

Theorem 2.8.1 If x1 be a solution of the linear congruence ax ≡ b(modm) and if x1 ≡x2(modm), then x2 is also a solution of the congruence.

Proof: Given that x1 be a solution of the linear congruence ax ≡ b(modm), therefore,ax1 ≡ b(modm). Now,

x2 ≡ x1(modm) ⇒ ax2 ≡ ax1(modm)⇒ ax2 ≡ b(modm).


Thus, x2 is a solution of the linear congruence ax ≡ b(modm). From this theorem, we have,if x1 be a solution of the linear congruence ax ≡ b(modm), then

x1 + λm; λ = 0,±1,±2, · · ·

is also a solution. All these solutions belong to one residue class modulo m and these arenot counted as different solutions.

Theorem 2.8.2 Let a, b and m be integers with m > 0 and (a,m) = 1. Then the congruenceax ≡ b(modm) has a unique solution.

Proof: Since (a,m) = 1, ∃u, v ∈ Z such that au+mv = 1 and so

a(bu) +m(bv) = b⇒ a(bu) ≡ b(modm).

This shows that x = bu is a solution of the linear congruence ax ≡ b(modm). Let x1, x2 besolutions of the linear congruence ax ≡ b(modm), then ax1 ≡ b(modm) and ax2 ≡ b(modm).This gives

ax1 ≡ ax2(modm) ⇒ x1 ≡ x2(modm); as (a,m) = 1.

This proves that the congruence has an unique solution. The solutions are written in theform x = bu + λm;λ = 0,±1,±2, · · · and they all belong to one and only one residue classmodulo m.

Result 2.8.2 In finding the solution when (a,m) = 1, let am be expressed as a simple

continued fraction with an even number of quotients and y0x0

be the least convergent butone. Then ax0 −my0 = 1 so that

ax0 ≡ 1(mod m) ⇒ abx0 ≡ b(mod m)⇒ bx0 is the required solution of ax ≡ b(mod m)⇒ x ≡ bx0(mod m)

Ex 2.8.18 Solve the linear congruence 5x ≡ 3(mod 11).

Solution: Since d = (a,m) = (5, 11) = 1, the given linear congruence has unique solution.Since (5, 11) = 1 ∃ integers u, v such that 5u+ 11v = 1. Here u = −2, v = 1, so

5.(−2) + 11.1 = 1,i.e., 5.(−2) ≡ 1(mod 11) ⇒ 5.(−6) ≡ 3(mod 11).

Therefore, the value of x is −6. All the solutions are x ≡ −6(mod11), i.e., x ≡ 5(mod11).All the solutions are congruent to 5(mod11) and therefore, the given congruence has uniquesolution.

Ex 2.8.19 Solve the linear congruence 47x ≡ 11(mod249).

Solution: Since (47, 249) = 1, so the given linear congruence 47x ≡ 11(mod249) has anunique solution. We express it as a simple continued fraction with even no. of quotients.

47249

= 0 +1

5+1

3+1

2+1

1+14

Last but one congruent is p5q5

= 1035

(= y0

x0

). Therefore,

x ≡ 11× 53(mod 249)or, x ≡ 583− 2× 249(mod 249)or, x ≡ 85(mod 249).


Theorem 2.8.3 Let a, b and m be integers with m > 0. The linear congruence ax ≡b(modm) has incongruent solutions if and only if d|b, where d = (a,m) 6= 1. Also, ifd|b, then there are exactly d mutually incongruent solutions modulo m.

Proof: The given linear congruence ax ≡ b(modm), i.e., ax− b = my, for some m ∈ Z isequivalent to the linear Diophantine equation ax−my = b. Since (a,m) = d, so,

d|a and d|m⇒ a = da1 and m = dm1

for some integers a1 and m1 with (a1,m1) = 1. Then,

ax ≡ b(mod m) ⇒ da1 ≡ b(mod dm1) (2.12)

We require these values of x for which da1x − b is divisible by dm1. No such value of x isobtained unless d|b. Thus if dm1|da1x− b and d|b, i.e. if b = db1 for some integer b1, then

dm1|da1x− db1 ⇒ m1|a1x− b1

⇒ a1x− b1 is divisible by m1

⇒ a1x ≡ b1(mod m1); where (a1,m1) = 1 (2.13)

which is equivalent form of (2.12). Therefore, the congruence (2.13) has one solution x0 < m′

and the d distinct solutions can be written in the form

x0, x0 +m′, x0 + 2m′, . . . , x0 + (d− 1)m′

i.e. x0, x0 +m

d, x0 + 2

m

d, · · · , x0 + (d− 1)

m

d; m′ =

m

d(2.14)

which is also d incongruence solution of (2.12). We assert that, these integers are incongruentmodulo m, and all other such integers x are congruent to some one of them. If it happensthat,

x0 +m

dk1 ≡ x0 +

m

dk2(modm); where,0 ≤ k1 < k2 ≤ d− 1,

⇒ m

dk1 ≡

m

dk2(modm).

Now, (md ,m) = m

d and so, the factor md can not cancelled to arrive at the congruence

k1 ≡ k2(mod d), 0 ≤ |k1 − k2| < d⇒ d|(k2 − k1),

which is impossible due to the fact that 0 ≤ |k2 − k1| < d. Therefore,

k1 − k2 = 0 ⇒ k1 = k2.

So the d distinct solutions are incongruent to each other modulo m.Now, we are to show that, any other solution x0 + m

d k is congruent modulo m to one ofthe d integers 0, 1, 2, · · · , d − 1, i.e., the congruent equation has no solutions except thoselisted in (2.14). Using division algorithm, we get, k = qd+ r; 0 ≤ r ≤ d− 1, and therefore,

x0 +m

dk = x0 +

m

d(qd+ r) = x0 +mq +

m

dr

≡ x0 +m

dr(mod m); 0 ≤ r < d

with x0 + md r being one of our d selected solutions. Thus, if x0 is any solution of ax ≡

b(modm), then the d incongruent distinct solutions are given by,

x = x0 +m

dk(mod m); k = 0, 1, 2, · · · , d− 1

i.e., x0, x0 +m

d, x0 + 2

m

d, · · · , x0 + (d− 1)

m

d.


Ex 2.8.20 Solve the linear congruence 20x ≡ 10(mod 35).

Solution: Since d = (a,m) = (20, 35) = 5 and 5|10, the given linear congruence has uniquesolution. The congruence 20x ≡ 10(mod 35) is equivalent to the linear congruence

4x ≡ 2(mod 7), where, (4, 7) = 1,

so, 4x ≡ 2(mod 7) has an unique solution. Since (4, 7) = 1 ∃ integers u, v such that4u+ 7v = 1. Here u = 2, v = −1, so

4.2 + 7.(−1) = 1,⇒ 4.2 ≡ 1(mod 7) ⇒ 4.4 ≡ 2(mod 7).

Therefore, the value of x is 4. The incongruent solutions are

x = 4 +355t = 4 + 7t; t = 0, 1, 2, 3, 4.

Ex 2.8.21 Solve : 25x ≡ 15(mod 120).

Solution: Here d = (25, 120) = 5 and 5(= d)|10(= b), the given linear congruence hasunique solution. The congruence 25x ≡ 15(mod 120) is equivalent to the linear congruence

5x ≡ 3(mod 24), where, (5, 24) = 1,or, 5x ≡ −45(mod24) ⇒ x ≡ −9(mod24)or, x ≡ 15(mod24); i.e., x ≡ 15 + 24t; t = 0, 1, 2, 3, 4,

having 5 solutions. Therefore, x = 15, 39, 63, 87, 111 are 5 incongruent solutions.

2.8.5 Simultaneous Linear Congruences

Here we consider to the problem of solving a system of simultaneous linear congruences

a1x ≡ b1 (mod m1), a2x ≡ b2 (mod m2), · · · , akx ≡ bk (mod mk). (2.15)

We assume that, the moduli mr are relatively prime in pairs. Evidently, the system of twoor more linear congruences will not have a solution unless dr|br;∀r, where dr = (ar,mr).When these conditions are satisfied, the factor dr can be cancelled in the kth congruence toproduce a new system having the same set of solutions as the original one

a∗1x ≡ b∗1 (mod n1), a∗2x ≡ b∗2 (mod n2), · · · , a∗kx ≡ b∗k (mod nk),

where nr = mr

drand (ni, nj) = 1, for i 6= j. Also, (a∗r , nr) = 1. The solutions of the individual

congruences assume the form

x ≡ c1 (mod n1), x ≡ c2 (mod n2), · · · , x ≡ ck (mod nk). (2.16)

Thus, the problem is reduced to one of finding a simultaneous solution of a system of con-gruences of this type. The kind of problem that can be solved by simultaneous congruencesis given by the following theorem.

Deduction 2.8.2 Let x ≡ a(modp) and x ≡ b(modp) be two simultaneous congruences andlet (p, q) = d. Since x ≡ a(modp), so x = a+ py, where y is given by

a+ py ≡ b(modq) ⇒ py ≡ b− a(modq).

If b − a is not divisible by d, then the solution does not exist. But if d|b − a, there is onlyone solution y1 of y < q

d , which satisfies the last congruence and the general value of y isgiven by


y = y1 +q

dt; t ∈ Z

so that x = x1 +pq

dt; where, x1 = a+ py1.

Hence, x = a+ py1 +pq

dt; t ∈ Z.

Thus a solution x1 of the given congruences exists if and only if b−a is divisible by d = (p, q)and the congruence are equivalent to a single congruence x ≡ x1(modl), where l = [p, q].

Ex 2.8.22 Find the general values of x for x ≡ 1(mod 6) and x ≡ 4(mod 9).

Solution: Here, a = 4, b = 1, so that a − b = 3 and (9, 6) = 3 = d so that d|a − b. So thesolution exists. Now,

x ≡ 1(mod 6) ⇒ x = 1 + 6y,

where, y is given by 1 + 6y ≡ 4(mod9) ⇒ 6y ≡ 3(mod9,

which has a solution y = 2(= y1) < qd = 3 and the general values of y are given by

y = y1 +q

dt = 2 + 3t; t ∈ Z

or, x = 1 + 6(2 + 3t) = 13 + 18t; as x = 1 + 6y

which gives the general values of the given congruences equivalent to a single congruencex ≡ 13(mod18, where [p, q] = [9, 6] = 18.

Theorem 2.8.4 Let m1,m2, · · · ,mr denote r positive integers with (mi,mj) = 1, 1 ≤ i <j ≤ r. Let a1, a2, · · · , ar be arbitrary integers. Then the system of linear congruences

x ≡ a1 (mod m1), x ≡ a2 (mod m2), · · · , x ≡ ar (mod mr) (2.17)

has unique simultaneous solution x0 modulo the product m, m = m1.m2 . . .mr, i.e., x ≡x0(mod m).

Proof: Here we take, m = m1.m2 . . .mr. For each k = 1, 2, · · · , r let us define,

Mk =m

mk= m1.m2 . . .mk−1.mk+1 · · ·mr,

i.e., Mk is the product of all integers mr with the factor mk omitted. By hypothesis, mk’sare respectively prime in pairs, i.e., (mi,mj) = 1 for i 6= j, so that (Mk,mk) = 1. Hence bythe theory of linear congruence, it is therefore possible to find the linear congruence

Mkx ≡ 1(modmk).

Let the unique solution be x0. We are to show that the integer

x0 = a1M1x1 + a2M2x2 + · · ·+ arMrxr

is the simultaneous common solution of the given system of congruences. As, mk|Mi; i 6= k,in this case, Mi ≡ 0(mod mk). Hence

x0 = a1M1x1 + a2M2x2 + · · ·+ arMrxr ≡ akMkxk(modmk).


But the integer xk was chosen to satisfy the congruence Mkx ≡ 1(modmk), which gives,

Mkxk ≡ 1(mod mk) ⇒ akMkxk ≡ ak(mod mk)⇒ x0 ≡ ak.1 = ak(mod mk).

This shows that a solution x0 to the system (2.17) of congruences exists. Let, x∗0 be anyother integer that satisfies these congruences, then,

x0 ≡ ak ≡ x∗0(mod mk); k = 1, 2, · · · , r⇒ mk|(x0 − x∗0); for each value of k.

As (mi,mj) = 1, we have,m1m2 . . .mr|(x0 − x∗0) ⇒ x∗0 ≡ x0(modm).

This shows the uniqueness of the solution. This is the Chinese remainder theorem.

Ex 2.8.23 Solve the simultaneous linear congruence x ≡ 36(mod 41), x ≡ 5(mod 17).

Solution: From the given simultaneous linear congruences, we see that m1 = 41,m2 = 17,so that (m1,m2) = (41, 17) = 1. Let m = m1.m2 = 41.17 = 697 and let

M1 =m

m1=

69741

= 17,M2 =m

m2=

69717

= 41,

then (M1, 41) = 1 and (M2, 1) = 1. Since, (M1, 41) = 1, the linear congruence 17x ≡1(mod 41) has an unique solution. Since,

17.(−12) + 41.5 = 1, i.e., 17.(−12) ≡ 1(mod41),

so the solution is x1 ≡ (−12)(mod 41) ≡ 29(mod 41). Since, (M2, 17) = 1, the linearcongruence 41x ≡ 1(mod 17) has an unique solution. Since,

41.5 + 17.(−12) = 1, i.e., 41.5 ≡ 1(mod 17),

so the solution is x2 ≡ 5(mod 41). Therefore, the common integer solution of the givensystem of linear congruences is given by

x0 ≡ a1M1x1 + a2M2x2 ≡ 36.(17.29) + 5.(41.5)(mod 697)≡ 18773(mod 697) ≡ 36(mod 697).

Ex 2.8.24 Solve the following system of linear congruences x ≡ 2(mod 7), x ≡ 5(mod 19)and x ≡ 4(mod 5).

Solution: Let m = 7.19.5. Now, we consider the following simultaneous linear congruences

m

7x ≡ 1(mod 7),

m

19x ≡ 1(mod 19) and

m

5x ≡ 1(mod 5)

i.e., 95x ≡ 1(mod 7), 35x ≡ 1(mod 19) and 133x ≡ 1(mod 5)i.e., (91 + 4)x ≡ 1(mod 7), (38− 3)x ≡ 1(mod 19) and (130 + 3)x ≡ 1(mod 5).

Now, we consider the system of congruences

4x ≡ 1(mod 7),−3x ≡ 1(mod 19) and 3x ≡ 1(mod 5).


Now, x = 2 is a solution of the first linear congruence, x = 6 is a solution of the second−3x ≡ 1(mod 19), and x = 2 is a solution of the third 3x ≡ 1(mod 5). Therefore, thelinear congruences

95x ≡ 1(mod 7), 35x ≡ 1(mod 19) and 133x ≡ 1(mod 5)

are satisfied by x = 2, 6, 2 respectively. Hence a solution of the given system is given by,

x0 = 2.2.95 + 5.6.35 + 4.2.133 = 2494

and the unique solution is given by,

x ≡ 2494(mod 7.19.5) ⇒ x ≡ 499(mod 665).

Ex 2.8.25 If x ≡ a(mod16), x ≡ b(mod5) and x ≡ c(mod11), then show thatx ≡ 385a+ 176b− 560c(mod880).

Solution: Here, m1 = 16,m2 = 5,m3 = 11, and they are relatively prime. Therefore,m = m1m2m3 = 880. Chinese remainder theorem is applicable. Now,

m

m1y1 ≡ 1(mod16) ⇒ 55y1 ≡ 33(mod16)

⇒ 5y1 ≡ 3(mod16) ⇒ y1 ≡ 7(mod16)m

m2y2 ≡ 1(mod5) ⇒ 176y2 ≡ 1(mod5)

⇒ y2 ≡ 1(mod5)m

m3y3 ≡ 1(mod11) ⇒ 80y3 ≡ 1(mod11)

⇒ y3 ≡ 4(mod11) ⇒ y3 ≡ −7(mod11)

The integer solution of the solution is given by

x0 ≡m

m1y1a+

m

m2y2b+

m

m3y3c(mod880)

≡ 55a× 7 + 176b× 1 + 80c× (−7)(mod880)≡ 385a+ 176b− 560c(mod880).

Ex 2.8.26 Find four consecutive integers divisible by 3, 4, 5, 7 respectively.

Solution: Let n, n+ 1, n+ 2 and n+ 3 be four consecutive integers divisible by 3, 4, 5 and7 respectively, then

n ≡ 0(mod3), n+ 1 ≡ 0(mod4), n+ 2 ≡ 0(mod5) and n+ 3 ≡ 0(mod7). · · · (i)

We are to solve simultaneous linear congruence (i) by using the Chinese remainder theorem.For this, let m = 3.4.5.7 as they are prime to each other. Now, let

M1 =m

3= 140,M2 =

m

4= 105,M3 =

m

5= 84 and M4 =

m

7= 60,

where (M1, 3) = 1, (M2, 4) = 1, (M3, 5) = 1 and (M4, 7) = 1.

(i) Since, (140, 3) = 1, the linear congruence 140x ≡ 1(mod 3) has an unique solution(mod 3) and the solution is x = x1 = 2.

(ii) Since, (105, 4) = 1, the linear congruence 105x ≡ 1(mod 4) has an unique solution(mod 4) and the solution is x = x2 = 1.


(iii) Since, (84, 5) = 1, the linear congruence 84x ≡ 1(mod 5) has an unique solution (mod5) and the solution is x = x3 = 4.

(iv) Since, (60, 3) = 1, the linear congruence 60x ≡ 1(mod 7) has an unique solution (mod7) and the solution is x = x4 = 2.

Thus, the common integer solution of the given system of congruences is given by

x0 ≡ a1M1x1 + a2M2x2 + a3M3x3 + a4M4x4(mod 420)≡ 1803(mod 420).

Therefore, the consecutive integers are n, n+ 1, n+ 2 and n+ 3, where,

n = 123 + 420t; t = 0,±1,±2, · · · .

Ex 2.8.27 Find the integer between 1 and 1000 which leaves the remainder 1, 2, 6 whendivided by 9, 11, 13 respectively.

Solution: The required integer between 1 and 1000 is a solution of the system of linearcongruences

x ≡ 1(mod 9), x ≡ 2(mod 11) and x ≡ 6(mod 13).

Now we are to solve these system of linear congruences x ≡ 1(mod 9), x ≡ 2(mod 11) andx ≡ 6(mod 13) by using the Chinese remainder theorem. For this, let M = 9.11.13. Now,we consider the congruences

13.11x ≡ 1(mod 9), 13.9x ≡ 1(mod 11) and 9.11x ≡ 1(mod 13)i.e., 143x ≡ 1(mod 9), 117x ≡ 1(mod 11) and 99x ≡ 1(mod 13)i.e., (144− 1)x ≡ 1(mod 9), (110 + 7)x ≡ 1(mod 11) and (91 + 8)x ≡ 1(mod 13).

Now, we consider the system of congruences

−x ≡ 1(mod 9), 7x ≡ 1(mod 11) and 8x ≡ 1(mod 13).

Notice that, x = 8 is a solution of the first linear congruence −x ≡ 1(mod 9), x = 8 isa solution of the second linear congruence 7x ≡ 1(mod 11), and x = 5 is a solution ofthe third linear congruence 8x ≡ 1(mod 13). Therefore, the linear congruences 143x ≡1(mod 9), 117x ≡ 1(mod 11) and 99x ≡ 1(mod 13) are satisfied by x = 8, 8, 5 respectively.Hence a solution of the given system is given by,

x0 = 1.8.11.13 + 2.8.9.13 + 6.5.9.11 = 5986

and the unique solution is given by,

x ≡ 5986(mod 9.11.13) ⇒ x ≡ 838(mod 1287).

Ex 2.8.28 Solve the linear congruence 32x ≡ 79(mod125) by applying Chinese remaindertheorem.

Solution: The canonical form of 1225 is 1225 = 52.72, and (52, 72) = 1. Thus the solutionof the given linear congruence 32x ≡ 79(mod125) is equivalent to finding a simultaneoussolution of the congruences

32x ≡ 79(mod 25) and 32x ≡ 79(mod 49).equivalently, 7x ≡ 4(mod 25) and 16x ≡ 15(mod 49). · · · (i)


We are to solve simultaneous linear congruence (i) by using the Chinese remainder theorem.For this, let m = 25.49 as they are prime to each other. Now, let

M1 =m

25= 49,M2 =

m

49= 25,

where (M1, 25) = 1, (M2, 49) = 1.

(i) Since, (49, 25) = 1, the linear congruence 49x ≡ 1(mod 25) has an unique solution(mod 25) and the solution is x = x1 = 24.

(ii) Since, (25, 49) = 1, the linear congruence 25x ≡ 1(mod 49) has an unique solution(mod 49) and the solution is x = x2 = 2.

Thus, the common integer solution of the given system of congruences is given by

x0 ≡ a1M1x1 + a2M2x2(mod m)≡ 26072(mod 1225) ≡ 347(mod1225).

Ex 2.8.29 Give an example of an congruence which has more roots than its degree.

Solution: We know x2 ≡ 1(mod8) has four distinct solutions x = 1, 3, 5, 7, but x2 ≡1(mod8) is of degree 2.

Result 2.8.3 Symbolic fraction method Let the linear congruence be ax ≡ b(mod m),where (a,m) = 1, then we have

ax ≡ b+mh(mod m)

where h is an arbitrary integer. For examples if we consider 5x ≡ 2 (mod 16) then,

x ≡ 25(mod 16) ⇒ x ≡ 2 + 3.16

5(mod 16) ⇒ x ≡ 10(mod 16).

2.8.6 Inverse of a Modulo m

If (a,m) = 1, then the linear congruence ax ≡ b(mod m) has a unique solution modulo m.This unique solution of ax ≡ 1(mod m) is sometimes called the multiplicative inverse orreciprocal of a modulo m. From the definition, it follows that, if a is the reciprocal of a,then ba is the solution of ax ≡ b(mod m). An element a is said to have an unit element, ifit has an inverse modulo m.

Since (1, 12) = 1 = (5, 12) = (7, 12) = (11, 12), so 1, 5, 7, 11 are units of modulo 12.

Ex 2.8.30 Find the inverse of 12 modulo 17, if it exists.

Solution: Consider the linear congruence 12x ≡ 1(mod17). Since (12, 17) = 1, it followsthat the linear congruence 12x ≡ 1(mod 17) has a solution. Hence there exists an inverseof 12 modulo 17. By division algorithm

17 = 12.1 + 5; 12 = 5.2 + 2 and 5 = 2.2 + 1.Since (17, 12) = 1 so inverse of 12 exists. Now, from above we write

1 = 5− 2.2 = 5− 2.(12− 5.2) = 5− 2.12 + 5.4= 5.5− 2.12 = 5.(17− 12.1)− 2.12= 5.17− 5.12− 2.12 = 12(−7) + 17.5.

This shows that 12(−7) ≡ 1(mod 17). Therefore, −7 is a solution of 12x ≡ 1(mod 17).Hence, −7 is an inverse of 12 modulo 17.

Fermat’s Theorem 131

Ex 2.8.31 If possible, find the inverse of 35 modulo 48.

Solution: Since (35, 48) = 1, so inverse of 35 modulo 48 exists. Now48 = 1× 35 + 13, 35 = 2× 13 + 4, 13 = 1× 9 + 4,1 = 9 + (−2)× 4 = 9 + (−2)[13 + (−1)× 9] = 11× 35 + (−8)× 48.

Hence 11 is the inverse of 35 modulo 35.

2.9 Fermat’s Theorem

Let a be an integer. If p be a prime, or p does not divide a, then respectively

ap−1 ≡ 1(mod p) or ap ≡ a(mod p). (2.18)

Proof: Let us consider the set R of nonzero residue classes of integers modulo p as

R = a, 2a, 3a, · · · , (p− 1)a.

It forms a multiplicative group of order p − 1. We shall first show that, no two distinctmembers of the above (p−1) integers are congruent to each other modulo p. Let if possible,

ra ≡ sa(modp); 1 ≤ s < r ≤ p− 1⇒ (r − s)a ≡ 0(modp) ⇒ p|(r − s)a;⇒ p|(r − s) or p|a; as p is prime .

Since 1 ≤ s < r ≤ p−1, we have p 6 |(r− s) and by hypothesis, p 6 |a. Hence, ra 6≡ sa(modp).Also, we find that ra 6≡ 0(modp) for r = 1, 2, . . . , p− 1. Hence,

ra ≡ k(modp), where, k ∈ Z and 0 < k ≤ p− 1.

Since no two distinct members of R are congruent to each other and there are (p−1) distinctintegers a, 2a, · · · , (p−1)a, it follows that the (p−1) integers in R must be congruent modulop to 1, 2, · · · , (p− 1) taken in some order. Let p be not a divisor of a. Therefore,

a.2a.3a . . . (p− 1)a ≡ 1.2.3 . . . (p− 1)(modp)⇒ ap−1[1.2.3 . . . (p− 1)] ≡ 1.2.3 . . . (p− 1)(modp)⇒ ap−1(p− 1)! ≡ (p− 1)!(modp)⇒ ap−1 ≡ 1(modp); as (p, (p− 1)!) = 1.

Hence the theorem. The converse of this theorem is not always true. For example, 2340 ≡1(mod341), as 341 = 11.31, so 341 is not a prime number.

Result 2.9.1 Let p be a divisor of a, i.e., p|a, then a = pk, for some k ∈ Z. Therefore,

ap − a = a(ap−1 − 1

)= pk

(ap−1 − 1

)= pt,

where, t = k(ap−1 − 1

)∈ Z

⇒ ap − a is divisible by p⇒ ap − a ≡ 0(mod p) ⇒ ap ≡ a(mod p).

Ex 2.9.1 Prove that15n5 +

13n3 +

715n is an integer for every n.


Solution: n5 ≡ n(mod5) and n3 ≡ n(mod3), by Fermat’s theorem. Then

5∣∣∣(n5 − n) and 3

∣∣∣(n3 − n) ⇒ n5 = 5t+ n, n3 = 3s+ n,

for some integer t and s.Now

15n5 +

13n3 +

715n = (t+ s) +

7n+ 5n+ 3n15

= (t+ s+ n) = an integer.

Ex 2.9.2 Use Fermat’s theorem to prove that a12 − b12, is divisible by 13 × 7, where, a, bare both prime to 91.

Solution: Since a is prime to 91, a is prime to both 13 and 7. Using Fermat’s theorem,

a12 − 1 ≡ 0(mod13) and a6 − 1 ≡ 0(mod7).

Since a6 − 1 ≡ 0(mod7), it follows that a12 − 1 ≡ 0(mod7). Also, a12 − 1 ≡ 0(mod13) anda12 − 1 ≡ 0(mod7), so,

a12 − 1 ≡ 0(mod91); as (13, 7) = 1.Similarly, b12 − 1 ≡ 0(mod91); as (13, 7) = 1.

⇒ a12 − b12 ≡ 0(mod91)

which is required.

Theorem 2.9.1 If p be prime > 2 then 1p + 2p + · · ·+ (p− 1)p ≡ 0 (mod p).

Proof: We have,1p ≡ 1 (mod p); 2p ≡ 2 (mod p), · · · , (p− 1)p ≡ (p− 1) (mod p).

Adding all the results, we get,

1p + 2p + · · ·+ (p− 1)p ≡ 1 + 2 + · · ·+ (p− 1)(mod p)

≡ 12p(p− 1) (mod p)

≡ 0 (mod p); since p− 1 is even.

Theorem 2.9.2 If p be prime and a be prime to p, then ap2−p ≡ 1 (mod p2).

Proof: The Fermat’s theorem is, if p be prime and a is integer then ap−1 ≡ 1 (mod p).Hence ∃ q ∈ Z such that ap−1 = 1 + qp. Therefore,

ap2−p = (1 + qp)p = 1 + p.qp+p(p− 1)

2!(qp)2 + · · ·+ (qp)p

= 1 + kp2; where k ∈ Z, .

Hence, by definition, ap2−p ≡ 1 (mod p2).

Ex 2.9.3 Show that the prime factor of n2 +1 is of the form 4m+1, where m is an integer.

Solution: Let p be a prime factor of n2 + 1, then p is not the divisor of n and n2 + 1 ≡0(mod p). By Fermat’s theorem, we have,

np−1 ≡ 1(modp) ⇒ (n2)p−12 ≡ 1(mod p); assume p is odd

⇒ (n2)p−12 ≡ (−1)

p−12 (mod p); as n2 ≡ −1 (mod p)

⇒ p− 12

= even integer, i.e.,p− 1

2= 2m (say).

Therefore, p = 4m+ 1, where m is an integer. From this it follows that, no prime factor ofn2 + 1 can be put of the form 4m− 1, where m is an integer.


Ex 2.9.4 If p is prime to a, then apn−1(p−1) ≡ 1(modpn).

Solution: By Fermat’s theorem, we have ap−1 ≡ 1(modp). Using the theorem, if a ≡1(modp) then ap ≡ 1(modpn+1), we get,

ap(p−1) ≡ 1(modp2), ap2(p−1) ≡ 1(modp3), · · · , apn−1(p−1) ≡ 1(modpn).

Result 2.9.2 If p is prime and p 6= 2, then ap−12 ≡ ±1(modp), when p 6 |a.

Proof: By Fermet’s theorem, we have,

ap−1 − 1 ≡ 0(modp)

or,(a

p−12 − 1

)(a

p−12 + 1

)≡ 0(mod p); p 6= 2.

⇒ p|(a

p−12 − 1

)or, p|

(a

p−12 + 1

); as p = prime.

⇒(a

p−12 − 1

)≡ 0(mod p) or

(a

p−12 + 1

)≡ 0(mod p)

⇒ ap−12 ≡ ±1(mod p).

Ex 2.9.5 Show that square of any integer is of the form 5k ± 1.

Solution: We know, ap−12 ≡ ±1(modp), then,

ap−12 ≡ ±1 + pk; k ∈ Z ⇒ a

p−12 = pk +±1.

When, p = 5, then a2 = 5k ± 1. Thus, square of any integer is of the form 5k ± 1.

Ex 2.9.6 Prove that the eighth power of any integer is of the form 17k or 17k ± 1.

Solution: Let a be an integer, divisible by 17, then a = 17k. If a is not divisible by 17,then (a, 17) = 1. By Fermat’s theorem,

a16 − 1 ≡ 0(mod17) ⇒ (a8 − 1)(a8 + 1) ≡ 0(mod17).Either, a8 − 1 ≡ 0(mod17), or, a8 + 1 ≡ 0(mod17).

a8 − 1 ≡ 0(mod17) ⇒ a8 = 17k + 1a8 + 1 ≡ 0(mod17) ⇒ a8 = 17k − 1.

Hence a8 = 17k or 17k ± 1, where a is an integer.

2.9.1 Wilson’s Theorem

Statement : If p be a prime, then (p− 1)! + 1 ≡ 0(modp).Proof: For the prime number p, the set S of integers which are less than and prime to p

is S = 1, 2, 3, . . . ; p − 1. Let a be one of the integers 1, 2, · · · , p − 1. Then no two of theintegers 1.a, 2.a, · · · , (p− 1). a are congruent modulo p, because, if ra ≡ sa(modp), for someintegers r, s such that

r ≡ s(modp); 1 ≤ r < s ≤ p− 1, as (a, p) = 1

which is a contradiction. Then as (a, p) = 1, a ∈ S, the linear congruence ax ≡ 1(mod p)has unique solution. Also, none of these is divisible by p. This means that the integersa, 2a, · · · , (p − 1)a are congruent to 1, 2, · · · , p − 1 modulo p, taken in some order. So fora ∈ S ∃ unique a′ ∈ S such that aa′ = 1(modp). If a = a′, then


a2 ≡ 1(mod p) ⇒ a2 − 1 ≡ 0(mod p)⇔ p|

(a2 − 1

)⇔ p|(a+ 1)(a− 1)

⇔ (a+ 1)(a− 1) ≡ 0(mod p)⇔ either (a− 1) ≡ 0(mod p) or (a+ 1) ≡ 0(mod p)

Now, a2 ≡ 1(modp) holds if p|(a2 − 1) and this happens only when p|(a − 1) or p|(a + 1).Since p is a prime and a < p, it follows that when a − 1 ≡ 0(mod p), a = 1 and whena+ 1 ≡ 0(mod p), a = p− 1.

If we omit integers 1 and p − 1 from S, the remaining p − 3 integers 2, 3, · · · , p − 2 aresuch that they are grouped into p−3

2 pairs (a, a′) satisfying ax ≡ 1(mod p), a 6= a′ and1 < a′ < p− 1. Multiplying 1

2 (p− 3) of such pair congruences, we have,

2.3 · · · p− 2 ≡ (modp) ⇒ (p− 2)! ≡ 1(modp).or, (p− 1)! ≡ (p− 1)(modp) ≡ (−1)(modp)⇒ (p− 1)! + 1 ≡ 0(modp).

The converse of this theorem is also true, i.e., if (p − 1)! + 1 ≡ 0(modp), then p(> 1) is aprime. For, if p be not a prime, p is composite and has a divisor d with 1 < d < p such thatd|(p+ 1)! + 1. Since 1 < d < p, d divides one of the factors of (p+ 1)!. Thus

d|(p+ 1)! + 1 and d|(p+ 1) ⇒ d|1

which is absurd as d 6= 1. Therefore p cannot be composite and so p is prime.This theorem provides necessary and sufficient condition for determining primality of a

positive integer p. When p assumes large values, then (p − 1)! becomes very large and inthis case is impracticable.

Ex 2.9.7 Show that 70! + 1 ≡ 0(mod 71).

Solution: Since 71 is a prime number, by Wilson’s theorem

(71− 1)! 6 (−1)(mod71) ⇒ 70! + 1 ≡ 0(mod 71).

Ex 2.9.8 For a prime p,

(p− 1)! ≡ (−1)p−12

[1.2. · · · p− 1

2

]2(mod p).

Show that the integer(

p−12

)! satisfies the congruence x2 ± 1 ≡ 0(mod p) according as

p = 4k + 1 and p = 4k + 3.

Solution: We consider the set of integers, 1, 2, 3, · · · , p−12 , p+1

2 , · · · , (p− 2), (p− 1) where, pis prime. Now, p− 1 ≡ −1(mod p), p− 2 ≡ −2(mod p), p+1

2 ≡ −p−12 (mod p). Now,

(p− 1)! = 1.2.3. · · · . p− 12

.p+ 1

2. · · · .(p− 2).(p− 1)

≡(

1.2.3. · · · . p− 12

).

(−p− 1

2. · · · .(−2).(−1)

)(mod p)

≡ 1.(−1).2.(−2). · · · .(p− 1

2

).

(−p− 1

2

)(mod p)

≡ (−1)p−12

[1.2. · · · p− 1

2

]2(mod p)

≡ (−1)p−12

[(p− 1

2

)!]2

(mod p);p− 1

2= integer.


Again by Wilson’s theorem (p− 1)! ≡ −1(modp). Therefore,

−1 ≡ (−1)p−12

[(p− 1

2

)!]2

(mod p);p− 1

2= integer.

Now, if p is a prime of the form 4k + 1 for some k ∈ N , then,

−1 ≡ (−1)4k+1−1

2

[(p− 1

2

)!]2

(mod p)

or, −1 ≡ 1[(

p− 12

)!]2

(mod p)

or,[(

p− 12

)!]2≡ −1(mod p)

Thus x =(

p−12

)! satisfies the congruence x2 + 1 ≡ 0(mod p), when p is of the form 4k + 1.

Again, p is of the form 4k + 3 for some k ∈ N , then,

−1 ≡ (−1)4k+3−1

2

[(p− 1

2

)!]2

(mod p)

or, 1 ≡ 1[(

p− 12

)!]2

(mod p)

or,[(

p− 12

)!]2≡ 1(mod p)

Thus x =(

p−12

)! satisfies the congruence x2 − 1 ≡ 0(mod p).

Ex 2.9.9 If p is odd prime, then show that

12.32.52 · · · (p− 2)2 ≡ (−1)p+12 (mod p).

Solution: We use the result, p− k ≡ −k(mod p) and k ≡ −(p− k)(mod p). Now,

(p− 1)! = 1.2.3. · · · . p− 12

.p+ 1

2. · · · .(p− 2).(p− 1)

= 1.(p− 1)3.(p− 3) · · · 2.(p− 2)≡ 1(−1).3(−3) · · · −(p− 2).(p− 2)(mod p)

≡ (−1)p−12 12.32.52 · · · (p− 2)2(mod p).

Again using Wilson’s theorem (p− 1)! ≡ −1(modp), we have,

−1 ≡ (−1)p−12 12.32.52 · · · (p− 2)2(mod p)

or, 12.32.52 · · · (p− 2)2 ≡ (−1)(−1)p−12 (mod p)

or, 12.32.52 · · · (p− 2)2 ≡ (−1)p+12 (mod p).

Ex 2.9.10 If p be a prime number, then show that

(p− 1)! ≡ p− 1(mod(1 + 2 + · · ·+ (p− 1))).


Solution: Using Wilson’s theorem (p− 1)! ≡ −1(modp), we have,

(p− 1)! ≡ (p− 1)(modp),⇒ p|(p− 1)!− (p− 1).

Also,p− 1

2|(p− 1)!− (p− 1); as p− 1 is even.

or, pp− 1

2|(p− 1)!− (p− 1) ⇒ (p− 1)!− (p− 1) ≡ 0

[mod p

p− 12

]or, (p− 1)! ≡ p− 1(mod(1 + 2 + · · ·+ (p− 1))).

as 1 + 2 + · · · (p− 1) = pp− 1

2.

Ex 2.9.11 Prove that 4(29)! + 5! is divisible by 31.

Solution: In Wilson’s theorem, let p = 31(prime), then (30)! + 1 ≡ 0(mod31), i.e.,

(31− 1)(29)! + 1 ≡ 0(mod31)⇒ −(29)! + 1 ≡ 0(mod31) ⇒ 4(29)!− 4 ≡ 0(mod31)⇒ 4(29)!− 4 + 124 ≡ 0(mod31) ⇒ 4(29)! + 120 ≡ 0(mod31).

Thus 4(29)! + 5! is divisible by 31.

2.10 Arithmetic Functions

An arithmetic or a number-theoretic function is a real or complex valued function whosedomain is the set of positive integers. If f is an arithmetic function we write f(n) for itsvalues. For example, f : N → N , defined by f(n) = n or n2, are arithmetic functions, butf(n) = log n is not an arithmetic function. The following are arithmetic functions:

(i) f(n) = 2n for all n ∈ N .

(ii) f(n) = 1n for all n ∈ N .

(iii) f(n) = n+ 1n for all n ∈ N .

An arithmetic function not identically zero is said to be normalized if f(1) = 1. Severalarithmetical functions plays an important role in the study of divisibility properties of inte-gers and the distribution of primes. Here we shall discus two important number theoreticfunctions

(i) Euler-Totiant function or phi function,

(ii) Mobius function.

2.10.1 Euler’s Phi Function

Let n ∈ N . Then the number of positive integers less than n and prime to n(i.e., the numberof divisors of n) is denoted by φ(n) with φ(1) = 1. Thus the function

φ : N → N defined by φ(n) =n′∑

k=1

1 (2.19)

is known as Euler’s phi function where ‘′’ indicates the sum is extended over those k(< n)satisfying (k, n) = 1. For example, let n = 12, then k = 1, 5, 7, 11, where k < n and(k, n) = 1. Thus

Arithmetic Functions 137

φ(12) =12′∑k=1

1 = 1 + 1 + 1 + 1 = 4.

A short table for φ(n) is given as follows:

n : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15φ(n) : 1 1 2 2 4 2 6 4 6 4 10 4 12 6 8

The function φ is a number-theoretic function.

Result 2.10.1 Let n be a prime integer. If p is a positive integer such that p < n, then(p, n) = 1. Hence the number of positive integers not exceeding n and relatively prime to nis n− 1, so that, φ(n) = n− 1.

Result 2.10.2 The followings are important:

(i) If n = pα11 · pα2

2 · · · pαk

k , where pi(1 ≤ i ≤ k) are distinct primes and αi ∈ N1 ≤ i ≤ k,then the number of prime divisors of n is (1 + α1)(1 + α2) · · · (1 + αk).

(ii) The highest power of a prime p contained in n! is denoted by k(n!), where

k(n!) =[np

]+[ np2

]+[ np3

]+ · · ·

Ex 2.10.1 Find the highest power of 3 contained in 100!

Solution: The highest power of a prime p contained in n! is denoted by k(n!), where

k(n!) =[np

]+[ np2

]+[ np3

]+ · · ·

=[100

3

]+[100

32

]+[100

33

]+[100

34

]+[100

35

]+ · · ·

= 33 + 11 + 3 + 1 + 0 = 48.

Theorem 2.10.1 If p is prime then φ(pk) = pk(1− 1

p

)where k is a positive integer.

Proof: When k = 1, then φ(p) = p− 1 as p is prime. If k > 1, let us arrange the integersfrom 1 to pk in the following way

1 2 · · · p− 1 pp+ 1 p+ 2 · · · p+ (p− 1) 2p2p+ 1 2p+ 2 · · · 2p+ (p− 1) 3p

...... · · ·

......

(pk−1 − 1)p+ 1 (pk−1 − 1)p+ 2 · · · (pk−1 − 1)p+ (p− 1) pk−1p

Thus if, k > 1, we find that q(≤ pk), a positive integer, then (q, pk) 6= 1, if and only if q is oneof p, 2p, 3p, . . . , pkp and their numbers are pk−1. If q is not equal to one of p, 2p, 3p, . . . , pkp,then (q, pk) = 1. Now there are pk integers 1 to pk. Among these integers, there are pk−1

integers which are not relatively prime to pk. Hence the remaining integers are relativelyprime to k. The total number is pk − pk−1 and so

φ(pk) = pk − pk−1 = pk

(1− 1

p

). (2.20)


If the integer n(> 1) is of the form n = pα11 pα2

2 · · · pαrr , where p1, p2, · · · , pr are prime to one

another, i.e., (pαi1 p

αj

2 ) = 1 for all i, j. Thenφ(n) = φ (pα1

1 )φ (pα22 ) · · ·φ (pαr

r )

= pα11

(1− 1

p1

)pα22

(1− 1

p2

)· · · pαr

r

(1− 1

pr

)= pα1

1 pα22 · · · pαr

r

(1− 1

p1

)(1− 1

p2

)· · ·(

1− 1pr

)= n

(1− 1

p1

)(1− 1

p2

)· · ·(

1− 1pr

)= n

r∏α=1

(1− 1

pα

). (2.21)

Ex 2.10.2 Find φ(191) and φ(260).

Solution: First we are to test whether 191 is a prime number or not. For this, we findall primes p satisfying p2 ≤ 191. These primes are 2, 3, 5, 7, 11, 13 and 17. But none of theprimes divide 191, so 191 is a prime.

Therefore, by definition, φ(191) = 191− 1 = 190.By using the unique factorization theorem, we get 260 = 22.5.13. Therefore,

φ(260) = φ(22.5.13) = 260(1− 12)(1− 1

5)(1− 1

13)

= 260.12.45.1213

= 96.

Theorem 2.10.2 Let m and n be two positive integers. If m,n are relatively prime, thenφ(mn) = φ(m)φ(n), i.e., φ(n) is multiplicative.

Proof: Given that (m,n) = 1. We consider the product mn. Then the first mn numberscan be arranged in n lines, each containing m numbers. Thus,

1 2 · · · k · · · mm+ 1 m+ 2 · · · m+ k · · · m+m2m+ 1 2m+ 2 · · · 2m+ k · · · 2m+m

...... · · ·

......

(n− 1)m+ 1 (n− 1)m+ 2 · · · (n− 1)m+ k · · · (n− 1)m+m

Now, we consider the vertical column beginning with k. If (k, a) = 1, all the terms of thiscolumn will be prime to m, but if k and m have a common divisor, no number in the columnwill be prime to m. Now the first row contains φ(m) numbers prime to n, therefore, φ(m)vertical columns in each of which every term is prime to n. Let us suppose that, the verticalcolumn which begins with k is one of these. This column is in arithmetic progression, theterms of which when divided by n leaves remainders

0, 1, 2, 3, · · · , n− 2, n− 1.

Hence, the column contains φ(m) integers prime to n. Thus in the table, there are φ(m)φ(n)integers, which are prime to m and also n and therefore to mn; i.e., φ(mn) = φ(m)φ(n).This theorem can be extended to a finite number of positive integers as

φ(m1m2 · · ·mr) = φ(m1)φ(m2) · · ·φ(mr),

where m1,m2, · · · ,mr are prime to one another. If (m,n) = d, then

φ(m,n) = φ(m) φ(n)d

φ(d).


Theorem 2.10.3 If n is a positive integer, then,

φ(2n) = φ(n); n is odd= 2φ(n); n is even.

Proof: When n is odd and 2, n are prime to each other so

φ(2n) = φ(2)φ(n) = 1.φ(n) = φ(n)

When n is even, let n = 2k.p where p is an odd integer.Therefore,

φ(n) = φ(2k.p) = 2k.(1− 12)φ(p) = 2k−1φ(p)

φ(2n) = φ(2k+1)φ(p) = 2kφ(p)⇒ φ(2n) = 2φ(n).

If p = 1, then also φ(2n) = 2φ(n), since φ(1) = 1.

Ex 2.10.3 Find all integers n such that

(i) φ(n) =n

2, (ii) φ(n) = φ(2n) and (iii) φ(n) = 12.

Solution: (i) Let n be a prime integer. Hence the number of positive integers not exceedingn and relatively prime to n is n− 1, so that, φ(n) = n− 1. Therefore,

n− 1 =n

2⇒ n

2= 1 ⇒ n = 2.

Also, n2 can be written as n(1− 1

2 ), so that the values of n are of the form n = 2α;α ∈ N .(ii) The given equation φ(n) = φ(2n) can be written in the form φ(n) − φ(2n) = 0. So,

using the definition of Euler’s phi function, we have,

φ(n) = 1 or 2 or 4 or 6 · · · i.e., n = 1, 2; 3, 4, 6; 5, 8, 10, 12; 7, 9, 14, 18; · · ·φ(2n) = 1 or 2 or 4 or 6 · · · i.e., 2n = 1, 2; 3, 4, 6; 5, 8, 10, 12; 7, 9, 14, 18; · · ·

Since n ∈ N , so for the second case, 2n 6= 1, 3, 5, 7, · · · and therefore, the common values ofn are n = 1, 3, 5, 7, 9, · · · .

(iii) Case 1: Let n be a prime integer. Hence the number of positive integers notexceeding n and relatively prime to n is n− 1, so that, φ(n) = n− 1, then

φ(n) = n− 1 = 12 ⇒ n = 13.

Case 2: Now, 12 can be written as 12 = 22.3 and so,

n(1− 12)(1− 1

3) = 12 ⇒ n× 1

2× 2

3= 12 ⇒ n = 36.

Thus the values of n satisfying φ(n) = 12 are n = 13, 36.

Ex 2.10.4 Solve for x, y, z ∈ N where φ(x− 5) + φ(3y − 5) + φ(5z − 18) = 3.

Solution: We see that, φ(n) ∈ N , and therefore, the given equation will be satisfied if andonly if φ(x− 5) = 1, φ(3y − 5) = 1, φ(5z − 18) = 1. Now,

φ(x− 5) = 1 ⇒ x− 5 = 1 or 2; i.e., x = 6 or7

φ(3y − 5) = 1 ⇒ 3y − 5 = 1 or 2; i.e., y = 2 but y 6= 73, as y ∈ N

φ(5z − 18) = 1 ⇒ 5z − 18 = 1 or 2; i.e., z = 2 but z 6= 195, as z ∈ N .

Thus the solutions are (6, 2, 4); (7, 2, 4).


Theorem 2.10.4 Let a,m(> 0) be integers. If (a,m) = 1 then aφ(m) ≡ 1 (mod m).

Proof: For each positive integer m ≥ 1, φ(m) is the Euler’s phi function, defined as

φ(1) = 1 and φ(m) =∑

1≤k≤m,(k,m)=1

1.

Thus for m = 1, the result holds trivially. Fix a positive integer m and take an inte-ger a, coprime to m. Let r1, r2, . . . , rφ(m) be a reduced residue system mod m. then,ar1, ar2, . . . , arφ(m) is also a reduced residue system mod m in some order. Since each(ari,m) = 1 and they are incongruent to each other. Hence, the product of all the integersin the first set is congruent to the product of those in the second set. Therefore,

ar1.ar2. . . . , arφ(m) ≡ r1.r2. . . . rφ(m)(modm)

⇒ aφ(m)r1.r2 . . . rφ(m) ≡ r1.r2. . . . rφ(m)(modm)

⇒ aφ(m) ≡ 1(modm); as, (r1.r2. . . . rφ(m),m) = 1.

Each ri is relatively prime to m, so we can cancel each ri and obtain the theorem. Thisis known as Euler Fermat theorem. This theorem can be used to calculate the solution oflinear congruence.

Result 2.10.3 If (a,m) = 1, the solution (unique mod m) of the linear congruence ax ≡b(modm) is given by x ≡ baφ(m)−1(modm).


Solution: Since (5, 24) = 1, there is an unique solution. The solution is given by

x ≡ 3.5φ(24)−1 ≡ 3.57(mod24); as φ(24) = φ(3)φ(8) = 2.4≡ 3.5(mod24); as 52 ≡ 1(mod24) ⇒ 56 ≡ 1(mod24)

⇒ x ≡ 15(mod24).


Solution: Here d = (25, 120) = 5. As d|15, the congruence has exactly five solutionsmodulo 120. To find them, we are to solve the linear congruence 5x ≡ 3(mod24). Thus thefive solutions are given by

x ≡ 15 + 24k; k = 0, 1, 2, 3, 4or, x ≡ 15, 39, 63, 87, 111(mod120).

Ex 2.10.7 If n > 7 is prime, prove that n6 − 1 is divisible by 504.

Solution: Since 7 is a prime and n is prime to 7, by Fermat’s theorem, n6 − 1 is divisibleby 7. By Euler’s theorem as n is prime to 9, nφ(9) − 1 is divisible by 9. Now,

φ(9) = φ(32) = 9(

1− 13

)= 6.

Therefore, n6 − 1 is divisible by 9. Since n > 7 is an odd prime, n is of the forms 4k + 1 or4k + 3, where k(> 1) ∈ N .

n6 − 1 = (n− 1)(n+ 1)(n4 + n2 + 1).If, n = 4k + 1, then, (n− 1)(n+ 1) = 4k(4k + 2) and,if, n = 4k + 3, then, (n− 1)(n+ 1) = (4k + 2)(4k + 4).

Therefore, in any case n is divisible by 8. Since, the three consecutive integers 7,8,9 arepairwise prime to each other,7.8.9|n6 − 1, i.e., 504|n6 − 1.


Ex 2.10.8 Use Euler-Fermat’s theorem to find the unit digit in 3100. [NET’11]

Solution: Since 3 is prime to 10, by Euler-Fermat’s theorem,

3φ(10) ≡ 1(mod10) where, φ(10) = 4,or, 34 ≡ 1(mod10) ⇒ 3100 = 34.25 ≡ 1(mod10).

Thus the unit digit in 3100 is 1.

Result 2.10.4 The sum of the divisors of a positive integer n is denoted by σ(n), i.e.,σ(n) =

∑d|n

d and it is an arithmetical function. For example, consider the positive integer 4.

The divisors of 4 are 1,2,4. Therefore, σ(4) = 1 + 2 + 4 = 7. Similarly, σ(6) = 12, σ(10) =18, σ(15) = 24. In general, if n = pα1

1 · pα22 · · · pαk

k , then

σ(n) =∑d|n

d =pα1+11 − 1p1 − 1

· pα2+12 − 1p2 − 1

· · ·pαk+1

k − 1pk − 1

. (2.22)

Ex 2.10.9 Find the number of positive divisors of 50000. [NET’12]

Solution: We have, 50000 = 24 × 55. Thus the number of positive divisor of 50000 is

= (4 + 1)(5 + 1) = 5× 6 = 30.

Result 2.10.5 Consider positive integer n and write S = 1, 2, · · · , n. Define ∼ on S bya ∼ b⇔ (an) = (b, n). Then ∼ is an equivalence relation. For a divisor of n,

A(d) =k : (k, n) = d

is an equivalence class. So S =

⋃d|n

A(d). For example, let n = 6, S = 1, 2, · · · , 6. Then,

divisors of 6 are 1,2,3,6. Now

A(1) = 1, 5, A(2) = 2, 4, A(3) = 3, A(6) = 6.

Note that these sets A(1), A(2), A(3), A(6) are disjoint and of is S.

2.10.2 The Mobius Function:

The Mobius function µ(n) is defined as,

µ(n) = 1; n = 1= (−1)r; if n = p1.p2. . . . pr (pi‘s are distinct prime.)= 0; if a2|n for some a > 1, i.e.n has a square factor > 1.

The following is a table, showing some values of µ(n) :

n : 1 2 3 4 5 6 7 8 9 10µ(n) : 1 −1 −1 0 −1 1 −1 0 0 1

The Mobius Function arises in many different places in number theory. One of its funda-mental properties is a remarkably simple formula for the divisor sum

∑d|n

µ(d), extended over

the positive divisor of n.


Theorem 2.10.5 If n ≥ 1, then∑d|n

µ(d) = [1n

] = 1;n = 1

= 0;n > 1

Proof: The formula is clearly true if n = 1. Assume, then, n > 1.Case 1: Let n = pα, then∑

d|pα

µ(d) = µ(1) + µ(p) + µ(p2) + · · ·+ µ(pα)

= 1 + (−1) + 0 + · · ·+ 0 = 1− 1 = 0.

Case 2: Assume that the result be true for integers with at most k distinct prime factors,i.e., let n = apα, where a is an integer with k prime factors and p 6 |a. Now,∑

d|n

µ(d) =∑d|a

µ(d) +∑d|a

µ(pd) +∑d|a

µ(p2d) + · · ·+∑d|a

µ(pαd)

=∑d|a

µ(d) +∑d|a

µ(p)µ(d) +∑d|a

µ(p2)µ(d) + · · ·+∑d|a

µ(pα)µ(d)

=∑d|a

µ(d)−∑d|a

µ(d) = 0.

Case 3: Let n = pa11 .p

a22 . . . par

r > 1, be a standard factorization of n. In the sum∑d|n

µ(d)

the only nonzero terms come from d = 1 and from those divisor of n which are products ofdistinct primes. Hence,∑

d|n

µ(d) = µ(1) +∑

1≤i≤r

µ(pi) +∑

1≤i<j≤r

µ(pipj) + · · ·+ µ(p1. . . . pr)

= µ(1) + µ(p1) + · · ·+ µ(pr) + µ(p1p2) + · · ·+ µ(pr−1pr) + · · ·+ µ(p1 . . . pr)

= 1 +(r

1

)(−1) +

(r

2

)(−1)2 + · · ·+ (−1)r

= (1− 1)r = 0

Thus the proof will be induction on different prime factors for n > 1.

Theorem 2.10.6 If F (n) =∑d|n

f(d) for every positive integer n, then

f(n) =∑d|n

µ(d)F (n/d).

Proof: By using the definition,∑d|n

µ(d)F (n/d) =∑

δ|n/d

µ(d)f(δ) =∑δ|n

µ(d)f(δ)

=∑δ|n

f(δ)∑

d|n/δ

µ(d) = f(n)

This is called Mobius inversion formula. If f(n) and g(n) are two arithmetic functionssatisfying the condition f(n) =

∑d|n

g(n), then f(n), g(n) is a Mobius pair. For example,

n, φ(n) is Mobius pair.


2.10.3 Divisor Function

Let n(> 1) be a positive integer. The divisor function, i.e., the number of positive divisorsof a positive integer n, τ : N → N , denoted by τ(n), n ∈ N is given by

τ(n) = 1; if n = 1= 2; if n = p( a prime )> 2; if n is composite.

Note that, τ(n), 1 is Mobius pair. Let a positive integer n(> 1) be expressed in a canonicalform as

n = p1α1 .p2

α2 . . . prαr , αi ≥ 0, for i = 1, 2, · · · , r,

where pn is the nth prime with p1 < p2 < · · · < pr. If m be a positive divisor of n, then mis of the form p1

α1 .p2α2 . . . pr

αr , where,

0 ≤ u1 ≤ α1, 0 ≤ u2 ≤ α2, · · · , 0 ≤ ur ≤ αr.

Thus the positive divisors of n in one-one correspondence with the totality of r tuples(u1, u2, · · · , ur), satisfying the above inequality. The number of such r tuples is (α1 +1)(α2 + 1) · · · (αr + 1). Hence the total number of positive divisors of n is

τ(n) = (α1 + 1)(α2 + 1) · · · (αr + 1).

The total number of positive divisors τ(n) include both the divisors 1 and n. For example,

τ(4) = τ(22) = 2 + 1 = 3; τ(12) = τ(22.3) = (2 + 1)(1 + 1) = 6.

The sum of all positive divisors of a positive integer n is denoted by σ(n). Every positivedivisor of n in the canonical form is a term in the product

(1 + p1 + · · ·+ pα11 )(1 + p2 + · · ·+ pα2

2 )(1 + p1 + · · ·+ pαrr )

and conversely, each term in the product is a divisor of n. Thus the sum of all positivedivisors of n = p1

α1 .p2α2 . . . pr

αr is

σ(n) = (1 + p1 + · · ·+ pα11 )(1 + p2 + · · ·+ pα2

2 )(1 + p1 + · · ·+ pαrr )

=pα1+11 − 1p1 − 1

.pα2+12 − 1p2 − 1

. . . . .pαr+1

r − 1pr − 1

with σ(1) = 1. The functions τ and σ are examples of number-theoretic functions. Both ofτ and σ are multiplicative functions, i.e.,

τ(mn) = τ(m)τ(n) and σ(mn) = σ(m)σ(n).

A positive integer n is said to be a prefect number , if σ(n) = 2n, i.e., if n be the sum of allits positive divisors excluding itself. For example, 6, 28 etc. are perfect number.

Ex 2.10.10 Find τ(360) and σ(360).

Solution: The number 360 can be written in canonical form as 360 = 23.32.5. Therefore,

τ(360) = (1 + 3)(1 + 2)(1 + 1) = 24.

σ(360) =24 − 12− 1

.33 − 13− 1

.52 − 15− 1

= 1170.


Ex 2.10.11 The total number of positive divisors of a positive integer n is odd if and onlyif n is a perfect square.

Solution: Let a positive integer n(> 1) be expressed in a canonical form as

n = p1α1 .p2

α2 . . . prαr , αi ≥ 0, for i = 1, 2, · · · , r,

where pn is the nth prime with p1 < p2 < · · · < pr. Then each of α1, α2, · · · , αr is an eveninteger and τ(n) is odd. If however, n = 1, a perfect square, then τ(n) = 1 and it is odd.

Conversely, let τ(n) be odd. Then each of the factors α1 + 1, α2 + 1, · · · , αr + 1 must beodd. Consequently, each of α1, α2, · · · , αr must be even and n is therefore a perfect square.This completes the proof.

2.10.4 Floor and Ceiling Functions

Let x be any real number. The floor function of x denoted by bxc and it is the greatestinteger less than or equal to x. That is, bxc : R → Z where bxc = greatest integer less thanor equal to x. For example, b8.25c = 8, b8.75c = 8, b−10.6c = −11, b8c = 8, b−3c = 3,b√

26c = 5, etc.The ceiling function or x ∈ R is denoted by dxe and it is the smallest integer greater than

or equal to x. Thus, dxe : R → Z, where bxc = least integer greater than or equal to x. Forexample, b8.25c = 9, b8.75c = 9, b−4.6c = −4, b−5c = 5, b5c = 5 etc.Properties

1. bxc = n⇔ n ≤ x < n+ 1, where n is an integer.2. dxe = n⇔ n < x ≤ n+ 1, where n is an integer and x is not an integer.3. x− 1 < bxc ≤ x ≤ dxe < x+ 1.4. bm+ nc = bmc+ n, where n is an integer.5. dm+ ne = dme+ n, where n is an integer.6. bxc+ byc 6= bx+ yc when x, y 6∈ Z.7. dxe+ dye 6= dx+ ye when x, y 6∈ Z.8. dmxe = mdxe, where m is an integer.9. bmxc = mbxc, where m is an integer.

2.10.5 Mod Function

Let m be a positive integer. The (mod m) function is defined as fm(a) = b, where b is theremainder when a is divided by m. The function fm(a) = b is also denoted by a ≡ b (modm), 0 ≤ b < m. Also, fm(a) = b when (b − a) is divisible by m. The integer m is calledthe modulus and a ≡ b (mod m) is read as ’a is congruent to b modulus m’. It can also bedefined as, fm(a) is unique integer r such that a = kq + r, 0 ≤ r < m for some integer q.This function is also written as a (mod m). For example,

f7(35) = 0 as 7 divides 35− 0 or 35 = 5× 7 + 0,f5(36) = 0 as 5 divides 36− 1 or 36 = 5× 7 + 1.

Exercise 2


1. Fundamental theorem of arithmetic: every positive integer n > 1 can be uniquely asa product of(a) Prime (b) Positive integers (c) Perfect squares (d) None of the above.


2. Division algorithm states as: let a and b integers with b 6= 0, then there exists integersq and r such that(a) a− bq = r (b) a = bq − r (c) a = q ∗ r + b (d) All of the above.

3. Suppose a, b and c are integers, which of the following are true?(a) If a/b and b/c, then a/c (b) If a/b and b/c, then a/(b + c) and a/(b − c) (c) Ifx > 0, then gcd(ax, bx) = x+ gcd(a, b) (d) For any integer x gcd(a, b) = gcd(a, b+ax).

4. gcd(540, 168) =(a) 168 (b) 34 (c) 12 (d) none of the above

5. Two integers a and b are said to relatively prime or coprime if(a) gcd(a, b) = a (b) gcd(a, b) = 1 (c) gcd(a, b) = a ∗ b (d) All of the above.

6. For linear congruence equation ax ≡ b(mod m) where, d = gcd(a,m), if d does notdivide b then the equation has(a) Unique solution (b) Has no solution (c) Two solutions (d) None of the above.

7. Consider congruence equation, 8x ≡ 12(mod 28), then(a) Equation has no solution (b) has unique solution as 5 (c) 5,12,19 and 26 arethe four solutions (d) All of the above.

8. Solution of 235x ≡ 54(mod7) is(a) x ≡ 12(mod7) (b) x ≡ 3(mod7) (c) x ≡ 5(mod7) (d) x ≡ 4(mod7)

9. Remainder of 8103 from Fermat theorem when divided by 103 is(a) 8 (b) 7 (c) 6 (d) 10

10. The unit digit of 2100 is NET(June)11(a) 2 (b) 4 (c) 6 (d) 8

11. The number of elements in the setm : 1 ≤ m ≤ 1000,mand 1000 are relatively prime

is NET(June)11(a) 100 (b) 250 (c) 300 (d) 400

12. The number of positive divisors of 50000 is NET(June)12(a) 20 (b) 30 (c) 40 (d) 50

13. The last digit of (38)1031 is NET(June)12(a) 6 (b) 2 (c) 4 (d) 8


1. Show that every integer > 1 has a prime factor.

2. If n > 1 is an integer, show that there exists a prime integer p such that p divides n.

3. Explain the fundamental theorem of arithmetic by an example.

4. Let a, b, n be positive integers such that a ≡ b(mod n), show that (a, n) ≡ (b, n).

5. Show that 32n ≡ 1(mod 8), for all integers n ≥ 1.

6. Find all the integral solutions of the equation 5x+ 4y = 9.

7. Explain the Wilson’s theorem for integers by an example.


8. Find φ(14), where φ is the Euler function.

9. Define the Mobius µ-function. Find µ(15).

10. Find the highest power of 7 contained in 1000!. Ans: 164


1. Prove the following by mathematical induction:

(a) 1.2 + 2.22 + 3.23 + 4.24 + · · ·+ n.2n = (n− 1)2n+1 + 2,∀n ∈ N .

(b) 1 + 2 + · · ·+ n = n(n+1)2 , ∀n ∈ N . KU(H) :’09

(c) 1.1! + 2.2! + · · ·+ n.n! = (n+ 1)!− 1, ∀ n ≥ 1.

(d) 112 + 1

22 + 132 + · · ·+ 1

n2 ≤ 2− 1n , ∀ n ≥ 1

(e) 12 + 2

22 + 332 + · · ·+ n

n2 = 2− n+22n , ∀ n ≥ 1.

2. Prove by mathematical induction that, for every n ∈ N

(a) 33n+3 − 8n− 7 is divisible by 49. JECA‘06

(b) 32n+1 + 2n+2 is divisible by 7.

(c) 32n − 1 is not exactly divisible by 2n+3.

(d) 22n+1 − 9n2 + 3n− 2 is divisible by 54 KU(H) :’07

(e) 72n + 16n− 1 is divisible by 64.

(f) 32n − 8n− 1 is divisible by 64.

3. Prove the following inequalities by induction on n ∈ N

(a) 2n < n! for all n ≥ 4.

(b) n! > 3n for all integers n ≥ 7.

(c) n2 < n! for all integers n ≥ 4.

4. Use division algorithm, show that

(a) The product of any k consecutive integers is divisible my k.

5. (a) Show that n2 + 2 is not divisible by 4 for any integer n.

(b) If p is a prime greater than 3, show that 24|p2 − 1.

(c) If n is an integer not divisible by 2 or 3, then 32|(n2 + 3)(n2 + 7).

(d) If n is an odd integer, then 24|(n2 + 3).

(e) Prove that 2n! is divisible by n!(n+ 1)!.

6. (a) Let a1, a2, . . . , an be any non-zero integers and d = (a1, a2, . . . , an). Then ∃m1,m2, . . . ,mn ∈ Z such that d = a1m1 + a2m2 + · · ·+ anmn.

(b) Prove that for any two integers u and v where v > 0 there exists two uniqueintegers m and n such that u = mv + n, where 0 ≤ n < v.

(c) If (a, b) = [a, b] for two positive integers a, b, prove that a = b. BH ′97

(d) If a > 1, prove that (am − 1, an − 1) = a(m,n) − 1.

7. Prove that√

2,√

3,√

5 are irrational numbers.


8. Prove that the product of the first n Fermat’s number is 22n − 1.

9. Prove that every square integer is either of the forms 5m, 5m + 1, 5m − 1, m is aninteger.

10. Prove that the prime factor of n2 + 1 is of the form 4m+ 1.

11. Find integers m,n such that

(a) (95, 102) = 95m+ 102n.

(b) (723, 24) = 723m+ 24n.

(c) (426, 246) = 426m+ 246n.

12. If (a, b) = 1, show that

(a) (a+ b, a− b) = 1 or 2.

(b) (a+ b, ab) = 1 and (a2 + b2, a2b2) = 1.

13. If k be a positive integer, then (ka, kb) = k(a, b).

14. (a) Prove that n12 − 1 is divisible by 7, if (n, 7) = 1.

(b) If n and n2 + 8 are both prime numbers, prove that p = 3.

(c) If 2n − 1 be prime, prove that n is a prime.

(d) Prove that n4 + 4n is a composite number for all natural number n > 1.

15. Show that a natural number is divisible by 9 if and only if the sum of its digits isdivisible by 9.

16. Find the integer between 1 and 1000 which leaves the remainder 1, 2, 6 when dividedby 9, 11, 13 respectively.

17. Solve the Diophantine equations:

(a) 56x+ 72y = 40 : x = 20 + 9t, y = −15− 7t.

(b) 8x− 27y = 125 : x = −1169− 27t, y = −351− 8t.

(c) 7x+ 11y = 1 : x = 8− 11t, y = −5 + 7t.

(d) 68x− 157y = 1 : x = −30− 157t, y = −13− 68t.

(e) 13x− 17y = 5 : x = 20− 17t, y = 15− 13t.

18. The sum of two positive integers is 100. If one is divided by 7 the remainder is 1, andif the other is divided by 9 the remainder is 7. Find the numbers. Ans: 57, 43.

19. For any natural number n show that,

(a) (2n+ 1)2 ≡ 1(mod8).

(b) 4.6n + 5n+1 ≡ 9(mod20).

20. Show that

(a) 241 ≡ 3(mod23).

(b) 315 ≡ 1(mod13).

21. Find all the natural numbers n ≤ 100 that satisfy(i) n ≡ 10(mod7) (ii) n ≡ 3(mod17) (iii) n ≡ 10(mod17).


22. If a ≡ b(mod m) and x ≡ y(mod m), then prove that

(a) ap+ xq = (bp+ yq)(mod m)

(b) ax ≡ by(mod m).

23. If a ≡ b(mod m) then prove that (a,m) = (b,m), i.e. congruent numbers have thesame GCD with m.

24. Solve the linear congruence :

(a) 7x ≡ 3(mod15) : Ans: x ≡ 9(mod15)

(b) 37x ≡ 7(mod127) : Ans: x ≡ 86(mod127)

(c) 29x ≡ 1(mod 13).

(d) 15x ≡ 9(mod18) : x = −3 + 6t; t = 0, 1, 2.

25. A certain number of sixes and nines are added to give a sum of 126. If the number ofsixes and nines are interchanged, the new sum is 114. How many sixes and nines werethere originally?

26. Show that the solution of the systemx ≡ a(mod 21), x ≡ b(mod 16) is x ≡ 64a− 63b(mod 336).

27. Find the solution of the system with the help of Chinese Method :

(a) x ≡ 5(mod 4), x ≡ 3(mod 7), x ≡ 2(mod 9).

(b) x ≡ 3(mod 6), x ≡ 5(mod 8), x ≡ 2(mod 11).

(c) x ≡ 1(mod 3), x ≡ 2(mod 5), x ≡ 3(mod7). Ans: x ≡ 52(mod 105)

28. Solve the system of congruence

(a) x ≡ 1(mod 3), x ≡ 2(mod 4), x ≡ 3(mod 5).

(b) x ≡ 11(mod 15), x ≡ 6(mod35), Ans: x ≡ 41(mod 105).

29. Use Fermat’s theorem to prove that for two positive integers a, b; a40− b40 is divisibleby 541 if both a and b are prime to 541.

30. Use Fermat’s theorem to prove that

(a) 1! + 2! + 3! + · · ·+ 79! + 80! ≡ 1(mod 80).

(b) 1p−1 + 2p−1 + 3p−1 + · · ·+ (p− 1)p−1 ≡ (−1)(mod p)

(c) 1p + 2p + 3p + · · ·+ (p− 1)p ≡ 0(mod p)

when p is an odd prime.

31. If p is odd prime, then show that

22.42.62 · · · (p− 1)2 ≡ (−1)p−12 (mod p).

32. Show that 28! + 233 ≡ 0(mod899).

Chapter 3

Theory of Matrices

In this chapter, we are to investigate the concepts and properties matrices and discuss someof the simple operations by which two or more matrices can be combined. Matrices are veryimportant topics in every field of science and engineering.

3.1 Matrix

A matrix is a collection of numbers ordered by rows and columns. It is customary to enclosethe numbers of a matrix in brackets [ ] or parenthesis ( ). For example, the following is amatrix:

A =[

3 5 −30 6 1

].

This matrix has two rows and three columns and it is referred to as a “2 by 3” or 2× 3matrix. Let F be a field of scalars and let the elements aij (i = 1, 2, · · · ,m; j = 1, 2, · · · , n),not necessarily distinct, belong to the field F . If we construct a rectangular array A of mnquantities aij into m rows and n columns, then A is said to be a matrix of order or sizem× n (read as m by n) over the field F and usually written in the form

A =

a11 a12 · · · a1n

a21 a22 · · · a2n

......

...an1 an2 · · · ann

. (3.1)

The mn quantities aij are called elements or constituents or coordinates or entries of thematrix. Frequently, the matrix may be written simply as A = [aij ] or [aij ]m×n or (aij) or(aij)m×n; where aij is the ith element or ij entry appears in the ith row and jth column.The numbers a11, a22, · · · , ann form the main or leading or principle diagonal.

Also, the elements of the the matrix A belong to the field F = <, of real numbers,therefore A is a real matrix.

3.1.1 Special Matrices

Row and column matrices

We draw attention to the fact that each row of an m× n matrix has n components, wheren is the number of columns and each column has m components, where m is the number ofrows. A matrix having a single row (column) is called a row (column) matrix. The ith rowand jth column of the matrix A are

149

150 Theory of Matrices

[ai1 ai2 · · · ain], 1 ≤ i ≤ m;

aij

a2j

...amj

, 1 ≤ j ≤ n. (3.2)

respectively. For example, let us consider the real matrix A =(

1 6 72 4 3

)of order (size) 2×3.

It has two rows [1 6 7] and [2 4 3] and three columns(

12

),(

64

)and

(73

). A 1 × n

or n× 1 is also known as n vector. The row and column matrices are sometimes called rowvectors and column vectors. A matrix having only one row is called row matrix, while amatrix having only one column is called column matrix.

3.1.2 Square Matrix

For an m × n matrix [aij ]m×n if m = n, i.e., the number of rows equal to the number ofcolumns, then the matrix is said to be a square matrix. A n × n square matrix is said tobe of order n and is sometimes known as n square matrix. The elements a11, a22, · · · , ann

are known as diagonal elements of A. For example,

1 6 72 4 34 3 6

is a square matrix of order 3.

with 1, 4, 6 is the leading diagonal.

Null matrix

A matrix whose entries are all zero, i.e., aij = 0, for all pairs of i and j, then the matrixA = [aij ]m×n is said to be a null or zero matrix of order m×n and is denoted by 0m×n. For

example, 0 =(

0 0 00 0 0

)is an example of a 2× 3 null matrix. If any one of aij ’s is non zero,

then A is said to a non-zero matrix.

Diagonal matrix

A square matrix A with all non-diagonal elements as zero, is called a diagonal matrix. For

example,(

1 00 4

),(

8 00 0

),(

0 00 0

)are the examples of diagonal matrices. So, for a diagonal

matrix A = [aij ], aij = 0, for i 6= j and it is denoted by A = diag(d11, d22, · · · , dnn). If ina diagonal matrix, all the elements are equal, then the diagonal matrix is called scalar orconstant matrix. Thus for a scalar matrix A = [aij ], we have,

aij = k; for i = j;= 0; for i 6= j

and is denoted by [k]. For example, the diagonal matrix(

2 00 2

)is scalar matrix.

Ex 3.1.1 If a matrix B commutes with a diagonal matrix, no two diagonal elements ofwhich are equal to each other, show that B must be a diagonal matrix.

Solution: Let A be a diagonal matrix of order n whose elements are

aij = ai δij ; 1 ≤ i, j ≤ n,

Matrix 151

where ai are scalars such that ai 6= aj if i 6= j. Let the ijth element of B be bij . Given thatAB = BA, so taking ijth elements of both sides, we have,

n∑p=1

aip bpj =n∑

p=1

bip apj

or,n∑

p=1

ai δip bpj =n∑

p=1

bip aj δpj

or, ai bij = bij aj ⇒ (ai − aj) bij = 0.

This shows that, if i 6= j, then bij = 0. The only elements of B which are likely to be differentfrom zero are the diagonal elements bii for 1 ≤ i ≤ n, proving that B is a diagonal matrix.

Identity matrix

If in a scalar matrix all the diagonal elements are unity, then it is called identity matrix orunit matrix. The nth order identity matrix is denoted by In and is written as

In =

1 0 · · · 00 1 · · · 0...

......

0 0 · · · 1

. (3.3)

The identity matrix can be written as I = [δij ], where δij is the kronecker delta, defined by,δij = 0, if i 6= j and δij = 1, if i = j. We shall denote the ith column of I by ei. Thus ei

has 1 in the ith position and 0’s elsewhere. A permutation matrix is a square matrix withentries 0’s and 1’s such that each row and each column contains exactly one 1.

Triangular matrix

If in a square matrix A = [aij ], all the elements below the diagonal are zero, i.e., aij = 0,for i > j, then the square matrix is said to be an upper triangular matrix and unit upper

triangular if aii = 1; aij = 0, i > j for all i, j. For example,

−8 4 90 4 70 0 6

is an upper

triangular matrix.If in a square matrix A = [aij ], all the elements above the diagonal are zero, i.e., aij = 0,

for i < j, then the square matrix is said to be an lower triangular matrix. For example,−8 0 02 4 01 3 6

is a lower triangular matrix and unit lower triangular if aii = 1; aij = 0, i < j

for all i, j. A square matrix A = [aij ] is said to be a triangular matrix, if it is either uppertriangular or lower triangular. In a diagonal matrix the non-diagonal elements are all zero,so diagonal matrix is both upper and lower triangular.

A matrix is said to be upper Hessenberg if aij = 0 when i > j + 1 and lower Hessenbergif aij = 0 for i < j − 1.

Ex 3.1.2 Find an upper triangular matrix A such that A3 =(

8 −570 27

).

Solution: Let the required upper triangular matrix be A =(a b0 c

), then,


A2 =(a b0 c

)(a b0 c

)=(a2 ab+ bc0 c2

)A3 = AA2 =

(a b0 c

)(a2 ab+ bc0 c2

)=(a3 a2b+ abc+ bc2

0 c3

),

⇒ a3 = 8; c3 = 27; a2b+ abc+ bc2 = −57

⇒ a = 2, c = 3, b = −3,⇒ A =(

2 −30 3

).

Trace of a matrix

The spur or trace of a square matrix A = [aij ]n×n is the sum of the diagonal elements as

trA = a11 + a22 + · · ·+ ann =n∑

i=1

aii. (3.4)

For example, the trace of the above matrices are 5 + 4 = 9 and 1 + 4 + 6 = 11 respectively.If A be an m × n real matrix, then tr(AAT ) ≥ 0, the equality occurs if A is a null matrix.If A and B are square matrices of the same order, then

(i) trA+ trB = tr(A+B).

(ii) trAT = trA.

(iii) tr(BA) = tr(AB).

For an m × n matrix if m 6= n, i.e., the number of rows not equal to the number of

columns, then the matrix is said to be a rectangular matrix. For example, A =(

1 6 72 4 3

)is

a rectangular matrix or order 2× 3.

Ex 3.1.3 If A and B are any two 2 × 2 matrics, show that AB − BA = I2 cannot holdunder any circumstances.

Solution: If possible, let, AB −BA = I2, then

tr(AB −BA) = tr(I2)or, tr(AB)− tr(BA) = tr(I2) = 1 + 1 = 2.

But tr(AB) = tr(BA); hence this cannot hold. Therefore, AB−BA = I2 cannot hold underany circumstances.

Band matrix

A real matrix A = [aij ]m×n is said to be band matrix with bandwidth k if

aij = 0 for |i− j| > k. (3.5)

If k = 1, then the matrix is called tridiagonal and if k = 0, then it is called diagonal. It iscalled diagonally dominant if

|aii| ≥n∑

j=1;i 6=j

|aij |; i = 1, 2, ..., n. (3.6)

Equations containing a diagonal matrix can be easily solved and hence some algorithms forsolution of linear equations actually try to transform the original matrix to an equivalentdiagonal form.

Matrix Operations 153

Filled and sparse matrix

If most elements of a matrix are nonzero, then it is said to be filled, while if most of theelements are zero, then it is said to be sparse.

3.2 Matrix Operations

In this section, we are to define the algebraic operations on matrices that will produce newmatrices out of given matrices. These operations are useful in application of matrices.

3.2.1 Equality of matrices

Two matrices A = [aij ]m×n and B = [bij ]m×n are said to be equal iff they are of the sameorder and each element of A is equal to the corresponding element of B, i.e., aij = bij for all

i and j. For example, A =(

23 52

32 43

)and B =

(8 259 64

)are equal matrices. Two matrices

are said to be comparable, if they are of the same type.

Ex 3.2.1 Find the values of x, y, z and u which satisfy the matrix equation(x+ 3 2y + xz − 1 4u− 6

)=(

0 −73 2u

).

Solution: Since the matrices are equal, x + 3 = 0, 2y + x = −7, z − 1 = 3, 4u − 6 = 2u.Solution of these equations is x = −3, z = 4, y = −2 and u = 3. Hence the required valuesof x, y, z, u are −3,−2, 4, 3 respectively.

3.2.2 Matrix Addition

For addition of two matrices, the matrices must be of same order. Let A = [aij ]m×n andB = [bij ]m×n be two given matrices of the same order. The sum of A and B, denoted byA+B, is obtained by adding the corresponding elements of A and B as

A+B = C = [cij ]m×n,

then the elements of C can be written as

cij = aij + bij ; 1 ≤ i ≤ m, 1 ≤ j ≤ n.

Let, A =(

5 67 8

), B =

(3 12 0

)and C =

(2 1 39 6 −1

)be three matrices of order 2×2, 2×2, 2×3

respectively. As A and B are in same order, A+B is defined and

A+B =(

5 67 8

)+(

3 12 0

)=(

5 + 3 6 + 17 + 2 8 + 0

)=(

8 79 8

).

Since the two matrices A and C are not of same order, then they are not conformable foraddition, i.e., A+C and hence B+C are not defined. Matrix subtraction works in the sameway, except that the elements are subtracted instead of added. For example, if

A =[a1 b1 c1a2 b2 c2

]and B =

[x1 y1 z1x2 y2 z2

]then,

A−B =[a1 − x1 b1 − y1 c1 − z1a2 − x2 b2 − y2 c2 − z2

].


3.2.3 Matrix Multiplication

Multiplication of matrices by a scalar

If A is a matrix [aij ]m×n and k is a scalar quantity, then the product kA or Ak is the matrix[bij ]m×n where bij = kaij . Thus, if k ∈ F and A = [aij ]m×n be any matrix, then

P = kA = [pij ]m×n; where pij = kaij ; 1 ≤ i ≤ m, 1 ≤ j ≤ n.

Therefore, we see that for scalar multiplication, each element of P is obtained by multiplyingthe corresponding element of A by k. The negative of A is obtained by multiplying by (−1)scalarly. The difference between two matrices A and B of same order m× n is defined as

A−B = A+ (−1)B.

For example, Let, A =(

5 67 8

)and B =

(3 12 0

)be two matrices of order 2× 2 respectively.

So, the scalar multiplication by 3 of A and A−B is given by

3A = 3(

5 67 8

)=(

3.5 3.63.7 3.8

)=(

15 1821 24

)A−B =

(5 67 8

)−(

3 12 0

)=(

5− 3 6− 17− 2 8− 0

)=(

2 55 8

).

Two m× n matrices A and B are equal, if (A− B) equals to the null matrix. Let A,B betwo matrices such that A+B and AB is defined. Then the following properties are satisfied:

(i) kA = Ak.

(ii) k(A+B) = kA+ kB; k ∈ F

(iii) (k + l)A = kA+ lA; k, l ∈ F

(iv) A(kB) = k(AB) = (kA)B.

Thus, the scalar multiplication of matrices is commutative, associative and distributive. IfA1, A2, · · · , Ak are m × n matrices and c1, c2, · · · , ck are scalars, then an expression of theform

c1A1 + c2A2 + · · ·+ ckAk

is called a linear combination of A1, A2, · · · , Ak and c1, c2, · · · , ck are called coefficients.

Theorem 3.2.1 Matrix addition is commutative as well as associative.

Proof: Let A = [aij ]m×n, B = [bij ]m×n and C = [cij ]m×n be three matrices of same order,so that A+B,B + C,A+ C,B +A,C +A,C +B are all defined. Let

X = A+B = [xij ]m×n; where, xij = pij + qij

Y = B +A = [yij ]m×n; where, yij = qij + pij .

Here, X any Y are of same orders and

xij = pij + qij = qij + pij ; as pij , qij ∈ F= yij ; for all 1 ≤ i ≤ m and 1 ≤ j ≤ n.

or, A+B = B +A.


Hence the matrix addition is commutative. Now,

(A+B) + C = [xij ]m×n + [cij ]m×n

= [rij ]m×n; where, rij = xij + cij = aij + bij + cij ,

A+ (B + C) = [aij ]m×n + [sij ]m×n

= [tij ]m×n; where, tij = aij + sij = aij + bij + cij .

Since, rij = tij for every pair of i and j, we have (A + B) + C = A + (B + C). Therefore,matrix addition is associative.

Since matrix addition is associative, we can define A+B +C as the matrix A+ (B +C)which is the same as (A+B) + C. We can extended it as

n∑i=1

Ai = A1 +A2 + · · ·+An.

Multiplication of a matrix by another matrix

If the number of columns of a matrix A be equal to the number of rows of another matrix B,then the matrices A and B are said to be conformable for the product AB and the productAB is said to be defined.

The number of rows and the number of columns of C are equal to the number of rows ofA and the number of columns of B, respectively. Let

A =[a1 b1 c1a2 b2 c2

]and B =

x1 y1x2 y2x3 y3

, then

AB =−−−−−−−→[a1 b1 c1a2 b2 c2

]yx1 y1x2 y2x3 y3

=[a1.x1 + b1.x2 + c1.x3 a1.y1 + b1.y2 + c1.y3a2.x1 + b2.x2 + c2.x3 a2.y1 + b2.y2 + c2.y3

].

For the product of two matrices A and B, the number of columns of the matrix A mustbe equal to the number of rows of matrix B, otherwise it is impossible to find the product ofA and B. Let A = [aij ]m×p and B = [bij ]p×n be two matrices. Here A,B are conformablefor the product AB. The ijth element is obtained by multiplying the ith row of A by thejth column of B. Hence,

a11 a12 · · · a1p

a21 a22 · · · a2p

......

...am1 am2 · · · amp

b11 b12 · · · b1n

b21 b22 · · · b2n

......

...bp1 bp2 · · · bpn

=

c11 c12 · · · c1n

c21 c22 · · · c2n

......

...cn1 cn2 · · · cnn

.

In the product, the matrix A is called the pre-factor and B is called the post-factor. Clearly,AB is the m× n matrix, whose ijth element

cij = ai1b1j + ai2b2j + · · ·+ aipbpj =p∑

k=1

aikbkj . (3.7)

In the product, we say that B is pre-multiplied by A and B is post-multiplied by B. Inorder that both AB and BA should exist, if A be of order m× n, B be of order n×m.

In general matrix multiplication is not commutative. The difference between the twomatrices AB and BA is known as the commutator of A and B and is denoted by

[A,B] = AB −BA. (3.8)


If should be clear that [B,A] = −[A,B]. If, in particular, AB is equal to BA, the twomatrices A and B are said to be commute with each other. AB and BA are equal onlywhen both the matrix A and B are square matrix of same order. The anticommutator ofthe matrices A and B, denoted by A,B is defined by,

A,B = AB +BA. (3.9)

Ex 3.2.2 Consider the matrices A =(

1 53 −2

), B =

(2 3 −14 0 5

). Here A is of order 2 × 2

and B is of order 2× 3. So the product AB is defined and

AB =(

1.2 + 5.4 1.3 + 5.0 1.(−1) + 5.53.2 + (−2).4 3.3 + (−2).0 3.(−1) + (−2).5

)=(

22 3 24−2 9 −13

)which is of order 2× 3. Notice that BA is not defined here.

Ex 3.2.3 Consider the matrices A =(

1 53 −2

)and B =

(−2 14 6

), then,

AB =(

18 31−14 −9

), BA =

(1 −1222 8

).

Hence AB,BA both are defined but AB 6= BA. The commutator of A and B is

[A,B] = AB −BA =(

18 31−14 −9

)−(

1 −1222 8

)=(

17 43−36 −17

).

The anticommutator of A and B is

A,B = AB +BA =(

18 31−14 −9

)+(

1 −1222 8

)=(

19 198 −1

).

Ex 3.2.4 Consider the matrices P =(

2 33 5

)and Q =

(1 00 1

). Here

PQ =(

2 33 5

)(1 00 1

)=(

2 33 5

)= QP

So we can conclude that, if A is an m × p matrix and B is a p × n matrix, then AB is anm×n matrix. BA is not defined if m 6= n. If m = n, then order of AB and BA are differentsizes. Even if both AB and BA are defined they may not be of the same order and hencemay not be equal. Even if AB and BA are defined and are same order they may not beequal.

Result 3.2.1 In ordinary algebra, we know,ab = 0 ⇒ either a = 0 or b = 0.

But in matrix theory, if AB = 0, then it is not necessarily imply that either A = 0 or B = 0.For example, let,

A =(

1 22 4

), B =

(6 −4−3 2

), then

AB =(

1 22 4

)(6 −4−3 2

)=(

0 00 0

).

In this case A is called the left divisor of zero and B is called right divisor of zero.


Result 3.2.2 In ordinary algebra, we know,ab = ac⇒ either a = 0 or b = c.

But in matrix theory, if AB = AC, then it is not necessarily imply that either A = 0 orB = C. For example, let,

A =(

1 22 4

), B =

(4 2 63 6 9

)and C =

(0 6 85 4 8

),

then AB =(

10 14 2420 28 48

)= AC.

but neither A = 0 nor B = C.

Result 3.2.3 Let us consider the matrix multiplication with special structures.

(i) The multiplication of(−13

)and [2 4] gives

(−13

)[2 4] =

(−2 −46 12

).

(ii) Let us consider AT = [1 0 − 2] and B =

235

, then ATB = [−8].

(iii) Let CT = [1 0 − 3 4], then

BCT =

235

[1 0 − 3 4] =

2 0 −6 83 0 −9 125 0 −15 20

.

Ex 3.2.5 If A =[

3 −41 −1

], prove that An =

[1 + 2n −4nn 1− 2n

], where n is a positive integer.

Solution: We shall prove this by using the principle of mathematical induction. Now,

A2 = A.A =[

3 −41 −1

] [3 −41 −1

]=[

3.3 + (−4).1 3.(−4) + (−4).(−1)1.3 + (−1).1 1.(−4) + (−1).(−1)

]=[

5 −82 −3

]=[

1 + 2.2 −4.22 1− 2.2

]A3 =

[5 −82 −3

] [3 −41 −1

]=[

7 −123 −5

]=[

1 + 2.3 −4.33 1− 2.3

].

Thus the result is true for n = 1, 2, 3. Let the result be true for n = k, then,

Ak+1 = Ak.A =[

1 + 2k −4kk 1− 2k

] [3 −41 −1

]=[

3 + 2k −4− 4kk + 1 −1− 2k

]=[

1 + 2(k + 1) −4(k + 1)k + 1 1− 2(k + 1)

].

Thus the result is true for n = k+1 if it is true for n = k, but it is true for n = 1, 2, 3. Thus

by the principle of mathematical induction, An =[

1 + 2n −4nn 1− 2n

].

Theorem 3.2.2 Matrix multiplication is associative.


Proof: Let A = [aij ]m×n, B = [bjk]n×p and C = [ckj ]p×q be three matrices such that theproducts A(BC) and (AB)C are defined. We are to show that A(BC) = (AB)C. Now,

AB = [dik]m×p; BC = [ejl]n×q,

where, dik =n∑

j=1

aijbjk and ejl =p∑

k=1

bjkckl. Now, (AB)C = [uil]m×q, where

uil =p∑

k=1

dikckl =p∑

k=1

n∑j=1

aijbjkckl.

Now, A(BC) = [vil]m×q, where,

vil =n∑

j=1

aijcjl =n∑

j=1

p∑k=1

aijbjkckl.

Since the sums are equal, i.e., corresponding elements in (AB)C and A(BC) are equal, soA(BC) = (AB)C.

Theorem 3.2.3 Matrix multiplication is distributive over addition, i.e., if A,B,C be threematrices such that A(B + C), AB and AC, BC are defined, then

(i) A(B + C) = AB +AC, left distributive,

(ii) (A+B)C = AC +BC, right distributive.

Proof: Let A = [aij ]m×n, B = [bjk]n×p and C = [ckj ]p×q be three matrices such thatA(B + C), AB and AC, BC are defined. Now,

B + C = [djk]n×p, where, djk = bjk + cjk.

Let, A(B + C) = [eik]m×p, then,

eik =n∑

j=1

aijdjk =n∑

j=1

aij(bjk + cjk)

=n∑

j=1

aijbjk +n∑

j=1

aijcjk.

Let AB = [fik]m×p and AC = [gik]m×p, then fik =n∑

j=1

aijbjk, gik =n∑

j=1

aijcjk. If AB+AC =

[hik]m×p, then,hik = fik + gik =

n∑j=1

aijbjk +n∑

j=1

aijcjk.

As the corresponding elements of A(B + C) and AB + AC are equal, so we conclude thatA(B+C) = AB+AC. Similarly, the right distributive (A+B)C = AC +BC. Also, if k bea scalar, then k(AB) = (kA)B = A(kB). Using distributive laws, we can prove that(

k∑i=1

Ai

) l∑j=1

Bj

=k∑

i=1

l∑j=1

AiBj ,

where the summation on the RHS can be taken in any order.


Definition 3.2.1 Let A be a square matrix. For any positive integer m, Am is defined, as

Am = A.A. · · ·A(m times ).

when A is an n × n non-null matrix, we define, A0 = In, in analogy with real numbers.Using the laws of matrix multiplication, it is easy to see that for a square matrix A,

AmAn = Am+n and (Am)n = Amn

for non-negative integers m and n. It is important that, (AB)n 6= AnBn, in general, theequality holds only when AB = BA. Also, it follows that AmAn = AnAm, i.e., the powersof A commute.

Ex 3.2.6 If

413

A =

−4 8 4−1 2 1−3 6 3

, then find A.

Solution: Let the given equation be of the form XA = B. Since the size of the matrix Xis 3× 1 and that of the matrix B therefore the size of the matrix A should be 1× 3. Hencewe can take A = [a b c]. Now, from the given relation we have4

13

[a b c] =

−4 8 4−1 2 1−3 6 3

or,

4a 4b 4ca b c

3a 3b 3c

=

−4 8 4−1 2 1−3 6 3

.

Equating both sides we get, 4a = −4, 4b = 8, 4c = 4; a = −1, b = 2, c = 1; 3a = −3, 3b =6, 3c = 3. Therefore, a = −1, b = 2, c = 1. Hence the required matrix A is [−1 2 1].

Ex 3.2.7 If n ∈ N and A =[

cos θ sin θ− sin θ cos θ

]then show that An =

[cosnθ sinnθ− sinnθ cosnθ

].

Solution: Here we use the principle of mathematical induction. Now,

A2 =[


] [cos θ sin θ− sin θ cos θ

]=[

cos2 θ − sin2 θ 2 sin θ cos θ−2 sin θ cos θ cos2 θ − sin2 θ

] [cos 2θ sin 2θ− sin 2θ cos 2θ

].

Thus the result be true for n = 2. Let the result be true for n = k. Now,

Ak+1 = AkA =[

cos kθ sin kθ− sin kθ cos kθ

] [cos θ sin θ− sin θ cos θ

]=[

cos θ cos kθ − sin kθ sin θ sin kθ cos θ + sin θ cos kθ− sin kθ cos θ − sin θ cos kθ cos θ cos kθ − sin kθ sin θ

]=[

cos(k + 1)θ sin(k + 1)θ− sin(k + 1)θ cos(k + 1)θ

].

Therefore, the result is true for n = k + 1 if the result is true for n = k. But, the resultis true for n = 2. Hence the result is true for n = 2 + 1 = 3, 3 + 1 = 4, . . .. Thus, byMathematical induction the result is true for n = 2, 3, 4, . . . etc, i.e., for any positive integer.


3.2.4 Transpose of a Matrix

Let A = [aij ]m×n be a given matrix. An n×m matrix, obtained by interchanging rows andcolumns of A as AT = [aji]n×m is said to be the transpose of the matrix A. For example,let,

A =(

2 3 63 5 −7

)and B = [2 − 1 5]

then, AT =

2 33 56 −7

and BT =

2−15

.

Thus, transpose of the transpose of a matrix is the given matrix itself, i.e., (AT )T = A.

Theorem 3.2.4 If A and B be two matrices such that A+B is defined, then (A+B)T =AT +BT .

Proof: Let A = [aij ]m×n and B = [bij ]m×n be two given matrices such that A + B isdefined. Also, let, A+B = [cij ]m×n, where, cij = aij + bij . Now,

ijth element of (A+B)T = jith element of (A+B)= jith element of [cij ]m×n = jith element of [aij + bij ]m×n

= jith element of [aij ]m×n + jith element of [bij ]m×n

= ijth element of [aji + ijth element of bji]m×n

= ijth element of AT + ijth element of BT

= ijth element of (A+B)T .

Also, order of (A+B)T is n×m and order of (AT +BT ) is n×m. Hence, (A+B)T = AT +BT .If k be a scalar, then (kA)T = kAT . If A,B be two matrices of same order, then,

(rA+ sB)T = rAT + sBT ,

provided s, t are scalars.

Theorem 3.2.5 If A,B be two matrices of appropriate sizes, then (AB)T = BTAT .

Proof: Let A = [aij ]m×n and B = [bjk]n×p be two given matrices such that AB is definedand the order is m× p. Also, order of AT is n×m and order of BT is p× n, so that orderof BTAT is p×m. Therefore,

order of (AB)T = order of BTAT .

Now, ijth element of AB is obtained by multiplying ith row of A with kth column of B,which is

ai1b1k + ai2b2k + · · ·+ ainbnk.

Also, ikth element of AB = kith element of (AB)T , which is

ai1b1k + ai2b2k + · · ·+ ainbnk.

Also, column k of B becomes kth row of BT and ith row of A becomes ith column of AT .Now,

kith element of BTAT = [b1k b2k · · · bnk][ai1 ai2 · · · ain]T

= b1kai1 + b2kai2 + · · ·+ bnkain.

= ai1b1k + ai2b2k + · · ·+ ainbnk

= kith element of (AB)T .

Few Matrices 161

Therefore, (AB)T = BTAT , i.e., transpose of the product of two matrices is equal to theproduct of their transposes taken in reverse order. This statement can be extended to severalmatrices as

(A.B · · ·KL)T = LTKT · · ·BTAT .

This can be proved by induction. From this result, it follows, if A is a square matrix, then(An)T = (AT )n, n ∈ N .

Ex 3.2.8 Find the matrices A and B such that 2A+ 3B = I2 and A+B = 2AT .

Solution: The given system is 2A + 3B = I2 and A + B = 2AT . Now from the firstequation, we have B = 2AT −A. Therefore, from the first equation we have,

2A+ 3B = I2 ⇒ 2A+ 6AT − 3A = I2

⇒ −A+ 6AT = I2 ⇒ −AT + 6A = I2; taking transpose.

Solving the equations −A+ 6AT = I2,−AT + 6A = I2, we get, A = 15I2. Using the relation

B = 2AT −A, we get, B = 15I2.

Ex 3.2.9 Find the matrices A and B such that

2A+BT =(

2 510 2

)and AT + 2B =

(1 84 1

).

Solution: Taking the transpose of first equation we get,

2AT +B =(

2 105 2

). Also, AT + 2B =

(1 84 1

)⇒ −3B =

(0 −6−3 0

)⇒ B =

(0 21 0

).

From the first given equation we get,

A =12

[(2 510 2

)−BT

]=

12

[(2 510 2

)−(

0 12 0

)]=

12

(2− 0 5− 110− 2 2− 0

)=(

1 24 1

).

3.3 Few Matrices

3.3.1 Nilpotent Matrix

For a least positive integer r, ifAr = 0, the null matrix, (3.10)

then the non-null matrix A is said to be nilpotent matrix of order r. The least value of r iscalled the index of it. For example, let

A =(

2 4−1 −2

), then, A2 =

(0 00 0

).

Therefore, A is a nilpotent matrix of index 2.

Ex 3.3.1 Show that A =(ab b2

−a2 −ab

)is a nilpotent matrix of index 2.


Solution: We are to show that, A2 = 0. Now,

A2 =(ab b2

−a2 −ab

)(ab b2

−a2 −ab

)=(

0 00 0

).

Therefore, the given matrix A is a nilpotent matrix of index 2.

Ex 3.3.2 Find non-null real matrices(a bc d

)such that it is a nilpotent matrix of index 2.

Solution: Let A =(a bc d

)be non-null real matrices such that A2 = 0. Therefore,

(a bc d

)(a bc d

)=(a2 + bc ab+ bdac+ cd bc+ d2

)=(

0 00 0

)⇒ a2 + bc = 0; ab+ bd = 0; ac+ cd = 0; bc+ d2 = 0⇒ a2 + bc = 0; a = −d.

Therefore, the non null matrices are given by(a bc −a

); a2 + bc = 0; a, b, c, d ∈ <

.

3.3.2 Idempotent Matrix

A matrix A is said to be idempotent matrix if A2 = A. For example, let

A =

2 −3 −5−1 4 51 −3 −4

, then, A2 =

2 −3 −5−1 4 51 −3 −4

= A.

Therefore, A is an idempotent matrix. Identity matrix is idempotent as I2 = I.

Ex 3.3.3 If A be an idempotent matrix of order n, show that In −A is also idempotent.

Solution: Since A is an idempotent matrix, so by definition, A2 = A. Now,

(In −A)2 = (In −A)(In −A) = I2n − InA−AIn +A2

= I2n − 2AIn +A = In − 2A+A = In −A.

Hence, if A is an idempotent matrix, the matrix In −A is so.

Ex 3.3.4 If A and B are two matrices such that AB = A and BA = B, then show thatAT , BT and A,B are idempotent.

Solution: From the given first relation, AB = A, we have,(AB)T = AT ⇒ BTAT = AT .

Also, as B = BA, we have,(BA)TAT = AT ⇒ ATBTAT = AT

⇒ ATAT = AT ⇒ (AT )2 = AT .

From the relation, BA = B, we have,(BA)T = BT ⇒ ATBT = BT .

Few Matrices 163

Also, as A = AB, we have,(AB)TBT = BT ⇒ BTATBT = BT

⇒ BTBT = BT ⇒ (BT )2 = BT .

Therefore, AT and BT are idempotent. Also,

A = AB = A(BA) = (AB)A = AA = A2

and B = BA = B(AB) = (BA)B = BB = B2.

This shows that A and B are also indempotent.

3.3.3 Involuntary Matrix

A matrix A is said to be involuntary matrix if A2 = I. For example, let

A =

−5 −8 03 5 01 2 −1

, then, A2 =

1 0 00 1 00 0 1

= I.

Identity matrix is also involuntary as I2 = I, and hence, identity matrix is involuntary aswell as idempotent matrix.

Ex 3.3.5 Find all non-null real matrices A =(a bc d

)such that it is an involuntary matrix.

Solution: Here, we are to find all non-null real matrices A such that A2 = I2. Therefore,(a bc d

)(a bc d

)=(a2 + bc ab+ bdac+ cd bc+ d2

)=(

1 00 1

)⇒ a2 + bc = 1; ab+ bd = 0; ac+ cd = 0; bc+ d2 = 1⇒ a = ±1, d = ±1, a+ d = 0, a2 + bc = 0.

Therefore, the non null real matrices are given by(a bc −a

); a2 + bc = 0; a, b, c, d ∈ <

, I2,−I2.

3.3.4 Periodic Matrix

A matrix A is said to be periodic matrix if Ak+1 = A, where k is a positive integer, where

the least value of k is the period of A. For example, let A =

2 −3 −5−1 4 51 −3 −4

, then, A2 = A.

Therefore, A is a periodic matrix of period 2.

3.3.5 Symmetric Matrices

A square matrix A = [aij ]n×n is said to be symmetric, if

AT = A, i.e., aij = aji,∀ pairs of (i, j). (3.11)

For example, the matrix A =

a h gh b fg f c

is a symmetric. In the symmetric matrix A, the

elements of A are symmetric with respect to main diagonal of A. A diagonal matrix is


always symmetric. Advantage of working with symmetric matrices A is that only half of Aneeds to be stored and the amount of calculation required is also halved.

A matrix A = [aij ]n×n is said to be pseudo-symmetric if

aij = an+1−j,n+1−i, ∀i and j. (3.12)

Now, note the following:

(i) If A,B be two symmetric matrices of the same order and c is a scalar, then A+B andcA are symmetric matrices.

(ii) However, if A,B are symmetric matrices of the same order, then AB may not be

symmetric. For example, let A =(

2 11 4

), B =

(3 44 1

), then AB =

(10 919 8

), which

is not symmetric. If A,B be two symmetric matrices of the same order then AB issymmetric if and only if AB = BA.

(iii) If A be a symmetric matrix, then An is symmetric for all n ∈ N .

(iv) The product of any matrix with its transpose is symmetric. If A be an m× n matrix,then AT and ATA are symmetric matrices of order m and n respectively.

(v) A matrix A is diagonal if and only if it is symmetric and upper triangular.

3.3.6 Skew-symmetric Matrices

A square matrix A = [aij ]n×n is said to be skew-symmetric, if

AT = −A, i.e., aij = −aji,∀ pairs of (i, j). (3.13)

For a skew symmetric matrix A = [aij ]n×n, we have by definition,

aii = −aii; for i = j ⇒ 2aii = 0, i.e., aii = 0.

So all the diagonal elements in a skew symmetric matrix is zero. For example, A =(

0 2−2 0

)is a skew symmetric matrix of order 2. If A be a skew-symmetric matrix, then An issymmetric or skew-symmetric according as n is even or odd positive integer.

Theorem 3.3.1 Every square matrix can be uniquely expressed as a sum of a symmetricmatrix and a skew symmetric matrix.

Proof: Let A be any given square matrix. Let us write,

A =12(A+AT ) +

12(A−AT ) = B + C, say,

where, B = 12 (A+AT ) and C = 1

2 (A−AT ). Now,

BT =12(A+AT )T =

12[AT + (AT )T

]=

12[AT +A] = B.

Therefore, B is a symmetric matrix. Again,

CT =12(A−AT )T =

12[AT − (AT )T

]=

12[AT −A] = −1

2[A−AT ] = −C.

Few Matrices 165

Therefore, C is a skew-symmetric matrix. So, every square matrix can be expressed as asum of a symmetric matrix and a skew symmetric matrix. Now, we are to show that therepresentation is unique. For this, let A = M + N, where M is symmetric and N is skewsymmetric. Now,

AT = (M +N)T = MT +NT = M −N

⇒ A+AT = 2M ; A−AT = 2N,

⇒ B =12(A+AT ) and C =

12(A−AT ).

Thus, the representation is unique. Therefore, every square matrix can be uniquely expressedas a sum of a symmetric matrix and a skew symmetric matrix.

Ex 3.3.6 Express A =

2 5 −37 −1 1−1 3 4

as a sum of a symmetric and skew symmetric matrix.

Solution: For the given matrix A, we have, AT =

2 7 −15 −1 3−3 1 4

. Now,

A+AT =

2 5 −37 −1 1−1 3 4

+

2 7 −15 −1 3−3 1 4

=

4 12 −412 −2 4−4 4 8

A−AT =

2 5 −37 −1 1−1 3 4

−

2 7 −15 −1 3−3 1 4

=

0 −2 −22 0 −22 2 0

.

Now the symmetric matrix is 12 (A + AT ) and the skew symmetric matrix is 1

2 (A − AT ).Therefore, 2 5 −3

7 −1 1−1 3 4

=12

4 12 −412 −2 4−4 4 8

+12

0 −2 −22 0 −22 2 0

.

This representation is unique. Therefore, the given square matrix A can be uniquely ex-pressed as a sum of a symmetric matrix and a skew symmetric matrix.

Ex 3.3.7 Show that (I3 −A)(I3 +A) is a symmetric matrix, where A is a 3× 3 symmetricor a skew symmetric matrix.

Solution: If A is symmetric, then AT = A and skew-symmetric if AT = −A. Let B =(I3 −A)(I3 +A), then,

B = (I3 −A)(I3 +A) = I3 +A−A−A2 = I3 −A2

BT = [(I3 −A)(I3 +A)]T = [I3 −A2]T = I3 − (AT )2 = I3 −A2,

whatever, A is a symmetric or a skew symmetric matrix. Hence, BT = B and consequentlyB = (I3 −A)(I3 +A) is a symmetric matrix.

3.3.7 Normal Matrix

A real matrix A is normal if it commutes with its transpose AT , i.e., if AAT = ATA. If A issymmetric, orthogonal or skew symmetric, then A is normal. There are also other normal


matrices. For example, let, A =(

6 −33 6

), then,

AAT =(

6 −33 6

)(6 3−3 6

)=(

45 00 45

).

Since AAT = ATA, the matrix A is normal.

3.4 Determinants

A very important issue in the study of matrix algebra is the concept of determinant. In thissection, various properties of determinant are studied. The methods for its computationand one of its application are discussed.

The x eliminant of two linear equations a11x+ a12 = 0 and a21x+ a22 = 0 is

−a12

a11= −a22

a21, ; i.e., a11a22 − a12a21 = 0.

Now, the expression (a11a22 − a12a21) can be written in the form∣∣∣∣a11 a12

a21 a22

∣∣∣∣ . Let A = [aij ]

be a square matrix of order n. We define the determinant of A of order n as∣∣∣∣∣∣∣∣∣a11 a12 · · · a1n

a21 a22 · · · a2n

......

...an1 an2 · · · ann

∣∣∣∣∣∣∣∣∣ (3.14)

and it is denoted by detA or |A| or 4. If we consider the matrix A of order 2, then the

determinant of order 2 is |A| =∣∣∣∣a11 a12

a21 a22

∣∣∣∣ , which is the x−eliminant given by the above.

Similarly, for a matrix A of order 3, we have

|A| =

∣∣∣∣∣∣a11 a12 a13

a21 a22 a23

a31 a32 a33

∣∣∣∣∣∣ = a11

∣∣∣∣a22 a23

a32 a33

∣∣∣∣− a12

∣∣∣∣a21 a23

a31 a33

∣∣∣∣+ a13

∣∣∣∣a21 a22

a31 a32

∣∣∣∣= a11(a22a33 − a23a32)− a12(a21a33 − a23a31) + a13(a21a32 − a22a31),

which is the x, y eliminant of the system of equations

a11x+ a12y + a13 = 0; a21x+ a22y + a23 = 0; a31x+ a32y + a33 = 0.

An nth order determinant contains n! terms in its expression of which 12n! terms are positive

and remaining 12n! terms are negative. For example, let

A =

1 −2 02 1 3−1 0 2

, then ,

|A| = 1(1.2− 3.0)− (−2)[2.2− (−1).3] + 0[2.0− (−1).1]= 2 + 14 + 0 = 16.

Definition 3.4.1 Let Mnn be the set of all square matrices of order n, whose elementsbelong to a field of scalars F . Then a mapping f : Mnn → F , which assigns to each matrixA ∈ Mnn, which is called the determinant function on the set Mnn and it is denoted by

Determinants 167

detA, or |A| or det(aij). The determinant associated with a square matrix A = [aij ] of ordern× n is scalar (a real or a complex number) defined by

det A = det(aij) =∑

σ

Sgn σ a1σ(1)a2σ(2) · · · anσ(n),

where σ is the permutation(

1 2 · · · nσ(1) σ(2) · · · σ(n)

)and Sgn σ = ±1 according as σ is even or

an odd permutation, the summation extends over all possible permutations σ(1), σ(2), . . . , σ(n)of n second subscripts in a’s. det A is said to be a determinant of order n and is denoted

by vertical bars

∣∣∣∣∣∣∣∣a11 a12 · · · a1n

a21 a22 · · · a2n

· · · · · · · · · · · ·an1 an2 · · · ann

∣∣∣∣∣∣∣∣ or shortly by |aij |n.

The summation∑σ

is said to be the expansion of det A. It contains n! terms as there

are n! permutations of the set 1, 2, . . . , n. Since there are 12n! even and 1

2n! odd permu-tations on the set 1,2, . . ., n, the expansion of det A contains 1

2n! positive terms and 12n!

negative terms. Each term is a product of n elements. The first subscript of a’s run over1, 2, . . . , n in natural order and the second subscript is a permutation σ(1), σ(2), . . . , σ(n)on the set 1, 2, . . . , n, each of which can occur only once. It is observed that each producta1σ(1)a2σ(2) · · · anσ(n) is constituted by taking one and only one element from each row andeach column of A and has a positive or a negative sign depending on whether Sgn σ is evenor odd.

Let A =[a11 a12

a21 a22

]. Then det A =

∑σSgn σ a1σ(1)a2σ(2), where σ is the permutation on

the set 1, 2. There are two permutations on 1, 2, they are σ1 =(

1 21 2

), σ2 =

(1 22 1

),

σ1 is even and σ2 is odd. Therefore,

det A = Sgn σ1 a11a22 + Sgn σ2 a12a21 = a11a22 − a12a22.

Thus det A =∣∣∣∣a11 a12

a21 a22

∣∣∣∣ = a11a22 − a12a21.

Let A =

a11 a12 a13

a21 a22 a23

a31 a32 a33

; det A =∑σSgn σ a1σ(1)a2σ(2) · · · anσ(n), where σ is a permutation

on 1, 2, 3. There are six permutations on 1, 2, 3, they are

σ1 =(

1 2 31 2 3

), σ2 =

(1 2 31 3 2

), σ3 =

(1 2 32 1 3

), σ4 =

(1 2 32 3 1

),

σ5 =(

1 2 33 1 2

), σ6 =

(1 2 33 2 1

).

Among them σ1, σ4, σ5 are even and σ2, σ3, σ6 are odd. Therefore,det A = Sgn σ1 a11a22a33 + Sgn σ2 a11a23a32 + Sgn σ3 a12a21a33

+Sgn σ4 a12a23a31 + Sgn σ5 a13a21a32 + Sgn σ6 a13a22a31

= a11a22a33 − a11a23a32 − a12a21a33 + a12a23a31 + a13a21a32 − a13a22a31.

If the first two columns of A are adjoined to its right, then the expansion of a 3× 3 deter-minant can be obtained as the product of diagonal elements with the assigned signs shownin Fig. 3.1.

Each product of det A is obtained by taking the row subscripts in the natural orderperforming permutation among the column subscript, and hence it is known as the row


a11 a12 a13 a11 a12

a21 a22 a23 a21 a22

a31 a32 a33 a31 a32j j j j jj− − − + + +

Figure 3.1: Row expansion of a third order determinant.

expansion of det A. Similarly, det A obtained by taking the column subscripts in a naturalorder and making all possible permutations among the row subscripts. Thus

det A =∑

σ

Sgn σ aσ(1)1aσ(2)2 · · · aσ(n)n. (3.15)

Ex 3.4.1 Find the number of 2 × 2 matrices over Z3( the field with three elements) withdeterminant 1. [IIT-JAM’10]

Solution: Let us take a 2× 2 matrix as A =[a bc d

]where a, b, c, d can take three 0, 1, 2

values. From the given condition |ad− bc| = 1. Thus either ad = 1, bc = 0 or ad = 0, bc = 1.Now bc = 0, is possible as

b = 0, c = 1; b = 0, c = 0; b = 1, c = 0; b = 2, c = 0; b = 0, c = 1and ad = 1 either through a = 1, b = 1; a = 2, b = 2. Therefore total number of such matrixis = 2× 2× 5 = 20. Also when ad = 2, bc = 1 or ad = 1, bc = 2, then |ad− bc| = 1. Numberof such matrix = 2 + 2 = 4.

Therefore, the number of 2 × 2 matrices over Z3( the field with three elements) withdeterminant 1 is 24.

Ex 3.4.2 Let Dn be a determinant of order n in which the diagonal elements are 1 andthose just above and just below the diagonal elements are a and all other elements are zero.Prove that D4 −D3 + a2D2 = 0 and hence find the value of

44 =

∣∣∣∣∣∣∣∣1 1

2 0 012 1 1

2 00 1

2 1 12

0 0 12 1

∣∣∣∣∣∣∣∣ .Solution: According to the definition of Dn, we get the form of D4 as

D4 =

∣∣∣∣∣∣∣∣1 a 0 0a 1 a 00 a 1 a0 0 a 1

∣∣∣∣∣∣∣∣ =∣∣∣∣∣∣1 a 0a 1 a0 a 1

∣∣∣∣∣∣− a

∣∣∣∣∣∣a a 00 1 a0 a 1

∣∣∣∣∣∣= D3 − a.a

∣∣∣∣ 1 aa 1

∣∣∣∣ = D3 − a2D2.

For the particular given determinant, we have,

44 = 43 −(

12

)2

42 =

∣∣∣∣∣∣1 1

2 012 1 1

20 1

2 1

∣∣∣∣∣∣−(

12

)2 ∣∣∣∣ 1 12

12 1

∣∣∣∣=(

1− 14

)− 1

2

(12− 0)− 1

4

(1− 1

4

)=

516.

Determinants 169

Property 3.4.1 The determinant of a matrix and its transpose are equal, i.e., |A| = |AT |.

Proof: First, consider a matrix A of order 2, then∣∣∣∣a11 a12

a21 a22

∣∣∣∣ = a11a22 − a12a21 = a11a22 − a21a12 = |AT |.

Let us consider a matrix of order 3, then

|A| =

∣∣∣∣∣∣a11 a12 a13

a21 a22 a23

a31 a32 a33

∣∣∣∣∣∣ = a11

∣∣∣∣a22 a23

a32 a33

∣∣∣∣− a12

∣∣∣∣a21 a23

a31 a33

∣∣∣∣+ a13

∣∣∣∣a21 a22

a31 a32

∣∣∣∣= a11(a22a33 − a23a32)− a12(a21a33 − a23a31) + a13(a21a32 − a22a31),= a11(a22a33 − a23a32)− a21(a13a32 − a12a33) + a31(a12a23 − a22a13),

=

∣∣∣∣∣∣a11 a21 a31

a12 a22 a32

a13 a23 a33

∣∣∣∣∣∣ = |AT |.

This property is true for any order of determinants. For example, let A =(

1 24 5

),

|A| =∣∣∣∣1 24 5

∣∣∣∣ = 1.5− 2.4 = −3

and |AT | =∣∣∣∣1 42 5

∣∣∣∣ = 1.5− 2.4 = −3.

Hence |A| = |AT |. From this property, we can say that a theorem which holds for some rowoperations on A, also holds equally well when corresponding column operations are madeon A.

Property 3.4.2 The interchange of two rows (or columns) of a square matrix A changesthe sign of |A|, but its value remains unaltered.

Proof: Let A =

a11 a12 a13

a21 a22 a23

a31 a32 a33

and A∗ =

a21 a22 a23

a11 a12 a13

a31 a32 a33

be the square matrices of order

3, where A∗ be obtained by interchanging any two rows of A. Therefore,

|A∗| =

∣∣∣∣∣∣a21 a22 a23

a11 a12 a13

a31 a32 a33

∣∣∣∣∣∣ = a11

∣∣∣∣a22 a23

a32 a33

∣∣∣∣− a12

∣∣∣∣a21 a23

a31 a33

∣∣∣∣+ a13

∣∣∣∣a21 a22

a31 a32

∣∣∣∣= a11(a22a33 − a23a32)− a12(a21a33 − a23a31) + a13(a21a32 − a22a31),= −a11(a22a33 − a23a32)− a12(a21a33 − a23a31) + a13(a21a32 − a22a31) = −|A|.

Similar proof for any interchange between two columns independently, by considering theequivalent from of the expression of |A|. This is true for any order of square matrices.

Property 3.4.3 If in a square matrix A of order n, two rows (columns) are equal or iden-tical, then the value of |A| = 0.

Proof: Let |A| = 4, the value of the determinant. We know, the interchange of any tworows or columns of a determinant changes the sign of the determinant without changing itsnumerical value. Hence the matrix A remain unchanged. Therefore,

−4 = |A| = 4⇒ 24 = 0 ⇒4 = |A| = 0.

If |A| = 0, then the matrix A is called singular matrix, otherwise it is non-singular.


Property 3.4.4 Let the elements of mth row of A are all zero, then if we expand thedeterminant of A with respect to the mth row, each term in the expression, contains a factorzero. Hence the value of |A| is 0. Thus, if a row or a column of any matrix consists entirelyof zeros, then |A| = 0.

Result 3.4.1 If two rows (or columns) of a matrix A become identical, for x = a, then(x − a) is a factor of |A|. Further, if r rows (or columns) become identical for x = a, then(x− a)r−1 is a factor of |A|.

Ex 3.4.3 Prove without expanding that |A| =

∣∣∣∣∣∣a2 a 1b2 b 1c2 c 1

∣∣∣∣∣∣ = −(a− b)(b− c)(c− a).

Solution: Let us consider the elements of A as polynomials in a. When a = b, two rows ofthe matrix A become identical. Therefore a− b is a factor of |A|. Now, let us consider theelements of A as polynomials in b. When b = c, two rows of the matrix A become identical.Therefore b − c is a factor of |A|. Similarly, c − a is a factor of |A|. Also, we see that, theexpression of |A| is a polynomial in a, b and c of degree 3. The leading term in the expansionof |A| is a2b. No other term in the expansion of |A| is a2b. Therefore,

|A| = k(a− b)(b− c)(c− a),

where the constant k is independent of a, b, c. Equating coefficients of a2b from both sidesof this equality, we get

1 = k.1.1.(−1) ⇒ k = −1.

Therefore, |A| = −(a− b)(b− c)(c− a).

Property 3.4.5 If every element of any row (or column) of a matrix A be multiplied by afactor k, then |A| is multiplied by the same factor k.

Property 3.4.6 If we add k times the elements of any row (column) of a matrix A tothe corresponding elements of any other row (column), the value of the determinant of A

remains unchanged. Therefore,∣∣∣∣a bc d

∣∣∣∣, ∣∣∣∣ a bc+ ak d+ bk

∣∣∣∣, ∣∣∣∣a+ bk bc+ dk d

∣∣∣∣ are of the same value.

Also, if in an m× n matrix A, if one row (column) be expressed as a linear combination ofother rows(columns) then |A| = 0.

Property 3.4.7 If every element of any row (or column) of a matrix A can be expressedas the sum of two quantities, then the determinant can also be expressed as the sum of twodeterminants. Thus,∣∣∣∣∣∣∣∣∣

a11 + k1 a12 + k2 · · · a1n + kn

a21 a22 · · · a2n

......

...an1 an2 · · · ann

∣∣∣∣∣∣∣∣∣ =∣∣∣∣∣∣∣∣∣a11 a12 · · · a1n

a21 a22 · · · a2n

......

...an1 an2 · · · ann

∣∣∣∣∣∣∣∣∣+∣∣∣∣∣∣∣∣∣k1 k2 · · · kn

a21 a22 · · · a2n

......

...an1 an2 · · · ann

∣∣∣∣∣∣∣∣∣ .Property 3.4.8 Let f1(x), f2(x), g1(x) and g2(x) are differentiable functions of the realvariable x, then

d

dx

∣∣∣∣f1(x) f2(x)g1(x) g2(x)

∣∣∣∣ =∣∣∣∣ d

dxf1(x)ddxf2(x)

g1(x) g2(x)

∣∣∣∣+ ∣∣∣∣ f1(x) f2(x)ddxg1(x)

ddxg2(x)

∣∣∣∣=∣∣∣∣ d

dxf1(x) f2(x)ddxg1(x) g2(x)

∣∣∣∣+ ∣∣∣∣f1(x) ddxf2(x)

g1(x) ddxg2(x)

∣∣∣∣ .

Determinants 171

This result can be extended to any finite order of determinants.

Ex 3.4.4 If f(x) =

xn sinx cosxn! sin nπ

2 cos nπ2

a a2 a3

, then show that f (n)(0) = 0.

Solution: According to the property of the derivative of the determinant, we get,

f (n) =

n! sin(x+ nπ

2

)cos(x+ nπ

2

)n! sin nπ

2 cos nπ2

a a2 a3

+

xn sinx cosx0 0 0a a2 a3

+


2 cos nπ2

0 0 0

.

Therefore, [f (n)(x)]x=0 is given by,

[f (n)(x)]x=0 =

n! sin nπ2 cos nπ

2n! sin nπ

2 cos nπ2

a a2 a3

+

xn sinx cosx0 0 0a a2 a3

+


2 cos nπ2

0 0 0

= 0 + 0 + 0 = 0.

3.4.1 Product of Determinants

The product of two determinants of order n is also a determinant of the order n. Let |aij |and |bij | be two determinants of order n. Then their product is defined by,

|aij |.|bij | = |cij |; where, cij =n∑

k=1

aikbkj , i.e.,

∣∣∣∣∣∣∣∣∣a11 a12 · · · a1n

a21 a22 · · · a2n

......

...an1 an2 · · · ann

∣∣∣∣∣∣∣∣∣ .∣∣∣∣∣∣∣∣∣b11 b12 · · · b1n

b21 b22 · · · b2n

......

...bn1 bn2 · · · bnn

∣∣∣∣∣∣∣∣∣ =

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣

n∑k=1

a1kb1k

n∑k=1

a1kb2k · · ·n∑

k=1

a1kbnk

n∑k=1

a2kb1k

n∑k=1

a2kb2k · · ·n∑

k=1

a2kbnk

......

...n∑

k=1

ankb1k

n∑k=1

ankb2k · · ·n∑

k=1

ankbnk

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣.

This rule of multiplication is called the ‘matrix rule’ or the rule of ‘multiplication of rows bycolumns.’ Since interchange of rows and column does not alter the value of the determinant,hence the product can be obtained in other forms also, cij may be taken also as cij =

n∑k=1

aikbjk. This rule of multiplication is called the rule of ‘multiplication of rows.’ Similarly,

we can define ‘multiplication of rows by columns.’ From the definition, if A and B be squarematrices of the same order, then |AB| = |A|.|B|.

Ex 3.4.5 Prove without expanding

∣∣∣∣∣∣∣∣1 a a2 a3 + bcd1 b b2 b3 + cda1 c c2 c3 + dab1 d d2 d3 + abc

∣∣∣∣∣∣∣∣ = 0


Solution: The given determinant can be written in the form

∆ =


∣∣∣∣∣∣∣∣ =∣∣∣∣∣∣∣∣1 a a2 a3

1 b b2 b3

1 c c2 c3

1 d d2 d3

∣∣∣∣∣∣∣∣+∣∣∣∣∣∣∣∣1 a a2 bcd1 b b2 cda1 c c2 dab1 d d2 abc

∣∣∣∣∣∣∣∣= ∆1 + ∆2, say.

Now, we simplify ∆2 as

∆2 =1

abcd

∣∣∣∣∣∣∣∣a a2 a3 abcdb b2 b3 abcdc c2 c3 abcdd d2 d3 abcd

∣∣∣∣∣∣∣∣;R′1 = aR1, R

′2 = bR2

;R′3 = cR3, R′4 = dR4

=

∣∣∣∣∣∣∣∣a a2 a3 1b b2 b3 1c c2 c3 1d d2 d3 1

∣∣∣∣∣∣∣∣ = −

∣∣∣∣∣∣∣∣1 a a2 a3

1 b b2 b3

1 c c2 c3

1 d d2 d3

∣∣∣∣∣∣∣∣ = −∆1,

applying three successive interchanges to bring C4 to C1. Therefore ∆ = 0.

Ex 3.4.6 Let m,n ∈ N with m ≥ n− 1 ≥ 1 and(mr

)= mCr. Prove that∣∣∣∣∣∣∣∣∣∣∣

1 1 1 · · · 1(m1

) (m+1

1

) (m+2

1

)· · ·(m+n−1

1

)(m2

) (m+1

2

) (m+2

2

)· · ·(m+n−1

2

)...

......

......(

mn−1

) (m+1n−1

) (m+2n−1

)· · ·(m+n−1

n−1

)

∣∣∣∣∣∣∣∣∣∣∣= 1.

Solution: Let ∆n be the given determinant. Subtracting the preceding column from eachcolumn beginning with the second, we have,

∆n =

∣∣∣∣∣∣∣∣∣∣∣

1 0 0 · · · 0(m1

)1 1 · · · 1(

m2

) (m1

) (m+1

1

)· · ·(m+n−2

1

)...

......

......(

mn−1

) (m

n−1

) (m+1n−1

)· · ·(m+n−2

n−1

)

∣∣∣∣∣∣∣∣∣∣∣.

Expanding in terms of the first row,

∆n =

∣∣∣∣∣∣∣∣∣1 1 · · · 1(m1

) (m+1

1

)· · ·(m+n−2

1

)...

......

...(m

n−2

) (m+1n−2

)· · ·(m+n−2

n−2

)∣∣∣∣∣∣∣∣∣ = ∆n−1.

Therefore ∆n = ∆n−1 = ∆n−2 = · · · = ∆2. But

∆2 =∣∣∣∣ 1 1(

m1

) (m+1

1

) ∣∣∣∣ = 1.

Consequently, ∆n = 1.

Ex 3.4.7 Prove that

∣∣∣∣∣∣b2 + c2 a2 a2

b2 c2 + a2 b2

c2 c2 a2 + b2

∣∣∣∣∣∣ = 4a2b2c2. [WBUT 2007]

Determinants 173

Solution:

∣∣∣∣∣∣b2 + c2 a2 a2

b2 c2 + a2 b2

c2 c2 a2 + b2

∣∣∣∣∣∣ =∣∣∣∣∣∣

0 −2c2 −2b2

b2 c2 + a2 b2

c2 c2 a2 + b2

∣∣∣∣∣∣ [R′1 = R1 −R2 −R3]

= −2

∣∣∣∣∣∣0 c2 b2

b2 c2 + a2 b2

c2 c2 a2 + b2

∣∣∣∣∣∣ = −2

∣∣∣∣∣∣0 c2 b2

b2 a2 0c2 0 a2

∣∣∣∣∣∣ [R′2 = R2 −R1,R′3 = R3 −R1]

= −20− c2(a2b2) + b2(0− a2c2) = −2 (−2 a2 b2 c2) = 4a2b2c2.

Ex 3.4.8 If tan−1

√a− c

c+ x+ tan−1

√a− c

c+ y+ tan−1

√a− c

c+ z= 0, prove that∣∣∣∣∣∣

1 x (a+ x)√c+ x

1 y (a+ y)√c+ y

1 z (a+ z)√c+ z

∣∣∣∣∣∣ = 0.

Solution: Let α = tan−1√

a−cc+x , β = tan−1

√a−cc+y , γ = tan−1

√a−cc+z . Hence,from the given

condition we get,α+ β + γ = 0 ⇒ α+ β = −γ

or, tan(α+ β) = − tan(γ) or,tanα+ tanβ

1− tanα tanβ= − tan γ.

or, tanα+ tanβ + tan γ = tanα tanβ tan γ.

Again let, x + c = X2, y + c = Y 2, z + c = Z2 then, tanα =√

a−cX , tanβ =

√a−cY , tan γ =

√a−cZ . Therefore, from the above relation we get,

√a− c

[1X

+1Y

+1Z

]=

(√a− c)3

XY Z

or, Y Z + ZX +XY = a− c.

Now, the given determinant 4 becomes,

4 =

∣∣∣∣∣∣1 X2 − c (X2 + a− c)X1 Y 2 − c (Y 2 + a− c)Y1 Z2 − c (Z2 + a− c)Z

∣∣∣∣∣∣ ; as a+ x = a+X2 − c, etc.

=

∣∣∣∣∣∣1 X2 (X2 +XY + Y Z + ZX)X1 Y 2 (Y 2 +XY + Y Z + ZX)Y1 Z2 (Z2 +XY + Y Z + ZX)Z

∣∣∣∣∣∣ ; using C ′2 = C2 + cC1

=

∣∣∣∣∣∣1 X2 X2(X + Y + Z) +XY Z1 Y 2 Y 2(X + Y + Z) +XY Z1 Z2 Z2(X + Y + Z) +XY Z

∣∣∣∣∣∣ =∣∣∣∣∣∣1 X2 X2(X + Y + Z)1 Y 2 Y 2(X + Y + Z)1 Z2 Z2(X + Y + Z)

∣∣∣∣∣∣+∣∣∣∣∣∣1 X2 XY Z1 Y 2 XY Z1 Z2 XY Z

∣∣∣∣∣∣= (X + Y + Z)

∣∣∣∣∣∣1 X2 X2

1 Y 2 Y 2

1 Z2 Z2

∣∣∣∣∣∣+XY Z

∣∣∣∣∣∣1 X2 11 Y 2 11 Z2 1

∣∣∣∣∣∣ = (X + Y + Z).0 +XY Z.0 = 0

Ex 3.4.9 Show that the value of

∣∣∣∣∣∣cos(x+ a) sin(x+ a) 1cos(x+ b) sin(x+ b) 1cos(x+ c) sin(x+ c) 1

∣∣∣∣∣∣ independent of x.

Solution: If 4 be the value of the determinant, we have

4 =

∣∣∣∣∣∣cos(x+ a) sin(x+ a) 1

cos(x+ b)− cos(x+ a) sin(x+ b)− sin(x+ a) 0cos(x+ c)− cos(x+ a) sin(x+ c)− sin(x+ a) 0

∣∣∣∣∣∣


=∣∣∣∣ cos(x+ b)− cos(x+ a) sin(x+ b)− sin(x+ a)cos(x+ c)− cos(x+ a) sin(x+ c)− sin(x+ a)

∣∣∣∣=∣∣∣∣2 sin 2x+a+b

2 sin a−b2 2 cos 2x+a+b

2 sin b−a2

2 sin 2x+a+c2 sin a−c

2 2 cos 2x+a+c2 sin c−a

2

∣∣∣∣= sin

a− b

2sin

a− c

2

∣∣∣∣2 sin 2x+a+b2 −2 cos 2x+a+b

22 sin 2x+a+c

2 −2 cos 2x+a+c2

∣∣∣∣= 4 sin

a− b

2sin

a− c

2sin

c− b

2,

which is independent of x.

Ex 3.4.10 If A+B + C = π, then show that

∣∣∣∣∣∣sin2A cotA 1sin2B cotB 1sin2 C cotC 1

∣∣∣∣∣∣ = 0.

Solution:∣∣∣∣∣∣sin2A cotA 1sin2B cotB 1sin2 C cotC 1

∣∣∣∣∣∣ =∣∣∣∣∣∣

sin2A cotA 1sin2B − sin2A cotB − cotA 0sin2 C − sin2A cotC − cotA 0

∣∣∣∣∣∣ [R′2 = R2 −R1,R′3 = R3 −R1]

=∣∣∣∣ sin2B − sin2A cotB − cotAsin2 C − sin2A cotC − cotA

∣∣∣∣=∣∣∣∣ sin(B −A) sin(B +A) sin(A−B)/ sinA sinBsin(C −A) sin(C +A) sin(A− C)/ sinA sinC

∣∣∣∣= sin(B −A) sin(C −A)

∣∣∣∣ sin(B +A) −1/ sinA sinBsin(C +A) −1/ sinA sinC

∣∣∣∣ = 0,

as two rows are identical.

Ex 3.4.11 If 2s = a+b+c, show that

∣∣∣∣∣∣a2 (s− a)2 (s− a)2

(s− b)2 b2 (s− b)2

(s− c)2 (s− c)2 c2

∣∣∣∣∣∣ = 2s3(s−a)(s−b)(s−c).

Solution: Let s− a = α, s− b = β, s− c = γ. Hence

α+ β + γ = 3s− (a+ b+ c) = 3s− 2s = s,

β + γ = 2s− (b+ c) = a.

Similarly γ + α = b, α+ β = c. Now,

∆ =

∣∣∣∣∣∣(β + γ)2 α2 α2

β2 (γ + α)2 β2

γ2 γ2 (α+ β)2

∣∣∣∣∣∣=

∣∣∣∣∣∣(β + γ)2 − α2 0 α2

0 (γ + α)2 − β2 β2

γ2 − (α+ β)2 γ2 − (α+ β)2 (α+ β)2

∣∣∣∣∣∣ (C1′ = C1 − C3, C2

′ = C2 − C3)

= (α+ β + γ)2

∣∣∣∣∣∣β + γ − α 0 α2

0 γ + α− β β2

γ − α− β γ − α− β (α+ β)2

∣∣∣∣∣∣= (α+ β + γ)2

∣∣∣∣∣∣β + γ − α 0 α2

0 γ + α− β β2

−2β −2α 2αβ

∣∣∣∣∣∣ (R3′ = R3 −R1 −R2)

Determinants 175

= 2(α+ β + γ)21αβ

∣∣∣∣∣∣αβ + γα− α2 0 α2

0 βγ + αβ − β2 β2

−αβ −αβ αβ

∣∣∣∣∣∣ (C1′′ = C1

′α, C2′′ = C2

′β)

= 2(α+ β + γ)21αβ

∣∣∣∣∣∣αβ + γα α2 α2

β2 βγ + αβ β2

0 0 αβ

∣∣∣∣∣∣ (C1′′′ = C1

′′ + C3, C2′′′ = C2

′′ + C3)

= 2(α+ β + γ)2∣∣∣∣α(β + γ) α2

β2 β(γ + α)

∣∣∣∣ = 2(α+ β + γ)2αβ(β + γ)(γ + α)− αβ

= 2αβγ(α+ β + γ)3 = 2s3(s− a)(s− b)(s− c).

Ex 3.4.12 Show that

∣∣∣∣∣∣(b+ c)2 c2 b2

c2 (c+ a)2 a2

b2 a2 (a+ b)2

∣∣∣∣∣∣ = 2(ab+ bc+ ca)3.

Solution: Using the properties of the determinants, we have,

4 =

∣∣∣∣∣∣(b+ c)2 c2 b2

c2 (c+ a)2 a2

b2 a2 (a+ b)2

∣∣∣∣∣∣ = 1a2b2c2

∣∣∣∣∣∣(ab+ ca)2 b2c2 b2c2

c2a2 (bc+ ab)2 c2a2

a2b2 a2b2 (ca+ bc)2

∣∣∣∣∣∣=

1a2b2c2

∣∣∣∣∣∣(ab+ ca)2 b2c2 − (ab+ ca)2 b2c2 − (ab+ ca)2

c2a2 (bc+ ab)2 − c2a2 0a2b2 0 (ca+ bc)2 − a2b2

∣∣∣∣∣∣ ; C2 − C1, C3 − C1

=(bc+ ca+ ab)2

b2c2

∣∣∣∣∣∣(b+ c)2 bc− ab− ca bc− ab− cac2 bc+ ab− ca 0b2 0 bc− ab+ ca

∣∣∣∣∣∣=

(bc+ ca+ ab)2

b2c2

∣∣∣∣∣∣2bc −2ab −2cac2 bc+ ab− ca 0b2 0 bc− ab+ ca

∣∣∣∣∣∣ ; R1 − (R2 +R3)

=2(bc+ ca+ ab)2

ab3c3

∣∣∣∣∣∣abc −abc −cabc2a bc2 + abc− c2a 0ab2 0 b2c− ab2 + bca

∣∣∣∣∣∣=

2(bc+ ca+ ab)2

ab3c3

∣∣∣∣∣∣abc 0 0c2a bc2 + abc c2aab2 ab2 b2c+ bca

∣∣∣∣∣∣ ; C2 + C1, C3 + C1

=2(bc+ ca+ ab)2

b2c2(bc2 + abc)(bca+ b2c)− a2b2c2 = 2(bc+ ca+ ab)3.

Ex 3.4.13 For a fixed positive integer n, if 4 =

∣∣∣∣∣∣n! (n+ 1)! (n+ 2)!

(n+ 1)! (n+ 2)! (n+ 3)!(n+ 2)! (n+ 3)! (n+ 4)!

∣∣∣∣∣∣, then show that

[ 4(n!)3 − 4] is divisible by n.

Solution: Taking out n!, (n+ 1)! and (n+ 2)! from the first, second and third row respec-tively, we get,

4 = n!(n+ 1)!(n+ 2)!

∣∣∣∣∣∣1 (n+ 1) (n+ 1)(n+ 2)1 (n+ 2) (n+ 2)(n+ 3)1 (n+ 3) (n+ 3)(n+ 4)

∣∣∣∣∣∣= (n!)3(n+ 1)2(n+ 2)

∣∣∣∣∣∣1 (n+ 1) (n+ 1)(n+ 2)0 1 2(n+ 2)0 1 2(n+ 3)

∣∣∣∣∣∣ ; R2 −R1, R3 −R1


= (n!)3(n+ 1)2(n+ 2).2

⇒ 4(n!)3

− 4 = 2(n+ 1)2(n+ 2)− 4 = 2n(n2 + 4n+ 5).

Therefore, [ 4(n!)3 − 4] is divisible by n.

Ex 3.4.14 Show that

∣∣∣∣∣∣−bc bc+ b2 bc+ c2

ca+ a2 −ca ca+ c2

ab+ a2 ab+ b2 −ab

∣∣∣∣∣∣ = (ab+ bc+ ca)3.

Solution: Let the value of the determinant be 4, then,

4 =1abc

∣∣∣∣∣∣−abc abc+ ab2 abc+ ac2

bca+ ba2 −bca bca+ bc2

cab+ ca2 cab+ cb2 −cab

∣∣∣∣∣∣=

1abc

∣∣∣∣∣∣abc+ a2b+ a2c abc+ ab2 + b2c abc+ ac2 + bc2



∣∣∣∣∣∣ ;R1 +R2 +R3

=ab+ bc+ ca

abc

∣∣∣∣∣∣a b c



∣∣∣∣∣∣ = (ab+ bc+ ca)

∣∣∣∣∣∣1 1 1

bc+ ab −ca ab+ bcbc+ ac ac+ bc −ab

∣∣∣∣∣∣= (ab+ bc+ ca)

∣∣∣∣∣∣1 0 0

bc+ ab −ca− bc− ab 0ac+ ac 0 −ab− bc− ac

∣∣∣∣∣∣ ;C2 − C1, C3 − C1

= (ab+ bc+ ca)3

∣∣∣∣∣∣1 0 0

bc+ ab −1 0ac+ ac 0 −1

∣∣∣∣∣∣ = (ab+ bc+ ca)3.

Ex 3.4.15 Show that

∣∣∣∣∣∣1 + a2 − b2 2ab −2b

2ab 1− a2 + b2 2a2b −2a 1− a2 − b2

∣∣∣∣∣∣ = (1 + a2 + b2)3.


4 =

∣∣∣∣∣∣1 + a2 + b2 0 −2b

0 1 + a2 + b2 2ab(1 + a2 + b2) −a(1 + a2 + b2) 1− a2 − b2

∣∣∣∣∣∣ ;C1 − bC3, C2 + aC3

= (1 + a2 + b2)2

∣∣∣∣∣∣1 0 −2b0 1 2ab −a 1− a2 − b2

∣∣∣∣∣∣= (1 + a2 + b2)2(1− a2 − b2 + 2a2)− 2b(0− b) = (1 + a2 + b2)3.

Ex 3.4.16 Show that

∣∣∣∣∣∣1 + a 1 1

1 1 + b 11 1 1 + c

∣∣∣∣∣∣ = abc(1 + 1

a + 1b + 1

c

).


4 = abc

∣∣∣∣∣∣1 + 1

a1b

1c

1a 1 + 1

b1c

1a

1b 1 + 1

c

∣∣∣∣∣∣ = abc

∣∣∣∣∣∣1 + 1

a + 1b + 1

c1b

1c

1 + 1a + 1

b + 1c 1 + 1

b1c

1 + 1a + 1

b + 1c

1b 1 + 1

c

∣∣∣∣∣∣ ;C1 + C2 + C3

Determinants 177

= abc

(1 +

1a

+1b

+1c

) ∣∣∣∣∣∣1 1

b1c

1 1 + 1b

1c

1 1b 1 + 1

c

∣∣∣∣∣∣= abc

(1 +

1a

+1b

+1c

) ∣∣∣∣∣∣1 1

b1c

0 1 00 0 1

∣∣∣∣∣∣ ;R2 −R1, R3 −R1

= abc

(1 +

1a

+1b

+1c

)1− 0 = abc

(1 +

1a

+1b

+1c

).

Ex 3.4.17 Show that

∣∣∣∣∣∣a b ax+ byb c bx+ cy

ax+ by bx+ cy 0

∣∣∣∣∣∣ = (b2 − ac)(ax2 + 2bxy + cy2).


4 =

∣∣∣∣∣∣a b 0b c 0

ax+ by bx+ cy −ax2 − 2bxy − cy2

∣∣∣∣∣∣ ;C3 − xC1 − yC2

= (ax2 + 2bxy + cy2)

∣∣∣∣∣∣a b 0b c 0

ax+ by bx+ cy −1

∣∣∣∣∣∣= (ax2 + 2bxy + cy2)−1(ac− b2) = (b2 − ac)(ax2 + 2bxy + cy2).

Ex 3.4.18 Show that

∣∣∣∣∣∣0 (a− b)2 (a− c)2

(b− a)2 0 (b− c)2

(c− a)2 (c− b)2 0

∣∣∣∣∣∣ = 2(b− c)2(c− a)2(a− b)2.


4 =

∣∣∣∣∣∣(a− a)2 (a− b)2 (a− c)2

(b− a)2 (b− b)2 (b− c)2

(c− a)2 (c− b)2 (c− c)2

∣∣∣∣∣∣ =∣∣∣∣∣∣a2 a 1b2 b 1c2 c 1

∣∣∣∣∣∣∣∣∣∣∣∣1 −2a a2

1 −2b b2

1 −2c c2

∣∣∣∣∣∣=

∣∣∣∣∣∣a2 a 1

b2 − a2 b− a 0c2 − a2 c− a 0

∣∣∣∣∣∣∣∣∣∣∣∣1 −2a a2

0 −2b+ 2a b2 − a2

0 −2c+ 2a c2 − a2

∣∣∣∣∣∣= (b− a)(c− a)

∣∣∣∣∣∣a2 a 1b+ a 1 0c+ a 1 0

∣∣∣∣∣∣ (b− a)(c− a)

∣∣∣∣∣∣1 −2a a2

0 −2 b+ a0 −2 c+ a

∣∣∣∣∣∣= (b− a)2(c− a)2(b+ a− c− a)(−2c− 2a+ 2b+ 2a) = 2(b− c)2(c− a)2(a− b)2.

Ex 3.4.19 Show that

∣∣∣∣∣∣∣∣x l m 1α x n 1α β x 1α β γ 1

∣∣∣∣∣∣∣∣ = (x− α)(x− β)(x− γ) for any value of l,m, n.


4 =

∣∣∣∣∣∣∣∣x l m 1

α− x x− l n−m 0α− x β − l x−m 0α− x β − l γ −m 0

∣∣∣∣∣∣∣∣ ;R2 −R1, R3 −R1, R4 −R1


= −

∣∣∣∣∣∣α− x x− l n−mα− x β − l x−mα− x β − l γ −m

∣∣∣∣∣∣ = −(α− x)

∣∣∣∣∣∣1 x− l n−m1 β − l x−m1 β − l γ −m

∣∣∣∣∣∣= −(α− x)

∣∣∣∣∣∣1 x− l n−m0 β − x x− n0 β − x γ − n

∣∣∣∣∣∣ = −(α− x)∣∣∣∣β − x x− nβ − x γ − n

∣∣∣∣= −(α− x)(β − x)

∣∣∣∣1 x− n1 γ − n

∣∣∣∣ = (x− α)(x− β)(x− γ),

which is independent on the value of l,m, n and consequently, 4 = (x−α)(x− β)(x− γ) istrue for any values of l,m, n.

Ex 3.4.20 Prove that,

∣∣∣∣∣∣(b+ c)2 a2 a2

b2 (c+ a)2 b2

c2 c2 (a+ b)2

∣∣∣∣∣∣ = 2abc(a + b + c)3. [WBUT 2004,

2009]

Solution:∣∣∣∣∣∣(b+ c)2 a2 a2

b2 (c+ a)2 b2

c2 c2 (a+ b)2

∣∣∣∣∣∣=

∣∣∣∣∣∣(b+ c)2 a2 − (b+ c)2 a2 − (b+ c)2

b2 (c+ a)2 − b2 0c2 0 (a+ b)2 − c2

∣∣∣∣∣∣ [C ′2 = C2 − C1, C

′3 = C3 − C1]

= (a+ b+ c)2

∣∣∣∣∣∣(b+ c)2 a− (b+ c) a− (b+ c)b2 (c+ a)− b 0c2 0 (a+ b)− c

∣∣∣∣∣∣= (a+ b+ c)2

∣∣∣∣∣∣2bc −2c −2bb2 (c+ a)− b 0c2 0 (a+ b)− c

∣∣∣∣∣∣ [R′1 = R1 −R2 −R3]

= 2(a+ b+ c)2

∣∣∣∣∣∣bc −c −bb2 (c+ a)− b 0c2 0 (a+ b)− c

∣∣∣∣∣∣= 2(a+ b+ c)2

1bc

∣∣∣∣∣∣bc bc −bcb2 bc+ ab− b2 0c2 0 ac+ bc− c2

∣∣∣∣∣∣= 2(a+ b+ c)2

∣∣∣∣∣∣1 −1 1b2 bc+ ab− b2 0c2 0 ac+ bc− c2

∣∣∣∣∣∣[taking common bc from first row]

= 2(a+ b+ c)2

∣∣∣∣∣∣1 0 0b2 bc+ ab b2

c2 c2 ac+ bc

∣∣∣∣∣∣ [C ′2 = C2 + C1, C

′3 = C3 + C1]

= 2(a+ b+ c)2∣∣∣∣ b(a+ c) b2

c2 c(a+ b)

∣∣∣∣ = 2bc(a+ b+ c)2∣∣∣∣a+ c b

c a+ b

∣∣∣∣= 2bc(a+ b+ c)2(ac+ bc+ c2) = 2abc(a+ b+ c)3.

Ex 3.4.21 Show that

∣∣∣∣∣∣a2 + λ ab acab b2 + λ bcac bc c2 + λ

∣∣∣∣∣∣ is divisible by λ2 and find the other factor.

Determinants 179

Solution:

∣∣∣∣∣∣a2 + λ ab acab b2 + λ bcac bc c2 + λ

∣∣∣∣∣∣ = a b c

∣∣∣∣∣∣a+ λ

a b ca b+ λ

b ca b c+ λ

c

∣∣∣∣∣∣[dividing first, second and third rows by a, b, c respectively]

=

∣∣∣∣∣∣a2 + λ b2 c2

a2 b2 + λ c2

a2 b2 c2 + λ

∣∣∣∣∣∣[multiplying first, second and third columns by a, b, c respectively]

=

∣∣∣∣∣∣λ+ a2 + b2 + c2 b2 c2

λ+ a2 + b2 + c2 b2 + λ c2

λ+ a2 + b2 + c2 b2 c2 + λ

∣∣∣∣∣∣ [C ′1 = C1 + C2 + C3]

= (λ+ a2 + b2 + c2)

∣∣∣∣∣∣1 b2 c2

1 b2 + λ c2

1 b2 c2 + λ

∣∣∣∣∣∣ [λ+ a2 + b2 + c2 is taking out]

= (λ+ a2 + b2 + c2)

∣∣∣∣∣∣1 b2 c2

0 λ 00 0 0λ

∣∣∣∣∣∣ [R′2 = R2 −R1, R′3 = R3 −R1]

= (λ+ a2 + b2 + c2) λ2.

Hence λ2 and λ+ a2 + b2 + c2 are only two factors of the given determinant.

Ex 3.4.22 If4∑

i=1

a2i =

4∑i=1

b2i =4∑

i=1

c2i =4∑

i=1

d2i = 1 and

4∑i=1

ai bi =4∑

i=1

bi ci =4∑

i=1

ci di =

4∑i=1

di ai =4∑

i=1

ai ci =4∑

i=1

bi di = 0 then show that∣∣∣∣∣∣∣∣a1 b1 c1 d1

a2 b2 c2 d2

a3 b3 c3 d3

a4 b4 c4 d4

∣∣∣∣∣∣∣∣ = ±1.

Solution: Let the given determinant be ∆. Then

∆2 =

∣∣∣∣∣∣∣∣a1 b1 c1 d1

a2 b2 c2 d2

a3 b3 c3 d3

a4 b4 c4 d4

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣a1 b1 c1 d1

a2 b2 c2 d2

a3 b3 c3 d3

a4 b4 c4 d4

∣∣∣∣∣∣∣∣Using multiplication by column rule, we get,

∆2 =

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣

4∑i=1

a2i

4∑i=1

ai bi4∑

i=1

ai ci4∑

i=1

ai di

4∑i=1

bi ai

4∑i=1

b2i4∑

i=1

bi ci4∑

i=1

bi di

4∑i=1

ci ai

4∑i=1

ci bi4∑

i=1

c2i4∑

i=1

ci di

4∑i=1

di ai

4∑i=1

di bi4∑

i=1

di ci4∑

i=1

d2i

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣=

∣∣∣∣∣∣∣∣1 0 0 00 1 0 00 0 1 00 0 0 1

∣∣∣∣∣∣∣∣= 1 ⇒ ∆ = ± 1.

Ex 3.4.23 Prove that∣∣∣∣∣∣∣∣a+ 1 a a aa a+ 2 a aa a a+ 3 aa a a a+ 4

∣∣∣∣∣∣∣∣ = 24(1 + a

1 + a2 + a

3 + a4

)[WBUT 2004]


Solution:∣∣∣∣∣∣∣∣a+ 1 a a aa a+ 2 a aa a a+ 3 aa a a a+ 4

∣∣∣∣∣∣∣∣ = (1.2.3.4)

∣∣∣∣∣∣∣∣1 + a

1a2

a3

a4

a1 1 + a

2a3

a4

a1

a2 1 + a

3a4

a1

a2

a3 1 + a

4

∣∣∣∣∣∣∣∣[dividing first, second, third and fourth columns by 1, 2, 3 and 4 respectively]

= 24

∣∣∣∣∣∣∣∣1 + a

1 + a2 + a

3 + a4

a2

a3

a4

1 + a1 + a

2 + a3 + a

4 1 + a2

a3

a4

1 + a1 + a

2 + a3 + a

4a2 1 + a

3a4

1 + a1 + a

2 + a3 + a

4a2

a3 1 + a

4

∣∣∣∣∣∣∣∣ [C′1 = C1 + C2 + C3 + C4]

= 24(1 +

a

1+a

2+a

3+a

4

) ∣∣∣∣∣∣∣∣1 a

2a3

a4

1 1 + a2

a3

a4

1 a2 1 + a

3a4

1 a2

a3 1 + a

4

∣∣∣∣∣∣∣∣[taking common 1 +

a

1+a

2+a

3+a

4from first column

]

= 24(1 +

a

1+a

2+a

3+a

4

) ∣∣∣∣∣∣∣∣1 a

2a3

a4

0 1 0 00 0 1 00 0 0 1

∣∣∣∣∣∣∣∣[R′2 = R2 −R1, R

′3 = R3 −R1, R

′4 = R4 −R1]

= 24(1 +

a

1+a

2+a

3+a

4

).1 = 24

(1 +

a

1+a

2+a

3+a

4

).

Ex 3.4.24 If u = ax4 + 4bx3 + 6cx2 + 4dx + e, u11 = ax2 + 2bx + c, u12 = bx2 + 2cx + d,u22 = cx2 + 2dx+ e then prove that∣∣∣∣∣∣∣∣

a b c u11

b c d u12

c d e u22

u11 u12 u22 0

∣∣∣∣∣∣∣∣ = −u

∣∣∣∣∣∣a b cb c dc d e

∣∣∣∣∣∣ .Solution:

∣∣∣∣∣∣∣∣a b c u11

b c d u12

c d e u22

u11 u12 u22 0

∣∣∣∣∣∣∣∣ =1x2

∣∣∣∣∣∣∣∣ax2 bx2 cx2 u11x

2

b c d u12

c d e u22

u11 u12 u22 0

∣∣∣∣∣∣∣∣=

1x2

∣∣∣∣∣∣∣∣ax2 + 2bx+ c bx2 + 2cx+ d cx2 + 2dx+ e u11x

2 + 2xu12 + u12

b c d u12

c d e u22

u11 u12 u22 0

∣∣∣∣∣∣∣∣[ using R′1 = R1 + 2R2x+R3]

=1x2

∣∣∣∣∣∣∣∣u11 u12 u22 ub c d u12

c d e u22

u11 u12 u22 0

∣∣∣∣∣∣∣∣ =1x2

∣∣∣∣∣∣∣∣0 0 0 ub c d u12

c d e u22

u11 u12 u22 0

∣∣∣∣∣∣∣∣ [R′1 = R1 −R4]

= − u

x2

∣∣∣∣∣∣b c dc d eu11 u12 u22

∣∣∣∣∣∣= − u

x2

∣∣∣∣∣∣b c dc d e

u11 − 2xb− c u12 − 2cx− d u22 − 2xd− e

∣∣∣∣∣∣ [R′3 = R3 − 2xR1 −R2]

Determinants 181

= − u

x2

∣∣∣∣∣∣b c dc d eax2 bx2 cx2

∣∣∣∣∣∣ = −u

∣∣∣∣∣∣b c dc d ea b c

∣∣∣∣∣∣ = −u

∣∣∣∣∣∣a b cb c dc d e

∣∣∣∣∣∣by interchanging first and third rows and then first and second rows.

3.4.2 Minors and Co-factors

Let us consider an n× n matrix A = [aij ]. Consider Mij , the (n− 1)× (n− 1) sub-matrixof A, which obtained by deleting the ith row and jth column of A. Now, |Mij | is called theminor of aij. The co-factor Aij of the element aij is defined as

Aij = (−1)i+j |Mij |. (3.16)

For the matrix A =

−1 2 43 −6 −5−2 0 1

, |M22|, A22 and |M31|, A31 are given by,

|M22| =∣∣∣∣−1 4−2 1

∣∣∣∣ = −1 + 8 = 7; A22 = (−1)2+2|M22| = 7.

|M31| =∣∣∣∣ 2 4−6 −5

∣∣∣∣ = −10 + 24 = 14; A31 = (−1)3+1|M31| = 14,

which are respectively the minors and co-factors. It is obvious that if (i+ j) be even, thenminor and co-factor of aij are same. Since each term in the expansion of a determinantcontains one element from any particular row (or column), we can express the expression asa linear function of that row (or column), for

|A| =

∣∣∣∣∣∣a11 a12 a13

a21 a22 a23

a31 a32 a33

∣∣∣∣∣∣ = a11

∣∣∣∣a22 a23

a32 a33

∣∣∣∣− a12

∣∣∣∣a21 a23

a31 a33

∣∣∣∣+ a13

∣∣∣∣a21 a22

a31 a32

∣∣∣∣= a11A11 + a12A12 + a13A13,

where A11, A12, A13 are the co-factors of a11, a12 and a13 respectively.Complementary minor, algebraic compliment, principal minor: Let M be aminor of order m in |A| = |aij |n×n. Now, if we delete all the rows and columns formingM , the minor N formed by the remaining rows and columns of order (n−m) is called thecomplementary minor of M . For the determinant

4 =

∣∣∣∣∣∣∣∣−1 2 4 63 −6 −5 0−2 0 1 92 7 5 4

∣∣∣∣∣∣∣∣ ;∣∣∣∣−1 2

3 −6

∣∣∣∣ and∣∣∣∣1 95 4

∣∣∣∣are complementary. Let M be a minor of order r in which rows i1, i2, · · · , ir and columnsj1, j2, · · · , jr are present. Now, the algebraic complement of M is

(−1)i1+i2+···+ir+j1+j2+···+jr × M ′; (3.17)

where M ′ is the algebraic complement of M . Algebraic complement of an element aij in|aij | is the co-factor of aij in |aij |. In the above example, 4 algebraic complement of∣∣∣∣−1 2

3 −6

∣∣∣∣ is (−1)1+2+1+2

∣∣∣∣1 95 4

∣∣∣∣ = ∣∣∣∣1 95 4

∣∣∣∣ .


If the row and column indices in a minor are the same or equivalently, then the minor is said

to be principal minor. In the above example, the principal minor of 4 is∣∣∣∣−6 0

7 4

∣∣∣∣ . If we take

the diagonal elements of minor from the diagonal elements of the matrix, then the indicesare equivalent. Since in a principal minor, sum of row and identical column subscripts arealways even, so the sign of minor is always positive. In this case, algebraic complement ofprincipal minor is equal to its complement.

Ex 3.4.25 If ∆ =

∣∣∣∣∣∣h a 01h

1b

1f

0 c f

∣∣∣∣∣∣ and ∆′ =

∣∣∣∣∣∣1bc − 1

ch1

fh

af −fh ch1

fh −1

af1ab −

1h2

∣∣∣∣∣∣, find ∆′

∆2 in its simplest form.

Solution: Hereadj ∆ =

∣∣∣∣∣∣fb −

cf −

fh

ch

−af fh −chaf −h

fhb −

ah

∣∣∣∣∣∣ = −cafh

∣∣∣∣∣∣1bc −

1f2 − 1

ch1

fh

af −fh ch1

fh − 1af

1ab −

1h2

∣∣∣∣∣∣taking out fc, −1, ah from 1st, 2nd and 3rd row respectively. Hence

∆2 = −cafh∆′, or,∆′

∆2= − 1

cafh.

Theorem 3.4.1 Laplace theorem : In an n order square matrix A, |A| can be expressedas the aggregate of the products of all minors of order r formed from any r selected rows ofA and corresponding algebraic complement of them.

According to this theorem, we expand |A| = |aij |4×4. Let the first two rows be selected. Soif we expand the determinant in terms of minors of ordered 2, we get,

|A| =∣∣∣∣a11 a12

a21 a22

∣∣∣∣ (−1)1+2+1+2

∣∣∣∣a33 a34

a43 a44

∣∣∣∣+ ∣∣∣∣a11 a13

a21 a23

∣∣∣∣ (−1)1+2+1+3

∣∣∣∣a32 a34

a42 a44

∣∣∣∣+∣∣∣∣a11 a14

a21 a24

∣∣∣∣ (−1)1+2+1+4

∣∣∣∣a32 a33

a42 a43

∣∣∣∣+ ∣∣∣∣a12 a13

a22 a23

∣∣∣∣ (−1)1+2+2+3

∣∣∣∣a31 a34

a41 a44

∣∣∣∣+∣∣∣∣a12 a14

a22 a24

∣∣∣∣ (−1)1+2+2+4

∣∣∣∣a31 a33

a41 a43

∣∣∣∣+ ∣∣∣∣a13 a14

a23 a24

∣∣∣∣ (−1)1+2+3+4

∣∣∣∣a31 a32

a41 a42

∣∣∣∣ .Ex 3.4.26 Using Laplace’s theorem, show that,

|A| =

∣∣∣∣∣∣∣∣a b c d−b a d −c−c −d a b−d c −b a

∣∣∣∣∣∣∣∣ = (a2 + b2 + c2 + d2)2.

Solution: Using Laplace’s theorem, we get,

|A| = (−1)1+2+1+2

∣∣∣∣ a b−b a

∣∣∣∣ ∣∣∣∣ a b−b a

∣∣∣∣+ (−1)1+2+1+3

∣∣∣∣ a c−b d

∣∣∣∣ ∣∣∣∣−d bc a

∣∣∣∣+(−1)1+2+1+4

∣∣∣∣ a d−b −c

∣∣∣∣ ∣∣∣∣−d ac −b

∣∣∣∣+ (−1)1+2+2+3

∣∣∣∣ b ca d∣∣∣∣ ∣∣∣∣−c b−d a

∣∣∣∣+(−1)1+2+2+4

∣∣∣∣ b da −c

∣∣∣∣ ∣∣∣∣−c a−d −b

∣∣∣∣+ (−1)1+2+3+4

∣∣∣∣ c dd −c

∣∣∣∣ ∣∣∣∣−c −d−d c

∣∣∣∣= (a2 + b2)(a2 + b2) + (ad+ bc)(ad+ bc) + (ac− bd)(ac− bd)

+(ac− bd)(ac− bd) + (ad+ bc)(ad+ bc) + (c2 + d2)(c2 + d2)= (a2 + b2)2 + 2(a2d2 + b2c2 − 2abcd+ a2c2 + b2d2 − 2abcd) + (c2 + d2)2

= (a2 + b2)2 + 2(a2 + b2)(c2 + d2) + (c2 + d2)2 = (a2 + b2 + c2 + d2)2.

Determinants 183

From this we conclude that, if a, b, c and d are real numbers, then the given determinant isnon-singular if and only if at least one of a, b, c, d is non-zero.

3.4.3 Adjoint and Reciprocal of Determinant

Let A = [aij ] be a square matrix of order n and Aij be the cofactor of aij in |A|. Now, |Aij |is called the adjoint or adjugate of |A|. Similarly, reciprocal or inverse is defined by

|A|′ =1|A|

|Aij |; where |A| 6= 0. (3.18)

Theorem 3.4.2 Jacobi’s theorem : Let A = [aij ] be a non-singular matrix of order nand Aij be the cofactor of aij in |A|, then |Aij | = |A|n−1.

Proof: Let A = [aij ]n×n be a square matrix of order n. Let Aij denotes the cofactor ofijth element of aij in detA. Now, |A|.|Aij |

=

∣∣∣∣∣∣∣∣∣a11 a12 · · · a1n

a21 a22 · · · a2n

......

...an1 an2 · · · ann

∣∣∣∣∣∣∣∣∣

∣∣∣∣∣∣∣∣∣A11 A21 · · · An1

A12 A22 · · · An2

......

...A1n A2n · · · Ann

∣∣∣∣∣∣∣∣∣ =

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣

n∑k=1

a1kA1k

n∑k=1

a1kA2k · · ·n∑

k=1

a1kAnk

n∑k=1

a2kA1k

n∑k=1

a2kA2k · · ·n∑

k=1

a2kAnk

......

...n∑

k=1

ankA1k

n∑k=1

ankA2k · · ·n∑

k=1

ankAnk

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣=

∣∣∣∣∣∣∣∣∣|A| 0 · · · 00 |A| · · · 0...

......

0 0 · · · |A|

∣∣∣∣∣∣∣∣∣ ; asn∑

k=1

aikAjk = |A|, if i = j and = 0, if i 6= j

= |A|n ⇒ |Aij | = |A|n−1, as |A| 6= 0.

Ex 3.4.27 If the adjugate of 4 =

∣∣∣∣∣∣a h gh b fg f c

∣∣∣∣∣∣ be 4′ =

∣∣∣∣∣∣A H GH B FG F C

∣∣∣∣∣∣, prove that BC−F 2

a =

CA−G2

b = AB−H2

c = 4 andGH −AF

F=HF −BG

G=FG− CH

H= 4.

Solution: Let us consider the following product∣∣∣∣∣∣1 0 0H B FG F C

∣∣∣∣∣∣×∣∣∣∣∣∣a h gh b fg f c

∣∣∣∣∣∣ =

∣∣∣∣∣∣1.a+ 0.h+ 0.g 1.h+ 0.b+ 0.f 1.g + 0.f + 0.cH.a+B.h+ F.g H.h+B.b+ F.f H.g +B.f + F.cG.a+ F.h+ C.g G.h+ F.b+ C.f G.g + F.f + C.c

∣∣∣∣∣∣=

∣∣∣∣∣∣a h g0 4 00 0 4

∣∣∣∣∣∣ = a42

⇒ (BC − F 2).4 = a42, i.e.,BC − F 2

a= 4.

Similarly, by considering the products∣∣∣∣∣∣A H G0 1 0G F C

∣∣∣∣∣∣×∣∣∣∣∣∣a h gh b fg f c

∣∣∣∣∣∣ and

∣∣∣∣∣∣A H GH B F0 0 1

∣∣∣∣∣∣×∣∣∣∣∣∣a h gh b fg f c

∣∣∣∣∣∣


we get,BC − F 2

a=CA−G2

b=AB −H2

c= 4 respectively. Now we consider the product∣∣∣∣∣∣

A H G0 0 1G F C

∣∣∣∣∣∣×∣∣∣∣∣∣a h gh b fg f c

∣∣∣∣∣∣ =∣∣∣∣∣∣4 0 0g f c0 0 4

∣∣∣∣∣∣⇒ (HG−AF )4 = f42 ⇒ HF −AF

f= 4.

Similarly, by considering the products∣∣∣∣∣∣A H GH B F1 0 0

∣∣∣∣∣∣×∣∣∣∣∣∣a h gh b fg f c

∣∣∣∣∣∣ and

∣∣∣∣∣∣0 1 0H B FG F C

∣∣∣∣∣∣×∣∣∣∣∣∣a h gh b fg f c

∣∣∣∣∣∣we get, HF−BG

G = FG−CHH = 4 respectively.

Ex 3.4.28 Prove that

∣∣∣∣∣∣bc− a2 ca− b2 ab− c2

ca− b2 ab− c2 bc− a2

ab− c2 bc− a2 ca− b2

∣∣∣∣∣∣ = (a3 + b3 + c3 − 3abc)2.

Solution: Let us consider the determinant ∆ =

∣∣∣∣∣∣a b cb c ac a b

∣∣∣∣∣∣ and its value is −(a3+b3+c3−3abc)

obtained by direct expansion. Now, adjoined of ∆ is ∆′ =

∣∣∣∣∣∣A B CB C AC A B

∣∣∣∣∣∣ where A,B,C are

cofactor of a, b, c in ∆. Therefore, A = bc − a2, B = ac − b2, C = ab − c2. By Jacobi’stheorem, ∆′ = ∆3−1 = ∆2,

or,

∣∣∣∣∣∣bc− a2 ca− b2 ab− c2



∣∣∣∣∣∣ = (a3 + b3 + c3 − 3abc)2.

3.4.4 Symmetric and Skew-symmetric Determinants

If A be a symmetric matrix of order n, then |A| is said to be symmetric determinant of ordern. If A be a skew-symmetric matrix of order n, Then |A| is said to be skew-symmetric deter-

minant of order n.

∣∣∣∣∣∣a h gh b fg f c

∣∣∣∣∣∣ and

∣∣∣∣∣∣0 a b−a 0 c−b −c 0

∣∣∣∣∣∣ are examples of symmetric and skew-symmetric

determinants respectively. Now,

(i) The adjoint of a symmetric determinant is symmetric.

(ii) The square of any determinant is symmetric.

(iii) In a skew-symmetric determinant |A| = |aij |n×n, Aij = (−1)n−1Aji, where Aij andAji are the cofactors of aij and aji in |A| respectively.

(iv) The adjoint of a skew-symmetric determinant of order n is symmetric or skew-symmetricaccording as n is even or odd.

Theorem 3.4.3 The value of every skew-symmetric determinant of odd order is zero.

Determinants 185

Proof: Let A be a skew-symmetric matrix of order n, where n is odd number. Hence bydefinition, AT = −A. Therefore,

|AT | = | −A| ⇒ |A| = (−1)n|A| = −|A|⇒ 2|A| = 0 ⇒ |A| = 0.

Therefore, the value of every skew-symmetric determinant of odd order is zero.

Theorem 3.4.4 The value of every skew-symmetric determinant of even order is a perfectsquare.

Proof: Let A = [aij ] be a skew-symmetric determinant of n, then by definition aij = −aji

and aii = 0. Let,

An =

0 a12 · · · a1n

a21 0 · · · a2n

......

...an1 an2 · · · 0

.

Now let, Aij be the cofactor of aij in |aij |. Then, Aij be a determinant of order (n − 1).Now, if we transform Aij in Aji, then every rows of it must be multiplied by (−1), therefore,Aij = (−1)n−1Aji. Thus,

adj|An| =

∣∣∣∣∣∣∣∣∣0 A12 · · · A1n

−A12 0 · · · A2n

......

...−A1n −A2n · · · 0

∣∣∣∣∣∣∣∣∣ .By Jacobi’s theorem, we have,∣∣∣∣ 0 A12

−A12 0

∣∣∣∣ = |An|

∣∣∣∣∣∣∣∣∣0 a34 · · · a3n

−a34 0 · · · a4n

......

...−a3n −a4n · · · 0

∣∣∣∣∣∣∣∣∣ = |An||An−2|.

Therefore, |An||An−2| = A212, which is a perfect square. For n > 2, it is true. When n = 0,

the result is obvious. Now,|A2| =

∣∣∣∣ 0 a12

−a12 0

∣∣∣∣ = a212,

which is a perfect square. If |A2| is perfect square, then |A4| is also a perfect square, byusing the relation, |An||An−2| = A2

12. Let the result be true for n = m, then it is true form + 2 as |Am+2||Am| is a perfect square. Also, it is true for n = 2, 4. Hence by method ofinduction the result is true for any even positive integer n.

Ex 3.4.29 Show that the value of 4 =

∣∣∣∣∣∣∣∣0 a b c−a 0 d e−b −d 0 f−c −e −f 0

∣∣∣∣∣∣∣∣ is a perfect square.

Solution: Expanding 4 by the minor of the elements of the first column, we get,

4 = a

∣∣∣∣∣∣a b c−d 0 f−e −f 0

∣∣∣∣∣∣− b

∣∣∣∣∣∣a b c0 d e−e −f 0

∣∣∣∣∣∣+ c

∣∣∣∣∣∣a b c0 d e−d 0 f

∣∣∣∣∣∣= af(af − be+ cd)− be(af − be+ cd) + cd(af − be+ cd)= (af − be+ cd)(af − be+ cd) = (af − be+ cd)2.

Since the given determinant 4 is a skew-symmetric determinant of even order, it is verifiedalso that, its value must be a perfect square.


3.4.5 Vander-Monde’s Determinant

The Vander-Monde’s determinant is defined by∣∣∣∣∣∣x2

0 x21 x

22

x0 x1 x2

1 1 1

∣∣∣∣∣∣ = (x0 − x1)(x0 − x2)(x1 − x2) =2∏

i<j=0

(xi − xj).

The difference product of 3 numbers x0, x1, x2 is defined as

D.P.(x0, x1, x2) = (x0 − x1)(x0 − x2)(x1 − x2) =2∏

i<j=0

(xi − xj).

Similarly the other DP are defined by

D.P.(x0, x1, x2, x3) = (x0 − x1)(x0 − x2)(x0 − x3)(x1 − x2)(x1 − x3)(x2 − x3)

=3∏

i<j=0

(xi − xj)

D.P.(x0, x1, ..., xn) =n∏

i<j=0

(xi − xj).

Therefore, the Vander-Monde’s determinant can be written in D.P. form as∣∣∣∣∣∣x2

0 x21 x

22

x0 x1 x2

1 1 1

∣∣∣∣∣∣ =2∏

i<j=0

(xi − xj) = D.P.(x0, x1, x2).

Hence the Vander-Monde’s determinant can be written in DP form. In general,∣∣∣∣∣∣∣∣x3

0 x31 · · · xn

n

· · · · · · · · · · · ·x0 x1 · · · xn

1 1 · · · 1

∣∣∣∣∣∣∣∣ =n∏

i<j=0

(xi − xj) = D.P.(x0, x1, x2, ..., xn).

3.4.6 Cramer’s Rule

Let us consider a system of n linear algebraic equations relating in n unknowns x1, x2, . . . , xn

of the explicit forma11x1 + a12x2 + . . .+ a1nxn = b1a21x1 + a22x2 + . . .+ a2nxn = b2

......

an1x1 + an2x2 + . . .+ annxn = bn

(3.19)

where the n2 coefficients aij and the n constants b1, b2, ..., bn are given real numbers. The(3.19) can be written in the matrix notation as Ax = b where the real n × n coefficientmatrix is A in which aij is the coefficient of xj in the ith equation, bT = [b1, b2, ..., bn] isa column n vector which are prescribed and xT = [x1, x2, ..., xn] is the unknown n columnvector to be computed. Equations (3.19) are said to be homogeneous system if bi = 0;∀iand non homogeneous system if at least one bi 6= 0.

This is simplest method for the solution of a nonhomogeneous system (3.19) of n linearequations in n unknowns. Here it will assume that ∆ = det(A) 6= 0, so that unique solutionfor x1, x2, ..., xn exists. From the properties of determinant, we have,

Determinants 187

x14 = x1

∣∣∣∣∣∣∣∣a11 a12 · · · a1n

a21 a22 · · · a2n

· · · · · · · · · · · ·an1 an2 · · · ann

∣∣∣∣∣∣∣∣ =∣∣∣∣∣∣∣∣x1a11 a12 · · · a1n

x1a21 a22 · · · a2n

· · · · · · · · · · · ·x1an1 an2 · · · ann

∣∣∣∣∣∣∣∣=

∣∣∣∣∣∣∣∣a11x1 + a12x2 + · · ·+ a1nxn a12 · · · a1n

a21x1 + a22x2 + · · ·+ a2nxn a22 · · · a2n

· · · · · · · · · · · ·an1x1 + an2x2 + · · ·+ annxn an2 · · · ann

∣∣∣∣∣∣∣∣C ′

1 = C1 + x2C2 + · · ·+ xnCn.

=

∣∣∣∣∣∣∣∣b1 a12 · · · a1n

b2 a22 · · · a2n

· · · · · · · · · · · ·bn an2 · · · ann

∣∣∣∣∣∣∣∣ [Using (3.19)] = 41(say). (3.20)

Therefore, x1 = 41/4. In general, let Aij be the cofactor of aij in ∆, then multiplyingboth sides of the ith equation by Aij for i = 1, 2, . . . , n and then adding we have,

(n∑

i=1

ai1 Aij) x1 + (n∑

i=1

ai2 Aij) x2 + · · ·+ (n∑

i=1

aij Aij) xj + · · ·+ (n∑

i=1

ain Aij) xn

=n∑

i=1

bi Aij .

we multiply each equation by the cofactor in A of the coefficient of xk in that equationand add the results we have,

4.xk =n∑

j=1

Ajkbj ; k = 1, 2, ..., n

where, Aij is the cofactor of aij in det(A) of order n − 1. Hence from equation (3.19) wehave n∑

k=1

aikxk = bi; i = 1, 2, ..., n

⇒ 1∆

n∑k=1

n∑j=1

Ajkbj =1∆

n∑j=1

[n∑

k=1

aikAjk]bj .

Since the RHS would reduce to 4 if b1, b2, ..., bn were replaced by a1k, a2k, ..., ank, so let∆i = determinant obtained from ∆, replacing ith column by the column vector b then theunique solution of (3.19) is given by

xi =∆i

∆= 4−1

n∑j=1

Aijbj ; i = 1, 2, ..., n. (3.21)

This is the well-known Cramer’s rule. Various methods have been devised to evaluate thevalue of a numerical determinants. The following cases may arise:

1. The homogeneous system bi = 0,∀i of equations has a trivial solution x1 = x2 = · · · =xn = 0, if ∆ 6= 0 and an non trivial solution ( at least one of x’s is non-zero) existwhen ∆ = 0. When 4 = 0, the solution is not unique.

2. The non homogeneous system of equations is said to be consistent if 4 6= 0. In thiscase the system (3.19)has an unique solution.

3. If 4 = 0 and all of 4i’s be 0 then the system (3.19) may or may not be consistent. Ifconsistent it admits of infinite number of solutions.


4. If4 = 0 and at least one of4i(i = 1, 2, ..., n) be nonzero then the system is inconsistentor incompatible and (3.19) has no solution.

Cramer’s rule have the following serious drawbacks

(i) If the size of (3.19) is large (n ≥ 4), Crammer’s rule requires enormous amount ofcomputation for evaluation of determinants and no efficient method exists for evalua-tion of determinants. Hence this is purely theoretical but certainly not from numericalpoint of view.

(ii) For large systems, this method is not useful as it involves a large number of manipula-tions. The Cramer’s rule involves to compute (n+1) determinants each of order n (fora system (3.19) of n equations and n variables). To evaluate a determinant of order nrequires (n!−1) additions and n!(n−1) multiplications. The total number of multipli-cations and divisions required to solve (3.19) by this method = (n− 1)× (n+ 1)! + n.

The scenario is shown in Figure 3.2.

?4 = 0 4 6= 0

[Consistent with unique solution]

?

?

A system of linear equations

? ?41 = 42 = · · · = 4n = 0

[Consistent withinfinitely many solutions]

At least one 4i’s non-zero[Inconsistent]

?

Figure 3.2: Different cases for existence of solutions.

Ex 3.4.30 Solve the following system of equations by Cramer’s rule

x+ 2y + 3z = 6, 2x+ 4y + z = 7, 3x+ 2y + 9z = 14

Solution: The given system of linear equations can be written in the form Ax = b, where,

A = the coefficient matrix =

1 2 32 4 13 2 9

, bT = (6 7 14) and xT = (x1 x2 x3). Here the

determinant of the coefficient matrix A is given by

4 =

∣∣∣∣∣∣1 2 32 4 13 2 9

∣∣∣∣∣∣ = 1(4.9− 1.2) + 2(1.3− 2.9) + 3(2.2− 4.3) = −20(6= 0).

Here the present system has unique solution. The determinant 4i obtained by replacingthe ith column of 4 by constant vector, given by

41 =

∣∣∣∣∣∣6 2 37 4 114 2 9

∣∣∣∣∣∣ = −20;42 =

∣∣∣∣∣∣1 6 32 7 13 14 9

∣∣∣∣∣∣ = −20;43 =

∣∣∣∣∣∣1 2 62 4 73 2 14

∣∣∣∣∣∣ = −20.

Complex Matrices 189

Hence by Cramer’s rule solution exists. So the unique solution is

x =−20−20

= 1; y =−20−20

= 1, z =−20−20

= 1.

Now the sum of the three equations is 6x+ 8y+ 13z = 27. Substituting the values of x, y, zwe get LHS = 27, which is the check solution.

Ex 3.4.31 Find for what values of a and b, the system of equations

x+ 4y + 2z = 1, 2x+ 7y + 5z = 2b, 4x+ ay + 10z = 2b+ 1

has (i) an unique solution, (ii) no solution and (iii) infinite number of solution over thefield of rational numbers.

Solution: The given system of equations can be written in the form Ax = b, where,

A =

1 4 22 7 54 a 10

, bT = (1 2b 2b+ 1) and xT = (x1 x2 x3).

Hence,

4 =

∣∣∣∣∣∣1 4 22 7 54 a 10

∣∣∣∣∣∣ = 1(7.10− 5.a) + 4(5.4− 10.2) + 2(2.a− 7.4) = 14− a

41 =

∣∣∣∣∣∣1 4 22b 7 5

2b+ 1 a 10

∣∣∣∣∣∣ = 7b− 5a− 68b+ 4ab.

(i) When, 14− a 6= 0, i.e., a 6= 14, then 4 6= 0, in this case, the solution will be unique.(ii) When, 14− a = 0, i.e., a = 14, then 4 = 0, and then 41 = 6 − 12b. If 6 − 12b 6= 0,

i.e., b 6= 12 , then the system has no solution.

(iii) If a = 14 and b = 12 , then the system of linear equations becomes x + 4y + 2z =

1, y − z = 0. Thus the system is consistent and have infinite number of solutions over thefield of rational numbers.

3.5 Complex Matrices

A matrix A is said to be complex, if its elements may be the complex numbers. If A bea complex matrix, then it can be expressed as A = P + iQ, i =

√−1, where P,Q are real

matrices. The matrix A =(−1 + 3i −i3 + i 4

)can be written as

A =(−1 + 3i −i3 + i 4

)=(−1 03 4

)+ i

(3 −11 0

)= P + iQ

so A is a complex matrix. If each element of the matrix A be replaced by its conjugate,then the matrix, so obtained is called the conjugate of A and is denoted by A. Thus, ifA = P + iQ, then A = P − iQ. Thus if A = [aij ], then A = [bij ], where bij = aij . Therefore,

for the above example, A =(−1− 3i i3− i 4

). A matrix A is real if and only if A = A.

Property 3.5.1 If A and B be conjugate of the matrices A and B respectively, then,

(i) A = A.


(ii) A+B = A+B; A,B are conformable for addition.

(iii) kA = k A; k being a complex number.

(iv) AB = A B; A,B are conformable for product.

3.5.1 Transpose Conjugate of a Matrix

The transpose of the conjugate of a matrix A is called the transpose conjugate of A and isdenoted by A∗. Thus,

A∗ = (AT ) = (A)T . (3.22)

It is also called as tranjugate of a matrix. For example, let A =(

3 + 2i 3i 3 + 4i

), then,

A =(

3− 2i 3−i 3− 4i

), and so,

A∗ = (A)T =(

3− 2i −i3 3− 4i

)= (AT ).

As A∗ = (AT ) = (A)T , so A is a transpose conjugate matrix. Note that, if A is real, thenA∗ = AT .

Property 3.5.2 If A∗ and B∗ be tranjugates of the matrices A and B respectively, then,

(i) [A∗]∗ = A.

(ii) [A+B]∗ = A∗ +B∗; A,B are conformable for addition.

(iii) [kA]∗ = k A∗; k being a complex number.

(iv) [AB]∗ = B∗ A∗; A,B are conformable for product.

Consider a complex matrix A. The relationship between A and its conjugate transpose A∗

yields following important kinds of complex matrices.

3.5.2 Harmitian Matrix

A complex matrix of order n× n is said to be Harmitian, if, A∗ = A. From the definition,it is clear that, a square complex matrix A = [aij ] is Harmitian if and only if symmetricelements are conjugate, i.e., aij = aji, in which each diagonal element aii must be real. Forexample, consider the following complex matrix

A =

7 1− i 3 + 2i1 + i 2 1 + 2i3− 2i 1− 2i 1

By inspection, the diagonal elements of A are real and symmetric elements 1− i and 1 + i,3 + 2i and 3− 2i, 1 + 2i and 1− 2i are conjugate. Thus A is Harmitian matrix.

Ex 3.5.1 Write down the most general symmetric and hermitian matrix of order 2.

Solution: The most general symmetric complex matrix of order 2 is

As =(a+ ib e+ ife+ if c+ id

); i =

√−1

Complex Matrices 191

which has real independent parameters. The most general hermitian matrix of order 2 canbe written interms of four independent parameters in the form

Ah =(

a c+ idc+ id b

); i =

√−1.

Ex 3.5.2 If A be a square matrix, then show that AA∗ and A∗A are Hermitian.

Solution: Let A∗ and B∗ be the transposed conjugates of A and B respectively, then byproperty (AB)∗ = B∗A∗. Here we have,

[AA∗]∗ = [A∗]∗A∗ = AA∗.

Hence AA∗ is Hermitian. Similarly,[A∗A]∗ = A∗[A∗]∗ = A∗A.

Therefore, A∗A is Hermitian.

3.5.3 Skew-Harmitian Matrix

A complex matrix of order n × n is said to be skew-Harmitian, if, A∗ = (A)T . A squarecomplex matrix A = [aij ] is skew-Harmitian if and only if it is skew-symmetric and eachdiagonal element aii is either zero or purely imaginary. For example, the following complexmatrix

A =

0 2− i 6− 3i−2− i 0 1 + 5i−6− 3i −1 + 5i i

is skew-Harmitian matrix.

Ex 3.5.3 If S = P + iQ be a skew Hermitian matrix, then show that P is a skew symmetricmatrix and Q is real skew symmetric matrix.

Solution: Let S = P + iQ be a skew Hermitian matrix, where P and Q are real matrices.Now, S = P − iQ and (S)T = PT − iQT .

Since, S is skew Hermitian, by definition, (S)T = S, i.e.,

PT − iQT = P + iQ⇒ PT = P and QT = −Q.

Therefore, P is a skew symmetric matrix and Q is real skew symmetric matrix.

Ex 3.5.4 If A be a Hermitian matrix, then show that iA is a skew-Hermitian.

Solution: Since A be a Hermitian matrix, by definition, A∗ = A. Now,

[iA]∗ = i A∗ = −i A∗ = −(iA).

Therefore, iA is skew hermitian.

Ex 3.5.5 Let A be an n×n matrix whose elements are complex numbers. Show that A+A∗

is Harmitian and A−A∗ is skew Harmitian.

Solution: Let P = A+A∗, then using the property (3.5.1), we get,

P = A+A∗ = A+A∗ = A+AT .

Therefore, (P )T = (A)T +A = A+A∗ = P.

Hence Z is Harmitian. Let Q = A−A∗, then using the property (3.5.1), we get,

Q = A−A∗ = A−A∗ = A−AT .

Therefore, (Q)T = (A)T −A = A∗ −A = −Q.

Hence Q is skew Harmitian.


3.5.4 Unitary Matrix

A complex square matrix A of order n is said to be unitary, if

A∗A−1 = A−1A∗ = In; i.e., A∗ = A−1. (3.23)

Thus A must be necessarily be square and inverible. We note that a complex matrix A isunitary if and only if its rows (columns) from an orthogonal set relative to the dot productof complex vectors. For example, let,

A =12

(1 + i 1− i1− i 1 + i

); then, A∗ =

12

(1− i 1 + i1 + i 1− i

)so, AA∗ =

12

(1 + i 1− i1− i 1 + i

)12

(1− i 1 + i1 + i 1− i

)=(

1 00 1

)= I2

=12

(1− i 1 + i1 + i 1− i

)12

(1 + i 1− i1− i 1 + i

)= A∗A.

Since A∗A−1 = A−1A∗ = I2, so A is an unitary matrix. Note that, when a matrix A is real,hermitian is same as symmetric and unitary is the same as orthogonal.

Theorem 3.5.1 For an unitary matrix A, |A| = 1.

Proof: Let A be an unitary matrix of order n, then by definition, A∗A = In. Therefore,

|A∗A| = |In| ⇒ |AT ||A| = 1⇒ |A||A| = 1 ⇒ |A||A| = 1⇒ |A|2 = 1 ⇒ |A| = 1.

Therefore, for an unitary matrix A, |A| = 1.

Ex 3.5.6 If A be an unitary matrix and I +A is non-singular

Solution:

3.5.5 Normal Matrix

A complex square matrix A is said to be normal if it commutes with A∗, i.e., if AA∗ = A∗A.For example, let,

A =(

2 + 3i 1i 1 + 2i

), then, A∗ =

(2− 3i −i

1 1− 2i

)AA∗ =

(2 + 3i 1i 1 + 2i

)(2− 3i −i

1 1− 2i

)=(

14 4− 4i4 + 4i 6

)=(

2− 3i −i1 1− 2i

)(2 + 3i 1i 1 + 2i

)= A∗A.

Since AA∗ = A∗A, the complex matrix A is normal.This definition reduces to that for realmatrices when A is real.

3.6 Adjoint of a Matrix

Let A = [aij ]n×n be a square matrix of order n. Let Aij be the cofactor of the ijth elementaij in detA. Then the square matrix (Aij)T is said to be the adjugate or adjoint of A and

Adjoint of a Matrix 193

is denoted by adjA. So, we first find the cofactor of ijth element in detA. Then, the adjoint

of A is obtained by transposing the cofactor. For example, let , A =[

1 −23 4

], then,

A11 = (−1)1+1 |4| = 4, A12 = (−1)1+2 |3| = −3,A21 = (−1)2+1 | − 2| = 2, A22 = (−1)2+2 |1| = 1.

Therefore, adj A =(A11 A12

A21 A22

)T

=(

4 2−3 1

).

For the matrix A =

1 0 2−1 1 0

2 0 1

, we get,

A11 = (−1)1+1

∣∣∣∣1 00 1

∣∣∣∣ = 1, A12 = (−1)1+2

∣∣∣∣−1 02 1

∣∣∣∣ = 1,

A13 = (−1)1+3

∣∣∣∣−1 12 0

∣∣∣∣ = −2, A21 = (−1)2+1

∣∣∣∣0 20 1

∣∣∣∣ = 0,

A22 = (−1)2+2

∣∣∣∣1 22 1

∣∣∣∣ = −3, A23 = (−1)2+3

∣∣∣∣1 02 0

∣∣∣∣ = 0,

A31 = (−1)3+1

∣∣∣∣0 21 0

∣∣∣∣ = −2, A32 = (−1)3+2

∣∣∣∣ 1 2−1 0

∣∣∣∣ = −2,

A33 = (−1)3+3

∣∣∣∣ 1 0−1 1

∣∣∣∣ = 1.

Therefore, the adjugate of A is given by

adj A =

A11 A12 A13

A21 A22 A23

A31 A32 A33

T

=

1 0 −21 −3 −2

−2 0 1

.From definition, we have,

(i) adj(AT ) = (adjA)T , and adj(kA) = kn−1adjA, where k is any scalar.

(ii) If 0 be a zero matrix, which is a square matrix of order n, then adj0 = 0.

(iii) IF I be a unit matrix of order n, then adjI = I.

(iv) If A is symmetric, then adjA is symmetric.

(v) If A is skew-symmetric then adjA is symmetric or skew-symmetric according as theorder of A is odd or even.

(vi) For the matrices A,B, adj (AB) = (adj B) (adj A).

Theorem 3.6.1 If A be a square matrix of order n, then

A.(adjA) = |A| In = (adjA).A. (3.24)

Proof: Let A = [aij ]n×n be a square matrix of order n. Let Aij denotes the cofactor ofijth element of aij in detA. The ijth element of A(adjA) is the inner product of the ith rowof A and the jth column of adjA, as

A(adjA) =

a11 a12 · · · a1n

a21 a22 · · · a2n

......

...an1 an2 · · · ann

A11 A21 · · · An1

A12 A22 · · · An2

......

...A1n A2n · · · Ann


=

n∑k=1

a1kA1k

n∑k=1

a1kA2k · · ·n∑

k=1

a1kAnk

n∑k=1

a2kA1k

n∑k=1

a2kA2k · · ·n∑

k=1

a2kAnk

......

...n∑

k=1

ankA1k

n∑k=1

ankA2k · · ·n∑

k=1

ankAnk

=

|A| 0 · · · 00 |A| · · · 0...

......

0 0 · · · |A|

= |A|In, as,n∑

k=1

aikAjk =|A|; if i = j0; if i = j

Similarly, taking the product between adjA andA and proceeding as before we get, (adjA)A =|A|In. Therefore,

A(adjA) = |A|In = (adjA)A.Thus the product of a matrix with its adjoint is commutative and it is a scalar matrix whosediagonal element is |A|.

Theorem 3.6.2 If A be a non-singular matrix of order n, then |adjA| = |A|n−1.

Proof: We know that, A(adjA) = |A|In. Hence,

|A(adjA)| = ||A|In| ⇒ |A||adjA| = |A|n

⇒ |adjA| = |A|n−1; as |A| 6= 0.

Therefore, if A be a non-singular square matrix of order n, then |adjA| = |A|n−1.

Ex 3.6.1 Show that,

∣∣∣∣∣∣bc− a2 ca− b2 ab− c2



∣∣∣∣∣∣ =∣∣∣∣∣∣a b cb c ac a b

∣∣∣∣∣∣2

. KH:09

Solution: If the right hand side is |A|2, the the adjoint of A is given by,

adjA =

bc− a2 ca− b2 ab− c2



.

Let A be a non-singular matrix of order 3, then by theorem (3.6.2), we get,

|adjA| =

∣∣∣∣∣∣bc− a2 ca− b2 ab− c2



∣∣∣∣∣∣ = |A|2 =

∣∣∣∣∣∣a b cb c ac a b

∣∣∣∣∣∣2

.

Theorem 3.6.3 If A be a non-singular matrix of order n, then adj(adjA) = |A|n−2A.

Proof: We know that, A(adjA) = |A|In. Now putting adjA in place of A, we get,

adjA(adj.adjA) = |adjA|Inor, adjA(adj.adjA) = |A|n−1In; as |adjA| = |A|n−1

or, A(adjA)(adj.adjA) = |A|n−1A

or, |A|In(adj.adjA) = |A|n−1A

or, adj.adjA = |A|n−2.A; as |A| 6= 0.

Therefore, if A be a non-singular matrix of order n, then adj(adjA) = |A|n−2A.


Ex 3.6.2 Find the matrix A, if adjA =

1 3 −4−2 2 −21 −3 4

.

Solution: Since the adjA is given, so,

adj(adjA) =

∣∣∣∣ 2 −2−3 4

∣∣∣∣ −∣∣∣∣ 3 −4−3 4

∣∣∣∣ ∣∣∣∣3 −42 −2

∣∣∣∣−∣∣∣∣−2 −2

1 4

∣∣∣∣ ∣∣∣∣1 −41 4

∣∣∣∣ −∣∣∣∣ 1 −4−2 −2

∣∣∣∣∣∣∣∣−2 21 −3

∣∣∣∣ −∣∣∣∣1 31 −3

∣∣∣∣ ∣∣∣∣ 1 3−2 2

∣∣∣∣

=

2 0 26 8 104 6 8

.

Now, from the relation, |adjA| = |A|n−1 we have,

|adjA| =

∣∣∣∣∣∣1 3 −4−2 2 −21 −3 4

∣∣∣∣∣∣ = 4 = |A|2 ⇒ |A| = ±2.

Using the relation, adj(adjA) = |A|n−2A, the matrix A is given by,

A =12

2 0 26 8 104 6 8

=

1 0 13 4 52 3 4

.

3.6.1 Reciprocal of a Matrix

For a non-singular square matrix A = [aij ]n×n of order n, the reciprocal matrix of A isdefined by

1|A|

adjA =

1|A|A11

1|A|A21 · · · 1

|A|An11|A|A12

1|A|A22 · · · 1

|A|An2

......

...1|A|A1n

1|A|A2n · · · 1

|A|Ann

. (3.25)

For example, let, A =

6 2 54 2 10 1 −3

, then |A| = 2 and adjA =

−7 11 −812 −18 144 −6 4

. Therefore,

the reciprocal of A is 1|A|

adjA =12

−7 11 −812 −18 144 −6 4

.

3.6.2 Inverse of a Matrix

Let A be a square matrix of order n. For any other square matrix of the same order B, ifA.B = B.A = In, (3.26)

is satisfied then, B is called the reciprocal or inverse of A and is denoted as B = A−1. SinceA and B are conformable from the product AB and BA and AB = BA, A and B are squarematrices of the same order. Thus a matrix A may have an inverse or A may be invertibleonly when it is a square matrix. By the property of adjoint matrix, we have,

A.(adjA) = |A|In = (adjA)A

⇒ A.1|A|

(adjA) = In =1|A|

(adjA)A; provided |A| 6= 0.


Again, by property of inverse, A.B = B.A = In. Comparing we get,

B = A−1 =1|A|

adjA, provided |A| 6= 0. (3.27)

Therefore, the inverse of any matrix exists if it be non-singular. The inverse of a nonsingulartriangular matrix is also the same dimension and structure.

Ex 3.6.3 Find the inverse of A =[

1 23 4

].

Solution: To find A−1, let A−1 =[a bc d

], then using AA−1 = I2, we get,[

1 23 4

] [a bc d

]=[

1 00 1

]⇒[a+ 2c b+ 2d3a+ 2c 3b+ 4d

]=[

1 00 1

]⇒ a+ 2c = 1, b+ 2d = 0, 3a+ 2c = 0, 3b+ 4d = 1

⇒ a = −2, b = 1, c =32, d = −1

2.

⇒ A−1 =[a bc d

]=[−2 132 − 1

2

].

Moreover, A−1 satisfies the property that[−2 132 − 1

2

] [1 23 4

]=[

1 00 1

],

we conclude that A is non singular and that A−1 =[−2 132 − 1

2

].

Theorem 3.6.4 The inverse of a matrix is unique.

Proof: Let A be an invertible matrix of order n. Also, let, B and C are the inverses of A.Then by definition of inverse, we have,

A.B = B.A = In and A.C = C.A = In.

Using the property that matrix multiplication is associative, we get,

C.(A.B) = (C.A).B ⇒ C.In = InB ⇒ C = B.

Hence,inverse of a matrix is unique.

Theorem 3.6.5 The necessary and sufficient condition for the existence of the inverse ofa square matrix A is that A is non-singular.

Proof: First, let, A be an n × n invertible matrix and B is the inverse of A. Then, bydefinition, A.B = B.A = In. Therefore,

|A.B| = |In| ⇒ |A|.|B| = 1.

Therefore, |A| 6= 0 and consequently, A is non-singular. Hence the condition is necessary.Conversely, let A be non-singular, i.e., |A| 6= 0. Now,

A.(adjA) = |A|In = (adjA)A

⇒ A.1|A|

(adjA) = In =1|A|

(adjA)A; as |A| 6= 0.

Hence by definition of inverse, A−1 = 1|A|adjA and it exists. Hence the condition is sufficient.


Ex 3.6.4 Find the matrix A, if adjA =

2 2 02 5 10 1 1

and |A| = 2.

Solution: Since the adjA and |A| is given,

A−1 =1|A|

(adjA) =12

2 2 02 5 10 1 1

= B, (say).

Therefore, |B| is given by, |B| = 12

∣∣∣∣∣∣2 2 02 5 10 1 1

∣∣∣∣∣∣ = 2 6= 0 and the adjB is given by,

adjB =

∣∣∣∣5 11 1

∣∣∣∣ −∣∣∣∣2 01 1

∣∣∣∣ ∣∣∣∣2 05 1

∣∣∣∣−∣∣∣∣2 10 1

∣∣∣∣ ∣∣∣∣2 00 1

∣∣∣∣ −∣∣∣∣2 02 1

∣∣∣∣∣∣∣∣2 50 1

∣∣∣∣ −∣∣∣∣2 20 1

∣∣∣∣ ∣∣∣∣2 22 5

∣∣∣∣

=

4 −2 2−2 2 −22 −2 6

.

Therefore, the matrix A is given by

A = B−1 =12

4 −2 2−2 2 −22 −2 6

=

2 −1 1−1 1 −11 −1 3

.

Theorem 3.6.6 If A and B are invertible square matrices of same order, then inverseof the product of two matrices is the product of their inverses in the reverse order, i.e.,(AB)−1 = B−1A−1.

Proof: Let A and B are invertible square matrices of same order n, then , |A| 6= 0, and|B| 6= 0. Therefore, |AB| = |A|.|B| 6= 0 and hence AB is invertible. Now,

(AB)(B−1A−1) = A(BB−1)A−1, associate property= AInA

−1 = AA−1 = In.

Again, (B−1A−1)(A) = B−1(A−1A)B; associate property= B−1InB = B−1B = In.

Hence by definition and uniqueness theorem of inverse, we have (AB)−1 = B−1A−1. Con-tinuing, we get, if A1, A2, · · · , Ak be k invertible matrices of the same order, then,

(A1.A2. · · · .Ak)−1 = A−1k . · · · .A−1

2 .A−11 . (3.28)

Theorem 3.6.7 If A be an invertible matrix, A−1 is invertible and (A−1)−1 = A.

Proof: Let, A be an n× n invertible matrix, then A 6= 0 and AA−1 = A−1A = In. Now,

|A|.|A−1| = |AA−1| = |In| = 1,

which shows that |A−1| 6= 0, hence A−1 is invertible. From the definition and uniquenesstheorem of inverse, we get A is the inverse of A−1 and hence (A−1)−1 = A.

Theorem 3.6.8 If A be an invertible matrix, then AT is invertible matrix and

(AT )−1 = (A−1)T .


Proof: Let A be invertible, then |A| 6= 0. Thus, |AT | = |A| 6= 0. Therefore, AT is invertible.Also, from the relation AA−1 = A−1A = I, we get,

(AA−1)T = (A−1A)T = IT

⇒ (A−1)TAT = AT (A−1)T = I.

From definition and uniqueness theorem of inverse, we get (A−1)T is the inverse of AT andhence (AT )−1 = (A−1)T .

Theorem 3.6.9 If A be an invertible matrix, then adjA−1 = (adjA)−1 = 1|A|A.

Proof: From the definition of inverse, we have, AA−1 = I = A−1A. Therefore,

adj(AA−1) = adj(I) = adjAadjA−1

or, adjA−1 adjA = I = adjA adjA−1 ⇒ adjA−1 = (adjA)−1.

Again, A−1 = 1|A|adjA, so adjA = |A|A−1 and so,

adjA−1 = (adjA)−1 =1|A|

A.

Theorem 3.6.10 If the sum of the elements in each row of a nonsingular matrix is k(6= 0)then the sum of the elements in each row of the inverse matrix is k−1.

Proof: Let A = [aij ]n×n be a give non singular matrix, where |A| 6= 0. Since, the sum ofthe elements in each row of a nonsingular matrix is k(6= 0), so

n∑j=1

aij = k; i = 1, 2, · · · , n.

Now, sum of the elements of the jth row of A−1 = 1|A|

n∑i=1

Aij . Therefore,

|A| =n∑

i=1

aijAij =n∑

i=1

(n∑

r=1

airAij

)−

n∑i=1

n∑r=1,r 6=j

airAij

= k

n∑i=1

Aij − 0 = kn∑

i=1

Aij .

Therefore, if the sum of the elements in each row of a nonsingular matrix is k(6= 0) then thesum of the elements in each row of the inverse matrix is k−1.

Ex 3.6.5 If A and B are both square matrices of order n and A has an inverse, show that(A+B)A−1(A−B) = (A−B)A−1(A+B).

Solution: Since A has an inverse, so A−1 exists. Now,

LHS = (A+B)A−1(A−B) = (A+B)(A−1A−A−1B)= (A+B)(I −A−1B) = A−AA−1B +B −BA−1B

= A−B +B −BA−1B = A+B −B −BA−1B

= A+AA−1B −B −BA−1B

= A(I +A−1B)−B(I +A−1B) = (A−B)(I +A−1B)= (A−B)(A−1A+A−1B) = (A−B)A−1(A+B) = RHS.

Therefore, if A and B are both square matrices of order n and A has an inverse, show that(A+B)A−1(A−B) = (A−B)A−1(A+B).


Ex 3.6.6 Show that if the non singular symmetric matrices A and B commute then A−1B,AB−1

and A−1B−1 are symmetric.

Solution: By the given condition, AB = BA and |A| 6= 0, |B| 6= 0. Also as A and B aresymmetric matrices AT = A,BT = B. Now,

(A−1B)T = BT (A−1)T = B(AT )−1

= BA−1 = A−1BAA−1 = A−1B; as AB = BA⇒ B = A−1BA.

Therefore, A−1B is symmetric. Similarly, AB−1 is also symmetric. Also,

(A−1B−1)T = (BA)−1T = (AB)−1T ; as AB = BA

= (AB)T −1 = (BTAT )−1 = (AT )−1(BT )−1

= A−1B−1; as AT = A and BT = B.

Therefore, A−1B−1 is symmetric.

Ex 3.6.7 If A =(

2 −1−1 2

), then show that A2 − 4A+ 3I = 0. Hence obtain A−1.

Solution: For the given matrix A, we have,

A2 =(

2 −1−1 2

)(2 −1−1 2

)=(

5 −4−4 5

).

Thus the expression A2 − 4A+ 3I becomes,(5 −4−4 5

)− 4

(2 −1−1 2

)+ 3

(1 00 1

)=(

0 00 0

)= 0.

Now the expression A2 − 4A + 3I = 0 shows that A is non singular (so that A−1 exists )and can be written in the form

A2 − 4A+ 3I = 0

⇒ A−1 = −13[A− 4I] = −1

3

(−6 −1−1 −6

).

Ex 3.6.8 Find A, from the equation A

(4 13 2

)=(

2 39 1

).

Solution: The given matrix equation can be written in the form AB = C, then,

|B| =∣∣∣∣4 13 2

∣∣∣∣ = 8− 3 = 5 6= 0.

⇒ B−1 =1|B|

(adjB) =15

(2 −1−3 4

).

So, A = CB−1 =(

2 39 1

)15

(2 −1−3 4

)=

15

(4− 9 −2 + 1218− 3 −9 + 4

)=

15

(−5 1015 −5

)=

15

(−1 23 −1

).

Ex 3.6.9 Show that the inverse of[A OB C

]is[

A−1 O−C−1BA−1 C−1

], where A and C are non-

singular.


Solution: Let us consider the product,[A OB C

] [A−1 O

−C−1BA−1 C−1

]=[

AA−1 OBA−1 − CC−1BA−1 CC−1

]=[I OO I

]= I.

Again,[

A−1 O−C−1BA−1 C−1

] [A OB C

]=[

AA−1 O−C−1BA−1A+ C−1B CC−1

]=[I OO I

]= I.

Therefore,[

A−1 O−C−1BA−1 C−1

]is the inverse of

[A OB C

].

Deduction 3.6.1 Solution of system of linear equations by matrix inverse method: Here we shall be concerned with the solution of a system of n linear algebraic equationsrelating in n unknowns x1, x2, ..., xn of the explicit form (3.19), where the n2 coefficientsaij and the n constants b1, b2, ..., bn are given real numbers. The (3.19) can be written inthe matrix notation as Ax = b where the real n × n coefficient matrix is A in which aij isthe coefficient of xj in the ith equation, bT = [b1, b2, ..., bn] is a column n vector which areprescribed and xT = [x1, x2, ..., xn] is the unknown n column vector to be computed up toa desired degree of accuracy.

If det(A) 6= 0, then unique A−1 exists, where the inverse of the matrix A. Thus thematrix inversion method finds solution of (3.19) as

Ax = b⇒ A−1Ax = A−1b =1|A|

(adjA)b

⇒ xi =n∑

j=1

A−1ij ∗ bj ; i = 1, 2, ..., n. (3.29)

Thus we see that in the solution of a system (3.19) by matrix method, the chief problem isthe inversion of the coefficient matrix A. This method is obviously unsuitable for solvinglarge systems, since the computation of A−1 by cofactor i.e., evaluation of determinants,will then become exceedingly difficult.

Ex 3.6.10 Using matrix inversion method, solve the system of equations

x+ 2y + 3z = 6, 2x+ 4y + z = 7, 3x+ 2y + 9z = 14

Solution: The given non homogeneous system can be written as Ax = b, where A iscoefficient matrix and b is constant vector. The solution of the system can be written asx = A−1b provided |A| 6= 0. Here |A| = −20(6= 0). Hence A is nonsingular and A−1 exists.Now

adjA =

34 −12 −10−15 0 5−8 4 0

⇒ A−1 =1−20

34 −12 −10−15 0 5−8 4 0

.Hence the solution is given by A−1b =

[1 1 1

]T. Therefore the solution is given by x = y =

z = 1.

Result 3.6.1 This method is obviously unsuitable for solving large systems, since the com-putation of A−1 by cofactor i.e., evaluation of determinants, will then become exceedinglydifficult. Various methods have been devised to evaluate the value of A−1.

If a given matrix is of higher order, then we apply some numerical methods to find theinverse. For further discussion the reader may see the Numerical book of the Author.


3.6.3 Singular Value Decomposition

For a rectangular matrix, like LU, QR decomposition method, a similar decomposition ispossible, which is known as the singular value decomposition. It plays a significant role inmatrix theory. Also it is used to find the generalized inverse of a singular matrix, which hasseveral applications in image processing.Let A be an m × n(m ≥ n) real matrix, then the n × n real sub-matrices ATA and AAT

are symmetric, positive definite and have eigenvalues say λk. Then we can find the northonormalized eigenvectors Xk of ATA such that

ATAXk = λkXk.

Let Yk be orthonormalized eigenvectors of AAT , then

AATYk = λkYk.

Then to solve the eigenvalue problem, find an orthogonal matrix U such that A can bedecomposed into the form

A = UDV T

which is called the singular value decomposition of the matrix A, where the n× n matrix Vconsists of Xk, which are n orthonormalized eigenvectors of ATA. If some λk = 0, then thecorresponding column of V must be identically zero as its norm is 0. Also UTU = V TV =V V T = In and the diagonal matrix D is as

D =

√λ1 0 · · · 00

√λ2 · · · 0

......

0 0 · · ·√λn

.The values

√λ1,

√λ2, . . .

√λn are called the singular values of A satisfying

√λ1 ≥

√λ2 ≥

. . . ≥√λn ≥ 0. Since all eigenvalues of ATA should be non-negative, except for possible

perturbations due to rounding errors, so if any λk is small negative number. If the rank ofA is r(< n), then √

λr+1 =√λr+2 = . . .

√λn = 0.

If all the λi’s are distinct satisfying√λ1 >

√λ2 > . . . >

√λn, then the singular value

decomposition of the matrix A is unique. One of the possible disadvantages of this methodis ATA must be formed, and this may be lead to a loss of information due to use of finite-length computer arithmetic.

Ex 3.6.11 Find the SVD of A =

1 22 11 3

and hence find A−1.

Solution: Here A =

1 22 11 3

so that AT =[

1 2 12 1 3

].

Hence AAT =

1 22 11 3

[1 2 12 1 3

]=[

6 77 14

].


Hence the eigenvalues of ATA are λ1 = 18.062, λ2 = 1.9377 and the corresponding eigen-vectors are [0.5803, 1]T , [1,−0.5803]T . Also,

√λ1 = 4.2499,

√λ2 = 1.3920. The eigenvectors

Y1, Y2 of AAT are given by

Y1 =1√λ1

AX1 =1

4.2499

1 22 11 3

[0.58031

]=

0.60710.50840.8424

Y2 =

1√λ2

AX2 =1

1.3920

1 22 11 3

[ 1−0.5803

]=

−0.11541.0716−0.5323

.Hence the singular value decomposition of A is given by

A =

1 22 11 3

=

0.6071 −0.11540.5084 1.07160.8424 −0.5323

[4.2499 00 1.3920

] [0.5803 1

1 −0.5803

].

Thus the A−1 is given by

A−1 = V D−1UT

=[

0.5803 11 −0.5803

] [4.2499 0

0 1.3920

] [1.6071 0.5084 0.8424−0.1606 1.4916 −0.7409

].

3.7 Orthogonal Matrix

A square matrix A of order n is said to be orthogonal ifAAT = ATA = In.

For example, let, A = 13

1 2 22 1 −2−2 2 −1

, then,

AAT =13

1 2 22 1 −2−2 2 −1

13

1 2 −22 1 22 −2 −1

=

19

9 0 00 9 00 0 9

=

1 0 00 1 00 0 1

= ATA.

Hence A is orthogonal matrix. Unit matrices are always orthogonal asIT I = IT I = I.

Ex 3.7.1 Determine the values of α, β, γ so that A =

0 2β γα β −γα −β γ

is orthogonal.

Solution: Since the matrix A is orthogonal, by definition, AAT = I = ATA. So, 0 2β γα β −γα −β γ

0 β α2α β −βγ −γ γ

=

2α2 0 00 6β2 00 0 3γ2

=

1 0 00 1 00 0 1

⇒ 2α2 = 1, 6β2 = 1 and 3γ2 = 1 ⇒ α = ± 1√

2, β = ± 1√

6, γ = ± 1√

3.

Ex 3.7.2 Find an orthogonal matrix of order 3, whose first row is a multiple of (2, 1, 2).

Orthogonal Matrix 203

Solution: Normalizing (2, 1, 2) we get, ( 23 ,

13 ,

23 ). Considering ( 2

3 ,13 ,

23 ) as the first row, let

the orthogonal matrix A be,

A =

23

13

23

p q rx y z

so that AT =

23 p x13 q y23 r z

.

Using the definition of orthogonal matrix AT = ATA = I, we have,

p2 + q2 + r2 = 1, 2p+ q + 2r = 0, 2x+ y + 2z = 0,px+ qy + rz = 0, x2 + y2 + z2 = 1.

Since there are five equations in six unknowns, there are infinite number of solutions satis-fying the equations. Taking q = 0, we have r = −p and so p2 = 1

2 , i.e., p = ± 1√2. Taking,

p = 1√2, we get r = − 1√

2and so x − z = 0, i.e., x = z and y = −4x. Therefore, using the

relation x2 + y2 + z2 = 1, we get, x2 = 118 . Taking x = 1

3√

2, we have, y = − 4

3√

2, z = 1

3√

2.

Therefore, the orthogonal matrix is given by,

A =

23

13

23

p q rx y z

=

23

13

23

1√2

0 − 1√2

13√

2− 4

3√

21

3√

2

.

Theorem 3.7.1 Orthogonal matrix A is non-singular and the value of |A| = ±1.

Proof: Let A be an orthogonal matrix of order n. Then by definition, AAT = ATA = In.Therefore,

|AAT | = |In| ⇒ |AT ||A| = 1 ⇒ |A|2 = 1; as |AT | = |A|.

Hence A is non-singular and |A| = ±1.

Ex 3.7.3 Obtain the most general orthogonal matrix of order 2.

Solution: Let us start with an arbitrary matrix of order 2 which can be written as A =(a bc d

), where a, b, c, d are any scalars, real or complex. If this is to be an orthogonal matrix,

its elements must satisfy AAT = ATA = I2, i.e.,(a bc d

)(a cb d

)=(a2 + b2 ac+ bdac+ bd c2 + d2

)=(

1 00 1

)⇒ a2 + b2 = 1 = c2 + d2; ac+ bd = 0.

The general solution is a = cos θ, b = sin θ, where θ is real or complex and c = cosα, d = sinα,where, α is a scalar. Using the relation,ac+ bd = 0, we get,

cos(θ − α) = 0 ⇒ θ − α = ±π2.

Thus the most general orthogonal matrix of order 2 then becomes

A =(a bc d

)=(

cos θ sin θ± sin θ ± cos θ

)for some value of θ. Choosing the upper signs, we get the most general orthogonal matrixof order 2 with |A| = 1, while, the lower signs, we get the most general orthogonal matrixof order 2 with |A| = −1.


Theorem 3.7.2 The product of two orthogonal matrices of same order is orthogonal.

Proof: Let A,B be two orthogonal matrices of order n. Then by definition, AAT = ATA =In and BBT = BTB = In. Now,

(AB)T (AB) = (BTAT )(AB) = BT (ATA)B= (BT In)B = BTB = In.

Similarly, (AB)(AB)T = In. Hence AB is orthogonal.

Theorem 3.7.3 If A be an orthogonal matrix, then A−1 = AT .

Proof: Let A be an orthogonal square matrix of order n, then ATA = AAT = In. Thus,

A(ATA) = AIn and (AAT )A = InA

⇒ [AAT − In]A = 0.

Since A is an orthogonal matrix, so |A| 6= 0 and so,

AAT − In = 0 ⇒ AAT = In = ATA( similarly ).

From the definition and uniqueness of inverse, A−1 = AT . Similarly,, it can be shown thatthe transpose of an orthogonal matrix is orthogonal.

Theorem 3.7.4 The inverse of an orthogonal matrix is orthogonal.

Proof: Let A be an orthogonal matrix of order n, then |A| 6= 0 and A−1 exists. Now,

(A−1)T (A−1) = (AT )−1(A−1) = (AAT )−1

= (In)−1 = In.

Hence A−1 is orthogonal. Also, using the definition we can show that, the transpose of anorthogonal matrix is also orthogonal.

Ex 3.7.4 Let A be an orthogonal matrix. Then kA is an orthogonal matrix if k = ±1.

Solution: Since A be an orthogonal matrix of order n, we have by definition, ATA =AAT = In. Now, kA is an orthogonal matrix, if,

(kA)T (kA) = In ⇒ (kAT )(kA) = In

⇒ k2ATA = In ⇒ k2 = 1, i.e., k = ±1.

Thus, if kA is an orthogonal matrix then k = ±1.

Ex 3.7.5 Let A and B are orthogonal and |A|+ |B| = 0. Prove that A+B is singular.

Solution: Since A and B are orthogonal matrices, so |A| 6= 0 and |B| 6= 0. Let AT +BT =CT , which implies that I +ABT = ACT and B +A = ACTB. Therefore,

|A+B| = |A||CT ||B| = −|A|2|CT |; as |A|+ |B| = 0= −|CT | = −|AT +BT | = |A+B|⇒ 2|A+B| = 0 ⇒ |A+B| = 0.

Therefore, A+B is singular.

Submatrix 205

Ex 3.7.6 If the matrices A and B are orthogonal, then show that the matrix[A OO B

]is also

orthogonal.

Solution: Let C =[A OO B

]. Since A and B are orthogonal, AAT = I and BBT = I. Now,

CCT =[A OO B

] [AT OO BT

]=[AAT OO BBT

]=[I OO I

]= I.

Hence C, i.e.,[A OO B

]is orthogonal.

Ex 3.7.7 Let A be a skew symmetric matrix and (I+A) be a nonsingular matrix, then showthat B = (I −A)(I +A)−1 is orthogonal.

Solution: Since the matrix A is a skew symmetric, so AT = −A, and so, (I −A)T = I +Aand (I +A)T = I −A. Now,

BT = [(I −A)(I +A)−1]T = [(I +A)−1]T (I −A)T

= (I +A)T −1(I −A)T = (I −A)−1(I +A).Also, (I +A)(I −A) = I −A+A−A2 = (I −A)(I +A).

We are to show that BTB = I. For this,

BTB = (I −A)−1(I +A)(I −A)(I +A)−1

= (I −A)−1(I −A)(I +A)(I +A)−1 = I.I = I.

Hence B = (I − A)(I + A)−1 is orthogonal. Conversely, let B = (I − A)(I + A)−1 beorthogonal, then by definition, BTB = I. Therefore,

[(I +A)−1]T (I −A)T (I −A)(I +A)−1 = I

or, [(I +A)T ]−1(I −A)T (I −A)(I +A)−1 = I

or, (I +AT )−1(I −AT )(I −A)(I +A)−1 = I

or, (I −AT )(I −A) = (I +AT )(I +A)or, I −A−AT +ATA = I +A+AT +ATA

or, 2(A+AT ) = 0 ⇒ AT = −A.

Therefore, A is skew symmetric.

3.8 Submatrix

Let A = [aij ]m×n be a matrix. Any matrix, obtained by omitting some rows or columns orboth a given matrix A, is called a submatrix of A. Consider an square matrix A = [aij ] oforder n and delete some, but not all, of its rows or columns, we obtain a sub-matrix of A.

Let A = [aij ]4×4, then,(a31 a32 a33

a41 a42 a43

)is a submatrix of A. Thus sub matrix can be formed

from a given matrix A by deleting some of its rows or columns or both. The determinantof the square matrix of order r, obtained from a given m× n matrix A by omitting (m− r)rows and (n − r) columns is called minor of A of order r. The sub matrix formed by theelements of the first r rows and columns of A is called the leading sub matrix of order r andits determinant is known as the leading minor of order r.


3.9 Partitioned Matrix

A matrix A can be divided into sub-matrices if we draw horizontal lines between rows and/or vertical lines between columns, the matrices are obtained called partitioned or blockmatrix of A. Consider the above matrix A = [aij ]4×4, then

a11 a12 a13 a14

. . . . . . . . . . . .a21 a22 a23 a24

a31 a32 a33 a34

a41 a42 a43 a44

,

a11

... a12 a13

... a14

a21

... a22 a23

... a24

. . .... . . . . . .

... . . .

a31

... a32 a33

... a34

a41

... a42 a43

... a44

are partitioned matrices of A. A partitioned matrix can be represented economically bydenoting each constituent submatrix by a single matrix symbol. Thus the above partitionedmatrices of A can be written as(

A11

A21

),

(A11 A12 A13

A21 A22 A23

)respectively, where, for the first matrix, A11 = (a11 a12 a13 a14) and in the second

A11 =(a11

a21

)and so on. The augmented matrix [A

...b] of a linear system Ax = b is a

partitioned matrix. Partitioning of matrices is useful to effect addition and multiplicationby handling smaller matrices.

3.9.1 Square Block Matrices

Let M be a block matrix. Then M is called a square block matrix if:

(i) M is a square matrix.

(ii) The blocks from a square matrix.

(iii) The diagonal blocks are also square matrices.

The latter two conditions will occur if and only if there are the same number of horizontaland vertical lines and they are placed symmetrically. Consider the following two blockmatrices:

A =

1 2... 3 4

... 5

1 1... 1 1

... 1· · · · · · · · · · · · · · · · · · · · ·

9 8... 7 6

... 5· · · · · · · · · · · · · · · · · · · · ·

4 4... 4 4

... 4

3 5... 3 5

... 3

;B =

1 2... 3 4

... 5

1 1... 1 1

... 1· · · · · · · · · · · · · · · · · · · · ·

9 8... 7 6

... 5

4 4... 4 4

... 4· · · · · · · · · · · · · · · · · · · · ·

3 5... 3 5

... 3

.

The block matrix A is not a square matrix, since the second and third diagonal blocks arenot square. On the other hand, the block matrix B is a square block matrix.

Partitioned Matrix 207

3.9.2 Block Diagonal Matrices

Let M = [Aij ] be a square block matrix such that the non diagonal blocks are all zeromatrices, i.e., Aij = 0, for i 6= j. Then M is called a block diagonal matrix. We sometimesdenote such a block diagonal matrix by writting

M = diag(A11, A22, · · · , Arr).

The importance of block diagonal matrices is that the algebra of the block matrix is fre-quently reduced to the algebra of the individual blocks. Specially, suppose f(x) is a poly-nomial and M is the above block diagonal matrix. Then f(M) is a block diagonal matrixand

f(M) = diag (f(A11), f(A22), · · · , f(Arr)) .

Also, M is invertible if and only if each Aii is invertible, and, in such a case, M−1 is a blockdiagonal matrix and

M−1 = diag(A−1

11 , A−122 , · · · , A−1

rr

).

Analogously, a square block matrix is called a block upper triangular matrix if the blocksbelow the diagonal are zero matrices, and a block lower triangular matrix if the blocks abovethe diagonal are zero matrices. Consider the following two block matrices:(i) A is upper triangular since the block below the diagonal is zero block.

A =

1 2

... 0

3 4... 5

· · · · · · · · · · · ·

0 0... 6

.

(ii) B is lower triangular since the blocks above the diagonal are zero blocks.

B =

1... 0 0

... 0· · · · · · · · · · · · · · · · · ·

2... 3 4

... 0

5... 0 6

... 0· · · · · · · · · · · · · · · · · ·

0... 7 8

... 9

(iii) C is diagonal since the blocks above and below the diagonal are zero blocks.

C =

1

... 0 0

3... 2 3

· · · · · · · · · · · ·

0... 4 5

, D =

1 2

... 0

3 4... 5

· · · · · · · · · · · ·

0 6... 7

,

(iv) D is neither upper triangular nor lower triangular. Also, no other partitioning of D willmake it into either a block upper triangular matrix or a block lower triangular matrix.


3.9.3 Block Addition

Let A = [Aij ] and B = [Bij ] are block matrices with the same numbers of row and col-umn blocks, and suppose that corresponding blocks have the same size. Then adding thecorresponding blocks of A and B also adds the corresponding elements of A and B as

A+B =

A11 +B11 A12 +B12 · · · A1n +B1n

A21 +B21 A22 +B22 · · · A2n +B2n

. . . . . . . . . . . .Am1 +Bm1 Am2 +Bm2 · · · Amn +Bmn

.

where A and B are conformable for addition. Multiplying each block of A by a scalar by ascalar k multiplies each element of A by k. Thus,

kA =

kA11 kA12 · · · kA1n

kA21 kA22 · · · kA2n

. . . . . . . . . . . .kAm1 kAm2 · · · kAmn

.

Suppose M and N are block diagonal matrices where corresponding blocks have the samesize, say M = diag(Ai) and N = diag(Bi), then M +N = diag(Ai +Bi).

3.9.4 Block Multiplication

Let A = [Aik] and B = [Bkj ] are block matrices such that they are conformable for multi-plications. Then the block multiplication of A and B is given by

AB =

C11 C12 · · · C1n

C21 C22 · · · C2n

. . . . . . . . . . . .Cm1 Cm2 · · · Cmn

, where, Cij =p∑

k=1

AikBkj ,

provided all the products of the form AikBkj can be formed.

Ex 3.9.1 Compute AB using block multiplication, where,

A =

1 2

... 1

3 4... 0

· · · · · · · · · · · ·

0 0... 2

and B =

1 2 3

... 1

4 5 6... 1

· · · · · · · · · · · ·

0 0 0... 1

.

Suppose M be block diagonal matrix, the Mk is defined by,

Mk = diag(Ak

11, Ak22, · · · , Ak

rr

).

Solution: Here, A =(

E F01×2 G

)and B =

(R S

01×3 T

), where E,F,G,R.S, T are the given

blocks, and 01×2 and 01×3 are zero matrices of the indicated sites. Hence,

AB =(

E F01×2 G

)(R S

01×3 T

)=(ER ES + FT01×3 GT

)

=

( 9 12 1519 26 33

) (37

)+(

10

)(0 0 0) 2

=

9 12 15

... 4

19 26 33... 7

. . . . . . . . . . . . · · ·

0 0 0... 2

.

Partitioned Matrix 209

Ex 3.9.2 If M = diag(A,B), where, A =(

1 23 4

), B = [5]. Find M2.

Solution: For the given matrices A and B, we have,

A2 =(

1 23 4

)(1 23 4

)=(

7 1015 22

)and B2 = [25].

Since M is block, square each block:

M2 = diag

([7 1015 22

], [25]

)=

7 10

...

15 22...

· · · · · · · · · · · ·... 25

.

3.9.5 Inversion of a Matrix by Partitioning

When a matrix is very large and it is not possible to store the entire matrix into the primarymemory of a computer at a time, then matrix partition method is used to find the inverseof a matrix. When a few more variables and consequently a few more equations are addedto the original system then also this method is very useful.

Let the coefficient matrix A be partitioned as

A =

B... C

· · · · · · · · ·

D... E

(3.30)

where B is an l× l matrix, C is an l×m matrix, D is an m× l and E is an m×m matrix;and l,m are positive integers with l +m = n. Let A−1 be partitioned as

A−1 =

P... Q

· · · · · · · · ·

R... S

(3.31)

where the matrices P,Q,R and S are of the same orders as those of the matrices B,C,Dand E respectively. Then

AA−1 =

B... C

· · · · · · · · ·

D... E

P

... Q· · · · · · · · ·

R... S

=

I1... 0

· · · · · · · · ·

0... I2

, (3.32)

where I1 and I2 are identity matrices of order l and m respectively. From (3.32), we have,

BP + CR = I1; BQ + CS = 0

and DP + ER = 0; DQ + ES = I2.

Now, BQ+CS = 0 gives Q = −B−1CS i.e., DQ = −DB−1CS. Also, from DQ+ES = I2,we have (E−DB−1C)S = I2. Therefore, S = (E−DB−1C)−1. Similarly, the other matricesare,


S = (E−DB−1C)−1

Q = −B−1CS

R = −(E−DB−1C)−1DB−1 = −SDB−1

P = B−1(I1 −CR) = B−1 −B−1CR.

It may be noted that, to find the inverse of A, it is required to determine the inverses oftwo matrices B and (E−DB−1C) of order l × l and m×m respectively.

That is, to compute the inverse of the matrix A of order n× n, the inverses of two lowerorder (roughly half) matrices are to be determined. If the matrices B,C,D,E are still largeto fit in the computer memory, then further partition them.

Ex 3.9.3 Find the inverse of the matrix A =

3 3 42 1 11 3 5

using the matrix partition method.

Hence find the solution of the system of equations

3x1 + 3x2 + 4x3 = 5; 2x1 + x2 + x3 = 7; x1 + 3x2 + 5x3 = 6.

Solution: Let the matrix A be partitioned as

A =

3 3

... 4

2 1... 1

· · · · · · · · · · · ·

1 3... 5

=

B... C

· · · · · · · · ·

D... E

,

where B =[

3 32 1

], C =

[41

], D =

[1 3], E =

[5]

and A−1 =

P... Q

· · · · · · · · ·

R... S

, where P,Q,R and S are given by

S = (E−DB−1C)−1, R = −SDB−1, P = B−1 −B−1CR, Q = −B−1CS.

Now, B−1 = −13

[1 −3

−2 3

]=

13

[−1 3

2 −3

].

E−DB−1C = 5−[1 3] 1

3

[−1 3

2 −3

] [41

]=

13.

S = 3

R = −3[1 3] 1

3

[−1 3

2 −3

]=[−5 6

]P = B−1 −B−1CR =

13

[−1 3

2 −3

]−1

3

[−1 3

2 −3

] [41

] [−5 6

]=[−2 3

9 −11

].

Q = −13

[−1 3

2 −3

] [41

]3 =

[1

−5

].

Therefore, A−1 is given by,

A−1 =

−2 3 19 −11 −5

−5 6 3

.

Rank of a Matrix 211

Hence, the solution of the given system of equations is given by

x = A−1b =

−2 3 19 −11 −5

−5 6 3

576

=

17−62

35

.Hence the required solution is x1 = 17, x2 = −62, x3 = 35.

3.10 Rank of a Matrix

Rank of a matrix A of order m× n is defined to be the greatest positive integer r such that

(i) there exist at least one square sub-matrix of A of order r, whose determinant is notequal to zero, and

(ii) the determinant of every (r + 1) order square sub-matrix in A is zero.

In other words, the rank of A is defined to be the greatest positive integer r such that Ahas at least one non-zero minor of order r, it is denoted by ρ(A) and r(A). Now,

(i) Rank of a zero matrix is defined to be 0.

(ii) Every minor of order greater than (r+1) can be expressed in terms of minors of order(r + 1). So every minor of order greater than r is zero.

(iii) Rank of a non-singular square matrix of order n is n and rank of a singular squarematrix of order n is less than n. The rank of an unit matrix of order n is n.

(iv) For a non-zero m× n matrix A, we have, 0 < rank of A < minm,n.

(v) Rank of A = rank of AT , since A and AT have identical minors.

So we are to first take the higher order minor and to continue by decreasing the order ofminor by one, until the rank of the matrix is obtained, i.e., we have to come in such positionwhen the minor becomes non-zero. If the order of given matrix be greater than 3, then thismethod becomes laborious in general.

Ex 3.10.1 Find the ranks of the following matrices:

(a) A =

1 0 −11 2 30 1 0

(b) B =

1 2 2−1 0 −22 1 4

, (c) C =[

1 2 32 4 6

].

Solution: (a) det A = −4 6= 0. Therefore, the rank of the matrix A = 3.(b) det B = 0 as first and third columns are identical. The rank B is not 3. But∣∣∣∣ 1 2

−1 0

∣∣∣∣ = 2 6= 0 ( a submatrix of order 2). Hence rank of B is 2.

(c) The submatrices of order 2 of C are∣∣∣∣1 22 4

∣∣∣∣ ; ∣∣∣∣1 32 6

∣∣∣∣ ; ∣∣∣∣2 34 6

∣∣∣∣.The determinants of all these submatrices are zero. Therefore, rank C is less than 2. But,all the submatrices of order one, viz., [1], [2], [3], [4], [5], [6], there is at least one non-zeroelement and hence rank of C is 1.


3.10.1 Elementary Operation

Now, we are tom present some convenient operations by which the rank can be easily ob-tained. Elementary operation is such an operation or transformations. When the trans-formations are applied to rows, the elementary transformations are said to be elementaryrow transformations and when applied to columns they are said to be elementary columntransformations.

The following operations on a matrix A = [aij ]m×n are called elementary operations:

(i) Interchange of any two rows (or columns) that is replace the rth row [ar1 ar2 · · · arn]by the sth row [as1 as2 · · · asn] and replace [as1 as2 · · · asn] by [ar1 ar2 · · · arn].It is denoted by Rrs ( or Crs ).

(ii) Multiplication of ith row (or ith column ) by a non-zero scalar c is denoted by cRi ( orcCi) or Ri(c) ( or Ci(c) ). Multiply ith row of A by c 6= 0, i.e., replace [ai1 ai2 · · · ain]by [cai1 cai2 · · · cain].

(iii) Addition of c times the jth row (or column) to the ith row (or column) is denoted byRi + cRj (or Ci + cCj) or Rij(c) (or Cij(c)). Add c times row j of A to row i of A,i 6= j, i.e., replace [ai1 ai2 · · · ain] by [ai1 + caj1 ai2 + caj2 · · · ain + cajn].

Comparing the elementary operations with the properties of determinants, we observethat after elementary transformations, a singular matrix remains singular and the determi-nant of the non-singular matrix is altered to the extent of a non-zero scalar multiple. Also,the determinants of the submatrices of all orders in any m×n matrix are affected similarly.Therefore, elementary transformations do not affect the rank of a matrix.

Ex 3.10.2 Find the rank of A =

3 5 72 1 31 4 4

by minor method.

Solution: Let A be a 3× 3 matrix and it has minors of 1, 2, 3. Minor of order 3 is

=

∣∣∣∣∣∣3 5 72 1 31 4 4

∣∣∣∣∣∣ =∣∣∣∣∣∣0 0 02 1 31 4 4

∣∣∣∣∣∣ = 0;R1 − (R2 +R3).

Hence the rank of A is < 3. The second order minors constructed from the first two rowsare

=∣∣∣∣3 72 3

∣∣∣∣ = −5;∣∣∣∣5 71 3

∣∣∣∣ = 8;∣∣∣∣3 52 1

∣∣∣∣ = −7.

Similarly, we can construct minors by using first and third, and, second ant third rows. Thusthe rank of the given matrix A is 2.

Ex 3.10.3 Determine the rank of A =

k 1 03 k − 2 1

3(k + 1) 0 k + 1

, for different values of k.

Solution: Using the elementary row and column operations, we get k 1 03 k − 2 1

3(k + 1) 0 k + 1

∼

k 1 00 k − 2 10 0 k + 1

= B(say), c13(−3).

Now, |B| =

∣∣∣∣∣∣k 1 00 k − 2 10 0 k + 1

∣∣∣∣∣∣ = k(k − 2)(k + 1).


If k 6= 0, 2,−1, the |B| 6= 0 and the rank of the given matrix is 3. If k = 0, one minor of

the equivalent matrix B is(

1 0−2 1

)= 1 6= 0. Therefore, the rank of the given matrix is 2.

Similarly, the rank is 2 for k = 2 or k = −1. Hence the rank of the given matrix is either 3or 2 for different values of k.

Ex 3.10.4 Determine the rank of a matrix A =

1 2 1 02 4 8 60 0 5 83 6 6 3

.

Solution: Let us apply elementary row operations on A to reduce it to a row-echelonmatrix.

A =

1 2 1 02 4 8 60 0 5 83 6 6 3

R2 − 2R1

→R4 − 3R1

1 2 1 00 0 6 60 0 5 80 0 3 3

16R2

→

1 2 1 00 0 1 10 0 5 80 0 3 3

R1 −R2

→R3 − 5R2

R4 − 3R2

1 2 0 −10 0 1 10 0 0 30 0 0 0

−−→13R3

1 2 0 −10 0 1 10 0 0 10 0 0 0

R1 +R3

→R2 −R3

1 2 0 00 0 1 00 0 0 10 0 0 0

= R, say.

R is a row-reduced echelon matrix and R has 3 non-zero rows. Therefore rank R = 3. SinceA is row equivalent to R, rank A = 3.

3.10.2 Row-reduced Echelon Matrix

An m × n matrix A is said to be row-reduced echelon matrix if it satisfies the followingproperties:

(i) All zero roes if there be any, appear at the bottom of the matrix.

(ii) The first non-zero entry from the left of a non-zero row is 1. This entry is calledleading one of its row.

(iii) For each non-zero row, the leading one appears to the right and below any leadingone’s in preceding rows.

(iv) If a column contains a leading one, then all other entries in that column are zero.

For example,

1 0 00 1 00 0 1

,

1 0 10 1 30 0 0

are examples of row reduced echelon matrix. Any matrix

A of order m× n and rank r(> 0) can be reduced to one of the following forms Ir... 0

. . . . . . . . .

0... 0

,(Ir

... 0),

Ir· · ·0

, [Ir]

by a sequence of elementary transformations. Those reduced forms are called normal formof A.


Ex 3.10.5 Find the rank of the matrix A =

3 −1 2−6 2 −4−3 1 −2

by normalization method.

Solution: By using the elementary transformations, we get,

AR21(2)∼

R31(1)

3 −1 20 0 00 0 0

C21( 13 )

∼C31(− 2

3 )

3 0 00 0 00 0 0

R1( 13 )

∼

1 0 00 0 00 0 0

.

Thus the rank of the given matrix A is 1.

Ex 3.10.6 Obtain the fully reduced form of the matrix

0 0 1 2 11 3 1 0 32 6 4 2 83 9 4 2 10

.

Solution: Let us apply elementary operations on the matrix.0 0 1 2 11 3 1 0 32 6 4 2 83 9 4 2 10

R12

∼

1 3 1 0 30 0 1 2 12 6 4 2 83 9 4 2 10

R3 − 2R1

∼R4 − 3R1

1 3 1 0 30 0 1 2 10 0 0 2 20 0 1 2 1

R1 −R2

∼R3 − 2R2

R4 −R2

1 3 0 −2 20 0 1 2 10 0 0 −2 00 0 0 0 0

− 12R3

∼

1 3 0 −2 20 0 1 2 10 0 0 1 00 0 0 0 0

R1 + 2R3

∼R2 − 2R3

1 3 0 0 20 0 1 0 10 0 0 1 00 0 0 0 0

C2 − 3C1

∼C5 − 2C1

1 0 0 0 00 0 1 0 10 0 0 1 00 0 0 0 0

C5 − C3

∼

1 0 0 0 00 0 1 0 00 0 0 1 00 0 0 0 0

C23

∼

1 0 0 0 00 1 0 0 00 0 0 1 00 0 0 0 0

C34

∼

1 0 0 0 00 1 0 0 00 0 1 0 00 0 0 0 0

= R, say.

R is the fully reduced normal form.

Deduction 3.10.1 Solution of system of linear equations by rank of the matrixmethod : Here we shall be concerned with the numerical computation of the solution ofa system of n linear algebraic equations relating in n unknowns x1, x2, ..., xn of the explicitform (3.19) where the n2 coefficients aij and the n constants b1, b2, ..., bn are given realnumbers. The (3.19) can be written in the matrix notation as Ax = b where the realn × n coefficient matrix is A in which aij is the coefficient of xj in the ith equation, bT =[b1, b2, ..., bn] is a column n vector which are prescribed and xT = [x1, x2, ..., xn] is theunknown n column vector.

(i) A nonhomogeneous system of n equations in n unknowns has an unique solution ifand only if A is non singular, i.e., det(A) 6= 0.

(ii) If r(A), r(A, b) be rank of coefficient and the augmented matrix respectively, the nec-essary and sufficient condition for the existence of a unique solution of the consistentsystem AX = b is r(A) = r(A, b)=number of unknowns.


(iii) If r(A) 6= r(A, b) then the equations are inconsistent or over determined and they haveno solution. If b = 0 and det(A) 6= 0, then the system has the only unique trivialsolution x = 0.

(iv) If r(A) = r(A, b) < the number of unknowns, then the equations have infinite numberof solutions.

Homogeneous system of equations lead to eigenvalue problems and such system possessesonly a trivial solution if |A| 6= 0 and a non trivial solution if r(A) = k < n.

This is shown in Figure 3.3.

?Has a solution

[rank (Ac) = rank (A)]Has no solution

[rank (Ac) 6= rank(A)]

?

?

A system of linear equations

? ?Has a unique solution

[rank (Ac) = rank (A) = n]Has infinite many solutions[rank (Ac) = rank (A) < n]

?

Figure 3.3: Different cases for existence of solutions.


x+ y + z = 1, x+ 2y − z = b, 5x+ 7y + az = b2


Solution: The given system of linear equation can be written in the form Ax = B, wherethe augmented matrix is given by

(A|B) =

1 1 1

... 1

1 2 −1... b

5 7 1... b2

.

Let us apply the elementary row and column operations on (A|B) as

(A|B)R2 −R1

∼R3 − 5R1

1 1 1

... 1

0 1 −2... b− 1

0 2 a− 5... b2 − 5

R1 −R2

∼R3 − 2R2

1 0 3

... −b+ 2

0 1 −2... b− 1

0 0 a− 1... b2 − 2b− 3

.

(i) If a 6= 1, then the rank of A and (A|B) are 3 = the order of the matrix. Therefore inthis case the system has an unique solution.(ii) If a = 1 and b2 − 2b − 3 6= 0, then rank of (A|B) = 3 and rank of A = 2 and thereforethe system is in inconsistent. Thus if a = 1, b 6= −1, 3 then the system has no solution.(iii) If a = 1 and b2 − 2b − 3 = 0, then rank of (A|B) = rank of A = 2 and therefore thesystem is consistent. Thus if a = 1, b = −1 or a = 1, b = 3 then the system has infinitenumber of solutions.



x1 + 4x2 + 2x3 = 1, 2x1 + 7x2 + 5x3 = 2b, 4x1 + ax2 + 10x3 = 2b− 1


Solution: If the matrix form of the system is AX = B, then

A =

1 4 22 7 54 a 10

, X =

x1

x2

x3

, B =

12b

2b+ 1

.Let us apply the elementary row and column operations on (A|B) as

(A|B) =

1 4 2

... 1

2 7 5... 2b

4 a 10... 2b+ 1

R21(−2)

∼R31(−4)

1 4 2

... 1

0 −1 1... 2b− 2

0 a− 16 2... 2b− 3

R2(−1)∼

R32(−2)

1 4 2 10 1 −1 2− 2b0 a− 14 0 1− 2b

.(i) Row-reduced form of

A =

1 4 20 −1 10 a− 14 0

The solution of the system will be unique, if ρ(A) = 3. For this a − 14 6= 0, i.e., a 6= 14.(ii) The system has no solution, if ρ(A) 6= ρ(C). If a = 14 and 1 − 2b 6= 0, i.e., b 6= 1

2then ρ(A) = 2 and ρ(C) = 3. In this case the system has no solution. (iii) If a = 14 andb = 1

2 then ρ(A) = ρ(C) = 2. The system is consistent and one (3− 2) variable is free. Theequations are equivalent to

x1 + 4x2 + 2x3 = 1, x2 − x3 = 1.

Considering x3 as arbitrary, x1 = −3− 6x3, x2 = 1 + x3. Putting rational values to x3, aninfinite number of solutions of the system over the field of rational numbers is obtained.

3.11 Elementary Matrices

A square matrix obtained from a unit matrix In of order n by a single elementary trans-formation is called an elementary matrix of order n. There are three different forms ofelementary matrices:

(i) The matrix Eij is obtained by interchanging the ith and jth rows ( or columns) of anunit matrix. Also, |Eij | = −1.

(ii) The matrix Ei(c) is obtained by multiplying the ith row (or column) of an unit matrixby a non-zero scalar c. Also, |Ei(c)| = c 6= 0.

(iii) Eij(c) is obtained by multiplying every element of the jth row by c of an unit matrix andadding them to the corresponding elements of the ith row of the matrix. |Eij(c)| = 1.

Elementary Matrices 217

Therefore, an elementary matrix is non-singular. Every elementary row (column) transfor-mation on a matrix can be brought by pre(post)-multiplication with an elementary matrix.Now,

(i) The interchange of the ith and jth rows of Eij will transform Eij to the unit matrix.This transformation is effected on pre-multiplication by Eij . Thus

EijEij = I ⇒ (Eij)−1 = Eij .

(ii) If the ith row of Ei(c) is multiplied by 1c , it will transform to the unit matrix. It is

nothing but pre-multiplication by Ei

(1c

), i.e.,

Ei

(1c

)Ei(c) = I ⇒ [Ei(c)]−1 = Ei

(1c

).

(iii) Similarly, Eij(−c)Eij(c) = I gives [Eij(c)]−1 = Eij(−c).

Thus, the inverse of an elementary matrix is an elementary matrix of the same type.An elementary matrix is that, which is obtained from a unit matrix, by subjecting it to

any of the elementary transformations. Let

I3 =

1 0 00 1 00 0 1

= E0, E1 =

1 0 00 0 10 1 0

, E2 =

k 0 00 1 00 0 1

, E3 =

1 k 00 1 00 0 1

be three matrices obtained from the unit matrix I3 by elementary row transformations as

I3R23→ E1, I3

kR1→ E2, I3

R1 + kR2→ E3.

The matrices E1, E2, E3 obtained from a unit matrix by elementary operations are referredas left-elementary matrices. The elementary row (column) transformations of a matrix A canobtained by pre-multiplying (post-multiplying) A by the corresponding elementary matrices.

Consider the matrix A =

a1 b1 c1a2 b2 c2a3 b3 c3

. Then

E1A =

1 0 00 0 10 1 0

a1 b1 c1a2 b2 c2a3 b3 c3

=

a1 b1 c1a3 b3 c3a2 b2 c2

E2A =

k 0 00 1 00 0 1

a1 b1 c1a2 b2 c2a3 b3 c3

=

ka1 kb1 kc1a2 b2 c2a3 b3 c3

E3A =

k 0 01 0 00 0 1

a1 b1 c1a2 b2 c2a3 b3 c3

=

a1 + ka2 b1 + kb2 c1 + kc3a2 b2 c2a3 b3 c3

.Here we see that the matrix A is subjected to the same elementary row transformationsR23, kR1 and R1 + kR2 respectively, as the unit matrix to obtain E1, E2, E3. Similarly,we obtain elementary column transformations of a matrix A by post-multiplying it with amatrix known as right elementary matrix.

Theorem 3.11.1 If A be an n× n matrix, the following are equivalent

(i) A is invertible

(ii) A is row-equivalent to the n× n identity matrix.


(iii) A is the product of elementary matrices.

Proof: Let R be the row reduced echelon matrix which is row equivalent to A, then

R = Ek · · · E2 E1 A,

where E1, E2, · · · , Ek are elementary matrices. We know, each elementary matrix E isinvertible, so,

A = E−11 E−1

2 · · · E−1k R.

As product of invertible matrices are invertible, we see that A is invertible if and only if Ris invertible. Since the square matrix R is a row-reduced echelon matrix, R is invertible ifand only if each row of R contains a non-zero entry, i.e., if and only if R = I, the identitymatrix. We have now shown that A is invertible if and only if R = I and if R = I, thenA = E−1

1 E−12 · · ·E−1

k . Thus (i), (ii) and (iii) are equivalent statements about the n × nmatrix A.

Note : If A be an invertible n×n matrix and if a sequence of elementary row operationsreduces A to the identity, then that same sequence of operations when applied on I yieldsA−1

Ex 3.11.1 Find the inverse of A where A =

1 1 22 4 43 3 7

.

Solution: Let us form the 3 × 3 matrix (A|I3) and perform elementary row operations toreduce A to a row-reduced echelon matrix.

(A|I3) =

1 1 2

... 1 0 0

2 4 4... 0 1 0

3 3 7... 0 0 1

R2 − 2R1

∼R3 − 3R1

1 1 2

... 1 0 0

0 2 0... −2 1 0

0 0 1... −3 0 1

12R2

∼

1 1 2

... 1 0 0

0 1 0... −1 1

2 0

0 0 1... −3 0 1

R1 −R2

∼

1 0 2

... 2 − 12 0

0 1 0... −1 1

2 0

0 0 1... −3 0 1

R1 − 2R3

∼

1 0 0

... 8 − 12 −2

0 1 0... −1 1

2 0

0 0 1... −3 0 1

= (I3|A−1).

Therefore A−1 =

8 − 12 −2

−1 12 0

−3 0 1

.3.11.1 Equivalent Matrices

Two matrices A and B are said to be equivalent, if it is possible to pass from one to theother by a chain of elementary transformations and this fact is written as A ∼ B. Theequivalent matrices have the following properties

(i) Any non-singular matrix is equivalent to the unit matrix.


(ii) An m × n matrix B is equivalent to A, if and only if B = PAQ, where P and Q aretwo suitable non-singular matrices of order m and n respectively.

(iii) If B = PA, then B is row equivalent to A and if B = AB, then B is column equivalentto A.

(iv) Equivalent matrices have the same rank.

(v) All the operations performed on Q matrix A are elementary row operations and there-fore this fact is equivalent to multiply A on the right by a suitable non-singular matrixP . Thus, PA = I, when |A| 6= 0 and |P | 6= 0, which gives A = P−1. Therefore, P−1 isthe product of some elementary matrices. Consequently, a non-singular matrix is theproduct of some elementary matrices.

Ex 3.11.2 Express A =(

1 22 5

)as the product of elementary matrices.

Solution: Applying elementary row operations on A we get,(1 22 5

)R2 − 2R1

∼

(1 20 1

)R1 − 2R2

∼

(1 00 1

).

A is reduced to row reduced echelon equivalent to I2, so A is non-singular. Now,

(R1 − 2R2)(R2 − 2R1)A = I2

or, E12(−2)E21(−2)A = I2

or, A = [E12(−2)]−1[E21(−2)]−1 = E21(2)E12(2).

A has been expressed as the product of two elementary matrices E21(2), E12(2). Also,

A−1 = E12(2)−1E21(2)−1 = E12(−2)E21(−2)

=(

1 −20 1

)(1 0−2 1

)=(

5 −2−2 1

).

Ex 3.11.3 Show that the matrix

2 0 13 3 06 2 3

is non-singular and express it as a product of

elementary matrices.

Solution: Let the given matrix be denoted by A. We apply elementary row operations onA to reduce it to row-reduced echelon matrix.

A12R1

∼

1 0 12

3 3 06 2 3

R2 − 3R1

∼R3 − 6R1

1 0 12

0 3 − 32

0 2 0

13R2

∼

1 0 12

0 1 − 12

0 2 0

R3 − 2R2

∼

1 0 12

0 1 − 12

0 0 1

R1 − 12R3

∼R2 + 1

2R3

1 0 00 1 00 0 1

.

Since A is row equivalent to I3, A is non-singular. We observe that,

(R2 +12R3)(R1 −

12R3)(R3 − 2R2)(

13R2)(R3 − 6R1)(R2 − 3R1)(

12R1)A = I3

or, E23(12)E13(−

12)E32(−2)E2(

13)E31(−6)E21(−3)E1(

12)A = I3

or, A = [E1(12)]−1[E21(−3)]−1[E31(−6)]−1[E2(

13)]−1[E32(−2)]−1[E13(−

12)]−1[E23(

12)]−1

or, A = E1(2)E21(3)E31(6)E2(3)E32(2)E13(12)E23(−

13).


3.11.2 Congruent Matrices

Let A and B be n × n matrices over a field F . A matrix A is said to be congruent tomatrix B, written A ' B, if there exists an matrix P (which may not be a square matrix)

over F such that B = PTAP. For example, let A =(

2 −12 3

)and B =

(6 6−3 9

), then ∃ a

non-singular matrix P =(

1 21 −1

)such that

PTAP =(

1 12 −1

)(2 −12 3

)(1 21 −1

)=(

6 6−3 9

)= B.

Thus, the matrix A is congruent to the matrix B and the matrix P is called the transformingmatrix for the congruence from A to B.

(i) Operations under elementary congruent transformations is known as congruence op-eration.

(ii) If B = PTAP under congruence operation then A and B have the same rank.

An elementary congruent transformation of a matrix is defined as a pair of elementarytransformations, one of which is with rows and the other is the corresponding transformationwith the columns.

3.11.3 Similar Matrices

Let A and B be two square matrices of the same order n over the field F . If there is aninvertible n× n matrix P over F such that B = P−1AP ; then B is similar to A over F , or,

B is obtained from A by a similarity transformation. For example, let A =(

5 −12 1

)and

B =(

5 1−2 1

), then ∃ a non-singular matrix P =

(1 24 7

)such that

P−1AP =(−7 24 −1

)(5 −12 1

)(1 24 7

)=(

5 1−2 1

)= B.

Thus, the matrix A is similar to the matrix B.

Theorem 3.11.2 Similarity of matrices is an equivalence relation over the set of n × nmatrices over F .

Proof: Let A,B and C be n × n matrices over F . Let ρ be the relation of similaritybetween matrices. That is, AρB if there exists an invertible matrix P of order n × n suchthat B = P−1AP. Now,(i) Since, IA = AI, I being the n× n identity matrix, it follows that A = I−1AI, for everyn× n matrix A. Hence AρA, ∀A in the set od n× n matrices. So the relation ρ is reflexive.(ii) Now, suppose that AρB holds. Then there exists an invertible matrix P such that

B = P−1AP ⇒ PB = AP

⇒ A = PBP−1 = (P−1)−1BP−1.

This shows that BρA holds. Hence ρ is symmetric.(iii) Finally, suppose that Aρb and BρC hold, then there exist invertible matrices P and Qsuch that

A = P−1BP and B = Q−1CQ

⇒ A = P−1(Q−1CQ)P = (QP )−1C(QP ).


Since QP is invertible, it follows that AρC holds. Hence ρ is transitive. Thus we see that,the relation ρ is reflexive, symmetric and transitive and hence it is an equivalence relation.

Exercise 3


1. If A =[

2 34 5

], B =

[−1 0

2 3

]then 2A−B is

(a)[

1 00 1

](b)

[2 31 −1

](c)[

3 32 2

](d)

[5 66 7

]

2. If A =[

1 23 4

]then A2 − 5A is equal to

(a) I (b) 2I (c) 0 (d) A− I

3. If A =[


]then A3 is

(a)[

cos 3θ sin 3θ− sin 3θ cos 3θ

](b)

[cos3 θ sin3 θ

− sin3 θ cos3 θ

](c)[

cos 2θ sin 2θ− sin 2θ cos 2θ

](d)

[cos θ sin θ

− sin θ cos θ

]4. Matrix A has p rows and p + 5 columns. Matrix B has q rows and 11 − q columns.

Both AB and BA exist. The values of p and q are(a) p = 2, q = 3 (b) p = 3, q = 8 (c) p = q = 3 (d) p = 0, q = 0

5. If A+B =[

1 −22 0

]and A−B =

[3 02 6

]then A is

(a)[

2 −12 3

](b)

[−1 −1

0 −3

](c)[

2 03 1

](d)

[4 −24 6

]

6. If A =[

2 11 2

]then A2 − 4A+ 3I is equal to

(a) A (b) I (c) −I (d) 0

7. The value of [2 3 4]×

1 2 30 1 53 1 −1

is

(a) [2 0 12] (b) [4 3 4] (c) [6 15 − 4] (d) [14 11 17]

8. If 2A+[

2 3 50 −1 2

]=[

7 8 54 −1 4

]then A is

(a)[

5 5 04 0 2

](b)

[5/2 5/2 02 0 1

](c)[

9/2 11/2 52 −1 3

](d)

[2 3 40 1 −1

]

9. If A =[

1 00 1

], B =

[2 34 5

], C =

[−1 −1−2 −3

], then the value of AB +AC is

(a)[

1 22 2

](b)

[3 46 8

](c)[

3/2 23 4

](d)

[1/2 11 1

]


10. If[x 12 y

] [24

]=[

10

]then the values of x and y are

(a) x = −3/2, y = −1 (b) x = −3, y = 7 (c) x = 1/2, y = 2/3 (d)x = y = 7/5

11. The number of 2×2 matrices over Z3( the field with three elements) with determinant1 is [IIT-JAM’10](a) 24 (b) 60 (c) 20 (d) 30

12. The value of the determinant

∣∣∣∣∣∣a b c1 2 02 4 0

∣∣∣∣∣∣ is

(a) a+b+c (b) 2 (c) 4 (d) 0


∣∣∣∣∣∣0 a b

−a 0 c−b −c 0

∣∣∣∣∣∣ is

(a) 0 (b) abc (c) −abc (d) 2abc

14. The value of

∣∣∣∣∣∣2000 2001 20022003 2004 20052006 2007 2008

∣∣∣∣∣∣ is

(a) 2000 (b) 0 (c) 45 (d) none of these [WBUT 2007]

15. The value of

∣∣∣∣∣∣x x+ 1 x+ 2

x+ 3 x+ 4 x+ 5x+ 6 x+ 7 x+ 8

∣∣∣∣∣∣ is

(a) x (b) x3 (c) 0 (d) x+ 1

16. The value of

∣∣∣∣∣∣5 10 200 5 100 0 2

∣∣∣∣∣∣ is

(a) 10 (b) 50 (c) 100 (d) 200

17. The value of the skew-symmetric determinant of odd order is

(a) a perfect square (b) 0 (c) 1 (d) none of these

18. The value x of the equation

∣∣∣∣∣∣x− 2 2 5x− 7 3 62x− 6 4 7

∣∣∣∣∣∣ = 0 is

(a) − 6 (b) 0 (c) 3 (d) 5

19. The root of the equation

∣∣∣∣∣∣1 1 1x α βx2 α2 β2

∣∣∣∣∣∣ = 0 are

(a) 1, 1 (b) α, β (c) α + β, α − β (d) α + 1, β + 1

20. One factor of the determinant

∣∣∣∣∣∣x+ 2 3 3

3 x+ 4 53 5 x+ 4

∣∣∣∣∣∣ = 0 is

(a) x − 1 (b) x − 2 (c) x − 3 (d) x + 1



∣∣∣∣∣∣1 2 32 3 43 4 5

∣∣∣∣∣∣ is

(a) 2 (b) 6 (c) 0 (d) 1


∣∣∣∣∣∣1 2 12 3 −21 4 −1

∣∣∣∣∣∣ is

(a) 5 (b) 0 (c) 10 (d) 15

23. If α, β are the roots of the equation x2 − 2x+ 5 = 0 then the value of

∣∣∣∣∣∣1 β β0 α − βα 0 0

∣∣∣∣∣∣ is

(a) 10 (b) 0 (c) α− β (d) 5

24. If ∆1 and ∆2 be two determinants such that their values are respectively 5 and 10.Then the value of their product is

(a) 5 (b) 10 (c) 50 (d) 2

25. Let ∆1 =

∣∣∣∣∣∣5 10 150 2 00 0 1

∣∣∣∣∣∣ and ∆2 =

∣∣∣∣∣∣5 0 00 2 00 0 −1

∣∣∣∣∣∣. Then ∆1 ∆2 is

(a) 0 (b) 100 (c) −100 (d) 10

26. Let A be a square matrix such that AAT = I then det A is equal to(a) 0 (b) ±2 (c) ±1 (d) none of these

27. The value of

∣∣∣∣∣∣∣∣∣∣0 a b c d

−a 0 a −b d−b −a 0 a c−c b −a 0 d−d −d −c −d 0

∣∣∣∣∣∣∣∣∣∣is

(a) 0 (b) abcd (c) −abcd (d) 1

28. The cofactor of the element 2 in the determinant

∣∣∣∣∣∣1 2 00 4 31 −2 4

∣∣∣∣∣∣ is

(a) 3 (b) −3 (c) 4 (d) 22

29. The minor of the element 1 of the determinant

∣∣∣∣∣∣−1 2 6

2 1 43 2 −1

∣∣∣∣∣∣ is

(a) −15 (b) −17 (c) 17 (d) 15

30. If ∆′ be the adjoint determinant of the determinant ∆ of order 4 then the value of ∆′

is

(a) ∆ (b) ∆2 (c) ∆3 (d) ∆4

31. If the value of a determinant is 5 and if its first row is multiplied by 3 then the valueof the new determinant is

(a) 3 (b) 5 (c) 15 (d) 5/3


32. The determinant of the matrix

1 0 0 0 0 20 1 0 0 2 00 0 1 2 0 00 0 2 1 0 00 2 0 0 1 02 0 0 0 0 1

is NET(Dec)11

(a) 0 (b) -9 (c) -27 (d) 1.

33. The matrix[a 23 1

]is singular when a is equal to

(a) 1 (b) 2 (c) 3 (d) 6

34. The matrix

0 1 −21 λ 32 1 2

is singular when λ is equal to

(a) −1/2 (b) −2 (c) 6 (d) −1/2

35. The adjoint of the matrix[

2 13 −2

]is

(a)[

2 13 −2

](b)

[−2 −1−3 2

](c)[−2 3

1 2

](d)[

1 2−2 3

]36. The inverse of the matrix

[3 12 1

]is

(a)[

1 −1−2 3

](b)

[1 12 3

](c) 1

2

[1 −1

−2 3

](d)

[3 12 1

]

37. The inverse of the matrix

2 3 10 a −10 0 −3

exists if

(a) a = 0 (b) a = 2 (c) a = −3 (d) a 6= 0

38. The matrix[

cosα sinα− sinα cosα

]is orthogonal for

(a) all values of α (b) α = π/2 (c) α = 0 (d) α = π

39. The matrix

cos θ − sin θ 0sin θ cos θ 0

0 0 2k

is orthogonal when k is equal to

(a) ±1 (b) ±1/2 (c) ±1/3 (d) ±1/4

40. If A is an orthogonal matrix then det A is equal to

(a) 0 (b) ±2 (c) ±4 (d) ±1

41. If A is an orthogonal matrix then A−1 is equal to

(a) A (b) A2 (c) AT (d) none of these

42. If A is an orthogonal matrix then which matrix is not orthogonal

(a) 2A (b) AT (c) A−1 (d) A2

43. Three matrices A, B, C are such that AB = AC. Then B = C when

(a) A is singular (b) A is null (c) A is non-singular (d) for all A


44. If A is a singular matrix then AB is

(a) singular (b) non-singular (c) orthogonal (d) symmetric

45. The rank of the matrix[

1 23 0

]is

(a) 0 (b) 2 (c) 1 (d) 3

46. The rank of the matrix

2 0 0 00 2 0 00 0 4 0

(a) 4 (b) 3 (c) 2 (d) 1

47. The rank of the matrix[

2 3 44 6 8

](a) 1 (b) 2 (c) 3 (d) none of these


2 0 0 00 0 0 10 0 0 2

is

(a) 0 (b) 1 (c) 2 (d) 3


1 x x xx 1 x xx x 1 xx x x 1

is one when

(a) x = 0 (b) x = 1 (c) x = 2 (d) x = −1/3

50. If the rank of the matrix

2 4 22 1 21 0 x

is 2 then the value of x is

(a) 0 (b) 1 (c) 2 (d) 3

51. If A and B are two matrices such that AB is determinable. Then rank(AB) is equalto

(a) rank A (b) rank B (c) minrank A, rank B (d) maxrank A, rank B

52. If A =[

2 1 01 2 3

]and B =

3 2 01 2 53 3 4

then rank(AB) is

(a) 2 (b) 3 (c) 1 (d) none of these

53. If the rank of the matrix A is 5 then the rank of the matrix 7A is

(a) 1 (b) 5 (c) 7 (d) 12

54. If P is a non-zero column matrix and Q is a row matrix then the rank of PQ is


55. The following system of equations x+ 2y = 3, 2x+ ay = b has unique solution if(a) a = 5 (b) a = 4 (c) a = 4, b = 1 (d) a 6= 4

56. The system of equations x+ 4y + z = 0, 4x+ y − z = 0 has(a) unique solution (b) many solutions (c) no solution


57. The system of equations x+ y = 3, x+ ay = b has no solution if(a) a = 1, b = 3 (b) a = 1, b 6= 3 (c) a 6= 1, b = 3 (d) a 6= 1, b 6= 3

58. For what value of a the system of equations x+y+z = 1, x+2y−z = 3, 5x+7y+az = 9has many solutions(a) 4 (b) 1 (c) 3 (d) 0

59. Let A and Ac represent respectively the coefficient and augmented matrices of a systemof n equations containing n variables. The system of equation AX = b has manysolutions if(a) rank(A) 6= rank(Ac) (b) rank(A) = rank(Ac) = n(c) rank(A) =rank(Ac)< n (d) none of these

60. LetA,B be n×n real matrices. Which the following statements is correct?NET’2012(June)

(a) rank(A+B) = rank (A)+rank(B)

(b) rank(A+B) ≤ rank (A)+rank(B)

(c) rank(A+B) = minrank(A), rank(B)(d) rank(A+B) = maxrank(A), rank(B)

61. Let AX = b be a system of equations with n equations and n variables and Ac be theaugmented matrix [A : b]. The system has a unique solution if(a) rank(A) = rank(Ac) = n (b) rank(A) = rank(Ac) < n,(c) rank(A) = rank(Ac) < n− 1 (d) rank(A) 6= rank(Ac)

62. The system of equations AX = b, where A is the coefficient matrix of order n× n, isconsistent if(a) rank(A) = rank(Ac)< n (b) rank(A) = rank(Ac) = n(c) rank(A) 6= rank(Ac) , (d) rank(A) = rank(Ac) ≤ n

63. Let A be a 5 × 4 matrix with real entries such that the space of all solutions of thelinear system AXt = [1, 2, 3, 4, 5]t is given by

[1 = 2s, 2 + 3s, 3 + 4s, 4 + 5s]t; s ∈ <,

the rank of A is equal to NET(Dec)11(a) 4 (b) 3 (c) 2 (d) 1.

64. The values of k, for which k(1, 0, 1) is/are the solution(s) of the system of equationsx+ 2y − z = 0, 2x+ y − 2z = 0 is(a) k = 1 (b) k 6= 1 (c) k is any value (d) none of these

65. A system of equations is called consistent if it has(a) no solution (b) has unique solution(c) has many solutions (d) none of these

66. The system of equations x1 + 2x2 = 6, 3x1 + 6x2 = 5 is(a) consistent (b) inconsistent

67. The solution of the following system of equations x1 + 2x2 + 3x3 = 6, −x2 + 3x3 = 4,x3 = 2 is(a) (1, 1, 1) (b) (2, 1, 1) (c) (0, 2, 2) (d) (−4, 2, 2)

68. The number of solutions of the equations x+ 2y = 5, 2x+ 4y = 3 is(a) 1 (b) infinite (c) 2 (d) 0


69. The system of equations 2x− y + 3z = 9, x+ y + z = 6, x− y + z = 2 has(a) a unique non-zero solution (b) infinitely many solutions (c)no solution

(d) zero solution

70. Let S =A : A = [aij ]5×5, aij = 0or1,∀i, j;

∑j aij = 1,∀i and

∑i aij = 1,∀j

, then

the number of elements in S is NET(June)11(a) 52 (b) 55 (c) 5! (d) 55

71. Let D be a non zero n×n real matrix with n ≥ 2. Which of the following implicationsis valid? NET(June)11(a) det(D) = 0 implies rank(D) = 0 (b) det(D) = 1 implies rank(D) 6= 1(c) det(D) = 1 implies rank(D) 6= 0 (d) det(D) = n implies rank(D) 6= 1


1. Give an example of a 3× 3 skew-Hermitian matrix.

2. Let A = X + iY be a skew-Hermitian matrix. Show that the diagonal elements of Xare all purely imaginary or 0 and Y is a real symmetric matrix.

3. Let A be a 3× 3 real matrix with det(A) = 6. Then show that det(adjA) = 36.

4. Let A be m×m real matrix. Show that the row rank of A is the same as the columnrank of A.

5. Find the column rank of the matrix A =

1 4 62 5 93 6 12

.

6. Let A be a nonsingular real matrix of order n. Show that det(A−1) = (detA)−1.

7. Consider the group of all non-singular 3×3 real matrices under matrix multiplication.

Show that the two matrices A =

1 0 01 3 01 2 1

and B =

3 0 40 1 00 0 1

are conjugate.


1. Obtain A+B and A−B in each of the following cases:

(a) A =

3 −92 −75 −6

; B =

1 6−3 08 −11

.

(b) A =(a2 b2

2a −ac

); B =

(b2 bcac c2

).

2. Find AB and BA and determine the commutator and anticomutator.

(a) A =(

3 9−1 7

); B =

(−2 71 5

).

(b) A =(

5 6−3 2

); B =

(−4 32 1

).


3. Using the following matrices

A =(

1 00 −1

); b =

(−1

√3√

3 1

); C = 1

2

(−1 −

√3

−√

3 1

);

D = 12

(−1

√3

−√

3 −1

); F = 1

2

(−1 −

√3√

3 1

);

Show that (i) A2 = B2 = C2 = I, (ii) AB = D, (iii) AC = BA = F.

4. If A =(

2 −53 1

), find scalars a, b such that I + aA+ bA2 = 0.

5. How many multiplications of scalars to compute the product AB, where A is an m×nmatrix and B is an n× p matrix.

6. If A =

2−10

, B = (3 0 1 − 5) and C =

2581

. Compute ABC, which of the two

possible ways of doing this is easier?

7. Show that Ak, for all k ≥ 2

(a) for the matrix A =

1 1 11 ω ω2

1 ω2 ω

;ω = e2πi3 , is

Ak = (−1)k2 3

k2 I; k = even

= (−1)k−12 3

k−12 A; k = odd.

(b) for the matrix A =[a b0 1

], is Ak =

[ak b(ak + ak−1 + · · ·+ a+ 1)0 1

].

(c) for the matrix A =

1 1 10 1 10 0 1

, is Ak =

1 k k(k+1)2

0 1 k0 0 1

. JECA’ 08

8. Show that every even power of the matrix A =

1 0 00 1 0a b −1

where a and b are arbitrary

scalars, equals to the unit matrix and every odd power equals to itself.

9. Find the upper triangular matrix A such that A3 =(

8 −570 27

). JECA‘00

10. Let x = (2, 1,−1, 0)T , y = (−1, 1, 1, 3), A =[

2 1 3−1 0 1

], u = (−1, 4, 2)T and v =

(1, 4, 3)T . Show that the matrix product AuxT yvT is defined. Evaluate the prod-uct.

11. If A =[a bc d

]prove that,

A2 − (a+ b)A+ (ad− bc)I2 = 0.

If ad− bc 6= 0, find A−1. JECA‘04

12. Find all 2× 2 real matrices which commute with


(a) the matric A =[

0 10 0

]. Ans:

[a b0 a

], a, b ∈ <.

(b) the real matrix A =[

2 31 4

]. JECA‘98

13. Find the matrices A and B, if 2A+BT =[

2 510 2

], AT + 2B =

[1 84 1

].

14. If AB = B and BA = A show that A and B are both idempotent.

15. If for two matrices A and B, AB = BA = In, then prove that A is nonsingular andA−1 = B. JECA‘06

16. Find the matrix A, if (i) A2 =[

17 88 17

]and (ii) A2 =

[2 10 2

].

17. Find all the real matrices A =[a bc d

], such that A2 = I2. BH‘02

18. If A =[

3 12 0

]find B = A3 − 3A2 + 2A,C = −2A2 + 3A+ I,BC and CB.

19. If A =

5 4 01 3 82 6 12

, find column vectors u, v such that uTAv = 8. Are u and v unique?

20. Prove that, if A and B are two matrices such that AB = A and BA = B, then AT , BT

are idempotent.

21. Show that there are no 2× 2 matrices A and B such that AB −BA = I2 holds.

22. Consider 2× 2 matrix A =(a bc d

). If a+ d = 1 = ad− bc, then find A3. Gate′98

23. Prove that

∣∣∣∣∣∣1 3 32 0 93 6 1

∣∣∣∣∣∣ is divisible by 19. JECA‘04

24. Let A =[

5 91 2

]. Find |A2004 − 3A2003|.

25. If φ(x) =

∣∣∣∣∣∣∣∣1 1 1 11 x 1 11 1 x 11 1 1 x

∣∣∣∣∣∣∣∣ , prove that φ′(x) = 3(x− 1)2.

26. If α, β, γ, δ be the roots of the equation x4 − x3 + 2x2 + x+ 1 = 0, find the value of∣∣∣∣∣∣∣∣α2 + 1 0 0 0

0 β2 + 1 0 00 0 γ2 + 1 00 0 0 δ2 + 1

∣∣∣∣∣∣∣∣ .

27. If

∣∣∣∣∣∣a b aα+ bb c bα+ c

aα+ b bα+ c 0

∣∣∣∣∣∣ = 0, then prove that either a, b, c are in GP or α is a root of

the equation ax2 + 2bx+ c = 0. JECA‘02


28. If α, β, γ are the roots of x2(px+ q) = r(x+ 1), prove that∣∣∣∣∣∣1 + α 1 1

1 1 + β 11 1 1 + γ

∣∣∣∣∣∣ = 0.

29. If α, β, γ are the roots of ax2 + bx+ c = 0 then find the value of∣∣∣∣∣∣1 cos(β − α) cosα

cos(α− β) 1 cosβcosα cosβ 1

∣∣∣∣∣∣ .

30. Express 4 =

∣∣∣∣∣∣b2 + c2 ab acba c2 + a2 bcca cb a2 + b2

∣∣∣∣∣∣ as a square of a determinant of order 3. Hence

determine the value of 4. CH‘98, JECA‘05

31. Show that

∣∣∣∣∣∣bc− a2 ca− b2 ab− c2

−bc+ ca+ ab bc− ca+ ab bc+ ca− ab(a+ b)(a+ c) (b+ c)(b+ a) (c+ a)(c+ b)

∣∣∣∣∣∣ = (b− c)(c− a)(a− b)(a+ b+

c)(ab+ bc+ ca).

32. Prove that

(a)

∣∣∣∣∣∣(b+ c)2 a2 a2

b2 (c+ a)2 b2

c2 c2 (a+ b)2

∣∣∣∣∣∣ = 2abc(a+ b+ c)3. BH‘98

(b)


∣∣∣∣∣∣∣∣ = 0. BH‘99

(c)

∣∣∣∣∣∣∣∣1 + a 1 1 1

1 1 + b 1 11 1 1 + c 11 1 1 1 + d

∣∣∣∣∣∣∣∣ = abcd(1 + 1

a + 1b + 1

c + 1d

). CH‘00, BH‘00, ‘04, V H‘02

(d)

∣∣∣∣∣∣α3 α2 1β3 β2 1γ3 γ2 1

∣∣∣∣∣∣ = −(α− β)(β − γ)(γ − α)(αβ + βγ + γα). BH‘01, ‘03, V H‘05

33. m be a positive integer and 4r =

∣∣∣∣∣∣2r − 1 mCr 1m2 − 1 2m m+ 1

sin2(m2) sin2(m) sin(m2)

∣∣∣∣∣∣, then findm∑

r=14r.

34. Using Laplace’s theorem, show that,

(a)

∣∣∣∣∣∣∣∣0 a b c−a 0 d e−b −d 0 f−c −e −f 0

∣∣∣∣∣∣∣∣ = (af − be+ cd)2. BH‘02, ‘05, V H‘03

(b)

∣∣∣∣∣∣∣∣a −b −a bb a −b −ac −d c −dd c d c

∣∣∣∣∣∣∣∣ = 4(a2 + b2)(c2 + d2). [WBUT 2005]


(c) |A| =

∣∣∣∣∣∣∣∣a b c d−b a d −c−c −d a b−d c −b a

∣∣∣∣∣∣∣∣ = (a2 + b2 + c2 + d2)2.

Hence show that the matrix A in which a, b, c, d are real numbers, is non-singular, ifand only if at least one of a, b, c, d is non-zero. CH‘98, ‘02

35. Solve the system of equations by Cramer’s rule.

(a) x+ 2y − 3z = 1, 2x− y + z = 4 and x+ 3y = 5.

36. Express the matrix A as the sum of a symmetric and a skew-symmetric matrix.

(i)A =

2 3 17 5 64 6 7

. (ii)A =

4 5 13 7 21 6 8

JECA‘98; (iii)A =

1 3 −47 0 62 8 1

.

37. Show that every matrix can be expressed uniquely as the sum of a real and a purelyimaginary matrix.

38. (a) Show that the sum of two hermitian matrices is a hermitian matrix.

(b) Show that the product of two hermitian matrices is hermitian if and only if thetwo matrices commute with each other.

(c) Prove that in a Hermitian matrix, the diagonal elements are all real.

(d) Let S and A be the matrices obtained by taking the real and imaginary parts,respectively, of each element of a hermitian matrix H, i.e., H = S + iA, where Sand A are real. Show that S is a symmetric matrix while A is an antisymmetricmatrix.

(e) If H1 is hermitian and H2 is antihermitian, show that both H1+iH2 and H1−iH2

are hermitian.

(f) Show that any hermitian matrix of order two can be expressed uniquely as alinear combination of the four vectors:[

1 00 1

];[

1 00 −1

];[

0 11 0

]and

[0 −ii 0

].

The last three matrices are known as the Pauli spin matrices for a spin 12 particle

in quantum mechanics.

39. (a) If A be a square matrix, then show that A+AT is symmetric and A−AT is skewsymmetric.

(b) If A and B are Hermitian, show that A+B,AB+BA are Hermitian and AB−BAis skew Hermitian.

(c) Let A be an n× n matrix which is both Hermitian and unitary. Then show thatA2 = I. Gate′01

(d) If a matrix A is triangular as well as hermitian, show that A is diagonal.

(e) Let P be a hermitian matrix with the property P 2 = P . Show that for any vectorX, the vectors PX and (I − P )X are orthogonal to each other.

40. (a) Show that the most general unitary matrix is(

cos θeiα sin θeiγ

− sin θei(β−γ) cos θei(β−α)

).


(b) Show that a unitary triangular matrix must be diagonal.

(c) Prove that the determinant of a Hermitian matrix is real.

(d) Show that a unitary matrix commutes with its own Hermitian conjugate.

(e) If H is a Hermitian matrix, show that U = (H−iI)(H+iI)−1 is a unitary matrix.

(f) If A is skew-Hermitian, B is symmetric, AB = BA and B + A is non-singular,show that (B −A)(B +A)T is unitary.

41. (a) Show that the most general orthogonal matrix of order 2 is(

cos θ sin θ∓ sin θ ± cos θ

).

(b) Show that the most general orthogonal matrix of order 3 is cosα cosβ cos γ − sinα sin γ − cosα cosβ sin γ − sinα cos γ cosα sinβsinα cosβ cos γ + cosα sin γ − sinα cosβ sin γ + cosα cos γ sinα sinβ

− sinβ cos γ sinβ sin γ cosβ

.

(c) Find k such that A =

cos θ − sin θ 0sin θ cos θ 0

0 0 k

is an orthogonal matrix.

(d) Find the condition on the real scalars a and b for which the following matrix is

orthogonal:(a+ b b− aa− b a+ b

).

(e) If A,B are two orthogonal matrices and detA detB < 0, prove that A + B issingular. JECA‘08

(f) If A is a skew symmetric matrix, then show that P = (I−A)(I+A) is orthogonal.JECA‘07

(g) If A and B are commutating orthogonal matrices such that I +A and I +B arenon-singular, show that (I −AB)(I +A+B +AB)−1 is skew symmetric.

42. (a) Find the condition for which the matrix

q p pp q pp p q

, where p and q are numbers,

is nonsingular. Show that the inverse, when exists, is a matrix of the same form.

(b) If A =[a bc d

], prove by using elementary row operations that A is invertible if

and only if ad− bc 6= 0. If this condition holds, find A−1.

(c) Find the inverse of the matrix A =[

coshx sinhxsinhx coshx

], and hence show that

An =[

coshnx sinhnxsinhnx coshnx

], n ∈ Z.

43. Find the inverse of the matrix

(a)

1 −1 02 3 5−1 4 0

Ans: 15

4 0 11 0 1115 1 −1

BH‘01, ‘04

(b)

−3 −3 2−4 −3 22 2 −1

Ans:

1 −1 00 1 22 0 3

VH‘02

44. Find the inverse of the matrix by row and column operations.


(a)

1 1 30 1 27 1 0

; C3 − (C1 + 2C2);− 19C3;R1 −R2;R3 −R2;R3 − 3R1. BH‘98

(b)

3 1 −10 −1 21 0 1

; R3 − (3R3 −R2);C3 + (2C2 − C1);− 12R1;−1R2. BH‘98, ‘00

45. If A =

1 2 11 −4 13 0 −3

, find the matrix B such that AB = 6I3 and hence solve the

system of equations 2x+ y + z = 5, x− y = 0, 2x+ y − z = 1 CH‘01

46. Find the inverse of the matrix

3 1 00 2 31 0 2

and hence solve the system of equations

3x+ y = 4, 2x+ 3z = 2, x+ 2z = 6. BH‘02.

47. Show that the matrix

13

23

23

− 23 −

13

23

23 − 2

313

is orthogonal and hence solve the system of equa-

tions x+ 2y + 2z = 2,−2x− y + 2z = 1, 2x− 2y + z = 7. BH‘03.

48. Solve the following system of equations

(a) x+ 2y + 3z = 14, 2x− y + 5z = 15,−3x+ 2y + 4z = 13. CH‘02

(b) 2x− y + 3z = 1, x+ 2y + 2z = 0,−3x+ 4y − 4z = −2 BH‘98

(c) 2x+ 4y + 3z + w = 1, x+ 2y + z − 3w = 1, 3x+ 6y + 4z − 2w = −4. BH‘99

(d) 2x− y + 3z = 0,−x+ 2y − 4z = −1, 4x+ 3y − 2z = −3. BH‘00

(e) y − z = 0, 3x− 2y + 4z = −1, 9x+ y + 8z = 0. BH‘01

by the matrix inverse method.

49. Solve, if possible, the following system of equations

(a) 2x+ y − 3z = 8, x− y − 2z = −2, x+ 2y − z = 10. BH‘03

(b) 3x+ y = 4, 2x+ 3z = 2, x+ 2z = 6. BH‘04

(c) x+ y + z = 6, x+ 2y + 3z = 14, x− y + z = 2. BH‘04.

(d) x− y + 2z = 6, x+ y + z = 8, 3x+ 5y − 7z = 14. BH‘05.

by matrix method.

50. (a) Determine the values of a, b so that the system of equations x + 4y + 2z =1, 2x + 7y + 5z = 2b, 4x + ay + 10z = 2b + 1 has (i) unique solution, (ii) nosolution (iii) many solutions in the field of rational numbers. CH‘95

(b) Determine the values of k so that the system of equations x + y − z = 1, 2x +3y + kz = 3, x+ ky + 3z = 2 has (i) unique solution, (ii) no solution (iii) manysolutions in the field of real numbers. CH‘05

(c) Determine the values of a, b, c so that the system of equations x + 2y + z =1, 3x+ y + 2z = b, ax− y + 4z = b2 has (i) unique solution, (ii) no solution (iii)many solutions in the field of real numbers. CH‘97, V H‘03


(d) Determine the values of a, b, c so that the system of equations x+ y + z = 1, x+2y − z = b, 5x+ 7y + az = b2 has (i) unique solution, (ii) no solution (iii) manysolutions in the field of real numbers. CH‘99

51. If α, β, γ are in AP and are roots of x3 + qx+ r = 0, then find the rank ofα β γβ γ αγ α β

. JECA‘99

52. Determine the rank of the following matrices

(i)

1 2 −1 0 32 4 4 1 −10 0 5 −2 43 6 8 −1 6

(ii)

1 2 −1 02 4 4 10 0 5 −2−1 −2 0 −3

(iii)

1 2 −1 02 4 4 10 0 5 −2−1 −2 0 −3

(iv)

2 4 −2 01 2 2 −30 0 5 −23 6 8 1

(v)

1 −1 2 02 2 1 51 3 −1 01 7 −4 1

(vi)

1 3 4 33 9 12 31 3 4 1

.

53. Obtain a row reduced echelon matrix which is row equivalent to

(i)

1 2 −1 02 4 4 −60 0 5 −23 6 8 −1

(ii)

2 1 3 −21 −1 5 21 1 1 1

(iii)

0 0 1 2 11 3 1 0 32 6 4 2 83 9 4 2 10

(iv)

2 3 1 40 1 2 −10 −2 −4 2

(v)

2 −1 33 2 11 −4 5

and hence find its rank.

54. If a, b, c be real and unequal, find the rank of the matrix

a b cb c ac a b

, when (i)a+b+c = 0

and (ii)a+ b+ c 6= 0.

55. Determine the rank of

a −1 1−1 a −1−1 −1 a1 1 1

, when a 6= −1 and a = −1. V H‘03

56. Express the matrix A =

2 0 13 3 06 2 3

as a product of elementary matrices and hence find

A−1. BH‘02, CH‘05

57. Let θ be a real number. Prove that the matrices(cos θ − sin θsin θ cos θ

)and

(eiθ 00 e−iθ

)are similar over the field of complex numbers. BU(M.Sc.)‘02

58. Obtain the normal form under congruence and the rank of the symmetric matrix

A =

0 2 32 4 53 5 6

.

Chapter 4

Vector Space

In many application in mathematics, the sciences, and engineering the notion of vector spaceaxis. Here we define the notion and structure of the vector space. In geometry, a vectorhas either 1, 2 or 3 components and it has a direction. In three-dimension, a vector canbe represented uniquely by three components. Here three-dimensional vector is extendedinto an n-dimensional vector and it is studied in algebraic point of view. An n-dimensionalvector has n components.

4.1 Vector Space

Let (F,+, .) be a field. Let (V,⊕) be system, where V is a non-empty set and let be anexternal composition of F with V . V is said to be a vector space or linear space over thefield F , if the following axioms are satisfied:

(i) 〈V,⊕〉 is an abelian group.

(ii) a ∈ F, α ∈ V ⇒ a α ∈ V

(iii) (a+ b) α = a α⊕ b α; a, b ∈ F and α ∈ V

(iv) a (α⊕ β) = a α⊕ a β;∀a ∈ F,∀α, β ∈ V.

(v) (a.b) α = a (b α); a, b ∈ F and α ∈ V

(vi) 1 α = α, where 1 is the identity element in F .

The vector space V over the field F is very often denoted by V (F ). The elements of V arecalled vectors and are denoted by α, β, γ, . . . etc. and the elements of F are called scalars andare denoted by a, b, c, . . . etc. The operation ⊕ is called vector addition, and the operation is called scalar multiplication.

1. The field of scalars F is called ground field of the vector space V (F ).

2. The external composition of F with V is called ‘multiplication by scalars’.

3. If F is the field R, of real numbers, then V (R) is called real vector space. Similarly,if F is the field of rational numbers (Q) or the field of complex numbers (C) then V iscalled respectively rational or complex vector space.

4. A vector space V = θ consisting of zero vector alone is called a trivial vector space.

235

236 Vector Space

Elementary properties

Here we shall discuss so elementary properties of a vector space. In a vector space V (F ),we have

(i) cθ = θ; for any c ∈ F and θ ∈ V .

(ii) 0α = θ;∀α ∈ V, 0 ∈ F

(iii) (−a)α = −(aα) = a(−α);∀a ∈ F and α ∈ V.

(iv) a(α− β) = aα− aβ;∀a ∈ F and α, β ∈ V.

(v) aα = θ ⇒either a = 0 or α = θ; a ∈ F and α ∈ V.

(vi) For a, b ∈ F and any non null vector α in V,aα = bα⇒ a = b.

(vii) For α, β ∈ V and any non zero scalar a in F,aα = aβ ⇒ α = β

Proof: The property holds directly from the definition of a vector space.

(i) Since θ is the null element in V, we have, θ + θ = θ in V. Thus,

θ + cθ = cθ; as θ is the additive identity in (V,+)= c(θ + θ) = cθ + cθ

⇒ cθ = θ; by cancellation law in group (V,+).

(ii) 0 is the zero element in F so 0 + 0 = 0 in F. Now

θ + 0α = 0α = (0 + 0)α = 0α+ 0α

Using cancellation law in group (V,+), we have, 0α = θ.

(iii) Since (−a) is the additive identity of a in F , we have,

θ = 0α = [a+ (−a)]α = aα+ (−a)α

Thus (−a)α is the additive inverse of aα i.e., (−a)α = −(aα) and similarly, a(−α) =−(aα). Thus, (−a)α = −(aα) = a(−α).

(iv) Using the definition of subtraction, α − β = α + (−β). Thus using the property (iii)we get, a(α− β) = a[α+ (−β)] = aα+ a(−β)

= aα+ [−(aβ)] = aα− aβ.

Hence the property. Also, α+ α = 1α+ 1α = (1 + 1)α = 2α.

(v) Let aα = θ and let a 6= 0,then a−1 exist in F. Now,

aα = θ and a 6= 0 ⇒ a−1(aα) = a−1θ

⇒ (a−1a)α = θ; as (ab)α = a(bα); and a−1θ = θ

⇒ 1α = θ ⇒ α = θ. as 1α = α, by definition.

Thus, aα = θ and a 6= 0 ⇒ α = θ. Again, let aα = θ and α 6= θ. Let, if possible, a 6= 0.Then a−1 exists and so aα = θ ⇒ 1α = θ,

which is a contradiction. So whenever aα = θ and α 6= θ then a = 0. Hence,

aα = θ ⇒ either a = 0 or α = θ.

Vector Space 237

(vi) Let a, b be any two scalars and α be a non null vector in V such that aα = bα holds.Then, aα = aβ and a 6= 0 ⇒ aα− aα = θ and a 6= 0

⇒ (a− b)α = θ and α 6= θ

⇒ a− b = 0 ⇒ a = b.

(vii) Let α, β be any two vectors in V and a non zero scalar in F such that aα = bβ holds.Then, aα = aβ and a 6= 0 ⇒ aα− aβ = θ and a 6= 0

⇒ a(α− β) = θ and a 6= 0⇒ α− β = θ ⇒ α = β.

Ex 4.1.1 (Vector space of Matrices) Let V be the set of all m × n matrices belong to thefield F. Show that V is a vector space over F under the usual addition of matrices andmultiplication of a matrix by a scalar as the two composition.

Solution: Let A = [aij ]m×n;B = [bij ]m×n;C = [cij ]m×n be any three matrices in V , whereaij , bij , cij ∈ F. The + composition on V , defined by

(aij) + (bij) = (aij + bij)

and the external composition (known as multiplication of matrices by real numbers) bedefined by c(aij) = (caij).(i) A+B = [aij ] + [bij ] = [aij + bij ]m×n.Since aij + bij ∈ F (as F is a field), so, A + B ∈ V ;∀A,B ∈ V . So the closure axiom issatisfied.(ii) We know, matrix addition is always associative, so,

A+ (B + C) = [aij + bij + cij ] = (A+B) + C; ∀A,B,C ∈ V.

(iii) Let θ = [0]m×n; as 0 is the additive identity in F so, 0 ∈ F and so θ ∈ V and

A+ θ = [aij + 0] = [0 + aij ] = θ +A;∀A ∈ V.

Hence θ is the additive identity in V .(iv) As (−aij) is the additive inverse of aij so, (aij) ∈ F and so,

A+ (−A) = (−A) +A = θ; ∀A ∈ V.

Hence (−A) ∈ V is the additive inverse in V .(v) We know, matrix addition is always commutative, so,

A+B = [aij + bij ] = [bij + aij ]; ‘ + ‘ is abelian in F= B +A; ∀A,B ∈ V.

Hence addition (+) composition is commutative in V .(vi) If A = [aij ]m×n and c ∈ F is an arbitrary element, then cA is also a m× n matrix and

cA = c[aij ]m×n = [caij ]m×n.

As caij ∈ F , so, cA ∈ V . Therefore closure axiom with respect to multiplication is satisfied.(vii) Now,

c[A+B] = c[aij + bij ] = [caij + cbij ]= [caij ] + [cbij ] = cA+ cB; ∀A,B ∈ V.

238 Vector Space

(viii)(c+ d)A = [(c+ d)aij ] = [caij + daij ];F is field

= [caij ] + [daij ] = cA+ dA;∀A ∈ V.

(ix)(cd)A = [(cd)aij ] = [cdaij ]

= [c(daij)] = c[daij ] = c(dA).

(x)1A = 1[aij ] = [1aij ] = [aij ] = A; as 1 ∈ F.

Since all the axioms for vector space hold, so, V (F ) is a vector space. This space is calledVector space of Matrices and is denoted by Mmn(F ).

Ex 4.1.2 (Vector space of polynomials) Let P [x] be the set of all polynomials over a real field<. Show that P [x] is a vector space with ordinary addition of polynomials and the multipli-cation of of each coefficient of the polynomial by a member of < as the scalar multiplicationcomposition.

Solution: Let P [x] be a set of all real polynomials of degree < n. A real polynomial inx of degree k is a function that is expressible as f = c0 + c1x + c2x

2 + . . . + ckxk, where

c0, c1, c2, . . . , ck ∈ <, with ck 6= 0. The addition composition (+) on P [x] is defined as

f + g = (c0 + c1x+ c2x2 + . . .+ ckx

k) + (d0 + d1x+ d2x2 + . . .+ dlx

l)= (c0 + d0) + (c1 + d1)x+ . . .+ (ck + dk)xk + dk+1x

k+1 + . . .+ dlxl; k < l

= (c0 + d0) + (c1 + d1)x+ . . .+ (cl + dl)xl + cl+1xl+1 + . . .+ ckx

k; k > l

= (c0 + d0) + (c1 + d1)x+ . . .+ (ck + dk)xk; k = l

(i.e., add coefficients of like-power terms) and an external composition of < with P [x], called‘multiplication of polynomials by real numbers’ be defined by,

rf(x) = (rc0) + (rc1)x+ (rc2)x2 + . . .+ (rck)xk; r(6= 0) ∈ <.

We are to show that, P [x](<) is a vector space with respect to the above defined compo-sitions. It is easy to verify that (P [x],+) is an abelian group. Now, if f, g ∈ P [x], then∀λ, β ∈ <, we have,

(i)λ[f + g] = λ[c0 + c1x+ c2x2 + . . .+ ckx

k + d0 + d1x+ d2x2 + . . .+ dlx

l]= (λc0 + λd0) + (λc1 + λd1)x+ . . .

= (λc0 + λc1x+ λc2x2 + . . .) + (λd0 + λd1x+ λd2x

2 + . . .)= λ(c0 + c1x+ c2x

2 + . . .) + λ(d0 + d1x+ d2x2 + . . .)

= λf + λg.

(ii)(λ+ β)f = (λ+ β)[c0 + c1x+ c2x2 + . . .+ ckx

k]= (λ+ β)c0 + (λ+ β)c1x+ (λ+ β)c2x2 + . . .+ (λ+ β)ckxk

Ex 4.1.3 (Continuous function space:) Prove that, the set C[a, b] of all real valued contin-uous function defined on the interval [a, b] forms a real vector space with respect to addition,defined by,

(f + g)(x) = f(x) + g(x); f, g ∈ C[a, b]

and multiplication by a real number λ by

(λf)(x) = λf(x); f ∈ C[a, b].

Vector Space 239

Solution: Let f, g, h be any three elements of C[a, b]. The addition composition andmultiplication by a scalar is defined by,

(f + g)(x) = f(x) + g(x); f, g ∈ C[a, b](λf)(x) = λf(x); f ∈ C[a, b].

(i) We know, sum of two continuous function is also a continuous function, so,

f + g ∈ C[a, b]; ∀f, g ∈ C[a, b].

Hence closure property holds.(ii) Now,

[f + (g + h)](x) = f(x) + (g + h)(x); by definition= f(x) + g(x) + h(x)= (f + g)(x) + h(x) = [(f + g) + h](x).

Thus, f + (g + h) = (f + g) + h; ∀f, g, h ∈ C[a, b].

Therefore, the addition composition is associative.(iii) Let θ(x) = 0, x ∈ [a, b], then θ(x) is also a continuous function on [a, b], i.e., θ(x) ∈C[a, b] and,

(f + θ)(x) = f(x) + θ(x)= f(x) + 0 = f(x)= θ(x) + f(x) = (θ + f)(x),

⇒ f + θ = f = θ + f ; ∀f ∈ C[a, b].

Hence θ is the additive identity in C[a, b]. The zero vector θ in C[a, b] maps every x ∈ [a, b]into zero element 0 ∈ F .(iv) We know, if f(x) is a continuous function in [a, b], then −f(x) is also a continuous in[a, b] and

[f + (−f)](x) = f(x) + (−f(x)) = θ(x)= (−f(x)) + f(x) = (−f + f)(x)

⇒ f + (−f) = θ = (−f) + f ; ∀f ∈ C[a, b].

Therefore, (−f) is the additive inverse in [a, b].(v) If f(x)+g(x) is continuous function of x, then g(x)+f(x) is also continuous for x ∈ [a, b].So,

f + g = g + f ; ∀f, g ∈ C[a, b].

(vi) The multiplication of a continuous function, with a real number λ is given by,

(λf)(x) = λf(x); λ ∈ <, f ∈ C[a, b]⇒ λf ∈ C[a, b].

(vii) Now,

λ(f + g)(x) = λf(x) + λg(x) = λg(x) + λf(x)= (λf + λg)(x)

⇒ λ(f + g) = λf + λg; ∀f, g ∈ C[a, b].

(viii)

240 Vector Space

[(λ+ µ)f ](x) = [(λ+ µ)f(x)] = [λf(x) + µf(x)]= [λf + µf ](x)

⇒ (λ+ µ)f = λf + µf ;∀f ∈ C[a, b];λ, µ ∈ <.

(ix)[(λµ)f ](x) = [(λµ)f(x)] = λµf(x)

= λ[µf(x)] = λ[µf ](x)⇒ (λµ)f = λ(µf); ∀λ, µ ∈ < and f ∈ C[a, b].

(x)(1f)(x) = 1f(x) = f(x)⇒ 1f = f ; ∀f ∈ C[a, b] and1 ∈ <.

Since all the axioms for vector space hold, so, V (F ) is a vector space. This space is calledVector space of continuous functions.

Ex 4.1.4 Consider the vector space F 3, where F is the Galois field of order 3, i.e., F =0, 1, 2 and addition and multiplication in F are modulo 3. If this vector space, find (i)(1, 1, 2) + (0, 2, 2), (ii) the negative of (0, 1, 2) and (iii) 2(1, 1, 2).

Solution: According to the definition of addition,

(1, 1, 2) + (0, 2, 2) = (1 + 0, 1 + 2, 2 + 2, ) = (1, 3, 4)= (1, 0, 1), as 3 ≡ 0(mod3) and 4 ≡ 1(mod3).

Let the negative of (0, 1, 2) be (x1, x2, x3), then by definition,

(0, 1, 2) + (x1, x2, x3) = (0, 0, 0)⇒ (x1, 1 + x2, 2 + x1) = (0, 0, 0)⇒ x1 = 0, 1 + x2 = 0, 2 + x3 = 0⇒ x1 = 0, 1 + x2 = 3, 2 + x3 = 3 as 3 ≡ 0(mod3)⇒ x1 = 0, x2 = 2, x3 = 1.

Thus the negative of (0, 1, 2) is (0, 2, 1). Also, by definition,

2(1, 1, 2) = (2, 2, 4) = (2, 2, 1) as 4 ≡ 1(mod3).

4.1.1 Vector Subspaces

In the study of the algebraic structure, it is of interest to examine subsets that possessesthe same structure as the set under consideration.

Definition 4.1.1 Let V (F ) be a vector space. A non empty subset W of V is called a subvector space or vector sub space of V , if W is a vector space in its own right with respect tothe addition and ‘multiplication by scalar’ compositions on V , restricted only on points ofW .

Note that, every vector space has at least two subspaces :

(i) In an arbitrary vector space V (F ), V itself is a subspace of V . This subspace is calledimproper subspace of V .

(ii) In an arbitrary vector space V (F ), the set θ consisting only the null vector forms asubspace. This subspace is called the trivial or zero subspace of V .

Vector Space 241

Criterion for identifying subspaces

Let V (F ) be a vector space. A necessary and sufficient conditions for a non empty subsetW of V to be a subspace of V are that,

(i) α ∈W,β ∈W ⇒ α+ β ∈W, i.e., W is closed under vector addition, and

(ii) ∀a ∈ F, and α ∈W ⇒ aα ∈W, i.e., W is closed under scalar multiplication.

Proof: Condition necessary : Let us first suppose that W is a subspace of V (F ). Then,by definition, W is a vector space in its own right. Consequently, W must be closed underaddition, and the scalar multiplication on W over F must be well defined. Hence,

α ∈W,β ∈W ⇒ α+ β ∈W,

by closure property in the group (W,+) and,

a ∈ F, and α ∈W ⇒ aα ∈W,

by definition of vector space W (F ). Thus, the condition is necessary.Condition sufficient : Let the given conditions be satisfied in W . Now, if α be an arbitraryelement of W and 1 is the unity of F , then −1 ∈ F and therefore, according to the givenconditions, we have

−1 ∈ F, α ∈W ⇒ (−1)α ∈W ⇒ −α ∈W.

Thus every element in W has its additive inverse in W . Consequently,

α ∈W,β ∈W ⇒ α ∈W,−β ∈W⇒ [α+ (−β)] ∈W ⇒ α− β ∈W.

This shows that, 〈W,+〉 is a subgroup of the additive group 〈V,+〉. Moreover, all theelements of W being the elements of V , and the addition of vectors being commutative inV , so it is in W . Therefore, 〈W,+〉 is an abelian subgroup of the additive group 〈V,+〉. Also,it is being given that, the scalar multiplication composition is well defined in W . Further allelements in W being elements of V , all the remaining four conditions of the vector space aresatisfied by elements of W as they are hereditary properties. Thus, W is by itself a vectorspace over F . Hence, W is a subspace of V (F ).

Result 4.1.1 The necessary and sufficient conditions for a non-empty subset W of a vectorspace V (F ) to be a subspace are that,

α ∈W,β ∈W ⇒ aα+ bβ ∈W ; a, b ∈ F. (4.1)

Deduction 4.1.1 Thus a subset W of a vector space V is a subspace of V if and only ifthe following four properties hold:

(i) α+ β ∈W ; α, β ∈W.

(ii) cα ∈W ; c ∈ F and α ∈W.

(iii) W has a zero vector.

(iv) Each vector in W has an additive inverse in W .

Ex 4.1.5 Let S = (a, b, c) : a, b, c ∈ < and a− 2b− 3c = 0. Show that S is a subspace ofthe real vector space <3.

242 Vector Space

Solution: Obviously, S is a nonempty subset of the real vector space <3(<). Let α =(a1, b1, c1) and β = (a2, b2, c2) ∈ S, then, ai, bi, ci ∈ < and

a1 − 2b1 − 3c1 = 0; a2 − 2b2 − 3c2 = 0.

For any two scalars, x, y ∈ <, we have,

xα+ yβ = x(a1, b1, c1) + y(a2, b2, c2)= (xa1 + ya2, xb1 + yb2, xc1 + yc2).

Since ai, bi, ci ∈ < and x, y ∈ <, we have, xa1 + ya2, xb1 + yb2, xc1 + yc2 ∈ < and,

(xa1 + ya2)− 2(xb1 + yb2)− 3(xc1 + yc2)= x(a1 − 2b1 − 3c1) + y(a2 − 2b2 − 3c2)= x0 + y0 = 0.

Therefore, xα+ yβ ∈ S, shows that S is a subspace of <3(<).

Ex 4.1.6 In a real vector space <3, every plane through the origin is a subspace of <3.

Solution: Let a plane through the origin be ax + by + cz = 0, where a, b, c are given realconstants in < with (a, b, c) 6= (0, 0, 0). Consider the set of planes passing through origin as

S = (x, y, z) ∈ <3 : ax+ by + cz = 0.

Obviously, S is a nonempty subset of the real vector space <3(<). Let α = (x1, y1, z1) andβ = (x2, y2, z2) ∈ S, then, xi, yi, zi ∈ < and

ax1 + by1 + cz1 = 0; ax2 + by2 + cz2 = 0.

For any two scalars, p, q ∈ <, we have,

pα+ qβ = p(x1, y1, z1) + q(x2, y2, z2) = (px1 + qx2, py1 + qy2, pz1 + qz2).

Since xi, yi, zi ∈ < and p, q ∈ <, we have, px1 + qx2, py1 + qy2, pz1 + qz2 ∈ < and,

a(px1 + qx2) + b(py1 + qy2) + c(pz1 + qz2)= p(ax1 + by1 + cz1) + q(ax2 + by2 + cz2) = p0 + q0 = 0.

Therefore, pα+ qβ ∈ S, shows that S is a subspace of <3(<). In <3, any plane not throughpasses through the origin is not a subspace. Hence any solution of a system of homogeneousequation in n unknowns forms a subspace of <n called the solution space and so the nonhomogeneous system of linear equations in n unknowns is not a subspace of <n.

Ex 4.1.7 Show that W = (x, y, z) ∈ <3 : x2 + y2 + z2 = 5 is not a subspace of <3.

Solution: Let α = (x1, y1, z1) and β = (x2, y2, z2) be any two vectors of W . Therefore,x2

1 + y21 + z2

1 = 5 and x22 + y2

2 + z22 = 5. Now,

α+ β = (x1 + x2, y1 + y2, z1 + z2) 6∈Wbecause (x1 + x2)2 + (y1 + y2)2 + (z1 + z2)2 6= 5. Hence W is not a subspace of <3.

Ex 4.1.8 If a vector space V is the set of real valued continuous functions over <, thenshow that the set W of solutions of 2 d2y

dx2 − 9 dydx + 2y = 0 is a subspace of V .

Vector Space 243

Solution: Here W = y : 2 d2ydx2 − 9 dy

dx + 2y = 0, y = f(x). y = 0 is the trivial solution, so0 ∈W . Thus, W is a nonempty subset of the real vector space V . Let y1, y2 ∈W , then,

2d2y1dx2

− 9dy1dx

+ 2y1 = 0; and 2d2y2dx2

− 9dy2dx

+ 2y2 = 0.

Let a, b be two scalars in <, then,

2d2

dx2(ay1 + by2)− 9

d

dx(ay1 + by2) + 2(ay1 + by2)

= a[2d2y1dx2

− 9dy1dx

+ 2y1] + b[2d2y2dx2

− 9dy2dx

+ 2y2]

= a0 + b0 = 0.

Since ay1 + by2 satisfies the differential equation, so,

y1, y2 ∈W ⇒ ay1 + by2 ∈W ; a, b ∈ <.

Therefore, W is a subspace.

Ex 4.1.9 Let V be the vector space of all functions from the real field < into <, and let,

Ve = f ∈ V ; f(−x) = f(x),∀x ∈ <

be the set of even functions. Show that Ve is a subspace of V .

Solution: First we are to show that, Ve is non-empty. Here, φ ∈ Ve, as,

φ(−x) = 0 = φ(x).

Now, if f, g ∈ Ve, then for any scalars a, b ∈ < and ∀x ∈ <, we have,

(af + bg)(−x) = (af)(−x) + (bg)(−x)= af(−x) + bg(−x)= af(x) + bg(x); as both f, g are even= (af)(x) + (bg)(x) = (af + bg)(x).

This shows that, whenever f and g are even functions, af + bg is also even. Thus,

f, g ∈ Ve ⇒ af + bg ∈ Ve;∀a, b ∈ <.

Hence Ve is a subspace of V . Similarly, the set of all odd functions, given by,

Vo = f ∈ V ; f(−x) = −f(x),∀x ∈ <

is a subspace of V .

Ex 4.1.10 Let V be the vector space of square matrices of order n over a field <. Provethat, the set of all symmetric matrices in V are subspaces of V .

Solution: Let W = A ∈ V : AT = A, where A = [aij ]n×n and aij ∈ <. Now, ifA,B ∈W , then for any scalars a, b ∈ <, we have,

[aA+ bB]T = [a[aij ] + b[bij ]]T

= [[aaij ] + [bbij ]]T ; where aaij ∈ <, bbij ∈ <= [aaij ] + [bbij ]T = [aaji] + [bbji]= [aaij ] + [bbij ]; as aij = aji and bij = bji

= a[aij ] + b[bij ] = aA+ bB.

This shows that,A,B ∈ W ⇒ aA + bB ∈ W ;∀a, b ∈ <. Hence, W is a subspace of V .Similarly, the set of all skew symmetric matrices in V are subspaces of V .

244 Vector Space

Ex 4.1.11 Let W denote the collection of all elements of the form[a b−b a

]from the space

M2(F ). Prove that, W (F ) is a subspaces of M2(F ).

Solution: Here, W =[

a b−b a

]; a, b ∈ F

. Let Ai =

[ai bi−bi ai

]; i = 1, 2 ∈W, then,

A1 +A2 =[

a1 + a2 b1 + b2−(b1 + b2) a1 + a2

]∈W

cA1 = c

[a1 b1−b1 a1

]=[ca1 cb1−cb1 ca1

]∈W.

Therefore, W (F ) is a subspaces of M2(F ).

Two important subspaces

Theorem 4.1.1 Let V be a vector space over the field F and let α ∈ V . Then W = cα; c ∈F forms a subspace of V .

Proof: Clearly, W is non empty. Here we consider two cases for α = θ and α 6= θ.Case 1: Let α = θ, then W = cθ = θ, so that, W is a subspace of V .Case 2: Let α 6= θ, then W is a non empty subset of V , as α ∈ W . Let α1, α2 ∈ W , thenfor some scalers c1, c2 ∈ F , we have,

α1 = c1α; α2 = c2α

⇒ α1 + α2 = (c1 + c2)α ∈W ; as c1 + c2 ∈ F.Thus, α1 ∈ W,α2 ∈W ⇒ α1 + α2 ∈W.

Let k ∈ F be another scalar. Then,

kα1 = k(c1α) = (kc1)α ∈W, as kc1 ∈ F.So, k ∈ F, α1 ∈W ⇒ kα1 ∈W.

Hence W is a subspace of V . This subspace is said to be generated by the vector α and αis said to be the generator of the subspace W .

Theorem 4.1.2 Let V (F ) be a vector space and α, β ∈ V . Then the set of all linearcombinations, i.e., W = cα+ dβ; c ∈ F, d ∈ F forms a subspace of V .

Proof: As θ = (0α+0β) ∈W , so W is a non empty subset of V . Let α1 = c1α+ d1β ∈Wand α2 = c2α+d2β ∈W , where the scalars c1, c2, d1d2 ∈ F . Since c1, c2, d1d2 ∈ F , we have,c1 + c2 ∈ F and d1 + d2 ∈ F by closure axiom in F . Now,

α1 + α2 = c1α+ d1β + c2α+ d2β

= (c1 + c2)α+ (d1 + d2)β ∈W.Therefore, α1 ∈W,α2 ∈W ⇒ α1 + α2 ∈W.

Let k ∈ F be another scalar. Then,

kα1 = k(c1α+ d1β) = (kc1)α+ (kd1)β∈ W ; as kc1 ∈ F, kd1 ∈ F.

So, k ∈ F, α1 ∈W ⇒ kα1 ∈W.

Thus, W is a subspace of V . The set α, β is said to be a generating set of the subspaceW . In general, if α1, α2, . . . , αr ∈ V , then

W = c1α1 + c2α2 + . . .+ crαr; ci ∈ Fforms a subspace of V and α1, α2, . . . , αr is a generating set of the subspace W .

Vector Space 245

Algebra of vector subspaces

Theorem 4.1.3 The intersection of any two subspaces of a vector space is also a subspaceof the same.

Proof: Let W1,W2 be two subspaces of a vector space V (F ). Clearly, θ ∈W1 and θ ∈W2

and so, θ ∈W1 ∩W2 ⇒W1 ∩W2 6= φ.

Thus, W1 ∩W2 is non empty. If W1 ∩W2 = θ, then ∀a, b ∈ F , we have,

aα+ bβ = aθ + bθ

= θ + θ = θ ∈ θ = W1 ∩W2.

Thus W1 ∩W2 is a subspace of V (F ). Now, let W1 ∩W2 6= θ and let α, β ∈W1 ∩W2 anda, b ∈ F. Therefore,

α, β ∈W1 ∩W2 ⇒ α, β ∈W1 and α, β ∈W2.

Since W1 as well as W2 is a subspace of V ,

α ∈W1, β ∈W1 ⇒ aα+ bβ ∈W1

α ∈W2, β ∈W2 ⇒ aα+ bβ ∈W2

⇒ aα+ bβ ∈W1 ∩W2

α ∈W1 ∩W2, β ∈W1 ∩W2 ⇒ aα+ bβ ∈ W1 ∩W2;∀a, b ∈ F.

Therefore, W1 ∩W2 is also a subspace of V (F ).

Theorem 4.1.4 The intersection of an arbitrary collection of subspaces of a vector spaceis a subspace of the same.

Proof: Let Wk; k ∈ Λ be an arbitrary collection of subspaces of a vector space V (F ).Then, θ ∈ each Wk ⇒ θ ∈

⋂k∈Λ

Wk ⇒W =⋂k∈Λ

Wk 6= φ.

Thus, W is non empty. Now, let α, β ∈W, then

α, β ∈⋂k∈Λ

Wk ⇒ α, β ∈ each Wk

⇒ aα+ bβ ∈ each Wk; [ each Wk is a subspace ]

⇒ aα+ bβ ∈⋂k∈Λ

Wk;∀a, b ∈ F.

Hence⋂

k∈Λ

Wk is a subspace of V (F ).

Theorem 4.1.5 The union of two subspaces of a vector space is its subspace if and only ifone is contained in the other.

Proof: Let W1 and W2 be two subspaces of a vector space V (F ), then we are to show thatW1 ∪W2 is a subspace of V iff either W1 ⊂ W2 or W2 ⊂ W1, i.e., either W1 −W2 = φ orW2 −W1 = φ. If possible, let us assume that both W1 −W2 6= φ and W2 −W1 6= φ. Then∃ vectors α, β such that α ∈W1 but α 6∈W2 and β ∈W2 but β 6∈W1. Now,

α ∈W1 ⇒ α ∈W1 ∪W2 and β ∈W2 ⇒ β ∈W1 ∪W2

⇒ α+ β ∈W1 ∪W2; as W1 ∪W2 is a subspace of V (F )⇒ α+ β ∈W1 or α+ β ∈W2.

246 Vector Space

Again, if α+ β ∈W1, then W1 being a subspace so,

α+ β ∈W1, α ∈W1 ⇒ (α+ β)− α ∈W1;W1 is a subspace⇒ β ∈W1,

which is a contradiction. Similarly,

α+ β ∈W2, β ∈W2 ⇒ (α+ β)− β ∈W2;W2 is a subspace⇒ α ∈W2,

which is a contradiction. Thus, α + β 6∈ W1;α + β 6∈ W2 and so α + β 6∈ W1 ∪W2. Soour assumption that both W1 −W2 6= φ and W2 −W1 6= φ is not tenable and so eitherW1 −W2 = φ or W2 −W1 = φ, i.e., W1 ⊂W2 or W2 ⊂W1.Conversely, let W1 and W2 be the two subspaces of a vector space V (F ), such that

either W1 ⊂ W2 or W2 ⊂W1

⇒ either W1 ∪W2 = W2 or W1 ∪W2 = W1.

But W1 and W2 being the subspaces of V and W1 ∪W2 being either equal to W2 or equalto W1, so in each case W1 ∪W2 is a subspace of V (F ). Thus, a vector space can not be theunion of two proper subspaces.

Result 4.1.2 The union of two subspaces of a vector space V (F ) is not, in general, asubspace of V (F ). For example, let < be the field of real numbers, and let us consider twosubspaces S and T of the vector space <3, where,

S = (a1, a2, 0); a1, a2 ∈ < and T = (0, a2, a3); a2, a3 ∈ <.

Now, if we consider two particular elements as α = (1, 2, 0) and β = (0, 3, 4) of S and Trespectively. Here, α ∈ S ∪ T and β ∈ S ∪ T , but α+ β = (1, 5, 4) 6∈ S ∪ T . Thus here,

α ∈ S ∪ T, β ∈ S ∪ T ⇒ α+ β 6∈ S ∪ T.

Hence (S ∪ T )(F ) is not a subspace of <3(<).

4.2 Linear Sum

Let W1 and W2 be two subspaces of a vector space V (F ). Then the subset,

W1 +W2 = s+ t : s ∈W1, t ∈W2 (4.2)

is said to be the linear sum of the subspaces W1 and W2. Clearly, each element of W1 +W2

is expressible as sum of an element of W1 and the element of W2.

Result 4.2.1 Let α ∈W1, then,

α = α+ θ; where α ∈W1 and θ ∈W2

⇒ α ∈W1 +W2 ⇒W1 ⊆W1 +W2.

Similarly, W2 ⊆W1 +W2.

Theorem 4.2.1 Let W1 and W2 be two subspaces of a vector space V (F ), then,

W1 +W2 = w1 + w2 : w1 ∈W1 and w2 ∈W2

is a subspace of V (F ).

Linear Sum 247

Proof: Let W1,W2 be two subspaces of a vector space V (F ). Then each of W1 and W2 isnonempty and so W1 +W2 6= φ. Let α, β be two arbitrary elements of W1 +W2,. then,

α = α1 + α2 for some α1 ∈W1 and α2 ∈W2,

and β = β1 + β2 for some β1 ∈W1 and β2 ∈W2.

Let a, b be any two scalars in F , then

α1 ∈W1, β1 ∈W1 ⇒ aα1 + bβ1 ∈W1; as W1 is a subspace,α2 ∈W2, β2 ∈W2 ⇒ aα2 + bβ2 ∈W2; as W2 is a subspace.

Therefore, we get,

aα+ bβ = a(α1 + α2) + b(β1 + β2)= (aα1 + bβ1) + (aα2 + bβ2) ∈W1 +W2.

Thus, α, β ∈W1 + W2 ⇒ aα+ bβ ∈W1 +W2;∀a, b ∈ F.

Thus W1 +W2 is a subspace of V (F ).

4.2.1 Smallest Subspace

Let S be a subset of a vector space V (F ). Then a subset U of V is called the smallestsubspace containing S, if U is a subspace of V containing S and is itself contained in everysubspace of V containing S. Such a subspace is also called a subspace generated or spannedby S, and we shall denote it by S. Clearly, Sis the intersection of all subspaces of S,each containing S.

Theorem 4.2.2 The subspace W1 + W2 is the smallest subspace of V containing the sub-spaces W1 and W2.

Proof: Let S be any subspace of V containing the subspaces W1 and W2. Let α be anelement of W1 +W2, then,

α = α1 + α2, for some α1 ∈W1, α2 ∈W2.

Since W1 ⊂ S,W2 ⊂ S, so α1 ∈ S, α2 ∈ S. Also, as S is a subspace of V , we get,

α1, α2 ∈ S ⇒ α1 + α2 ∈ S⇒ α ∈ S.

Thus, α = α1 + α2 ∈W1 +W2 ⇒ α ∈ S⇒ W1 +W2 ⊂ S.

Thus, W1 +W2 is the smallest subspace containing W1 and W2.

4.2.2 Direct Sum

A vector space V (F ) is said to be the direct sum of its subspaces W1 and W2, denoted byV = W1 ⊕W2, if each element of V is uniquely expressible as the sum of an element of W1

and element of W2, i.e., if each α ∈ V is uniquely expressed as,α = α1 + α2;∀ α1 ∈W1 and α2 ∈W2.

W1 and W2 are said to the complementary subspaces.

Theorem 4.2.3 The necessary and sufficient conditions for a vector space V (F ) to be thedirect sum of its subspaces W1 and W2 are,

(i) V = W1 +W2 and (ii) W1 ∩W2 = θ.

248 Vector Space

Proof: First suppose that, V = W1 ⊕W2. Then, each element of V is expressed uniquelyas the sum of an element of W1 and an element of W2. Consequently, V = W1 +W2 and so(i) is satisfied. Now, to verify the validity of (ii), if possible, let α(6= θ) ∈ W1 ∩W2. Thenα is a non zero vector common to both W1 and W2. We may write,

α = α+ θ where α ∈W1 and θ ∈W2

α = θ + α where θ ∈W1 and α ∈W2.

This shows that, a non zero element α ∈ V is expressible in at least two ways as of anelement of W1 and an element of W2. This contradicts the fact that V = W1 ⊕W2. Henceθ is the only vector common to both W1 and W2. Thus W1 ∩W2 = θ. Therefore,

V = W1 ⊕W2 ⇒ V = W1 +W2 and W1 ∩W2 = θ.

Conversely, let the conditions (i) and (ii) hold and we are to show that V = W1 ⊕W2, i.e.,we are to show that, each element of V is expressed uniquely as the sum of an element ofW1 and an element of W2. Now, the condition V = W1 +W2 revels that each element of Vis expressed as the sum of an element of W1 and an element of W2. Hence we are to showthat this expression is unique. Let, if possible, α(6= θ) ∈ V be expressed as,

α = α1 + α2; α1 ∈W1 and α2 ∈W2

α = β1 + β2; β1 ∈W1 and β2 ∈W2

⇒ α1 + α2 = β1 + β2

⇒ α1 − β1 = β2 − α2 ∈W1 ∩W2; as α1 − β1 ∈W1 and β2 − α2 ∈W2

⇒ α1 − β1 = θ and β2 − α2 = θ; as W1 ∩W2 = θ⇒ α1 = β1 and α2 = β2.

Thus each element of V is uniquely expressed as sum of an element of W1 and an elementof W2. Hence V = W1 ⊕W2.

Ex 4.2.1 Let W1 and W2 be two subspaces of <3(<), where W1 be the xy plane and W2 bethe yz plane. Find the direct sum of W1 and W2.

Solution: Here given that the vector space V = <3(<) = (x, y, z) : x, y, z ∈ <. The twosubspaces W1 and W2 are given by,

W1 = (a, b, 0); a, b ∈ <; W2 = (0, b, c); b, c ∈ <.

The linear sum of W1 and W2 is given by,

W1 +W2 = (a, b, c); a, b, c ∈ <.

Since every vector in <3 is the sum of vector in W1 and a vector in W2, so, W1 + W2 =<3.Also,

W1 ∩W2 = (0, b, o); b ∈ < 6= (0, 0, 0); b 6= 0 ∈ <.

Thus, <3 is not the direct sum of W1 and W2. Also, take a particular example, say, (5, 7, 9),then,

(5, 7, 9) = (5, 5, 0) + (0, 2, 9); (5, 7, 9) = (5, 4, 0) + (0, 3, 9).

Thus the linear sums are not unique. This also shows that <3 is not a direct sum of W1 andW2.

Ex 4.2.2 In a vector space V of all real valued continuous functions, defined on the set <,of real numbers, let Ve and V0 denote the sets of even and odd functions respectively. Showthat Ve and V0 are subspaces of V and V = Ve ⊕ V0.

Quotient Space 249

Solution: It has already been shown that, Ve as well as V0 is a subspace of V . Now, inorder to show that, V = Ve ⊕ V0, we must prove that V = Ve + V0 and Ve ∩ V0 = θ. Letf be an arbitrary element of V . Then ∀x ∈ <, we have,

f(x) =12[f(x) + f(−x)] +

12[f(x)− f(−x)]

= fe(x) + f0(x) = (fe + f0)(x),

where, fe(x) = 12 [f(x) + f(−x)] and f0(x) = 1

2 [f(x)− f(−x)]. Also,

fe(−x) =12[f(−x) + f(x)] = fe(x)

and f0(−x) =12[f(−x)− f(x)] = −f0(x).

Therefore, fe ∈ Ve and f0 ∈ V0 and consequently,

f = fe + f0; where fe ∈ Ve, f0 ∈ V0.

This shows that, every element of V is expressed as sum of an element of Ve and an elementof V0, so, V = Ve + V0. Also, the condition Ve ∩ V0 = θ follows from the fact that the zerofunction is the only real valued function on <, which is both even and odd. Thus,

V = Ve + V0; Ve ∩ V0 = θ ⇒ V = Ve ⊕ V0.

Ex 4.2.3 Let V be the vector space of square matrices of order n over a field <. Let Vs

and Va be subspaces of symmetric and antisymmetric matrices in V respectively. Show thatV = Vs ⊕ Va.

Solution: It has already been shown that, Vs as well as Va is a subspace of V . Now, inorder to show that, V = Vs ⊕ Va, we must prove that V = Vs + Va and Vs ∩ Va = θ. LetA be an arbitrary element of V and A = X + Y , where,

X =12(A+AT ) and Y =

12(A−AT ).

Since XT = X and Y T = −Y , so X ∈ Vs and Y ∈ Va. If M ∈ Vs ∩ Va, then, M = MT andMT = −M . It implies that, M = −M , i.e., M = 0 = θ. Hence, Vs ∩ Va = θ.Thus,

V = Vs + Va; Vs ∩ Va = θ ⇒ V = Vs ⊕ Va.

4.3 Quotient Space

Let W be a subspace of a vector space V (F ) and let α ∈ V . Then the set given by,

α+W = α+ w : w ∈W ⊂ V

is called a coset of W in V and is denoted by α+W . The set of all distinct cosets of W isdenoted by V/W .

Theorem 4.3.1 Let W be a subspace of a vector space V (F ). Then the set V/W of allcosets W + α, where α ∈ V , is a vector space over the field F , with respect to addition andscalar multiplication by,

(i) (W + α) + (W + β) = W + (α+ β)

250 Vector Space

(ii) a(W + α) = W + aα; ∀α, β ∈ V and a ∈ F .

Proof: First we are to show that, the composition is well defined. Let, W + α = W +α1 and W + β = W + β1. Then,

W + α = W + α1 and W + β = W + β1

⇒ α− α1 ∈W and β − β1 ∈W⇒ (α− α1) + (β − β1) ∈W⇒ (α+ β)− (α1 + β1) ∈W⇒ W + (α+ β) = W + (α1 + β1)⇒ (W + α) + (W + β) = (W + α1) + (W + β1).

This shows that the addition composition on (V/W ) is well defined. Again,

W + α = W + α1 ⇒ α− α1 ∈W⇒ a(α− α1) ∈W, i.e., aα− aα1 ∈W⇒ W + aα = W + aα1

⇒ a(W + α) = a(W + α1).

So the scalar multiplication is well defined. Now, it is easy to show that (V/W,+) is anabelian group. In fact, the coset W + θ = W is the additive identity and every coset W +αis (V/W ) has its additive inverse W +(−α) in (V/W ). Moreover, ∀(W +α), (W +β) ∈ V/Wand a, b ∈ F , we have,

(i) a[(W + α) + (W + β)] = a[W + (α+ β)]= W + a(α+ β) = W + aα+ aβ

= (W + aα) + (W + aβ) = a(W + α) + a(W + β).(ii) (a+ b)(W + α) = W + (a+ b)α = W + (aα+ bα)

= (W + aα) + (W + bα) = a(W + α) + b(W + α).(iii) (ab)(W + α) = W + (ab)α = W + a(bα)

= a(W + bα) = a[b(W + α)].(iv) 1(W + α) = W + 1α = W + α.

Hence V/W is a vector space and this vector space V/W is called a quotient space of V byW .

Theorem 4.3.2 Let W be a subspace of a vector space V (F ), and α, β ∈ V . Then α+W =β +W if and only if α− β ∈W .

Proof: First let, α+W = β +W and let γ ∈ α+W . Then,

γ = α+ w1 = β + w2; for some w1, w2 ∈W⇒ α− β = w2 − w1 ∈W.

Conversely, let α− β ∈W . Then,

α = β + w3; for some w3 ∈Wand β = α+ w4; for some w4 ∈W.

Linear Combination of Vectors 251

Let γ ∈ α+W , then γ = α+ w5 for some w5 ∈W . Thus,

γ = α+ w5 = (β + w3) + w5

= β + w6; w6 = w3 + w5 ∈W⇒ γ ∈ β +W.

Therefore, γ ∈ α+W ⇒ γ ∈ β +W , so α+W ⊂ β +W. Let δ ∈ β +W , then,

δ ∈ β +W ⇒ δ = β + w7; for some w7 ∈W⇒ δ = (α+ w4) + w7 = α+ w8; w8 = w4 + w7 ∈W⇒ δ ∈ α+W, so, β +W ⊂ α+W.

Hence it follows that β +W = α+W. Hence the theorem.

4.4 Linear Combination of Vectors

Let V (F ) be a vector space, and S = α1, α2, . . . , αn be a finite subset of V (F ). A vectorα ∈ V is said to be a linear combination of the vectors α1, α2, . . . , αn if α can be expressedin the form

α = c1α1 + c2α2 + . . .+ cnαn (4.3)

for some scalars c1, c2, . . . , cn in F . Note that, if we write α is a linear combination of vectorsα1, α2, . . . , αn, then we are to solve a system AC = B of linear equations in unknownsC = (c1, c2, . . . , cn) and B = α and the columns of the coefficient matrix A are αk’s. If thesystem of linear equations AC = B has no solution, then α can not be expressed as a linearcombination of α1, α2, . . . , αn.

Ex 4.4.1 Express (2, 1,−6) as a linear combination of (1, 1, 2), (3,−1, 0) and (2, 0,−1) ina real vector space <3(<).

Solution: Let α = (2, 1,−6), α1 = (1, 1, 2), α2 = (3,−1, 0) and α3 = (2, 0,−1). Here weseek scalars c1, c2, c3 ∈ < such that the relation α = c1α1 + c2α2 + c3α3. holds. Using thevalues of α1, α2 and α3, we get,

(2, 1,−6) = c1(1, 1, 2) + c2(3,−1, 0) + c3(2, 0,−1)= (c1 + 3c2 + 2c3, c1 − c2, 2c1 − c3)

Combining terms on the left and equating corresponding entries leads to the linear system

c1 +3c2 +2c3 = 2c1 −c2 = 12c1 −c3 = −6

.

The above system of equations is consistent and so has a solution:c1 = − 78 , c2 = − 15

8 , c3 =174 . Hence α is a linear combination of the αi’s as α = − 7

8α1 − 158 α2 + 17

4 α3.Alternatively, we write down the augmented matrix M of the equivalent system of linear

equations, where α1, α2, α3 are the first three columns of M and α is the last column, andthen reduce M to echelon form1 3 2 2

1 −1 0 12 0 −1 −6

∼1 3 2 2

0 −4 −2 −10 −6 −5 −10

∼1 3 2 2

0 −4 −2 −10 0 −4 −17

.The last matrix corresponds to a triangular system, which has a solution

c1 = − 78 , c2 = − 15

8 , c3 = 174 .

252 Vector Space

Ex 4.4.2 In a vector space <3(<), show that (2, 5, 3) can not be expressed as a linear com-bination of vectors (1,−3, 2), (2,−4,−1) and (1,−5, 7).

Solution: Let α = (2, 5, 3), α1 = (1,−3, 2), α2 = (2,−4,−1) and α3 = (1,−5, 7). Thevector α = (2, 5, 3) is a linear combination of α1, α2, α3 if we can find scalars c1, c2, c3 ∈ <,so that, α = c1α1 + c2α2 + c3α3 holds. Using the values of α1, α2 and α3, we get,

(2, 5, 3) = c1(1,−3, 2) + c2(2,−4,−1) + c3(1,−5, 7)= (c1 + 2c2 + c3,−3c1 − 4c2 − 5c3, 2c1 − c2 + 7c3)

Combining terms on the left and equating corresponding entries leads to the linear system

c1 +2c2 +c3 = 2−3c1 +4c2 −5c3 = 52c1 −c2 +7c3 = 3

.

The above system of equations is not consistent and have no solution. Hence α can not beexpressed as a linear combination of the αi’s.

Ex 4.4.3 Express p(t) = 3t2 + 5t − 5 as a linear combination of the polynomials p1(t) =t2 + 2t+ 1, p2(t) = 2t2 + 5t+ 4 and p3(t) = t2 + 3t+ 6.

Solution: We seek scalars c1, c2, c3 ∈ <, so that, p(t) = c1p1 + c2p2 + c3p3, i.e.,

3t2 + 5t− 5 = c1(t2 + 2t+ 1) + c2(2t2 + 5t+ 4) + c3(t2 + 3t+ 6)= (c1 + 2c2 + c3)t2 + (2c1 + 5c2 + 3c3)t+ c1 + 4c2 + 6c3

⇒ c1 + 2c2 + c3 = 3; 2c1 + 5c2 + 3c3 = 5; c1 + 4c2 + 6c3 = −5.

The system of equation has solution c1 = 3, c2 = 1, c3 = −2. Therefore,p(t) = 3p1 + p2 − 2p3.

Ex 4.4.4 Let A =[

1 11 0

], B =

[0 01 1

], C =

[0 20 −1

]and X =

[4 02 0

]. Express X as a linear

combination of A,B,C.

Solution: We seek to scalars p, q, r such that X = pA+ qB + rC holds. Now,[4 02 0

]= p

[1 11 0

]+ q

[0 01 1

]+ r

[0 20 −1

]=[

p p+ 2rp+ q q − r

].

⇒ p = 4, p+ 2r = 0, p+ q = 2, q − r = 0 or, p = 4, q = −2, r = −2.

Hence the required relation is X = 4A− 2B − 2C.

4.4.1 Linear Span

Let V (F ) be a vector space and S be a non-empty subset of V .

(i) If S is finite, the set of all linear combinations of the vectors of S, which is a smallestsubspace of V , is defined as a linear span of S and is denoted by L(S) or span S.Thus, if α = α1, α2 . . . , αn, then,

L(S) =

n∑

i=1

ciαi; ci ∈ F

. (4.4)


(ii) If S is infinite, the set of all linear combinations of a finite number of vectors from Sis defined as a linear span of S and is denoted by L(S) or span S.

In both cases, the space L(S) is said to be generated of spanned by the set S and S is saidto be the set of generators of L(S). For convenience, L(φ) = θ, φ = null set.

Ex 4.4.5 Determine the subspace of R3 spanned by the vectors α = (1, 2, 3), β = (−1, 2, 4).Show that γ = (3, 2, 2) is in the subspace.

Solution: Lα, β is the set of vectors cα+ dβ, where c, d are real numbers andcα+ dβ = c(1, 2, 3) + d(−1, 2, 4) = (c− d, 2c+ 2d, 3c+ 4d).

If γ ∈ Lα, β then there must be real numbers c, d such that(3, 2, 2) = (c− d, 2c+ 2d, 3c+ 4d).

Therefore, c − d = 3, 2c + 2d = 2, 3c + 4d = 2. These equations are consistent and theirsolution is c = 2, d = −1. Hence γ ∈ Lα, β, i.e., γ belongs to the subspace generated bythe vectors α, β.

Ex 4.4.6 Find the condition among x, y, z such that the vector (x, y, z) belongs to the spacegenerated by α = (2, 1, 0), β = (1,−1, 2), γ = (0, 3,−4).

Solution: If (x, y, z) ∈ Lα, β, γ then (x, y, z) can be expressed as a linear combination ofα, β, γ. Let (x, y, z) = c1α+ c2β + c3γ then,

(x, y, z) = c1(2, 1, 0) + c2(1,−1, 2) + c3(0, 3,−4)= (2c1 + c2, c1 − c2 + 3c3, 2c2 − 4c3).

This gives 2c1 + c2 = x, c1 − c2 + 3c3 = y, 2c2 − 4c3 = z. Multiplying second equation by2 and subtracting from first equation we get,

3c2 − 6c3 = x− 2y or, c2 − 2c3 = (x− 2y)/3.Again, from third equation c2− 2c3 = z/2. Hence (x− 2y)/3 = z/2 or, 2x− 4y− 3z = 0,

which is the required condition.

Theorem 4.4.1 The linear span L(S) of a non empty subset S of a vector space V (F ) isthe smallest subspace of V containing S.

Proof: Let S = α1, α2 . . . , αn ⊂ V and let,

L(S) =

n∑

i=1

ciαi; ci ∈ F

.

Now, L(S) is a nonempty subset of V , as α1 ∈ V (F ). Let α1 ∈ S, then we can write,

α1 = 1α1 ⇒ α1 ∈ L(S).Therefore, α1 ∈ S ⇒ α1 ∈ L(S) ⇒ S ⊆ L(S).

Now, in order to show that L(S) is a subspace of V (F ), let α and β be any two arbitraryelements of L(S). Then each one of them is a linear combination of finite number of elementsos S. Let,

α =m∑

i=1

aiαi; αi ∈ S and ai ∈ F

and β =n∑

j=1

bjβj ; βj ∈ S and bj ∈ F.

Now, for ∀a, b ∈ F, we have,

254 Vector Space

aα+ bβ = a

(m∑

i=1

aiαi

)+ b( n∑

j=1

bjβj

)=

m∑i=1

(aai)αi +n∑

j=1

(bbj)βj ,

which is a linear combination of finite number of elements of S and so is a member of L(S).Thus,

α ∈ L(S), β ∈ L(S) ⇒ aα+ bβ ∈ L(S); ∀a, b ∈ F.

Thus L(S) is a subspace of V (F ). Next, let W be any subspace of V containing the set Sand let α ∈ L(S). Then,

α = c1α1 + c2α2 + . . .+ cnαn,

for some scalars c1, c2, . . . , cn ∈ F . Since W is a subspace of V containing αi and ci ∈ F , itfollows that

ciαi ∈W ; i = 1, 2, . . . , n.

Since W is a subspace and c1α1, c2α2, . . . , cnαn ∈W, it follows that,

c1α1 + c2α2 + . . .+ cnαn ∈W⇒ α ∈W.

Thus, α ∈ L(S) ⇒ α ∈W ⇒ L(S) ⊂W.

Hence L(S) is the smallest subspace of V (F ) containing S and it is called the subspacespanned or generated by S.

Theorem 4.4.2 If S and T be nonempty finite subsets of a vector space V (F ), then

(i) S ⊆ T ⇒ L(S) ⊆ L(T ).

(ii) S is a subspace of V ⇔ L(S) = S.

(iii) LL(S) = L(S).

(iv) L(S ∪ T ) = L(S) + L(T ).

Proof: (i) Let S = α1, α2, . . . , αr and T = α1, α2, . . . , αr, αr+1, . . . , αn be two subsetsof a vector space V (F ) and let α ∈ L(S). Then for some scalars, ci ∈ F , we have,

α = c1α1 + c2α2 + . . .+ cnαn.

Thus, α ∈ L(S) ⇒ α =r∑

i=1

ciαi

=n∑

i=1

ciαi; cr+1 = cr+2 = . . . = cn = 0

⇒ α ∈ L(T ) ⇒ L(S) ⊆ L(T ).

(ii) Let S be a subspace of V and let α ∈ L(S). Then α is a linear combination of finitenumber of elements of S, i.e.,

α =n∑

i=1

ciαi; for some ci ∈ F.


But S is a subspace of V , then the smallest subspace containing S is S. So,n∑

i=1

ciαi ∈ S

and so α ∈ S and therefore, L(S) ⊆ S. Asα ∈ S ⇒ α = 1.α ∈ L(S)

and we know L(S) is the smallest subspace containing S so, S ⊆ L(S). Combining thesetwo, we conclude, L(S) = S. Conversely, let L(S) = S, then L(S) being a subspace of V ,so is therefore S.

(iii) Let S1 = L(S), then S1 is a subspace of V (F ). Therefore, by (ii), we have,

L(S1) = S1 ⇒ LL(S) = L(S).

(iv) It has been already proved that, the linear sum of the two subspaces is a subspace andthe linear span of a non empty subset of a vector space is its subspace. Hence L(S) + L(T )as well as L(S ∪ T ) is a subspace of V (F ). Let S and T be two finite sets, given by,

S = α1, α2, . . . , αr and T = β1, β2, . . . , βk

and let α ∈ L(S ∪ T ). Then α is a linear combination of finite number of elements of S ∪ T .Thus,

α ∈ L(S ∪ T ) ⇒ α =r∑

i=1

ciαi +k∑

i=1

diβi; for some ci, di ∈ F

= λ+ µ(say); λ =r∑

i=1

ciαi ∈ L(S), µ =k∑

i=1

diβi ∈ L(T )

⇒ L(S ∪ T ) ⊆ L(S) + L(T ). (4.5)

Again, let α ∈ L(S) + L(T ), then,

α = α1 + α2; for some α1 ∈ L(S) and α2 ∈ L(T )

=r∑

i=1

ciαi +k∑

i=1

diβi; for some ci, di ∈ F

∈ L(S ∪ T ); α1, α2, . . . , αr, β1, β2, . . . , βk ∈ S ∪ T.So, α ∈ L(S) + L(T ) ⇒ α ∈ L(S ∪ T )

⇒ L(S) + L(T ) ⊆ L(S ∪ T ). (4.6)

Hence from (4.5) and (4.6) we have, L(S ∪ T ) = L(S) + L(T ).

Theorem 4.4.3 If S and T be two non empty finite subsets of a vector space V (F ) andeach element of T is a linear combination of the vectors of S, then L(T ) ⊂ L(S).

Proof: Let S = α1, α2, . . . , αr, T = β1, β2, . . . , βm and let βi = ci1α1+ci2α2+. . .+cirαr

for some cij ∈ F , i = 1, 2, . . . , r and j = 1, 2, . . . ,m. Let α be an element of L(T ), then

α = a1β1 + a2β2 + . . .+ amβm; for some ai ∈ F

= a1

r∑j=1

c1jαj + a2

r∑j=1

c2jαj + . . .+ am

r∑j=1

cmjαj

=r∑

j=1

(a1c1j)αj +r∑

j=1

(a2c2j)αj + . . .+r∑

j=1

(amcmj)αj

= d1α1 + d2α2 + . . . drαr ∈ L(S)as di = a1c1i + a2c2i + . . .+ amcmi ∈ F ; i = 1, 2, . . . , r.

Thus, α ∈ L(T ) ⇒ α ∈ L(S) and so L(T ) ⊂ L(S).

256 Vector Space

Ex 4.4.7 Determine the subspace of <3 spanned by the vectors α = (1, 2, 3), β = (3, 1, 0).Examine, if γ = (2, 1, 3) and δ = (−1, 3, 6) are in the subspace.

Solution: Let S = α, β, where α = (1, 2, 3) and β = (3, 1, 0). Then,

L(S) = Lα, β = cα+ dβ; c, d ∈ <= c(1, 2, 3) + d(3, 1, 0); c, d ∈ <= (c+ 3d, 2c+ d, 3c); c, d ∈ < ⊆ <3.

Let γ ∈ L(S), then there must be real numbers c, d such that,

(2, 1, 3) = (c+ 3d, 2c+ d, 3c)⇒ c+ 2d = 2, 2c+ d = 1, 3c = 3.

These equations are inconsistent and so γ is not in L(S). Let δ ∈ L(S), then there must bereal numbers c, d such that,

(−1, 3, 6) = (c+ 3d, 2c+ d, 3c)⇒ c+ 3d = −1, 2c+ d = 3, 3c = 6.

These equations are consistent and c = 2, d = −1 so that δ = 2α−β, showing that δ ∈ L(S).

Ex 4.4.8 In the vector space V3(<), consider the vectors α = (1, 2, 1), β = (3, 1, 5), γ =(3,−4, 7). Show that the subspaces spanned by S = α, β and T = α, β, γ are the same.

Solution: Since S ⊆ T, we have L(S) ⊆ L(T ). Now, we are to show that γ can be expressedas a linear combination of α and β. Let γ = aα+ bβ, for some scalars a, b ∈ F , then,

(3,−4, 7) = a(1, 2, 1) + b(3, 1, 5) = (a+ 3b, 2a+ b, a+ 5b)⇒ a+ 3b = 3, 2a+ b = −4, a+ 5b = 7 ⇒ a = −3, b = 2.

Therefore, γ = −3α+ 2β. Now, let δ be an arbitrary element of L(T ). Then,

δ = a1α+ a2β + a3γ; for some scalars a1, a2, a3 ∈ F= a1α+ a2β + a3(−3α+ 2β)= (a1 − 3a3)α+ (a2 + 2a3)β ∈ L(S).

δ ∈ L(T ) ⇒ δ ∈ L(S) ⇒ L(T ) ⊆ L(S).

Therefore, L(S) = L(T ) and so the subspaces spanned by S = α, β and T = α, β, γ arethe same.

Ex 4.4.9 Let S = α, β, γ and T = α, β, α + β, β + γ be subsets of a real vector spaceV . Show that, L(S) = L(T ).

Solution: S and T are finite subsets of V and each element of T is a linear combination ofthe vectors of S and therefore L(T ) ⊂ L(S). Again,

α = α+ 0β + 0(α+ β) + 0(β + γ)β = 0α+ β + 0(α+ β) + 0(β + γ)γ = 0α− β + 0(α+ β) + (β + γ).

This shows that, each element of S is a linear combination of the vectors of T , and soL(S) ⊂ L(T ). It follows that L(S) = L(T ).


Ex 4.4.10 If α(1, 2,−1), β = (2,−3, 2), γ = (4, 1, 3) and δ = (−3, 1, 2) ∈ <3(<), prove thatL(α, β) 6= L(γ, δ).

Solution: Let, if possible, L(α, β) = L(γ, δ), then ∃ scalars x, y ∈ < and for arbitrarya, b ∈ <, such that, xα+ yβ = aγ + bδ, i.e.,

x(1, 2,−1) + y(2,−3, 2) = a(4, 1, 3) + b(−3, 1, 2)⇒ (x+ 2y, 2x− 3y,−x+ 2y) = (4a− 3b, a+ b, 3a+ 2b)⇒ x+ 2y = 4a− 3b, 2x− 3y = a+ b,−x+ 2y = 3a+ 2b.

From first and third, we have, x = 12 (a− 5b), y = 1

4 (7a− b). Now,

2x− 3y = −174

(a+ b) 6= (a+ b).

Therefore, these equations are inconsistent and so L(α, β) 6= L(γ, δ).

Theorem 4.4.4 The linear sum of two subspaces W1 and W2 of a vector space V (F ) isgenerated by their union W1 +W2 = L(W1 ∪W2).

Proof: It has been already proved that, the linear sum of the two subspaces is a subspaceand the linear span of a non empty subset of a vector space is its subspace. Consequently,W1 +W2 as well as L(W1 ∪W2) is a subspace of V (F ). Now, let α ∈W1 +W2, then

α = α1 + α2; for some α1 ∈W1 and α2 ∈W2

= 1α1 + 1α2.

Therefore, α is a linear combination of α1, α2 ∈W1 ∪W2 and so, α ∈ L(W1 ∪W2). Thus,

α ∈W1 +W2 ⇒ α ∈ L(W1 ∪W2)⇒ W1 +W2 ⊆ L(W1 ∪W2).

Again, W1 ⊆W1 +W2 and W2 ⊆W1 +W2

W1 ∪W2 ⊆W1 +W2 ⊆ L(W1 +W2).

But, L(W1∪W2) being the smallest subspace containing W1∪W2, so L(W1∪W2) ⊆W1+W2.Therefore L(W1 ∪W2) = W1 +W2.

4.4.2 Linearly Dependence and Independence

This concept of linearly dependence and independence plays an important role in the theoryof linear algebra and in mathematics in general.

Definition 4.4.1 Let V (F ) be a vector space. A finite set of vectors S = α1, α2, . . . , αnof V (F ) is said to be linearly dependent (LD) if ∃ scalars c1, c2, . . . , cn ∈ F , not all zerosuch that,

c1α1 + c2α2 + . . .+ cnαn = θ. (4.7)

An arbitrary set of vectors of a vector space V (F ) is said to be linearly dependent in V if ∃a finite subset of S which is linearly dependent in V .

Ex 4.4.11 Prove that the set of vectors α1, α2, α3, where α1 = (2, 2,−3), α2 = (0,−4, 1)and α3 = (3, 1,−4) in <3(<) is linearly dependent.

258 Vector Space

Solution: Let c1, c2, c3 ∈ < be three scalars such that c1α1 + c2α2 + c3α3 = θ holds. Then,

c1(2, 2,−3) + c2(0,−4, 1) + c3(3, 1,−4) = θ = (0, 0, 0)⇒ (2c1 + 3c3; 2c1 − 4c2 + c3;−3c1 + c2 − 4c3) = (0, 0, 0)

⇒ 2c1 + 3c3 = 0; 2c1 − 4c2 + c3 = 0;−3c1 + c2 − 4c3 = 0⇒ c1 = 3, c2 = 1, c3 = −2.

So, ∃ scalars c1 = 3, c2 = 1, c3 = −2, not all zero, such that c1α1 + c2α2 + c3α3 = θ holds.Thus, α1, α2, α3 is linearly dependent in <3.

Ex 4.4.12 If C is the field of complex numbers, prove that the vectors (x1, y1), (x2, y2) ∈V2(C) are linearly dependent if and only if x1y2 − x2y1 = 0.

Solution: Let a, b ∈ C, then,

a(x1, y1) + b(x2, y2) = θ ⇒ (ax1 + bx2, ay1 + by2) = (0, 0)⇒ ax1 + bx2 = 0; ay1 + by2 = 0.

The necessary and sufficient condition for these equations to possess a common non zerovalues of a and b is that, ∣∣∣∣x1 y1

x2 y2

∣∣∣∣ = 0 ⇒ x1y2 − x2y1 = 0.

Hence, the given vectors are linearly dependent if and only if x1y2 − x2y1 = 0.

Definition 4.4.2 Let V (F ) be a vector space. A finite set of vectors S = α1, α2, . . . , αnof V (F ) is said to be linearly independent (LI) if ∃ scalars c1, c2, . . . , cn ∈ F , such that,

c1α1 + c2α2 + . . .+ cnαn = θ ⇒ c1 = c2 = . . . = cn = 0. (4.8)

An arbitrary set of vectors of a vector space V (F ) is said to be linearly independent in V if∃ a finite subset of S which is linearly independent in V .

Ex 4.4.13 Prove that the set of vectors α1 = (2, 1, 4), α2 = (−3, 2,−1) and α3 = (1,−3,−2)in V3(<) is linearly independent.

Solution: Let c1, c2c3 ∈ < be three scalars such that c1α1 + c2α2 + c3α3 = θ holds. Then,c1(2, 1, 4) + c2(−3, 2,−1) + c3(1,−3,−2) = θ = (0, 0, 0)

⇒ (2c1 − 3c2 + c3; c1 + 2c2 − 3c3; 4c1 − c2 − 2c3) = (0, 0, 0)⇒ 2c1 − 3c2 + c3 = 0; c1 + 2c2 − 3c3 = 0; 4c1 − c2 − 2c3 = 0⇒ c1 = 0, c2 = 0, c3 = 0.

So, ∃ scalars c1 = 3, c2 = 1, c3 = −2, not all zero, such that c1α1 + c2α2 + c3α3 = θ holds.Thus, α1, α2, α3 is linearly dependent in <3.

Ex 4.4.14 In the vector space P [x] of all polynomials over the field F , the infinite setS = 1, x, x2, . . . is linearly independent.

Solution: In order to show that the given infinite set S is linearly independent, we mustshow that every finite subset of S is linearly independent. Let A = xm1 , xm2 , . . . , xmrbe an arbitrary finite subset of S, so that each mi is a non negative integer. Now, leta1, a2, . . . , ar ∈ F be r scalars such that,

a1xm1 + a2x

m2 + . . .+ arxmr = θ

holds. Since this is true for arbitrary values of xmi ’s, we have by definition of equality ofpolynomials,a1 = a2 = . . . = ar = 0. This shows that A is linearly independent and hence Sis linearly independent.


Ex 4.4.15 Find the values of x such that the vectors (1, 2, 1), (x, 3, 1) and (2, x, 0) are lin-early dependent. [WBUT2005]

Solution: If the given vectors are linearly dependent thenc1(1, 2, 1) + c2(x, 3, 1) + c3(2, x, 0) = θ gives∣∣∣∣∣∣

1 2 1x 3 12 x 0

∣∣∣∣∣∣ = 0 ⇒ 1(0− x)− 2(0− 2) + 1(x2 − 6) = 0

or, (x− 2)(x+ 1) = 0 or, x = −1, 2.Hence the required values of x are −1, 2.

Ex 4.4.16 Prove that, the vector space of all periodic function f(t) with period T containsan infinite set of linearly independent vectors.

Solution: Let us consider, the infinite set of periodic functions

S =

1, cos2nπxT

, sin2nπxT

; n ∈ N.

Consider a finite subset Sn (for each positive integer n), of S as,

Sn =

1, cos2πxT

, , sin2πxT

, · · · , cos2nπxT

, sin2nπxT

, · · ·.

To prove Sn is linearly independent, let ∃ scalars a0, a1, · · · , an, b1, b2, · · · , bn such that,

a0 +n∑

r=1

(ar cos

2rπxT

+ br sin2rπxT

)= 0 · · · (i)

∫ T

0

a0dx +n∑

r=1

(ar

∫ T

0

cos2rπxT

dx+ br

∫ T

0

sin2rπxT

dx

)= 0

⇒ a0T = 0 ⇒ a0 = 0.

Now we use the following integration formulas,∫ T

0

cos2rπxT

cos2kπxT

dx =∫ T

0

sin2rπxT

sin2kπxT

dx =T

2δrk

and,∫ T

0

sin2rπxT

cos2kπxT

dx = 0.

Multiplying both sides of (i) by cos 2kπxT and integrating we get,

a0

T∫0

cos2kπxT

dx +n∑

r=1

ar

T∫0

cos2rπxT

cos2kπxT

dx+ br

T∫0

cos2kπxT

sin2rπxT

dx

= 0

⇒ a0.0 +n∑

r=1

[arT

2δrk + br.0] = 0 ⇒ ar = 0, r = 1, 2, · · · , n.

Similarly, multiplying both sides of (i) by sin 2kπxT and integrating from 0 to T , wt get,

br = 0; r = 1, 2, . . . , n.

Thus, Sn is linearly independent, for every positive integer n and consequently S is linearlyindependent.

260 Vector Space

Ex 4.4.17 Prove that, if α, β, γ are linearly independent vectors of a complex vector spaceV (C), then so also are α+ β, β + γ, γ + α.

Solution: Let a, b, c ∈ C be any three scalars such that

a(α + β) + b(β + γ) + c(γ + α) = θ

⇒ (a + c)α+ (a+ b)β + (b+ c)γ = θ

⇒ a+ c = a+ b = b+ c = 0; as α, β, γis linearly independent.

As

∣∣∣∣∣∣1 0 11 1 00 1 1

∣∣∣∣∣∣ = 2 6= 0, so the given homogeneous system has unique solution and the unique

solution is trivial. Thus, a = b = c = 0, shows that α + β, β + γ, γ + α is linearlyindependent.

Theorem 4.4.5 A set containing a single non null vector is linearly independent.

Proof: Let S = α;α 6= θ be a subset in a vector space V (F ). Let for some scalar a ∈ F ,we have

aα = θ ⇒ a = 0; as α 6= θ.

Therefore, the set is linearly independent. The set θ is also linearly independent.

Theorem 4.4.6 Every set of vectors containing the null vector is linearly dependent.

Proof: Let S = α1, α2, . . . , αr, . . . , αm, where at least one of them say αr = θ. Then, itis clear that, 0α1 + 0α2 + . . .+ 1αr + . . .+ 0αm = θ

⇒m∑

i=1

ciαi = θ; where cr 6= 0.

Hence S is linearly dependent. Thus, we conclude, if the set S = α1, α2, . . . , αm, of vectorsin a vector space V (F ) is linearly independent, then none of the vectors in S can be a zerovector.

Theorem 4.4.7 Every subset of a linearly independent set is linearly independent.

Proof: Let the set S = α1, α2, . . . , αm be a linearly independent set of vectors and letT = α1, α2, . . . , αr; 1 ≤ r ≤ m,

be its subset. Let for some scalars c1, c2, . . . , cr ∈ F, we have,

c1α1 + c2α2 + . . .+ crαr = θ

⇒ c1α1 + c2α2 + . . .+ crαr + 0αr+1 + . . .+ 0αm = θ

⇒ c1 = c2 = . . . = cr = 0; as S is linearly independent.

This shows that T is linearly independent.

Ex 4.4.18 Prove that the four vectors x = (1, 0, 0), y = (0, 1, 0), z = (0, 0, 1), u = (1, 1, 1) in<3 form linearly dependent subset of <3, but, any three of them are linearly independent.

Solution: Let us consider the relation ax+ by + cz + du = θor, a(1, 0, 0) + b(0, 1, 0) + c(0, 0, 1) + d(1, 1, 1) = (0, 0, 0)or, (a+ d, b+ d, c+ d) = (0, 0, 0)or, a+ d = 0, b+ d = 0, c+ d = 0 or, a = −d = b = c.


Let d = −1. Therefore, a = b = c = 1. Hence x, y, z, u are linearly dependent and thererelation is

(1, 0, 0) + (0, 1, 0) + (0, 0, 1)− (1, 1, 1) = θ.Let us take three vectors x, y, z, then a = b = c = 0. Hence x, y, z are linearly independent.

If we take y, z, u thend = 0, b+ d = 0, c+ d = 0 or, b = c = d = 0.

Thus y, z, u are linearly independent. Similarly, x, y, z, z, y, u, x, z, u, etc. all arelinearly independent.

Theorem 4.4.8 Every superset of a linearly dependent set is linearly dependent.

Proof: Let S = α1, α2, . . . , αr be a linearly dependent set of vectors and letT =

α1, α2, . . . , αr, αr+1, . . . , αm

be its superset. Now, S being linearly dependent, so ∃ scalars c1, c2, . . . , cr, not all zero suchthat, c1α1 + c2α2 + . . .+ crαr = θ

⇒ c1α1 + c2α2 + . . .+ crαr + cr+1αr+1 + . . .+ cmαm = θ

where cr+1 = . . . = cm = 0.

Thus ∃ scalars c1, c2, . . . , cr, not all zero such that∑m

i=1 ciαi = θ holds. Hence the set T islinearly dependent.

Theorem 4.4.9 A set α1, α2, . . . , αn of nonzero vectors in a vector space V (F ) is linearlydependent, if and only if, some αk(2 ≤ k ≤ n) in the set is a linear combination of itspreceding vectors α1, α2, . . . , αk−1.

Proof: Let S = α1, α2, . . . , αn be a linearly dependent set of non null vectors. Sinceα1 6= θ, the set α1 is linearly independent.Let k be the first integer, such that α1, α2, . . . , αk is linearly dependent. Clearly, 2 ≤ k ≤n. Thus, ∃ scalars c1, c2, . . . , ck, not all zero, such that

c1α1 + c2α2 + . . .+ ckαk = θ.

Here ck 6= 0, for, otherwise α1, α2, . . . , αk−1 would be linearly dependent and the samewould contradict our hypothesis that k is the first integer between 2 and n for whichα1, α2, . . . , αk is linearly dependent. Hence as ck 6= 0, c−1

k ∈ F exists. Thus, fromthe above relation, we get,

αk = −c−1k c1α1 − c−1

k c2α2 − . . .− c−1k ck−1αk−1,

i.e., αk is a linear combination of the prereading vectors α1, α2, . . . , αk−1 of the set.Conversely, let some αk(2 ≤ k ≤ n) be a linear combinations of the prereading vectorsα1, α2, . . . , αk−1 so that ∃ scalars c1, c2, . . . ck−1, such that,

αk = c1α1 + c2α2 + . . .+ ck−1αk−1

⇒ c1α1 + c2α2 + . . .+ ck−1αk−1 + (−1)αk = θ

⇒k∑

i=1

ciαi = θ; where ck = −1 6= 0.

Since the above equality holds for scalars c1, c2, . . . , ck−1,−1 in F and one of them is nonzero, the set of vectors α1, α2, . . . , αk is linearly dependent. Since, every superset ofa linearly dependent set is linearly dependent, it follows that α1, α2, . . . , αk, . . . , αn islinearly dependent. From this theorem, it is observed that if α1, α2, . . . , αn are linearlyindependent set of vectors in a vector space, then they must be distinct and none can bethe zero vector.

262 Vector Space

Theorem 4.4.10 If a set of vectors α1, α2, . . . , αn in a vector space V (F ) be linearlydependent, if and only if at least one of the vectors of the set can be expressed as a linearcombination of the others.

Proof: Since the set S = α1, α2, . . . , αn is linearly dependent, so ∃ scalars c1, c2, . . . , cn ∈F , not all zero, such that,

c1α1 + c2α2 + . . .+ cnαn = θ.

Let ck 6= 0, then c−1k exists and ∈ F , where ckc−1

k = c−1k ck = 1, the identity element in F .

Now, ckαk = −c1α1 − c2α2 − . . .− ck−1αk−1 − . . .− cnαn

⇒ αk = −c−1k c1α1 − c−1

k c2α2 − . . .− c−1k ck−1αk−1 − . . .− c−1

k cnαn

= d1α1 + d2α2 + . . . dk−1αk−1 + . . . dnαn,

where dr = −c−1k cr; r = 1, 2, . . . , j − 1, j + 1, . . . , n.

This shows that, αk is a linear combination of the vectors α1, α2, . . . , αk−1, αk+1, . . . , αn.Conversely, let one of the vectors αj of the set S, is a linear combination of the other vectorsof the set S. Then for some scalars ci ∈ F (i = 1, 2, . . . , j − 1, j + 1, . . . , n), we have,

αj = c1α1 + c2α2 + . . .+ cj−1αj−1 + cj+1αj+1 + . . .+ cnαn

⇒ c1α1 + c2α2 + . . .+ cj−1αj−1 + (−1)αj + cj+1αj+1 + . . .+ cnαn = θ.

Here all the scalars in the equality belong to F and since at least one of them is non zero,S = α1, α2, . . . , αn is linearly dependent.

Theorem 4.4.11 If S is a linearly independent subset of the vector space V (F ) and L(S) =V , then no proper subset of S can span V .

Proof: Let T be a proper subset of S and L(T ) = V . Since T is a proper subset there is avector α ∈ S−T . Now, α ∈ V, so α is a linear combination of the vector of T . Hence T ∪αis linearly dependent. But T ∪ α ⊂ S. Consequently, it must be linearly independent andthus T can not span V .

4.5 Basis and Dimension

Here we are to discuss the structure of a vector space V (F ) by determining a smallest setof vectors in V that completely describes V .

Definition 4.5.1 Basis of a vector space : Let V (F ) be a vector space. A nonemptysubset S of vectors in V (F ) is said to be its basis, if

(i) S is linearly independent in V , and

(ii) S generates V ; i.e., L(S) = V .

If α1, α2, . . . , αn form a basis for a vector space V (F ), then they must be distinct and nonnull, so we write them as a set S = α1, α2, . . . , αn. Note that, if α1, α2, . . . , αn be abasis for the vector space V (F ), then cα1, α2, . . . , αn, is also a basis, when c 6= 0. Thus,a basis for a non zero vector space is never unique.

Definition 4.5.2 Dimension of a vector space : Let V (F ) be a vector space. The vectorspace V (F ) is said to be finite dimensional or finitely generated if there exists a finite subsetS of vectors in V , such that V = L(S).

Basis and Dimension 263

(i) A vector space which is not finitely generated is known as an infinite dimensionalvector space.

(ii) The null space θ, which has no basis and linearly dependent set is finite dimensional,since it is generated by the null set φ. So the vector space θ is said to be of dimension0.

The number of elements in any basis set S of a finite of a finite dimensional vector spaceV (F ), is called dimension of the vector space and is denoted by dimV . For example, the setS = e1, e2, · · · , en, where e1 = (1, 0, · · · , 0), e2 = (0, 1, 0, · · · , 0), · · · , en = (0, 0, · · · , 1) is abasis of <n(<). This is known as it standard basis of <n.

Ex 4.5.1 Show that the vectors α1, α2, α3, where α1 = (1, 0,−1), α2 = (1, 2, 1) and α3 =(0,−3, 2) forms a basis. Express each of the standard basis vector as a linear combinationof these vectors.

Solution: Let S = α1, α2, α3. To show S is linearly independent, let ∃ scalars c1, c2, c3 ∈<, such that,

c1α1 + c2α2 + c3α3 = θ

⇒ c1(1, 0,−1) + c2(1, 2, 1) + c3(0,−3, 2) = θ

⇒ (c1 + c2, 2c2 − 3c3,−c1 + c2 + 2c3) = (0, 0, 0).

Thus, we obtain a linear system of equations,

c1 + c2 = 2c2 − 3c3 = −c1 + c2 + 2c3 = 0,

where,

∣∣∣∣∣∣1 1 00 2 −3−1 1 2

∣∣∣∣∣∣ 6= 0. Thus, the homogeneous system has unique solution c1 = c2 = c3 = 0,

which shows that S is linearly independent. To show that S spans V3(<), let (a, b, c) be anarbitrary element in V3(<). We now seek constants c1, c2, c3 ∈ <, such that

(a, b, c) = c1α1 + c2α2 + c3α3 = (c1 + c2, 2c2 − 3c3,−c1 + c2 + 3c3)

⇒ c1 =110

(7a− 2b− 3c), c2 =110

(3a+ 2b+ 3c), c3 =15(a− b+ c)

Thus (a, b, c) ∈ V3(<) can be written as,(a, b, c) = 1

10 (7a− 2b− 3c)(1, 0,−1) + 110 (3a+ 2b+ 3c)(1, 2, 1) + 1

5 (a− b+ c)(0,−3, 2),i.e., every element in V3(<) can be expressed as a linear combination of elements of S andso L(S) = V3(<) and consequently S is a basis of V3(<). Therefore,

(1, 0, 0) =710

(1, 0,−1) +310

(1, 2, 1) +15(0,−3, 2)

(0, 1, 0) = −15(1, 0,−1) +

15(1, 2, 1)− 1

5(0,−3, 2)

(0, 0, 1) = − 310

(1, 0,−1) +310

(1, 2, 1) +15(0,−3, 2).

Ex 4.5.2 Let W = (x, y, z) ∈ <3 : x − 4y + 3z = 0. Show that W is a subspace of <3.Find the basis and dimension of the subspace W of <3.

Solution: In the previous section, we are easily verified that W is a subspace of <3. Letα = (a, b, c) ∈W , then a, b, c ∈ < and satisfies a− 4b+ 3c = 0. Therefore,

α = (a, b, c) = (4b− 3c, b, c) = b(4, 1, 0) + c(−3, 0, 1).

264 Vector Space

Let β = (4, 1, 0) and γ = (−3, 0, 1), then,

α ∈W and α = bβ + cγ; b, c ∈ <∈ Lβ, γ ⇒W ⊂ Lβ, γ.

Again, β ∈W,γ ∈W , so Lβ, γ ⊂W gives W = Lβ, γ. Now,

c1β + c2γ = θ; c1, c2 ∈ <⇒ c1(4, 1, 0) + c2(−3, 0, 1) = θ

⇒ 4c1 − 3c2 = 0 = c1 = c2 ⇒ c1 = c2 = 0.

Therefore, β, γ are linearly independent in W . Hence β, γ is a basis of W and dimW = 2.

Ex 4.5.3 Show that W = (x, y, z) ∈ <3 : 2x− y+ 3z = 0 and x+ y+ z = 0 is a subspaceof <3. Find a basis of W . What its dimension?

Solution: W is non-empty as (0, 0, 0) ∈ W . Let α = (a1, a2, a3), β = (b1, b2, b3) andα, β ∈W . Then,

2a1 − a2 + 3a3 = 0, a1 + a2 + a3 = 0and 2b1 − b2 + 3b3 = 0, b1 + b2 + b3 = 0.

Also, 2(c1a1 + c2b1)− (c1a2 + c2b2) + 3(c1a3 + c2b3)= c1(2a1 − a2 + 3a3) + c2(2b1 − b2 + 3b3) = 0

and (c1a1 + c2b1) + (c1a2 + c2b2) + (c1a3 + c2b3)= c1(a1 + a2 + a3) + c2(b1 + b2 + b3) = 0

Now,c1α+ c2β = (c1a1 + c2b1, c1a2 + c2b2, c1a3 + c2b3).

Therefore c1α+c2β ∈W . Hence W is a subspace of <3. Let ξ = (a, b, c) be any vector of W .Then 2a−b+3c = 0 and a+b+c = 0. Solving these two equations we get a = −4c/3, b = c/3.Therefore ξ = (−4c/3, c/3, c) = c

3 (−4, 1, 3).Since c is arbitrary and any vector of W can be expressed in terms of ξ, therefore,

W = L(−4, 1, 3).Hence (−4, 1, 3) is a basis of W . Since the number of vectors in this basis is 1 so the

dimension of W is 1.

Ex 4.5.4 Show that W = (x1, x2, x3, x4) ∈ <4 : x1 − x2 + x3 − x4 = 0 is a subspace ofthe four dimensional vector real space R4. Find the dimension of W .

Solution: Let α = (a1, a2, a3, a4), β = (b1, b2, b3, b4) be two vectors of W . Thena1 − a2 + a3 − a4 = 0 and b1 − b2 + b3 − b4 = 0.

or, (ca1 + db1)− (ca2 + db2) + (ca3 + db3)− (ca4 + db4)= c(a1 − a2 + a3 − a4) + d(b1 − b2 + b3 − b4) = 0.

Then,cα+ dβ = c(a1, a2, a3, a4) + d(b1, b2, b3, b4)

= (ca1 + db1, ca2 + db2, ca3 + db3, ca4 + db4) ∈W .Hence W is a subspace of R4. Now, α = (a1, a2, a3, a4), where a1 − a2 + a3 − a4 = 0

= (a1, a2, a3, a1 − a2 + a3) = a1(1, 0, 0, 1) + a2(0, 1, 0,−1) + a3(0, 0, 1, 1).Since a1, a2, a3 are arbitrary and α is linear combination of (1, 0, 0, 1), (0, 1, 0,−1), (0, 0, 1, 1).

Therefore,W = L(1, 0, 0, 1), (0, 1, 0,−1), (0, 0, 1, 1).

Again, the vectors (1, 0, 0, 1), (0, 1, 0,−1), (0, 0, 1, 1) are linearly independent sincea(1, 0, 0, 1) + b(0, 1, 0,−1) + c(0, 0, 1, 1) = θ

implies a = b = c = 0. Therefore, (1, 0, 0, 1), (0, 1, 0,−1), (0, 0, 1, 1) is a basis of W .Since there are three vectors in the basis of W , dimension of W is 3.


Ex 4.5.5 Show that, S = t2 + 1, t− 1, 2t+ 2 is a basis for the vector space P2.

Solution: To do this, we must show that S spans V and is linearly independent. To showthat, it spans V , we take any vector V , i.e., a polynomial at2+bt+c and must find constantsk1, k2 and k3 such that,

at2 + bt+ c = k1(t2 + 1) + k2(t− 1) + k3(2t+ 2)= k1t

2 + (k2 + 2k3)t+ (k1 − k2 + 2k3).

Since the polynomials agree for all values of t only, if the coefficients of the respective powersof t agree, we get the linear system,

a = k1; b = k2 + 2k3; c = k1 − k2 + 2k3

⇒ k1 = a; k2 =12(a+ b− c); k3 =

14(b+ c− a).

Hence S spans V . To show that, S is linearly independent, we form,

k1(t2 + 1) + k2(t− 1) + k3(2t+ 2) = θ

⇒ k1t2 + (k2 + 2k3)t+ (k1 − k2 + 2k3) = θ

⇒ k1 = 0; k2 + 2k3 = 0; k1 − k2 + 2k3 = 0.

The only solution to this homogeneous system is k1 = k2 = k3 = 0, which implies that S islinearly independent. Thus, S is a basis for P2. The set of vectors tn, tn−1, · · · , t, 1 formsa basis for the vector space Pn, called the natural or standard basis for Pn.

Ex 4.5.6 Find a basis for the subspace V of P2, consisting of all vectors of the form pt2 +qt+ s, where s = p− q.

Solution: Every vector in V is of the form pt2 + qt+ s can be written as

pt2 + qt+ p− q = p(t2 + 1) + q(t− 1),

so the vectors t2 + 1 and t − 1 span V . Moreover, these vectors are linearly independentbecause neither one is a multiple of the other. This conclusion could also be reached bywriting the equation,

k1(t2 + 1) + k2(t− 1) = θ

⇒ k1t2 + k2t+ (k1 − k2) = θ.

Since this equation is to hold for all values of t, we must have, k1 = 0 and k2 = 0.

Ex 4.5.7 Prove that the set S = (1, 2, 1), (2, 1, 1), (1, 1, 2) is a basis of <3.

Solution: Let α = (1, 2, 1), β = (2, 1, 1), γ = (1, 1, 2). Now,

∆ =

∣∣∣∣∣∣1 2 12 1 11 1 2

∣∣∣∣∣∣ = 1(2− 1)− 2(4− 1) + 1(2− 1) = −4 6= 0.

Hence α, β, γ is linear independent. Let δ = (x, y, z) be an arbitrary vector of <3.Let us examine if δ ∈ Lα, β, γ. If possible let δ = c1α + c2β + c3γ, where ci’s are real.Therefore,

c1 + 2c2 + c3 = x, 2c1 + c2 + c3 = y, c1 + c2 + 2c3 = z.This is a homogeneous system of three equations in c1, c2, c3. The coefficient determinant∆ =

−4 6= 0. Therefore, there exists a unique solution for c1, c2, c3. This proves that δ ∈Lα, β, γ. Thus any vector of <3 can be generated by the vectors α, β, γ. Hence α, β, γgenerates <3. Therefore, α, β, γ is a basis of <3.

266 Vector Space

Ex 4.5.8 Prove that (2, 0, 0), (0,−1, 0) are linearly independent but do not form a basis of<3.

Solution: Let c1(2, 0, 0) + c2(0,−1, 0) = (0, 0, 0)or, 2c1 = 0,−c2 = 0 or, c1 = c2 = 0.

Therefore, the given vectors are linearly independent. Let (1, 2, 3) be a vector of <3. Thend1(2, 0, 0) + d2(0,−1, 0) = (1, 2, 3) or, (2d1,−d1, 0) = (1, 2, 3).

Equating both sides we get 2d1 = 1,−d2 = 2, 0 = 3. The last relation is not possible.Hence the vectors (1, 2, 3) ∈ <3, but, can not be expressed using the given vectors, i.e., thegiven vectors do not generate <3. Hence they do not form a basis of <3.

Ex 4.5.9 If α, β, γ be a basis of real vector space V and c 6= 0 be a real number, examinewhether α+ cβ, β + cγ, γ + cα is a basis of V or not. [WBUT 2003]

Solution: Let α + cβ = α1, α + cγ = α2, γ + cα = α3. Let us consider the relationc1α1 + c2α2 + c3α3 = θ, where c1, c2, c3 are real. Therefore,

c1(α+ cβ) + c2(β + cγ) + c3(γ + cα) = θor, (c1 + c3c)α+ (c1c+ c2)β + (c2c+ c3)γ = θor, c1 + cc3 = 0, c1c+ c2 = 0, c2c+ c3 = 0,

since α, β, γ are linearly independent. The coefficient determinant of the above system ofequations of c1, c2, c3 is

∆ =

∣∣∣∣∣∣1 0 cc 1 00 c 1

∣∣∣∣∣∣ = c3 + 1.

If c3 +1 = 0 or, c = −1 then ∆ = 0 and hence the vectors α1, α2, α3 are linear dependentand therefore, α + cβ, β + cγ, γ + cα does not form a basis. But, if c 6= −1, ∆ 6= 0 thenα1, α2, α3 are linearly independent. V is a vector space of dimension 3 and α1, α2, α3 is alinearly independent set containing 3 vectors of V . Therefore, α1, α2, α3 is a basis of V .

Ex 4.5.10 Let V be the vector space of all polynomials with real coefficients of degree atmost n, where n ≥ 2. Considering elements of V as functions from < to <, define W =p ∈ V :

∫ 1

0

p(x)dx = 0, show that W is a subspace of V and dim(W ) = n. [IIT-JAM’11]

Solution: The set W of V is given by

W =p ∈ V :

∫ 1

0

p(x)dx = 0.

Clearly 0 ∈W as∫ 1

00dx = 0 ∈W , i.e., W is nonempty. Let p1(x), p2(x) ∈W , then∫ 1

0

p1(x)dx = 0 and∫ 1

0

p2(x)dx = 0.

Let a, b ∈ <, then∫ 1

0

[ap1(x) + bp2(x)

]dx = a

∫ 1

0

p1(x)dx+ b

∫ 1

0

p2(x)dx

= a · 0 + b · 0 = 0.

This implies ap1(x) + bp2(x) ∈W, if p1(x), p2(x) ∈W hence W is a subspace of V . Now

let p ∈W then∫ 1

0

p(x)dx = 0, where p(x) is a polynomial of degree n as

p(x) = a0 + a1x+ a2x2 + · · ·+ anx

n

⇒∫ 1

0

p(x)dx =∫ 1

0

[a0 + a1x+ a2x

2 + · · ·+ anxn]dx

= a0 +a1

2+a2

3+ · · ·+ an

n+ 1= 0.


Now above equation will hold true for all n ∈ N if and only if a0 = a1 = a2 = · · · = an = 0.Hence dimW = n = dimV.

Existence theorem

Theorem 4.5.1 Every finite dimensional vector space has a finite basis.

Proof: Let V (F ) be a finite dimensional vector space, then V = L(S), where S is a finitesubset of V . Let S = α1, α2, . . . , αn and we can assume S does not contain θ, as

L(S − θ) = L(S).If S is linearly independent, then it is a finite basis of V and the theorem follows. If S is

linearly dependent, ∃ some αk(2 ≤ k ≤ n) in S such that αk is a linear combination of thepreceding vectors α1, α2, . . . , αk−1. If S1 = S − αk, then,

L(S1) = L(S) = V.

If S1 is linearly independent, then it is a finite basis of V and so, we are done. If S1 islinearly dependent, then ∃ some αl(l > k), which is a linear combination of the precedingvectors. In the same way, we can say that if, S2 = S1 − αl, then,

L(S2) = L(S1) = L(S) = V.

Now, if S2 is linearly independent, it becomes a finite basis, otherwise we continue to pro-ceed in the same manner till after a finite number of steps, we obtain a linearly independentsubset of S, which generates V .It is clear that each step consists in the exclusion of an α and the resulting set generates V .At the most, we may be left with a single element generating V , which is clearly linearlyindependent and so, it will become a basis. Thus there must exists a linearly independentsubset of S, generating V .

Result 4.5.1 (Deletion theorem). If a vector space V over a field F be spanned by alinearly dependent set α1, α2, . . . , αn, then V can also be generated by a suitable propersubset of α1, α2, . . . , αn.

Replacement theorem

Theorem 4.5.2 Let α1, α2, . . . , αn be a basis of a vector space V (F ) and β(6= θ) ∈ V ,

where β =n∑

i=1

ciαi; ci ∈ F. Then if ck 6= 0, β can replace αk to give a new basis of V .

Proof: Since ck 6= 0, so c−1k exists and ∈ F and c−1

k ck = 1, where 1 is the identity elementiv F . Now,

β = c1α1 + c2α2 + . . .+ ck−1αk−1 + ckαk + ck+1αk+1 + . . .+ cnαn

or, ckαk = β − c1α1 − c2α2 − . . .− ck−1αk−1 − ck+1αk+1 − . . .− cnαn

⇒ αk = c−1k [β − c1α1 − c2α2 − . . .− ck−1αk−1 − ck+1αk+1 − . . .− cnαn]

= d1α1 + d2α2 + . . .+ dk−1αk−1 + dkβ + dk+1αk+1 + . . .+ dnαn

where, the dr are given by,

dr =−c−1

k cr; r = 1, 2, . . . , k − 1, k + 1, . . . , . . . , n−c−1

k ; r = k

Hence αk is a linear combinations of vectors α1, α2, . . . , αk−1, β, αk+1, . . . , αn. Now, we areto show that α1, α2, . . . , αk−1, β, αk+1, . . . , αn is linearly independent. Let p1, p2, . . . , pn

be n scalars such that,

268 Vector Space

k−1∑i=1

piαi + pkβ +n∑

i=k+1

piαi = θ

⇒k−1∑i=1

piαi + pk(n∑

i=1

ciαi) +n∑

i=k+1

piαi = θ

⇒k−1∑i=1

(pi + pkci)αi + pkckαk +n∑

i=k+1

(pi + pkci)αi = θ

Comparing the coefficients, we get

pi + pkci = 0; i = 1, 2, . . . , k − 1,pkck = 0

pi + pkci = 0; k + 1, . . . , n

as α1, α2, . . . , αn is LI

⇒ pk = 0 and pi = 0; ∀i = 1, 2, . . . , k − 1, k + 1, . . . , n.

This shows that, α1, α2, . . . , αk−1, β, αk+1, . . . , αn is linearly independent. Now, we are toshow that

Lα1, α2, . . . , αk−1, β, αk+1, . . . , αn = V.

For this, let, S = α1, α2, . . . , αk−1, αk, αk+1, . . . , αnand T = α1, α2, . . . , αk−1, β, αk+1, . . . , αn.

Since β is a linear combination of the vectors of S, each element of T is a linear combinationof the vectors of S. Therefore,

L(T ) ⊂ L(S).Also, since αj is a linear combination of the vectors of T , each element of S is a linearcombination of the vectors of T . Therefore,

L(S) ⊂ L(T ).Therefore, L(S) = L(T ) = V . Hence, T = α1, α2, . . . , αk−1, β, αk+1, . . . , αn fulfills boththe conditions for the basis of V . Hence T is a new basis of V .Corrolary: If α1, α2, . . . , αn be a basis of the finite dimensional vector space V (F ), thenany set of linearly independent vectors of V contains at most n vectors.

Ex 4.5.11 Prove that the set of vectors (1, 1, 0, 1), (1,−2, 0, 0), (1, 0,−1, 2) is LI in <4.Extend this set to a basis of <4. Express α = x1, x2, x3, x4 in terms of the basis soformed.

Solution: Let α1 = (1, 1, 0, 1), α2 = (1,−2, 0, 0), α3 = (1, 0,−1, 2). Let for some scalarsc1, c2, c3 ∈ <, such that,

c1α1 + c2α2 + c3α3 = θ

⇒ c1(1, 1, 0, 1) + c2(1,−2, 0, 0) + c3(1, 0,−1, 2) = θ

⇒ (c1 + c2 + c3, c1 − 2c2,−c3, c1 + 2c3) = θ

⇒ c1 + c2 + c3 = 0 = c1 − 2c2 = −c3 = c1 + 2c3⇒ c1 = c2 = c3 = 0.

Hence α1, α2, α3 is linearly independent in <4. Let e1, e2, e3, e4 be a standard basis of<4. Then, α1 = 1e1 + 1e2 + 0e3 + 1e4.

Since the coefficients of e1 is non zero, by replacement theorem, α1, e2, e3, e4 is a newbasis. Now,


α2 = 1e1 − 2e2 = α1 − 3e2 − e4.

Since the coefficients of e2 is non zero, so, by replacement theorem, α1, α2, e3, e4 is a newbasis of <4. Also,

α3 = 1e1 − e3 + 2e4 = α1 − e2 − e4 + 2e4

= α1 +13[α2 − α1 + e4]− e3 + e4

=23α1 +

13α2 − e3 +

43e4.

Since the coefficients of e3 is non zero, by replacement theorem, α1, α2, α3, e4 is a newbasis of <4 and L(e1, e2, e3, e4) = L(α1, α2, α3, e4). Also,e3 = 2

3α1 + 13α2 − α3 + 4

3e4; e2 = 13 (α1 − α2 − e4), e1 = 2

3α1 + 13α2 − 2

3e4. Now,

α = (x1, x2, x3, x4) = x1e1 + x2e2 + x3e3 + x4e4

= (2x1

3+x2

3+

2x3

3)α1 + (

x1

3− x2

3+x3

3)α2 − x3α3 − (

2x1

3+x2

3− 4x3

3− x4)e4.

Ex 4.5.12 Obtain a basis of <3 containing the vector (−1, 0, 2).

Solution: <3 is a real vector space of dimension 3. The standard basis of <3 is e1, e2, e3,where, e1 = (1, 0, 0), e2 = (0, 1, 0), e3 = (0, 0, 1). Let α = (−1, 0, 2) be the given vector, then

α = −1e1 + 0e2 + 2e3.

Since the coefficient of e1 is non zero, by replacement theorem, e1 can be replaced by α togive a new basis of <3. Hence a basis of <3 containing the given vector is

(−1, 0, 2), (0, 1, 0), (0, 0, 1).

The replacement can be done in more than one ways and thus different bases for <3 can beobtained.

Ex 4.5.13 Obtain a basis of <3 containing the vectors (2,−1, 0) and (1, 3, 2).

Solution: We know, e1, e2, e3, is the standard basis of <3, where,

e1 = (1, 0, 0), e2 = (0, 1, 0), e3 = (0, 0, 1).

Let α = (2,−1, 0) and β = (1, 3, 2) be the two given vectors, then

α = 2e1 − e2 + 0e3.

Since the coefficient of e1 is non zero, by replacement theorem, α can replace e1, to give anew basis α, e2, e3 of <3. Now,

β = 1e1 + 3e2 + 2e3 =12(α+ e2) + 3e2 + 2e3 =

12α+

72e2 + 2e3.

Since the coefficient of e2 is non zero, by replacement theorem, β can replace e2, to give anew basis α, β, e3 of <3. Hence a basis of <3 containing the given vectors is

(2,−1, 0), (1, 3, 2), (0, 0, 1).

270 Vector Space

Invariance theorem

Theorem 4.5.3 Let V (F ) be a finite dimensional vector space, then any two bases of Vhave the same number of vectors.

Proof: Let V (F ) be a finite dimensional vector space and B1 = α1, α2, . . . , αn andB2 = β1, β2, . . . , βr be two bases of V (F ). We are to show that, n = r. If possible, letr > n, then αi 6= θ, βi 6= θ and using replacement theorem, we know β1 can replace some αi

to give a new basis of V . Without loss of any generality (by changing the order of α’s), wecan say, β1, α2, . . . , αn is a basis of V (F ). Let,

β2 = c1β1 +n∑

i=2

ciαi,

then we assume that some ci(i > 1) 6= 0 for if ci = 0(i > 1), then β2 = c1β1, showingthat β1, β2 is linearly dependent, but β1, β2 being a subset of linearly independent set islinearly independent. Thus, ∃ some ci(i > 1) 6= 0. Hence β2 can be replaced some αi(ı ≥ 2)to give a new basis of V . Without loss any generality, we assume β1, β2, . . . , βn is a basisof V (F ).

Therefore, βn+1 is a linear combination of β1, β2, . . . , βn giving that, β1, β2, . . . , βn,βn+1 is linearly dependent, which is a contradiction., since it is an subset of linearly inde-pendent set β1, β2, . . . , βn, . . . , βr. This contradiction shows that our assumption is wrong.Hence r 6> n. Similarly, by changing the roles of the basis, we have r 6< n and consequently,r = n.

Thus although a vector space has many bases, we have just shown that for a particularvector space V , all bases have the same number of vectors. Therefore, all finite dimensionalvector space of the same dimension differ only in the nature of the elements, their algebraicproperties are identical.

Extension theorem

Theorem 4.5.4 Every linearly independent subset in a finite dimensional vector spaceV (F ) is either a basis of V or it can be extended to form a basis of V .

Proof: Let α1, α2, . . . , αr be a linearly independent subset of a finite dimensional vectorspace V (F ). Now, L(S) being the smallest subspace containing S, so, L(S) ⊂ V. If L(S) = V ,then S is the finite basis of V . If L(S) is a proper subspace of V , then,

V − L(S) 6= φ.

Let β1 ∈ V − L(S), then S1 = α1, α2, . . . , αr, β1 and we are to prove that S1 is linearlyindependent. Let ∃ scalars c1, c2, . . . , cr, cr+1 ∈ F , such that,

r∑i=1

ciαi + cr+1β1 = θ.

We assert that, cr+1 = 0, because, if cr+1 6= 0, then c−1r+1 exists ∈ F and then,

β1 = −r∑

i=1

c−1r+1ciαi; −c−1

r+1ci ∈ F

shows that β1 ∈ L(S), which is a contradiction. So cr+1 = 0. Also, α1, α2, . . . , αr is lin-early independent, so,c1 = c2 = . . . = cr = 0; cr+1 = 0. Therefore, S1 = α1, α2, . . . , αr, β1is linearly independent. Now, L(S1) ⊂ V. If L(S1) = V , then S1 is a basis of V , where S1 ⊃ S


and as S1 is an extension of S, the theorem is proved. If L(S1) is a proper subspace of V ,then,

V − L(S1) = φ

and proceed as before and we get S2 = α1, α2, . . . , αr, β1, β2 ⊃ S1. Let L(S2) = V , then itis basis, if not so, we continue to repeat the procedure, till after a finite number of steps, weget a linearly independent set, which generates V containing S. As V is finite dimensional,after a finite number of steps we come to a finite set

Sk = α1, α2, . . . , αr, β1, β2, . . . , βk

as an extension of S and also as a basis of V . Hence either S is already a basis or it can beextended to form a basis of V .

Deduction 4.5.1 Every set of (n + 1) or more vectors in an n dimensional vector spaceV (F ) is linearly dependent.

Proof: Since V (F ) is a finite dimensional vector space with dimension n, every basis of Vwill contain exactly n vectors. Now, if S is any linearly independent subset of V containing(n+ 1) vectors. Then by ‘Extension theorem’, either it is already a basis of V or it can beextended to form a basis of V .

Conversely, in each case the basis of V contains (n + 1) or more vectors in V , which iscontrary, to the hypothesis that V is n−dimensional. Thus S is linearly dependent and sois every superset of the same.

Deduction 4.5.2 If V (F ) be a finite dimensional space of dimension n, then any linearlyindependent set of n vectors in V forms the basis of V .

Proof: Since V (F ) be a finite dimensional vector space with dimV = n, every basis of Vwill contain exactly n vectors. Now, if S is a linearly independent set of n vectors in V ,then by extension theorem either S is already a basis of V or it can be extended to form thebasis of V . But in later case, the basis of V will contain more than n−vectors, contradictingthe hypothesis that V is n−dimensional. Consequently, the former statement that S formsthe basis of V is true.

Deduction 4.5.3 If V (F ) be a finite dimensional space of dimension n, then any subsetconsisting of n vectors in V and which generates V , forms the basis of V .

Proof: Since V (F ) be a finite dimensional vector space with dimV = n, every basis of Vwill contain exactly n vectors. Let S be a set of n vectors in V generating V . If S is linearlyindependent, then it will form a basis of V ; otherwise there will exist a proper subset of Swhich will form the basis of V .

Thus, in this case, the basis of V will contain less than n elements, contradicting thehypothesis that V is n−dimensional. Hence S can not be linearly dependent and so it mustform the basis of V .

Theorem 4.5.5 Let V (F ) be a vector space. A subset B = α1, α2, . . . , αn of V is a basisof V if and only if every element of V has a unique representation as a linear combinationof the vectors of B.

Proof: Let B = α1, α2, . . . , αn be a basis of V (F ) and α ∈ V . Then, every vector α ∈ Vcan be written as a linear combination of the vectors in B, as B spans V . Now, let,

272 Vector Space

α =n∑

i=1

ciαi; for some scalars ci ∈ F.

and α =n∑

i=1

diαi; for some other scalars di ∈ F.

Subtracting, second from the first, we obtain,n∑

i=1

(ci − di)αi = α− α = θ

⇒ ci − di = 0; ∀i, as α1, α2, . . . , αn is linearly independent⇒ ci = di; 1 ≤ i ≤ n

and so ci’s are unique. Hence there is only one way to express α as a linear combinationof the vectors in B. Conversely, let B = α1, α2, . . . , αn be a subset of V such that everyvector of V has a unique representation as linear combination of the vectors of B. Clearly

V = L(B) = L(α1, α2, . . . , αn).

Now, θ ∈ V , and by the condition θ has an unique representation as a linear combination of

the vectors of B. Let, θ =n∑

i=1

ciαi, which is satisfied by c1 = c2 = . . . = cn = 0 and because

of uniqueness in the condition, it follows that,n∑

i=1

ciαi = θ ⇒ ci = 0;∀i.

⇒ B is a LI set ⇒ B is a basis of V (F ).

Result 4.5.2 If U be a subspace of a finite dimensional vector space V and dim V = nthen U is finite dimensional and dim U ≤ n.

Ex 4.5.14 Find a basis of <3 containing the vectors (1,2,0) and (1,3,1).

Solution: Since dim <3 = 3, so three vectors are needed to generate <3. Let α =(1, 2, 0), β = (1, 3, 1) and the third vector be e1 = (1, 0, 0).

Now, the determinant formed by the vectors e1, α, β,∣∣∣∣∣∣1 0 01 2 01 3 1

∣∣∣∣∣∣ = 2 6= 0.

So, the vectors e1, α, β are linearly independent. Also, the number of vectors is three andthey belong to <3. Hence (1, 2, 0), (1, 3, 1), (1, 0, 0) is a basis of <3 containing α and β.

Ex 4.5.15 W1 and W2 are two subspaces of <4 defined byW1 = (x, y, z, w) : x, y, z, w ∈ <, 3x + y + z + 2w = 0, W2 = (x, y, z, w) : x, y, z, w ∈<, x+ y − z + 2w = 0. Find dim (W1 ∩W2).

Solution: The subspace W1 ∩ W2 = (x, y, z, w) : x, y, z, w ∈ <, 3x + y + z + 2w =0, x+ y − z + 2w = 0. Now solving the equations 3x+ y + z + 2w = 0, x+ y − z + 2w = 0for y, z, we get y = −2x− 2w, z = −x. Therefore,

(x, y, z, w) = (x,−2x− 2w,−x,w) = −x(−1, 2, 1, 0) + w(0,−2, 0, 1).Thus the set (−1, 2, 1, 0), (0,−2, 0, 1) generates the subspace W1 ∩ W2. The vectors

(−1, 2, 1, 0) and (0,−2, 0, 1) are linearly independent asc1(−1, 2, 1, 0) + c2(0,−2, 0, 1) = (0, 0, 0, 0) implies c1 = 0, c2 = 0.

Hence (−1, 2, 1, 0), (0,−2, 0, 1) is a basis ofW1∩W2 and hence the dimension ofW1∩W2

is 2.


Dimension of a subspace

Theorem 4.5.6 Every non null subspace W of a finite dimensional vector space V (F ) isfinite dimensional and dimW ≤ dimV.

Proof: Since V is finite dimensional, every basis of V will contain a finite number ofelements, say n and so every set of (n + 1) or more vectors in V is linearly dependent.Consequently, a linearly independent set of vectors in W contains at most n elements. Let,

S = α1, α2, . . . , αm,where m ≤ n be a maximal linearly independent set in W . Now, if α is an arbitrary elementof of W , then S being a maximal linearly independent set,

S1 = α1, α2, . . . , αm, αis therefore linearly dependent and hence the vector α is a linear combination of α1, α2, . . . , αm,showing that S generates V . Accordingly, dimW = m ≤ n = dimV. Moreover, when W isa proper subspace of V , ∃ a vector β ∈ V but not contained in W and as such β can not beexpressed as a linear combination of elements of S, the basis of W . Consequently, the setobtained by adjoining β to S forms a linearly independent subset of V and so the basis ofV will contain more than m vectors. Hence, in this case,

dimW < dimV.

Again, if V = W , then every basis of V is also a basis of W and therefore,

V = W ⇒ dimW = dimV.

On the other hand, let W be a subspace of V such that

dimW = dimV = n(say).

Now, if S is a basis of W then it being a linearly independent subset of V containing nvectors, it will also generate V . Thus, each one of V and W is generated by S. So in thiscase, V = W. Hence,

V = W ⇔ dimW = dimV.

Dimension of a linear sum

Theorem 4.5.7 Let W1 and W2 are the subspaces of a finite dimensional vector spaceV (F ). Then W1 +W2 is finite dimensional and

dim(W1 +W2) = dimW1 + dimW2 − dim(W1 ∩W2).

Proof: Every subspace of a finite dimensional vector space being finite dimensional andV (F ) being finite dimensional; so are therefore its subspaces W1,W2,W1∩W2 and W1 +W2

and

dim(W1 ∩W2) ≤ dimW1; dimW1 ≤ dim(W1 ∩W2) ≤ dimV

dim(W1 ∩W2) ≤ dimW2; dimW2 ≤ dim(W1 ∩W2) ≤ dimV.

Let B = α1, α2, . . . , αr be a basis of W1 ∩W2. Since W1 ∩W2 being a subspace of W1 aswell as of W2, B can be extended to form the basis of W1 and W2. Let the extended setsB1 and B2 which forms the bases of W1 and W2 respectively be

B1 = α1, α2, . . . , αr;β1, β2, . . . , βsB2 = α1, α2, . . . , αr; γ1, γ2, . . . , γt.

274 Vector Space

Obviously, dim(W1 ∩ W2) = r, dimW1 = r + s and dimW2 = r + t. Consider the set,B0 = α1, α2, . . . , αr;β1, β2, . . . , βs; γ1, γ2, . . . , γt. We shall show that, the set B0 is basisof W1 +W2. First, we shall show that L(B0) = W1 +W2. Now,

α ∈ L(B0) ⇒ α =r∑

i=1

ciαi +s∑

i=1

biβi +t∑

i=1

kiγi

=

[r∑

i=1

ciαi +s∑

i=1

biβi

]+

[r∑

i=1

0αi +t∑

i=1

kiγi

]= δ1 + δ2; where δ1 ∈W1 and δ2 ∈W2

⇒ α ∈ W1 +W2 ⇒ L(B0) ⊆W1 +W2. (4.9)

Again, β ∈W1 +W2 ⇒ β = ξ1 + ξ2, where ξ1 ∈W1, ξ2 ∈W2, so,

β =

[r∑

i=1

ciαi +s∑

i=1

biβi

]+

[r∑

i=1

kiαi +t∑

i=1

liγi

]

=r∑

i=1

c′iαi +s∑

i=1

biβi +t∑

i=1

liγi; c′i = ci + ki, di, li ∈ F

⇒ β ∈ L(B0) ⇒W1 +W2 ⊆ L(B0). (4.10)

Hence from (4.9) and (4.10), it follows that, L(B0) = W1+W2. Next, we are to show that B0

is LI. For this, let ∃ scalars xi(i = 1, 2, . . . , r), yi(i = 1, 2, . . . , s), and zi(i = 1, 2, . . . , t) ∈ F ,such that

r∑i=1

xiαi +s∑

i=1

yiβi +t∑

i=1

ziγi = θ

⇒t∑

i=1

(−zi)γi =r∑

i=1

xiαi +s∑

i=1

yiβi = δ(say).

Since δ ∈W1 as well as δ ∈W2, so, δ ∈W1 ∩W2. Hence,

δ =r∑

i=1

xiαi +s∑

i=1

yiβi =r∑

i=1

uiαi

⇒r∑

i=1

(xi − ui)αi +s∑

i=1

yiβi = θ

⇒ xi − ui = 0; i = 1, 2, . . . , r and yi = 0; i = 1, 2, . . . , s; as B1 is LI⇒ xi = 0, yi = 0 and consequently zi = 0.

Therefore, B0 is linearly independent and so B0 is a basis of finite dimensional subspaceW1 +W2 and

dim(W1 +W2) = r + s+ t = (r + s) + (r + t)− r

= dimW1 + dimW2 − dim(W1 ∩W2).

Ex 4.5.16 Suppose W1 and W2 be the distinct four-dimensional subspaces of a vector spaceV , where dimV = 6. Find the possible dimension of W1 ∩W2.

Solution: Since the subspaces W1 and W2 be the distinct four-dimensional subspaces of avector space V , W1 +W2 properly contains W1 and W2. Consequently, dim(W1 +W2) > 4.


But dim(W1 +W2) can not be greater than 6, as dimV = 6. Therefore we have the followingtwo possibilities: (i) dim(W1 +W2) = 5, or, (ii) dim(W1 +W2) = 6. Using the theorem of‘dimension of a linear sum’, we have

dim(W1 ∩W2) = dimW1 + dimW2 − dim(W1 +W2)= 8− dim(W1 +W2).

Therefore, (i) dim(W1 ∩W2) = 3, or, (ii) dim(W1 ∩W2) = 2.

Ex 4.5.17 If U = L(1, 2, 1), (2, 1, 3),W = (1, 0, 0), (0, 1, 0), show that U,W are sub-spaces of <3. Determine dim U, dim V, dim (U ∩W ), dim (U + V ).

Solution: Let α = (1, 2, 1), β = (2, 1, 3), γ = (1, 0, 0), δ = (0, 1, 0). Then α, β is linearlyindependent as c1(1, 2, 1) + c2(2, 1, 3) = (0, 0, 0) implies c1 = 0, c2 = 0. Also, it is given thatU = Lα, β. Hence α, β is a subspace of <3 of dimension 2.

Again γ, δ is linearly independent as d1(1, 0, 0) + d2(0, 1, 0) = (0, 0, 0) implies d1 =0, d2 = 0. Also, γ, δ generates W . Therefore, W is a subspace of <3 and γ, δ is a basisof W of dimension 2.

Let ξ be a vector in U∩W . Then ξ = aα+bβ for some real numbers a, b. Also, ξ = cγ+dδfor some real number c, d. Therefore,

a(1, 2, 1) + b(2, 1, 3) = c(1, 0, 0) + d(0, 1, 0)or, (a+ 2b, 2a+ b, a+ 3b) = (c, d, 0)or, a+ 2b = c, 2a+ b = d, a+ 3b = 0.

Solving we get, a = −3b, c = −b, d = −5b. Hence ξ = (−b,−5b, 0) = −b(1, 5, 0), b isarbitrary.

Therefore U ∩W is a subspace of dimension 1. Now,dim (U + V ) = dim U + dim W − dim (U ∪W ) = 2 + 2− 1 = 3.

Thus, dim U = 2, dim W = 2, dim (U ∩W ) = 1 and dim (U +W ) = 3.

Dimension of a direct sum

Theorem 4.5.8 If a finite dimensional vector space V (F ) is the direct sum of its subspacesW1 and W2, then, dim(V ) = dim W1 + dim W2.

Proof: Since V is finite dimensional, so are therefore its subspaces W1 and W2. LetS1 = α1, α2, . . . , αk and β1, β2, . . . , βl be the bases of W1 and W2 respectively, so thatdimW1 = k and dimW2 = l. We are to show that, S = α1, α2, . . . , αk, β1, β2, . . . , βl is abasis of V . Now, since V = W1 ⊕W2, so every γ ∈ V can be expressed as,

γ = α+ β; α ∈W1 and β ∈W2

=k∑

i=1

ciαi +l∑

j=1

djβj ; for some ci, dj ∈ F.

This shows that, γ can be expressed as a linear combination of elements of S. Thus, Sgenerates V . Now, we are to show that, S is linearly independent. For this,

k∑i=1

ciαi +l∑

j=1

djβj = θ

⇒k∑

i=1

ciαi =l∑

j=1

(−dj)βj ∈W1 ∩W2

276 Vector Space

ask∑

i=1

ciαi ∈W1 andl∑

j=1

(−dj)βj ∈W2

⇒k∑

i=1

ciαi = θ andl∑

j=1

(−dj)βj = θ; as W1 ∩W2 = θ

⇒ ci = 0,∀i and dj = 0,∀j as W1,W2 are LI⇒ S is linearly independent .

Thus, S is a basis of V , and consequently,dim(V ) = k + l = dimW1 + dimW2.

Theorem 4.5.9 Existence of complementary subspace: Every subspace of a finitedimensional vector space has a complement.

Proof: Let W1 be a subspace of a finite dimensional vector space V (F ). Then, we areto find a subspace W2 of V such that V = W1 ⊕W2. Since V i finite dimensional, so ittherefore its subspace W1. Let S1 = α1, α2, . . . , αn be a basis of W1. Then, S1 is alinearly independent subset of V and therefore, it can be extended to form a basis of V . Letthe extended set,

S2 = α1, α2, . . . , αn, β1, β2, . . . , βn

be a basis of V . Let us denote by W2, the subspace generated by β1, β2, . . . , βn. We shallshow that, V = W1 ⊕W2, which is equivalent to V = W1 +W2 and W1

⋂W2 = θ. Let γ

be an arbitrary element of V . As S2 is a basis of V , we have,

γ =m∑

i=1

aiαi +n∑

j=1

bjβj ; for some scalars ai, bi

= α+ β, where α =m∑

i=1

aiαi ∈W1 and β =n∑

j=1

bjβj ∈W2.

Thus, each element of V is expressible as sum of an element of W1 and an element of W2,

so V = W1 +W2. Now, in order to show that, W1

⋂W2 = θ, let α =

m∑i=1

aiαi ∈ W1 and

β =n∑

j=1

bjβj ∈W2 be equal. Then,

α = β ⇒m∑

i=1

aiαi =n∑

j=1

bjβj ⇒m∑

i=1

aiαi +n∑

j=1

(−bj)βj = θ

⇒ ai = 0,∀i and bj = 0,∀j; as S2 is linearly independent

⇒m∑

i=1

aiαi = θ andn∑

j=1

bjβj = θ ⇒ α = β = θ.

Thus, no non-zero vector is common to both W1 and W2, i,e, W1

⋂W2 = θ. Therefore,

V = W1 ⊕W2.

Theorem 4.5.10 Dimension of a quotient space: Let V (F ) be a finite dimensionalvector space and W be a subspace of V . Then

dim(V/W ) = dimV − dimW.

Proof: Let dimV = n and dimW = m. Let S1 = α1, α2, . . . , αm be a basis of W . Byextension theorem, S1 can be extended to S2 = α1, α2, . . . , αm, β1, β2, . . . , βn−m to forma basis of V . We claim that the set

Co-ordinatisation of Vectors 277

S3 = W + β1,W + β2, . . . ,W + βn−m

of (n−m) cosets, is a basis of (V/W ). First, we are to show that, S3 is linearly independent.Now, for some scalars b1, b2, . . . , bn−m ∈ F , we have,

b1(W + β1) + b2(W + β2) + . . .+ bn−m(W + βn−m) = W + θ

⇒ (W + b1β1) + (W + b2β2) + . . .+ (W + bn−mβn−m) = W + θ

⇒ W + (b1β1 + b2β2 + . . .+ bn−mβn−m) = W + θ

⇒ b1β1 + b2β2 + . . .+ bn−mβn−m ∈W⇒ b1β1 + b2β2 + . . .+ bn−mβn−m = a1α1 + a2α2 + . . .+ amαm, for some ai ∈ F⇒ a1α1 + a2α2 + . . .+ amαm + (−b1)β1 + (−b2)β2 + . . .+ (−bn−m)βn−m = θ

⇒ a1 = a2 = . . . = am = 0; b1 = b2 = . . . = bn−m = 0; as S2 is LI⇒ b1 = b2 = . . . = bn−m = 0; in particular .

Therefore, S3 is linearly independent. Moreover, if W +α be an arbitrary element of V/W ,then α ∈ V and S2 being a basis of V , we have for some scalars ai, bj ∈ F,

α =m∑

i=1

aiαi +n−m∑j=1

bjβj

⇒ W + α = W +[ m∑

i=1

aiαi +n−m∑j=1

bjβj

]=[W +

m∑i=1

aiαi

]+[W +

n−m∑j=1

bjβj

]

= W +n−m∑j=1

bjβj ; asm∑

i=1

aiαi ∈W ⇒W +m∑

i=1

aiαi = W

=n−m∑j=1

bj(W + βj).

This shows that W + α ∈ L(W + β1, . . . ,W + βn−m). Thus, each element of V/W isexpressible as a linear combination of elements of S3, i.e., S3 generates V/W . So, S3 is abasis of V/W . Therefore,

dim(V/W ) = n−m = dimV − dimW.

Ex 4.5.18 Let V = <4 and W be a subspace of V generated by the vectors (1, 0, 0, 0), (1, 1, 0, 0).Find a basis of the quotient space V/W .

Solution: Let α = (1, 0, 0, 0) and β = (1, 1, 0, 0). Since α, β are linearly independent, soα, β is a basis of W . The linearly independent set S in V can be extended to a basis ofV . Let γ = (0, 0, 1, 0) and δ = (0, 0, 0, 1), then S1 = α, β, γ, δ is linearly independent in Vand so a basis of V . Elements of the bases of the quotient space V/W are W + γ and W + δand so dimV/W = 2. Here dimV = 4, dimW = 2 and dimV/W = 2 = 4− 2, so that

dim(V/W ) = n−m = dimV − dimW.

4.6 Co-ordinatisation of Vectors

Let V be an n−dimensional vector space, then V has a basis S with n vectors in it. Herewe shall discuss of an order basis S = α1, α2, . . . , αn for V .

278 Vector Space

4.6.1 Ordered Basis

If the vectors of the basis set S of a finite dimensional vector space V (F ) be enumerated insome fixed ordering way, then it is called ordered basis.

4.6.2 Co-ordinates

Let S = α1, α2, . . . , αn be a ordered basis of a finite dimensional vector space V (F ). Then,for scalars c1, c2, . . . , cn, each α ∈ V can be uniquely expressed in the form

α = c1α1 + c2α2 + · · ·+ cnαn =n∑

i=1

ciαi. (4.11)

For each α ∈ V , the unique ordered n−tuple (c1, c2, . . . , cn) is called co-ordinate vector ofα relative to the ordered basis S and is denoted by (α)S . The entries of (α)S are calledco-ordinates of V with respect to S.

(i) We assert that, the set of vectors in S should be ordered because a change in (α)S

occurs if the relative order of vectors in S be changed.

(ii) For a non zero vector space V3(F ), the co-ordinates of all vectors in Vn are unique,relative to the ordered basis S.

(iii) The co-ordinate vectors of the vectors in an abstract space V (F ) of dimension n relativeto an ordered basis β are the elements of Fn.

Ex 4.6.1 Find the co-ordinate vector of α = (1, 3, 1) relative to the ordered basis B =α1, α2, α3 of <3, where α1 = (1, 1, 1), α2 = (1, 1, 0), α3 = (1, 0, 0).

Solution: It is easy to verify that B is a basis of <3(<). Let ∃ scalars c1, c2, c3 ∈ < suchthat c1α1 + c2α2 + c3α3 = α holds, so,

c1(1, 1, 1) + c2(1, 1, 0) + c3(1, 0, 0) = (1, 3, 1)⇒ (c1 + c2 + c3, c1 + c2, c1) = (1, 3, 1).

Set corresponding components equal to each other to obtain the system

c1 + c2 + c3 = 1; c1 + c2 = 3; c1 = 1⇒ c1 = 1, c2 = 2, c3 = −2.

This is the unique solution to the system and hence the ordered basis of α with respect tothe basis B is (α)B = (1, 2,−2). If the coordinate vector relative to the ordered basis B is(a, b, c), then the vector α is given by

α = aα1 + bα2 + cα3

= a(1, 1, 1) + b(1, 1, 0) + c(1, 0, 0)= (a+ b+ c, a+ b, a).

Ex 4.6.2 In the vector space V of polynomials in t of maximum degree 3, consider thefollowing basis B = 1, 1−t, (1−t)2, (1−t)3. Find the coordinate vector of α = 3−2t−t2 ∈ Vrelative to the basis B.


Solution: It is easy to verify that B is a basis of V . Set α as a linear combination of thepolynomials in the basis B, using the unknown scalars c1, c2, c3, c4 ∈ < such that

c1.1 + c2(1− t) + c3(1− t)2 + c4(1− t)3 = α = 3− 2t− t2

⇒ (c1 + c2 + c3 + c4, (−c2 − 2c3 − 3c4)t, (c3 + 3c4)t2,−c4t3) = 3− 2t− t2.

Set corresponding coefficients of the same powers of t equal to each other to obtain thesystem of linear equations

c1 + c2 + c3 + c4 = 3; c2 + 2c3 + 3c4 = 2; c3 + 3c4 = −1; c4 = 0,

from which, we have the unique solution c1 = 0, c2 = 4, c3 = −1 and c4 = 0. This is theunique solution to the system and hence the ordered basis of α with respect to the basis Bis (α)B = (0, 4,−1, 0).

Ex 4.6.3 In the vector space W of 2× 2 symmetric matrices over < consider the following

basis B =(

1 −1−1 2

),

(4 11 0

),

(3 −2−2 1

). Find the coordinate vector of the matrix α =(

1 22 4

)∈W relative to the basis B.

Solution: It is easy to verify that B is a basis of W . Set α ∈W as a linear combination ofthe matrices in the basis B, using the unknown scalars c1, c2, c3 ∈ < as

c1

(1 −1−1 2

)+ c2

(4 11 0

)+ c3

(3 −2−2 1

)= α =

(1 22 4

)⇒(c1 + 4c2 + 3c3 −c1 + c2 − 2c3−c1 + c2 − 2c3 2c1 + c3

)=(

1 22 4

).

Set corresponding entries equal to each other to obtain the system of linear equations

c1 + 4c2 + 3c3 = 1; −c1 + c2 − 2c3 = 2; 2c1 + 3c3 = 4,

from which, we have the unique solution c1 = 3, c2 = 1, c3 = −2. This is the uniquesolution to the system and hence the ordered basis of α with respect to the basis B is(α)B = (3, 1,−2). Since dimW = 3, so (α)B must be a vector in <3.

4.7 Rank of a Matrix

Here we obtain an effective method for finding a basis for a vector space V spanned by agiven set of vectors. We attach a unique number to a matrix A that we later show givesus information about the dimension of the solution space of a homogeneous systems withcoefficient matrix A.

4.7.1 Row Space of a Matrix

Let A = [aij ]m×n be an arbitrary m × n matrix over the field F , i.e., aij ∈ F . LetR1, R2, . . . , Rm be the m row vectors of A, where Ri ∈ Vn. Then L(R1, R2, . . . , Rm)is a subspace of the linear space Fn, called the row space of A and is denoted by R(A).The dimension of the row space R(A) is called the row rank of A.

(i) R(AT ) = C(A).

(ii) The matrices A and B are row equivalent, written A ∼ B, if B can be obtained fromA by a sequence of elementary row operations.

280 Vector Space

(iii) Row equivalent matrices have the row space.

(iv) Every matrix A is row equivalent to a unique matrix in row canonical form.

Ex 4.7.1 Find the row space and row rank of the matrix A =

6 7 2 11 2 1 42 4 2 8

.

Solution: Here the row vectors are R1 = (6, 7, 2, 1), R2 = (1, 2, 1, 4) and R3 = (2, 4, 2, 8).Now row space of A is the linear span of the row vectors R1, R2, R3. Hence the row spaceis,

R(A) = a1R1 + a2R2 + a3R3; a1, a2, a3 ∈ <where, a1R1 + a2R2 + a3R3

= a1(6, 7, 2, 1) + a2(1, 2, 1, 4) + a3(2, 4, 2, 8)= (6a1 + a2 + 2a3, 7a1 + 2a2 + 4a3, 2a1 + a2 + 2a3, a1 + 4a2 + 8a3).

Now, R1, R2, R3 is linearly dependent as R3 = 2R2. But R1, R2 is linearly independent.Hence, R1, R2 is a basis of the row space R(A) and so dimR(A) = 2. Consequently, therow rank of A = 2.

Ex 4.7.2 Determine, which of the following matrices have the same row space

A =(

1 −2 −13 −4 5

), B =

(1 −1 22 3 −1

), C =

1 −1 32 −1 103 −5 1

.

Solution: The row reduce each matrix to row canonical form are given by,(1 −2 −13 −4 5

)−−−−−−→R2 − 3R1

(1 −2 −10 2 8

)−−−−−→R1 +R2

(1 0 70 2 8

)∼(

1 0 70 1 4

).(

1 −1 22 3 −1

)−−−−−−→R2 − 2R1

(1 −1 20 5 −5

)∼(

1 −1 20 1 −1

)∼(

1 0 10 1 −1

).1 −1 3

2 −1 103 −5 1

∼

1 −1 30 1 40 −2 −8

∼

1 −1 30 1 40 1 4

∼

1 0 70 1 40 0 0

.

Since the non zero rows of the reduced form of A and of the reduced form of C are same, Aand C have the same row space. On the other hand, the non zero rows of the reduced formof B are not the same as the others, and so B has a different row space.

Ex 4.7.3 Let α1 = (1, 1,−1), α2 = (2, 3,−1), α3 = (3, 1,−5) and β1 = (1,−1,−3), β2 =(3,−2,−8), β3 = (2, 1,−3). Show that the subspace of <3 generated by αi is the same as thesubspace generated by the βi.

Solution: Let us consider two matrices A and B, where the rows of A are αi and the rowsof B are the βi. The row reduce each matrix to row canonical form are given by,

A =

1 1 −12 3 −13 1 −5

∼

1 1 −10 1 10 −2 −2

∼

1 0 −20 1 10 0 0

.

B =

1 −1 −33 −2 −82 1 −3

∼

1 −1 −30 1 10 3 3

∼

1 0 −20 1 10 0 0

.

Since the non zero rows of the reduced form of A and of the reduced form of B are same, Aand B have the same row space. Hence the subspace of <3 generated by the αi is the sameas the subspace generated by the βi.


4.7.2 Column Space of a Matrix

Let A = [aij ]m×n be an arbitrary m × n matrix over the field F , i.e., aij ∈ F . LetC1, C2, . . . , Cn be the n column vectors of A, where Ci ∈ Vm. Then L(C1, C2, . . . , Cm) isa subspace of the linear space Fm, called the column space of A and is denoted by C(A).The dimension of the column space C(A) is called the column rank of A.

Ex 4.7.4 Find the column space and column rank of the matrix A =

6 7 2 11 2 1 42 4 2 8

.

Solution: Here the column vectors are C1 = (6, 1, 2), C2 = (7, 2, 4), C3 = (2, 1, 2) and C4 =(1, 4, 8). Now the column space of A is the linear span of the column vectors C1, C2, C3, C4.Hence the column space of A is,

C(A) = b1C1 + b2C2 + b3C3 + b4C4; b1, b2, b3, b4 ∈ <where, b1C1 + b2C2 + b3C3 + b4C4

= a1(1, 4, 8) + b2(7, 2, 4) + b3(2, 1, 2) + b4(1, 4, 8)= (6b1 + 7b2 + 2b3 + b4, b1 + 2b2 + b3 + 4b4, 2b1 + 4b2 + 2b3 + 8b4).

Now, C1, C2, C3, C4 is linearly dependent but C1, C2 is linearly independent. Hence,C1, C2 is a basis of the column space C(A) and so dimC(A) = 2. Consequently, thecolumn rank of A = 2.

Theorem 4.7.1 Let A = [aij ]m×n and P = [pij ]m×n be two matrices of same order overthe same field F . Then,

(i) the row space of PA is the subspace of the row space of A.

(ii) the row space of PA is the same as the row space of A if P is non singular.

Proof: (i) Here A = [aij ]m×n and P = [pij ]m×n are two given matrices. Let R1, R2, . . . , Rm

be the row vectors of A and ρ1, ρ2, . . . , ρm be the row vectors of PA. Then,

ρi = pi1R1 + pi2R2 + . . .+ pimRm; i = 1, 2, . . . ,m.

Therefore, each ρi is a linear combination of the vectors R1, R2, . . . , Rm. Hence,

L(ρ1, ρ2, . . . , ρm) ⊂ L(R1, R2, . . . , Rm),

i.e., the row space of PA is the subspace of the row space of A.(ii) Let PA = B. Since P is non singular, P−1 exists and A = P−1B. Hence by using(i), row space of P−1B is the subspace of the row space of B, i.e., R(A) ⊂ R(PA). AgainR(PA) ⊂ R(A) as in (i), and so, R(A) = R(PA).Corrolary: If B is row equivalent to A, then ∃ a non singular square matrix P of order msuch that B = PA, where P is also the product of elementary matrices. Hence,

R(A) = R(PA) = R(B).

Therefore, row equivalent matrices have the same row space.Corrolary: It is shown that,

R(A) = R(PA) = R(B) ⇒ dimR(A) = dimR(B)⇒ row rank of A = row rank ofB.

Hence, the pre multiplication with a non singular matrix does not alter the row rank.

282 Vector Space

Result 4.7.1 The elementary row transformations on A, (i)Ri ↔ Rj or (ii)Ri ↔ kRi; k 6=0 or (iii)Ri → kRj +Ri does not alter the row space or row rank of the matrix.

Theorem 4.7.2 The non zero row vectors of a row reduced echelon form R of the matrixA = [aij ]m×n form a basis of the row space of A.

Proof: Let A = [aij ]m×n and R1, R2, . . . , Rr be the non zero row vectors of the row reducedechelon matrix R. Other m− r row vectors of R are null vectors θ. Hence the row space isgenerated by

R1, R2, . . . , Rr, θ

and the row space R = L(R1, R2, . . . , Rr, θ). Since the generator R1, R2, . . . , Rr, θ con-tains a null vector θ, the vectors of the set is linearly dependent. Using deletion theorem,the null vector θ can be deleted from the generating set. So the new generating set isR1, R2, . . . , Rr, which contains non null vectors. Now we shall show that R1, R2, . . . , Rris linearly independent. Let,

Ri = (ai1, ai2, . . . , ain).

Since R is a row reduced echelon matrix, there are positive integers k1, k2, . . . kr satisfyingthe following conditions:

(i) the leading 1 of Ri occurs in column ki

(ii) k1 < k2 < . . . < kr

(iii) aikj= δij

(iv) aij = 0; if j < ki.

Let us consider the relation c1R1 + c2R2 + . . .+ crRr = θ, where ci’s are scalars. Then,

c1R1 + c2R2 + . . .+ crRr = θ

⇒ c1(a11, a12, . . . , a1n) + . . .+ cr(ar1, ar2, . . . , arn) = (0, 0, . . . , 0)⇒ (c1a11 + . . .+ crar1, c1a12 + . . .+ crar2, . . . , c1a1n + . . .+ crarn) = (0, 0, . . . , 0).

Here,a1k1 = 1 a2k1 = 0 . . . ark1 = 0a1k2 = 0 a2k2 = 0 . . . ark2 = 0

......

. . ....

a1kr= 0 a2kr

= 0 . . . arkr= 1

Equating kth1 , k

th2 , . . . , k

thr components only, we have

c1 = c2 = . . . = cr = 0.Here R1, R2, . . . , Rr is linearly independent and consequently it is a basis of the row

space of R. Since R is row equivalent to A, the row space of A is same as that of R andtherefore R1, R2, . . . , Rr is a basis of the row space of A. Hence every matrix is rowequivalent to a unique row reduced echelon matrix called its row canonical form.Corrolary: The row rank of a row reduced echelon matrix R is the number of nonzero rowsof R. Hence, dimR(T ) = dimR(A) = row rank of A.Corrolary: The determinant rank of A is the order of the largest sub matrix of A, whosedeterminant is not zero. Hence

dimR(A) = row rank of A = determinant rank.

Isomorphic 283

Theorem 4.7.3 For any m× n matrix A, the row rank and the column rank are equal.

Proof: Let A = [aij ]m×n be a matrix, where aij ∈ F . Also, let R1, R2, . . . , Rm andC1, C2, . . . , Cn be the row and column vectors of A respectively. Let the row rank of A ber and a basis of the row space of A is α1, α2, . . . , αr, where,

αi = (bi1, bi2, . . . , bin)

and bij = akj for some k. Since α1, α2, . . . , αr is a basis, by the property of a basis, forsome suitable scalars cij ∈ F ,

R1 = c11α1 + c12α2 + . . .+ c1rαr

R2 = c21α1 + c22α2 + . . .+ c2rαr

...Rm = cm1α1 + cm2α2 + . . .+ cmrαr

(i)

The jth component of Ri is aij , so considering jth components of (i), we have,

a1j = c11b1j + c12b2j + . . .+ c1rbrj

a2j = c21b1j + c22b2j + . . .+ c2rbrj

...amj = cm1b1j + cm2b2j + . . .+ cmrbrj

Hence cj = b1jβ1 + b2jβ2 + . . .+ brjβr, for j = 1, 2, . . . , n, where

βi =

c1i

c2i

...cmi

; i = 1, 2, . . . , r.

This shows that, any column vectors of A belongs to the linear span of r vectors β1, β2, . . . , βr

and so the column space of A has dimension at most r. Hence

dimension of the column space ≤ r ⇒ Column rank of A ≤ r

⇒ Column rank of A ≤ Row rank of A. (ii)

Also, we know, Row rank of A = Column rank of AT

Column rank of AT ≤ Row rank of AT

Row rank of A ≤ Column rank of A. (iii)

Combining (ii) and (iii), we have, Row rank of A = column rank of A. Also, if A and B betwo matrices of the same type, over the same field F , then

rank of (A+B) ≤ rank of A+ rank of B.

4.8 Isomorphic

We see that, to each vector α ∈ V , there corresponds relative to a given basis B =α1, α2, . . . , αn, a n−tuple [α]B in Fn. On the other hand, if (c1, c2, . . . , cn) ∈ Fn, then ∃a vector in V of the form

284 Vector Space

c1α1 + c2α2 + . . .+ cnαn.

Thus, the basis B determines a one-to-one correspondence between the vectors in V and

the n−tuples in Fn. Also, if α =n∑

i=1

ciαi corresponds to (c1, c2, . . . , cn) and β =n∑

i=1

diαi

corresponds to (d1, d2, . . . , dn), then

α+ β =n∑

i=1

(ci + di)αi corresponds to (c1, c2, . . . , cn) + (d1, d2, . . . , dn)

and for any scalar m ∈ F ,mα =

n∑i=1

(mci)αi corresponds to m(c1, c2, . . . , cn)

i.e., [α + β]B = [α]B + [β]B and [mα]B = m[α]B .

Thus the one-to-one correspondence V → Fn preserves the vector space operations of vec-tor addition and scalar multiplication. This one-to-one correspondence V → Fn is calledisomorphic, i.e., V ∼= Fn.

Ex 4.8.1 Test whether the following matrices in V = M2×3 are linearly independent or not.

A =[

1 2 −34 0 1

]; B =

[1 3 −46 5 4

]; C =

[3 8 −1116 10 9

]Solution: The coordinate vectors of the matrices in the usual basis of M2×3 are

[A] = [1, 2,−3, 4, 0, 1]; [B] = [1, 3,−4, 6, 5, 4]; [C] = [3, 8,−11, 16, 10, 9].From the matrix M whose rows are the above coordinate vectors and reduce M to an echelonform: 1 2 3 4 0 1

1 3 −4 6 5 43 8 −11 16 10 9

∼1 2 3 4 0 1

0 1 −1 2 5 30 2 −2 4 10 6

∼1 2 3 4 0 1

0 1 −1 2 5 30 0 0 0 0 0

.Sice the echelon matrix has only two nonzero rows, the coordinate vectors [A], [B] and

[C] span a subspace of dimension 2 and so are linearly dependent. Accordingly, the originalmatrices A,B,C are linearly dependent.

Exercise 4


1. Let V1 and V2 subspaces of a vector space V . Which of the following is necessarily asubspace of V ? NET(June)12(a) V1 ∩ V2 (b) V1 ∪ V2 (c) V1 + V2 = x + y : x ∈ V1, y ∈ V2 (d) V1/V2 = x ∈V1, y 6∈ V2

2. Let n be a positive integer and let Hn be the space of all n × n matrices A = (aij)with entries in < satisfying aij = ars whenever i + j = r + s(i, j, r, s = 1, 2, · · · , n).Then the dimension of Hn, as a vector space over <, is [NET(Dec)11](a) n2 (b) n2 − n+ 1 (c) 2n+ 1 (d) 2n− 1.

3. The value of k for which the vectors (1,5) and (2, k) linearly dependent is(a) k = 1 (b) k = 5 (c) k = 2 (d) k = 10

4. The value of k for which the vectors (1,0,0), (0,2,0) and (0, 0, k) linearly dependent is(a) k = 0 (b) k = 1 (c) k = 2 (d) k = −1

Isomorphic 285

5. The value of x for which the vectors (x, 1, 0), (0, x, 1) and (−1, 1, 1) linearly dependentare(a) 0, 1 (b) 0, 2 (c) 0, 3 (d) 1, 2

6. The value of k such that the vectors (1, k, 2), (0, 1, 2) and (1, 1, 1) linearly independentis(a) k = 3/2 (b) k 6= 3/2 (c) k = −3/2 (d) k 6= −3/2

7. If (1, 2,−1), (2, 0, 1), (−1, 1, k) is a basis of R3 then the value of k is(a) k = −2 (b) k 6= −2 (c) k = 0 (d) k 6= −1

8. If (1, 0, 0), (0, 1, 0), (0, 0, 1) is a basis of vector space V then its dimension is(a) 0 (b) 1 (c) 2 (d) 3

9. If W = (x, y, z) ∈ R3 : x+ y = 2z is a subspace of R3. Then one of its basis is(a) (2, 0, 1), (0, 2, 1) (b) (1, 1, 0), (0, 1, 1) (c) (−2, 1, 0), (1, 0, 1)(d) none of these

10. If W = (x, y, z) ∈ R3 : x+ 2y − 3z = 0 is a subspace of R3, then one of its basis is(a) (1, 1, 1), (1, 0, 1) (b) (−2, 1, 0), (3, 0, 1) (c) (1, 0, 0), (0, 1, 0)(d) (−2, 1, 0), (1, 0, 0)

11. If S = (x, y, z) ∈ R3 : x+ y + z = 0 is a subspace of R3 then dimension of S is(a) 0 (b) 1 (c) 2 (d) 3

12. If S = (x, y, z) ∈ R3 : 2x − y + 4z = 0, x − y + z = 0 is a subspace of R3 thendimension of S is(a) 4 (b) 3 (c) 2 (d) 1

13. The set (1, 0, 0), (0, 1, 0), (0, 0, 1) is a basis of(a) R (b) R2 (c) R3 (d) R4

14. The set (1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 1, 0), (0, 0, 0, 1) is a basis of(a) < (b) <2 (c) <3 (d) <4

15. If α = (3, 7) and β = (2, 4), γ = (−1, 1) then the linear combination of α in terms ofβ, γ is(a) α = β + 5γ (b) α = −β + 5γ (c) α = 1

3β + 53γ (d) α = − 1

3β + γ

16. If U and W are two subspaces of V and dim U = 2, dim W = 3, dim (U ∩W ) = 1then dim (U +W ) is(a) 1 (b) 3 (c) 6 (d) 4

17. Let U = (x, y, z) ∈ <3 : x + y + z = 0, V = (x, y, z) ∈ <3 : x − y + 2z = 0,W = (x, y, z) ∈ R3 : 2x− y + z = 0. Then S = (1, 2, 0), (0, 1, 1) is a basis of(a) U (b) W (c) V (d) none of these

18. If V is a vector space of all polynomials of degree ≤ n, then dimension of V is(a) 0 (b) 1 (c) n (d) infinite

19. Let T : <n → <n be a linear transformation, where n〉2. For k ≤ n, let E =v1, v2, · · · , vk ⊆ <n and F = Tv1, T v2, · · · , T vk. Then [IIT-JAM’11](a) If E is linearly independent, then F is linearly independent (b) If F is linearlyindependent, then E is linearly independent (c)If E is linearly independent, then Fis linearly dependent (d) If F is linearly independent, then E is linearly dependent

286 Vector Space

20. The dimension of the vector space of all symmetric matrices of order n × n(n ≥ 2)with real entries and trace equal to zero is NET(June)11(a) (n2−n)/2−1 (b) (n2 +n)/2−1 (c) (n2−2n)/2−1 (d)(n2 +2n)/2−1

21. The dimension of the vector space of all symmetric matrices A = (aij) of order n ×n(n ≥ 2) with real entries, a11 = 0 and trace equal to zero is NET(June)12(a) (n2 +n− 4)/2 (b) (n2−n+ 4)/2 (c) (n2 +n− 3)/2 (d)(n2−n+ 3)/2.

22. Let C be an n×n real matrix. LetW be the vector space spanned by I, C,C2, · · · , C2n.The dimension of the vector space W is NET(June)12

(a) 2n (b) atmost n (c) n2 (d) atmost 2n.

23. Let M be the vector space of all 3× 3 real matrices and let A =

2 1 00 2 00 0 3

. Which of

the following are subspaces of M? NET(June)11(a)

X ∈ M : XA = AX

(b)

X ∈ M : X + A = A + X

(c)

X ∈ M :

trace(AX) = 0

(d)X ∈M : det(AX) = 0

24. Let W =p(B) : p is a polynomial with real coefficients

where B =

0 1 00 0 11 0 0

. The

dimension d of the vector space W satisfies NET(June)11(a) 4 ≤ d ≤ 6 (b) 6 ≤ d ≤ 9 (c) 3 ≤ d ≤ 8 (d) 3 ≤ d ≤ 4


1. Show that the straight lines and the planes in <3 through the origin (0,0,0) are propersubspaces of the linear space <3.

2. Show that the dimension of the vector space V = (x1, x2, · · · , xn) ∈ <n : x1 + x2 +· · ·+ xn = 0 is n− 1.

3. Prove that every finitely generated vector space has a finite basis.

4. Let V1 = (x, y, z) ∈ <3 : x = y = z and V2 = (0, y, z) ∈ <3, find V1 ∩ V2.

5. Let V = (x, y, z) ∈ <3 : 2x + 3y + 5z = 5. Is V is a vector space over <? Justifyyour answer.

6. Show that the following sets of vectors are linearly independent:(a) (1, 0, 1), (0, 1, 1), (1, 1, 1) in R3

(b) (1, 3, 0), (0, 1, 1) in R3

(c) (1, 1, 1, 0), (1, 1, 0, 1), (1, 0, 1, 1), (0, 1, 1, 1) in R4

(d) (2, 6,−1, 8), (0, 10, 4, 3), (0, 0,−1, 4), (0, 0, 0, 8) in R4,(e) 1, 2, 2), (2, 1, 2), (2, 2, 1) in <3. [WBUT 2004]

7. Test whether the following set of vectors

(a) (1, 2,−1), (3,−1, 2) and (5, 3, 0) BH‘98(b) (1, 0,−1), (2,−1, 3) and (0,−1, 5) BH‘99(c) (1, 2, 3), (2, 1, 3) and (3, 0,−1) BH‘96in Euclidean 3 space is linearly dependent or not.

Isomorphic 287

8. Show that the following sets of vectors are linearly independent:(a) (1, 2, 0), (3,−1, 1), (4, 1, 1) in R3

(b) (1, 0,−1), (2, 1, 3), (−1, 0, 0), (1, 0, 1) in R3

(c) (1, 2,−3, 4), (3,−1, 2, 1), (1,−5, 8,−7) in R4.

9. Determine k so that the set S is linearly dependent:(a) S = (1, 3, 1), (2, k, 0), (0, 4, 1) in R3

(b) S = (1, 2, 1), (k, 1, 1), (0, 1, 1) in R3 [WBUT 2007](c) S = (1, 2, 1), (k, 3, 1), (2, k, 0) in R3 [WBUT 2005](d) (k, 1, 1), (1, k, 1), (1, 1, k) [ VH 97](e) S = (0, 1, k), (1, k, 1), (k, 1, 0) on <3(<). [ SET 10]

10. Find t, for which the following vectors are linearly independent:(a) (cos t, sin t), (− sin t, cos t)(b) (cos t, sin t), (sin t, cos t)(c) (eαt, αeαt), (eβt, βeβt).

11. Let A =(

1 11 1

), B =

(1 10 0

), C =

(1 00 1

)be three matrices in M2(<). Are they

linearly independent over <. Justify your answer. [BH‘06]

12. Examine the set S is a basis(a) S = (1, 1, 0), (0, 1, 1), (1, 0, 1) for R3,(b) S = (1, 1, 0), (1, 0, 0), (1, 1, 1) for R3,(c) S = (1, 2,−1,−2), (2, 3, 0,−1), (1, 2, 1, 4), (1, 3,−1, 0) for V4(R),(d) S = (2, 1, 0, 1), (1,−1, 2, 0), (3, 0, 2, 1), (0, 1,−2, 3) for R4.(e) S = (1, 2, 1), (2, 1, 0), (1,−1, 2) of <3. V H99, 01(f) S = (1, 2,−1,−2), (2, 3, 0,−1), (1, 2, 1, 4), (1, 3,−1, 0) of V4(<). VH‘96

13. Examine whether in <3, the vector (1,0,7) is in the span of S = (0,−1, 2), (1, 2, 3).


1. (a) Let P [x] be the set of all polynomials in x of degree ≤ n, over a real field <, i.e.,V = f(x) : f(x) = a0 + a1x+ a2x

2 + · · ·+ anxn, ai ∈ <.

Show that P [x] is a vector space with ordinary addition of polynomials and themultiplication of of each coefficient of the polynomial by a member of < as thescalar multiplication composition.

(b) Let V be the set of all m×n matrices whose elements belong to the field F . Showthat V is a vector space over F with respect to the operations of matrix additionand scalar multiplication.

(c) Show that the set C of complex numbers is a vector space itself as the vectorspace compositions.

(d) Show that the set of all odd functions from < to itself is a vector space withrespect to addition and scalar multiplication of functions.

(e) Let V be the set of all ordered pairs (x, y) of real numbers and let < be a field ofreal numbers. Define

(x1, y1) + (x2, y2) = (3y1 + 3y2,−x1 − x2) and c(x, y) = (3cx,−cx).Verify that V with these operations is not a vector space over <.

288 Vector Space

(f) Let V = <2 = (a1, a2) : a1, a2 ∈ < and F = <. Define the addition and scalarmultiplication in <2 to <2 as follows:

(a1, a2) + (b1, b2) = (a1 + b1 + 1, a2 + b2 + 1) andc(a1, a2) = (ca1 + a1 − 1, ca2 + a2 − 1).

Show that V is a vector space over <.

(g) (Vector space of n-tuples) Let n be a fixed integer (≥ 1). Then (<n,+, .) is avector space over <, where,

(x1, x2, · · · , xn) + (y1, y2, · · · , yn) = (x1 + y1, x2 + y2, · · · , xn + yn)c(x1, x2, · · · , xn) = (cx1, cx2, · · · , cxn).

(h) The set of all ordered triplets (x1, x2, x3) of real numbers such thatx1

3=x2

4=x3

2forms a real vector space, where the operations addition and multiplication aredefined as above, i.e., the set of all points on any line passing through origin in<3 forms a vector space.

(i) V = f(t) : f(t) = c1 cos t + c2 sin t, c1, c2 ∈ F, that is f is a solution of the

differential equationd2x

dt2+ x = 0 and c1, c2 are scalars of the field. Then V is a

vector space over F .

(j) The power set V of a fixed non-empty set Ω forms a vector space over F = 0, 1withe respect to the operations

A+B = (A−B) ∪ (B −A) = A4B.cA = A if c = 1 and cA = φ, the null set, if c = 0.

(k) The set of all real-valued random variables on a fixed sample space forms a vectorspace over < under the usual operations.

(l) For the vector space FX show that the set f : 0 ∈ range off is a generatingset provided X has at least two elements and f : 0 6∈ range off is a generatingset provided F has at least 3 elements.

2. (a) Show that S =p(t) :

∫ 1

−1

(2t + 3)p(t)dt = 0

is a subspace of P4 and find a

generating set of S with size ≤ 3.

(b) If a vector space V is the set of real valued continuous functions over R, then

show that the set W of solutions ofd2y

dx2+ p

dy

dx+ qy = 0 is a subspace of V .

(c) Let <3 be the vector space of all 3 tuples of real numbers. Show that W =(x1, x2, 0) : x1, x2 ∈ < is a subspace of <3.

(d) Show that the set of all n× n real diagonal matrices is a subspace of Rn×n.

(e) Show that the set of all n× n real symmetric matrices is a subspace of Rn×n.

(f) Show that the set of all n×n real skew-symmetric matrices is a subspace of Rn×n.

(g) Show that the set of all n× n real triangular matrices is a subspace of Rn×n.

3. Does the set S = (x, y) ∈ <2;xy ≥ 0 form a vector space? Give reasons. Gate’04

4. Show that any field can be considered to be a vector space over a subfield of it. BH‘98

5. Show that the following subsets of <3 are subspaces of <3.(i) W = (x, 2x, 3x) : x ∈ R(ii) W = (x, y, z) ∈ R3 : x+ 2y + 4z = 0(iii) W = (x, y, z) ∈ R3 : 2x− 2y + 5z = 0, x− y + 2z = 0.

Isomorphic 289

6. Show that the following subsets of <3 are not subspaces of <3.(i) U = (x, y, 3) : x, y ∈ R(ii) U = (x, y, z) ∈ R3 : x+ 2y − z = 3(iii) U = (x, y, z) ∈ R3 : 2x− y + z = 1, x− 3y + z = 0.

7. (a) Show that f1(t) = 1, f2(t) = t−2, and f3(t) = (t−2)2 form a basis of P4. Express3t2 − 5t+ 4 as a linear combination of f1, f2, f3.

(b) Show that α = (8, 17, 36) is the linear combination of β = (1, 0, 5), γ = (0, 3, 4), δ =(1, 1, 1).

8. (a) Show that α = (1, 0, 0), β = (0, 1, 0), γ = (0, 0, 1) and δ = (1, 1, 1) in V3 form alinearly dependent set in <3, but any three of them are linearly independent.

(b) Show that the vectors 2x3 + x2 + x+ 1, x3 + 3x2 + x− 2 and x3 + 2x2 − x+ 3 ofP [x], the real vector space of all polynomials are linearly independent.

(c) Show that the vectors (1, 2, 1), (3, 1, 5) and (3,−4, 7) are linearly dependent in<3. V H‘96

(d) Determine the subspace of <3 spanned by the vectors (1, 2, 3), (2, 3, 4) and exam-ine if (1, 1, 1) is in this subspace.

(e) Show that the vectors 1 + x + 2x2, 3 − x + x2, 2 + x,−7 + 5x + x2 of P (x), thevector space of polynomials in x over <, are linearly dependent.

(f) If α, β and γ are linearly independent vectors, find the number of linearly inde-pendent vectors in the set α− β, β − γ, γ − α. BH‘99

9. Let f1, f2, · · · , fn be real valued functions defined on [a, b] such that fi has continu-ous derivatives upto order (n − 1). If the Wronskian W (f1, f2, · · · , fn)(x) = 0 andW (f1, f2, · · · , fn−1)(x) 6= 0 in [a, b], show that f1, f2, · · · , fn are LD on [a, b]. Gate’97

10. The vectors (a1, b1) and (a2, b2) are linearly independent in <2. Find the rank of thesubspace of P generated by (a1, b1), (a2, b2), (a3, b3). BH‘98

11. Find the dimensions of the vector space of all solutions of the set of equations x1 +x2 + x3 = 0, x1 − x3 = 0 and x2 − x3 = 0. BH‘98

12. Show that the vectors (1, 2, 0), (3,−1, 1) and (4, 1, 1) are linearly dependent. Find twoof them which are linearly independent. BH‘02

13. Show that the vector space of all periodic functions f(t) with period T contains aninfinite set of linearly independent vectors.

14. (a) Examine whether in <3, (1, 0, 7) is the span of S = (0,−1, 2), (1, 2, 3). BH‘06

(b) Consider the vectors α1 = (1, 3, 2) and α2 = (−2, 4, 3) in <3. Show that the spanof α1, α2 is

(c1, c2, c3) : c1 − 7c2 + 10c3 = 0

and show that it can also be written as (α, β, −α+7β10 ) : α, β ∈ <.

(c) Consider the vectors α1 = (1, 0, 1,−1), α2 = (1,−1, 2, 0) and α3 = (2, 1, 1,−3) in<4. Show that the span of α1, α2, α3 is

(c1, c2, c3, c4) : c1 + c2 + c4 = 0, 2c1 − c3 + c4 = 0

and show that it can also be written as (α, β, α− β,−α− β) : α, β ∈ <.

290 Vector Space

(d) Consider the vectors α1 = (1, 2, 1,−1), α2 = (2, 4, 1, 1), α3 = (−1,−2,−2,−4)and α4 = (3, 6, 2, 0) in <4. Show that the span of α1, α2, α3, α4 is

(c1, c2, c3, c4) : 2c1 − c2 = 0, 2c1 − 3c3 − c4 = 0

and show that it can also be written as (α, 2α, β, 2α− β) : α, β ∈ <.(e) In <2, let α = (3, 1), β = (2,−1). Show that L(α, β) = <2.

(f) Show that the set S = (1, 2, 3, 0), (2, 1, 0, 3), (1, 1, 1, 1), (2, 3, 4, 1) is linearlydependent in R4. Find a linearly independent subset S1 of S such that L(S) =L(S1).

15. (a) For what values of a do the vectors (1 + a, 1, 1), (1, 1 + a, 1) and (1, 1, 1 + a) forma basis of V3(R).

(b) Show that the set (1, 0, 0), (1, 1, 1) form a basis of V3(<). Find the co-ordinatesof the vector (a, b, c) with respect to the above basis. VH‘99

(c) Show that α + β, β + γ, γ + α and α, α + β, α + β + γ are bases of <3, ifα, β, γ is a basis of <3.

16. (a) Find a basis for <3 containing the set of vectors (1, 2,−1), (2, 4,−2). BH‘99

(b) Find a basis for <3 containing the set of vectors (1, 2, 0), (1, 3, 1). VH‘04, ‘00

(c) Find a basis for <3 containing the vector (1, 2, 0). BH‘00

17. Show that if α1, α2, α3 be a basis of a vector space V of dimension 3, then α1 +α2 + α3, α2 + α3, α3 is also a basis of V . BH‘03, ‘05

18. Show that the sets S = α, β and T = α, β, α−β of real vectors generate the samevector space. BH‘01

19. (a) Suppose that S = (x, y, z) ∈ <3 : z = 2x− y. Show that S is a subspace of thevector space <3 over the field of reals. Find a basis of S containing the vector(1, 2, 0). VH‘04

(b) Suppose that S = (x, y, z) ∈ <3 : 2x + y − z = 0. Show that S is a subspaceof the vector space <3 over the field of reals. Find a basis and dimension of S.NBH‘05

(c) Show that S = (x, y, z) ∈ <3 : x + y − z = 0 and2x − y + z = 0 is a subspaceof the vector space <3. Find the dimension of S. VH‘02

(d) Suppose that S = (x, y, z) ∈ <3 : x+ y + z = 0. Show that S is a subspace ofthe vector space <3 over the field of reals. Find a basis of S. VH‘00, ‘97

(e) Show that S = a+ ib, c+ id is a basis of C(<) if ad− bc 6= 0.

(f) Let V2 =(

a bb c

); a, b, c ∈ Q

, show that V2 is a subspace of the vector space

of 2×2 real matrices with usual matrix addition and scalar multiplication. Writea basis of V2 and find dimension of V2.

20. (a) Find basis for the subspace in <3 defined by (x1, x2, x3) : x1 + x2 + x3 = 0.[JU(M.Sc.)‘06]

(b) Find the conditions on a, b, c so that (a, b, c) ∈ <3 belongs to the space generatedby α = (2, 1, 0), β = (1,−1, 2) and γ = (0, 3,−4).

(c) Determine a basis of the subspace spanned by the vectors α1 = (2,−3, 1), α2 =(3, 0, 1), α3 = (0, 2, 1) and α4 = (1, 1, 1).

Isomorphic 291

(d) Let α1 = (1, 2, 0, 3, 0), α2 = (1, 2,−1,−1, 0), α3 = (0, 0, 1, 4, 0), α4 = (2, 4, 1, 10, 1)and α5 = (0, 0, 0, 0, 1). Find the dimension of the linear span of α1, α2, · · · , α5.Gate’04

21. (a) Show that, the yz plane W = (0, b, c) in <3 is generated by (0, 1, 1) and(0, 2,−1).

(b) Show that the complex numbers w = 2 + 3i and z = 1− 2i generate the complexfield C as a vector space over the real field <.

22. Find a basis of the span of the vectors (1, 0, 1, 2,−1), (2, 1,−1, 3, 0), (0,−1, 3, 1,−2),(3, 1, 0, 1, 1), (3, 1, 0, 3, 0) in <5.

23. Using replacement theorem determine a basis of <4 containing vectors(1, 2, 1, 3), (2, 1, 1, 0) and (3, 2, 1, 1). [CH10]

24. Consider the subspaces of <4 asS = (x, y, z, w) ∈ <4 : 2x+ y + z + w = 0T = (x, y, z, w) ∈ <4 : x+ 2y + z + w = 0

Determine a basis of the subspace S ∩ T and hence determine dim(S ∩ T ). [CH10]

25. Two subspaces of R3 are U = (x, y, z) : x+y+z = 0 and W = (x, y, z) : x+2y−z =0. Find dim U, dim W, dim U ∩W, dim (U +W ).

26. Let W be an m dimensional subspace of an n dimensional vector space V , wherem < n. Find dim(V/W ). Gate’98

27. Extend the linearly independent subset A of V to a basis of V in each of the followingcases:

(a) V = <n and A = (1, 1, · · · , 1).(b) V = F 4 with F = GF (3) and A = (1, 2, 1, 2), (1, 1, 2, 1).

28. Show that α1 = 2 + 3t, α2 = 3 + 5t, α3 = 5 − 8t2 + t3 and α4 = 4t − t2 form a basisof P4. Find the coordinates of the vector a0 + a1t + a2t

2 + a3t3 with respect to this

coordinate system.

29. In a real vector space <3(<), show that the co-ordinate vector of α = (3, 1,−4) withrespect to the basis (1, 1, 1), (0, 1, 1), (0, 0, 1) is identical to α itself.

30. (a) Let S and T be subspaces of a vector space V with dim(S) = 2, dim(T ) = 3 anddim(V ) = 5. Find the minimum and maximum possible values of dim(S+T ) andshow that every (integer) value between these can be attained.

(b) Let S and T be two subspaces of <24 such that dim(S) = 19 and dim(T ) = 17.Find the possible value of dim(S ∩ T ). Gate’04

31. Determine rank of row space of

1 0 10 1 01 1 1

.

32. Show that, if A and B be have the same column space if and only if AT and BT havethe same row space.

33. Find two different subspaces of dimension 1 and two different subspaces of dimension2 contained in the subspace (α1, α2, α3, α4) : α1 + 2α2 + 3α3 + 4α4 = 0 of <4.

292 Vector Space

34. Let <3 → <3 be a linear mapping. Show that ∃ an one dimensional subspace V of <3

such that it is invariant under f . Gate’96

35. Let B = α1, α2, α3 be the basis for C3, where α1 = (1, 0,−1), α2 = (1, 1, 1) andα3 = (2, 2, 0). Find the dual basis of B. BU(M.Sc)‘03

Chapter 5

Linear Transformations

We have studied homomorphisms from one algebraic system to another algebraic system,namely, group homomorphism, ring homomorphism. On parallel lines we shall study vectorspace homomorphism. Since the vector space V (F ) is comprised of two algebraic system,group (V,+) and a field (F,+, .), there may be some confusion as to what operations areto be preserved by such function. Generally, vector space homomorphism are called linearmappings or linear transformation.

In this chapter, the notion of linear transformation (or mapping or function) and itsdifferent properties are studied. Here we shall show that a linear transformation can berepresented by a matrix. The similarity between kernel or null space, rank, nullity andother features of linear transformations are discussed here.

5.1 Linear Transformations

Let V and W be two vector spaces over the same field F . Then a mapping T : V → Wwith domain V and codomain W is called a linear transformation or linear mapping orhomomorphism of V into W , if

(i) Additive property: T (α+ β) = T (α) + T (β);

(ii) Homogeneous property: T (cα) = cT (α);

for all α and β in V and all scalars c in F . Thus T : V →W is linear if it preserves the twobasic operations, vector addition, scalar multiplication of a vector space. Note the followingfacts:

(i) A linear mapping T : V → W is completely characterized by the condition (principleof superposition)

T (aα+ bβ) = aT (α) + bT (β); ∀a, b ∈ F and α, β ∈ V. (5.1)

This single condition, which is the replacement of the above two conditions is some-times used as its definition. More generally, for any scalars ci ∈ F and for any vectorsαi ∈ V , we get,

T

(n∑

i=1

ciαi

)=

n∑i=1

ciT (αi)

⇒ T (c1α1 + c2α2 + · · ·+ cnαn) = c1T (α1) + c2T (α2) + · · ·+ cnT (αn).

(ii) If c = 0, then T (θV α) = f (θW ) = θW f(α) = θW , i.e. every linear mapping takes thenull vector into the null vector of W .

293

294 Linear Transformations

(iii) The term linear transformation rather than linear mapping is frequently used for linearmappings of the form T : <n → <m.

(iv) The mapping T : V → V defined by T (α) = α;∀α ∈ V , is a linear mapping. This iscalled the identity mapping on V and is denoted by IV .

(v) Two linear transformations T1 and T2 from V (F ) to W (F ) are said to be equal iffT1(α) = T2(α); ∀ α ∈ V .

(vi) The mapping T : V →W defined by T (α) = θW ;∀α ∈ V , θW being the null vector inW , is a linear mapping. This is called the zero mapping and is denoted by 0T .

(vii) A one-to-ont linear transformation of V onto W is called an isomorphism. In case,∃ an isomorphism of V onto W , we say that V is isomorphic to W and is written asV ∼= W.

(viii) Let A be any m × n real matrix, which is determined by a mapping TA : Fn → Fm

by FA(α) = Aα( where the vectors Fn and Fm are written as columns). This matrixmapping is linear.

(ix) A transformation T is called nilpotent of index n if Tn+1 = θ but Tn−1 6= θ.

The following are some important linear transformations:

(i) Projection: T : <3 → <2, defined by T (x, y, z) = (x, y).

(ii) Dilation: T : <3 → <3, defined by T (α) = rα; r > 1.

(iii) Contraction: T : <3 → <3, defined by T (α) = rα; 0 < r < 1.

(iv) Reflection: T : <3 → <2, defined by T (x, y) = (x,−y).

(v) Rotation: T : <2 → <2, defined by T (α) =[

cos θ − sin θsin θ cos θ

]α.

In geometry, rotation, reflections and projections provide us with another class of lineartransformations. These transformations can be used to study rigid motion in <n.

Ex 5.1.1 The mapping T : R2 → R defined by T (x, y) = 2x + y for all (x, y) ∈ R2 is alinear transformation.

Solution: Let c1, c2 ∈ R and α = (a1, a2), β = (b1, b2) be any two elements of <2. Then

T (c1α+ c2β) = T (c1a1 + c2b1, c1a2 + c2b2)= 2(c1a1 + c2b1) + (c1a2 + c2b2) = c1(2a1 + a2) + c2(2b1 + b2)= c1T (a1, a2) + c2T (b1, b2) = c1T (α) + c2T (β).

Hence T is a linear transformation.

Ex 5.1.2 Let F [x] be the vector space of all polynomials in the indeterminant x over <.Prove that, the mapping T : F [x] → F [x] defined by T [p(x)] = d

dx [p(x)]; ∀p(x) ∈ F [x] is alinear mapping.

Linear Transformations 295

Solution: Here the mapping T : F [x] → F [x] defined by T [p(x)] = ddx [p(x)]; ∀p(x) ∈ F [x].

Now, ∀a, b ∈ < and p1(x), p2(x) ∈ F [x], we have,

T [ap1(x) + bp2(x)] =d

dx[ap1(x) + bp2(x)]; by definition

= ad

dx[p1(x)] + b

d

dx[p2(x)]

= aT [p1(x)] + bT [p2(x)].

Therefore, the mapping T : F [x] → F [x] is linear. This T : F [x] → F [x] is called derivativemapping. If p(x) is a polynomial of degree n, then

Tn+1 [p(x)] =dn+1

dxn+1p(x) = 0; ∀ p(x) ∈ F [x].

i.e. Tn+1 = θ. Thus ∃ a non-zero transformation T such that a finite power of T is θ.

Ex 5.1.3 Let V be the vector space of all real valued continuous functions on [a, b]. Prove

that, the mapping T : V → < defined by T [f ] =∫ b

a

f(x)dx; f ∈ V is a linear mapping.

Solution: Here the mapping T : V → < defined by T [f ] =∫ b

a

f(x)dx; f ∈ V . Now,

∀a1, a2 ∈ < and f, g ∈ V , we have,

T [a1f + a2g] =∫ b

a

[a1f(x) + a2g(x)]dx; by definition

= a1

∫ b

a

f(x)dx+ a2

∫ b

a

g(x)dx = a1T [f ] + a2T [g].

Therefore, the mapping T : V → < is linear. This mapping T : V → < is called integral

mapping. The linear transformation T [f ](x) =∫ x

0

f(t)dt is not only continuous but has a

continuous first derivative.

Ex 5.1.4 Let W (F ) be a subspace of a vector space V (F ). Prove that the mapping T : V →V/W defined by T (α) = α+W ; α ∈ V is a linear transformation.

Solution: Let a, b ∈ F , then ∀ α, β ∈ V , we have,

T (aα+ bβ) = aα+ bβ +W = (aα+W ) + (bβ +W )= a (α+W ) + b (β +W ) = aT (α) + bT (β)

Therefore, T is a linear mapping.

Ex 5.1.5 Let the mapping T : P1 → P2 be defined by T [p(x)] = xp(x) + x2. Is T is lineartransformation?

Solution: Let p1(x), p2(x) ∈ P1, then

T [p1(x) + p2(x)] = x[p1(x) + p2(x)] + x2

= xp1(x) + x2 + xp2(x) + x2 − x2

= T [p1(x)] + T [p2(x)]− x2 6= T [p1(x)] + T [p2(x)].

Therefore, we conclude that T is not a linear transformation.


Properties of linear transformation

Let V and W be two vector spaces over the same field F and T : V → W be a linearmapping. Let α, β ∈ V , and θV , θW be the null elements of V and W respectively. We nowhave,

(i) T (θV ) = θW .

(ii) T (−α) = −T (α); ∀α ∈ V.

(iii) T (α− β) = T (α)− T (β); ∀α, β ∈ V

(iv) T (α+ α+ · · ·+ α︸︷︷︸n times

) = T (nα) = nT (α), n is a positive integer,

(v) T (−mα) = mT (−α) = −mT (α), m is a positive integer,

(vi) T (mn α) = m

n T (α), m and n are integers.

Proof: This properties are derived from the definition of linear transformation. HereT : V →W is a linear mapping. (i) For any α ∈ V , we have,

T (α) = T (α+ θV ) = T (α) + T (θV ); as T is linear⇒ T (α) + θW = T (α) + T (θV ); θW is the zero in W⇒ T (θV ) = θW ; by cancellation law in (W,+).

Thus, if T (θV ) 6= θW , then T is not linear transformation. For example, let T : <2 → <2,defined by T (x, y) = (x + 4, y + 7) be a translation mapping. Note that, T (θ) = T (0, 0) =(4, 7) 6= (0, 0). Thus, the zero vector is not mapped into the zero vector, hence T is notlinear.(ii) Using the first result, we have,

θW = T (θV ) = T [α+ (−α)]= T (α) + T (−α); sinceT is linear

⇒ T (−α) = T (α).

If T is the linear operator, then T (θ) = θ.From this result, it follows that the principle of homogeneity of a linear transformation

follows from the principle of additively when c (T (cα) = cT (α)) is rational, but, this is notthe case if c is irrational. Again, a transformation may satisfy the property of homogeneitywithout satisfying the property of additivity.

Ex 5.1.6 Prove that the following mappings are not linear (i) T : <2 → <2 defined byT (x, y) = (xy, x). (ii) T : <2 → <3 defined by T (x, y) = (x+ 5, 7y, x+ y). (iii) T : <3 → <2

defined by T (x, y, z) = (|x|, y + z).

Solution: (i) Let α = (1, 2) and β = (3, 4), then α+ β = (4, 6). Also by definition,

T (α) = (2, 1) and T (β) = (12, 3)⇒ T (α) + T (β) = (14, 4)⇒ T (α+ β) = (24, 6) 6= T (α) + T (β).

Therefore, the mapping T : <2 → <2 defined by T (x, y) = (xy, x) is not linear.(ii) Since T (0, 0) = (5, 0, 0) 6= (0, 0, 0), so the mapping T : <2 → <3 defined by T (x, y) =


(x+ 5, 7y, x+ y) is not linear.(iii) Let α = (1, 2, 3) and c = −3, then cα = (−3,−6,−9) so that, by definition, T (cα) =(3,−15). Also, we have

T (α) = (1, 5) and cT (α) = (−3,−15) 6= (3,−15) = T (cα).

Therefore, the mapping T : <3 → <2 defined by T (x, y) = (|x|, y + z) is not linear.

5.1.1 Kernal of Linear Mapping

Let V and W be two vector spaces over the same field F and T : V → W be a linearmapping. The kernal or null space of the linear mapping T , denoted by KerT , is the set ofelements of V such that T (α) = θW ; θW being the null vector in W , i.e.,

KerT = α : α ∈ V and T (α) = θW . (5.2)

For example, let T : V → V and T0 : V → W be identity and zero mappings respectively,then N(T ) = θ, N(T0) = V . Since θV ∈ kerT, so kerT is never an empty set. kerT is

Ker (T )

dim= r

r θ

V

Domain

θ′

*

W

Co-domain

dim= n− r

range(T )

q

T

Figure 5.1: Range and kernel of a linear transformation T : V →W .

also called the null space of T and is denoted by N(T ). Also, dimN(T ) ≤ dimV.

Ex 5.1.7 A mapping T : <3 → <3, defined by, T (x, y, z) = (x + 2y + 3z, 3x + 2y + z, x +y + z); (x, y, z) ∈ <3. Show that T is linear. Find kerT and dimension of kerT .

Solution: Let α = (x1, y1, z1) ∈ <3 and β = (x2, y2, z2) ∈ <3. Now,

T (α) = (x1 + 2y1 + 3z1, 3x1 + 2y1 + z1, x1 + y1 + z1)T (β) = (x2 + 2y2 + 3z2, 3x2 + 2y2 + z2, x2 + y2 + z2)

T (α+ β) = T (x1 + x2, y1 + y2, z1 + z2)= ((x1 + x2) + 2(y1 + y2) + 3(z1 + z2), 3(x1 + x2)+ 2(y1 + y2) + (z1 + z2), (x1 + x2) + (y1 + y2) + (z1 + z2))= (x1 + 2y1 + 3z1, 3x1 + 2y1 + z1, x1 + y1 + z1)+ (x2 + 2y2 + 3z2, 3x2 + 2y2 + z2, x2 + y2 + z2)= T (α) + T (β); ∀α, β ∈ <3.


Let c ∈ < be any scalar, then cα = (cx1, cy1, cz1). Therefore, using definition,

T (cα) = T (cx1, cy1, cz1)= (cx1 + 2cy1 + 3cz1, 3cx1 + 2cy1 + cz1, cx1 + cy1 + cz1)= c(x1 + 2y1 + 3z1, 3x1 + 2y1 + z1, x1 + y1 + z1)= cT (α); ∀c ∈ < and α ∈ <3.

Hence T is a linear mapping. Let (x1, y1, z1) ∈ KerT , then by using definition of kerT , wehave,

kerT = (x1, y1, z1) ∈ <3; T (x1, y1, z1) = (0, 0, 0).or, x1 + 2y1 + 3z1 = 0, 3x1 + 2y1 + z1 = 0, x1 + y1 + z1 = 0.

From the first two equations, we have,

x1

−4=y18

=z1−4

or,x1

−1=y12

=z1−1

= k(say)

or, x1 = −k, y1 = 2k, z1 = −k,

which satisfies the last equation x1 + y1 + z1 = 0. Thus, (x1, y1, z1) = k(−1, 2,−1); k ∈ <.Let α = (−1, 2,−1), then kerT = Lα and so dimkerT = 1.

Ex 5.1.8 Find a basis and dimension of kerT , where the linear mapping T : <3 → <2 isdefined by T (x, y, z) = (x+ y, y + z).

Solution: To find the basis and dimension of the kerT , set T (α) = θ, where α = (x, y, z).Therefore, we have the homogeneous system as,

(x+ y, y + z) = (0, 0) ⇒ x+ y = 0 = y + z.

The solution space is given by x − z = 0, the free variable is y in kerT . Hence dim(kerT )or nullity(T ) = 1. Now, (1,−1, 1) is a solution and so (1,−1, 1) form a basis for kerT .

Deduction 5.1.1 Kernal of matrix mapping: The kernal of any m× n matrix A overF, viewed by a linear map A : Fn → Fm, consists of all vectors α for which Aα = θ. Thismeans that, the kernal of A is the solution space of the homogeneous system Ax = θ, calledthe null space of A.

Ex 5.1.9 Consider the matrix mapping A : <4 → <3, where A =

1 2 3 11 3 5 −23 8 13 −3

. Find the

basis and dimension of the kernal of A.

Solution: By definition, kerA is the solution space of the homogeneous system Ax = θ,where we take the variables as x = (x1, x2, x3, x4)T . Therefore, reduce the matrix A ofcoefficients to echelon form1 2 3 1

1 3 5 −23 8 13 −3

∼

1 2 3 10 1 2 −30 2 4 −6

∼

1 2 3 10 1 2 −30 0 0 0

Thus Ax = θ becomes

x1 + 2x2 + 3x3 + x4 = 0 and x2 + 2x3 − 3x4 = 0.


These equations shows that the variables x3 and x4 are free variables, so that dimker(A) = 2.Also we see that (1,−2, 1, 0) and (−7, 3, 0, 1) satisfies the equation, so that the basis of kerAis (1,−2, 1, 0), (−7, 3, 0, 1).

Theorem 5.1.1 Let V and W be two vector spaces over a field F and let T : V →W be alinear mapping. Then kerT is a subspace of V .

Proof: By definition of kerT , we have,

KerT = α : α ∈ V and T (α) = θW .

Since T (θV ) = θW ; so θV ∈ kerT . Therefore, kerT is non empty. To prove the theorem, weconsider the following two cases:Case 1: Let kerT = θV , then obviously kerT is a subspace of V .Case 2: Let kerT 6= θV , and let α and β ∈ kerT . Then by definition,

T (α) = θW ; T (β) = θW .

Since T is a linear transformation, so ∀a, b ∈ F , we have,

T (aα+ bβ) = T (aα) + T (bβ); as T is linear= aT (α) + bT (β); as T is linear= aθW + bθW = θW

⇒ aα + bβ ∈ kerT ; ∀α, β ∈ kerT and a, b ∈ F.

Hence kerT is a subspace of V . This subspace is also known as null space of T . Thedimension of kerT is called the nullity of T , i.e., nullity (T)=dim (ker T).

Theorem 5.1.2 Let V and W be two finite dimensional vector spaces over a field F andlet T : V →W be linear. Then T is injective if and only if kerT = θV .

Proof: First let, T be injective mapping from V into W . Since T (θV ) = θW in W , θ is apreimage of θW and since T is injective, θV is the only preimage of θW . Therefore,

kerT = θV .

Conversely, let kerT = θV .We wish to show that T is one-to-one. Let α, β be two elementsof V such that T (α) = T (β) in W . Now,

θW = T (α) − T (β) = T (α− β); as T is linear⇒ α− β ∈ kerT ⇒ α− β = θV ; as kerT = θV .

Thus, T (α) = T (β) ⇒ α = β

and so T is injective. Hence the theorem. Note that, if the linear mapping T : V → V besuch that kerT = θV , then a basis of V is mapped into another basis of V . Thus T isinjective, if and only if dim(kerT ) = 0.Note : If T (α) = β and T (γ) = β, then α − γ ∈ kerT. In other words, any two solutionsto T (α) = β differ by an element of the kernal of T .

Ex 5.1.10 Prove that there cannot be a one-one linear transformation T : <2 → <2.

Solution: Here dim(<3) = 3 and dim(<2) = 2. Therefore

dim(<3) = dim(<2) + dim(kerT )or, 3 = 2 + dim(kerT ) ⇒ dim(kerT ) = 1 6= 0.

Thus, there cannot be a one-one linear transformation T : <2 → <2.


Theorem 5.1.3 Let V and W be vector spaces over a field F and T : V → W be a linearmapping such that kerT = θ. Then the images of a linearly independent set of vectors inV are linearly independent in W .

Proof: Let S = α1, α2, . . . , αn be a linearly independent set in V . We are to show that,T (α1), T (α2), . . . , T (αn) is a LI set in W . For some scalars c1, c2, . . . , cn ∈ F , we have,

c1T (α1) + c2T (α2) + . . .+ cnT (αn) = θW

⇒ T (c1α1) + T (c2α2) + . . .+ T (cnαn) = θW ; as T is Linear⇒ T (c1α1 + c2α2 + . . .+ cnαn) = θW ; as T is Linear⇒ c1α1 + c2α2 + . . .+ cnαn = θV ; as kerT = θV ⇒ c1 = c2 = . . . = cn = 0; as S is LI.

Hence T (α1), T (α2), . . . , T (αn) is a linearly independent set of vectors in W .

5.1.2 Image of Linear Mapping

Let V and W be two vector spaces over the same field F and T : V → W be a linearmapping. The range or image of the linear mapping T , denoted by R(T ) or ImT , is theset of all images of all elements of V, i.e.,

R(T ) = ImT = T (α) ∈W : α ∈ V . (5.3)

If ImT = W, we say that T is onto.

Ex 5.1.11 Show that the following mapping T : <3 → <3, defined by,T (x, y, z) = (2x+ y + 3z, 3x− y + z,−4x+ 3y + z); (x, y, z) ∈ <3

is linear. Find ImT and dimension of ImT .

Solution: Let α = (x1, y1, z1) ∈ <3 and β = (x2, y2, z2) ∈ <3. By definition,

T (α) = (2x1 + y1 + 3z1, 3x1 − y1 + z1,−4x1 + 3y1 + z1)T (β) = (2x2 + y2 + 3z2, 3x2 − y2 + z2,−4x2 + 3y2 + z2)

T (α+ β) = T (x1 + x2, y1 + y2, z1 + z2)= (2(x1 + x2) + (y1 + y2) + 3(z1 + z2), 3(x1 + x2)− (y1 + y2) + (z1 + z2),−4(x1 + x2) + 3(y1 + y2) + (z1 + z2))= (2x1 + y1 + 3z1, 3x1 − y1 + z1,−4x1 + 3y1 + z1)+ (2x2 + y2 + 3z2, 3x2 − y2 + z2,−4x2 + 3y2 + z2)= T (α) + T (β); ∀α, β ∈ <3.

Let c ∈ <. Then cα = (cx1, cy1, cz1). By definition,

T (cα) = T (cx1, cy1, cz1)= (2cx1 + cy1 + 3cz1, 3cx1 − cy1 + cz1,−4cx1 + 3cy1 + cz1)= c(2x1 + y1 + 3z1, 3x1 − y1 + z1,−4x1 + 3y1 + z1)= cT (α); ∀c ∈ < and α ∈ <3.

Hence T is linear. Let α be an arbitrary vector in ImT . Then,

α = (2x+ y + 3z, 3x− y + z,−4x+ 3y + z)= x(2, 3,−4) + y(1,−1, 3) + z(3, 1, 1).


Hence α is a linear combination of vectors (2, 3,−4), (1,−1, 3) and (3, 1, 1). Also, for somescalars, c1, c2, c3 ∈ <, if

c1(2, 3,−4) + c2(1,−1, 3) + c3(3, 1, 1) = θ and

∣∣∣∣∣∣2 1 33 −1 1−4 3 1

∣∣∣∣∣∣ = 0,

so that, S = (2, 3,−4), (1,−1, 3), (3, 1, 1) is linearly dependent. Hence,

ImT = L((2, 3,−4), (1,−1, 3), (3, 1, 1)).

Since S is linearly dependent, so dimension of ImT is 3.

Ex 5.1.12 Show that a linear transformation T : <3 → <3, such that ImT is a subspaceS =

(x, y, z) ∈ <3, x+ y + z = 0

is T (x, y, z) = (x+ y,−x,−y).

Solution: Let α = (x, y, z) = (x, y,−x − y) = x(1, 0,−1) + y(0, 1,−1). Lete1, e2, e3

be

a standard basis of <3, then there exists a unique linear transformation T such thatT (e1) = (1, 0,−1) = β, T (e2) = (0, 1,−1) = γ, T (e3) = (0, 0, 0) = θ.

Now α = (x, y, z) = xe1 + ye2 + ze3, so that

T (α) = T (x, y, z) = xT (e1) + yT (e2) + zT (e3)= x(1, 0,−1) + y(0, 1,−1) + z(0, 0, 0) = (x, y,−x− y).

Ex 5.1.13 Find a linear mapping T : <3 → <3, whose image space is spanned by (1, 2, 3)and (4, 5, 6).

Solution: Let e1, e2, e3 be a standard basis of <3. Then there exists unique lineartransformation T such that T (e1) = (1, 2, 3) = α, T (e2) = (4, 5, 6) = β and T (e3) =(0, 0, 0) = θ. Let (x, y, z) ∈ <3, then

(x, y, z) = xe1 + ye2 + ze3

⇒ T (x, y, z) = xT (e1) + yT (e2) + zT (e3)= x(1, 2, 3) + y(4, 5, 6) + z(0, 0, 0)= (x+ 4y, 2x+ 5y, 3x+ 6y).

Since e1, e2, e3 is a basis of <3, α, β, θ generates the range space. Thus the range spaceis generated by L(α, β, θ) = L(α, β).

Deduction 5.1.2 Image of matrix mapping: Let A be any m×n matrix over a field F ,viewed as a linear map A : Tn → Tm. Now the usual basis vectors span Fn, so their imagesAei; i = 1, 2, · · ·n, which are precisely the columns of A, span the image of A. ThereforeImA = columnspace(A).

Ex 5.1.14 Consider the matrix mapping A : <4 → <3, where A =

1 2 3 11 3 5 −23 8 13 −3

. Find the

basis and dimension of the image of A.

Solution: By definition, the column space of A is the ImA. Therefore, reduce the matrixAT of coefficients to echelon form

1 1 32 3 83 5 131 −2 −3

∼

1 1 30 1 20 2 40 −3 6

∼

1 1 30 1 20 0 00 0 0

.

Thus the basis of ImA is (1, 1, 3), (0, 1, 2) and dim(ImA) = 2.


Theorem 5.1.4 Let V and W be two vector spaces over a field F and let T : V →W be alinear mapping. Then ImT is a subspace of W .

Proof: Let θV and θW be the null elements of V and W respectively. Since,

T (θV ) = θW ; so θW ∈ ImT.

Hence ImT is not an empty set. To prove the theorem, consider the following two cases:Case 1: Let ImT = θW , then obviously ImT is a subspace of W .Case 2: Let ImT 6= θW . Let γ, δ ∈ ImT , then ∃α, β ∈ ImT such that T (α) = γ andT (β) = δ, where γ, δ ∈W . Therefore, ∀a, b ∈ F , we have,

aγ + bδ = aT (α) + bT (β)= T (aα) + T (bβ); as T is linear= T (aα+ bβ); as T is linear

⇒ aγ + bδ ∈ ImT ; as aα+ bβ ∈W.

This proves that, ImT is a subspace of W . ImT is also called the range of T and is denotedby R(T ). The dimension of R(T ) is called the rank of T .

Ex 5.1.15 Let a linear transformation T : <3 → <3 be defined by

T (α) = T

a1

a2

a3

=

1 0 11 1 22 1 3

a1

a2

a3

= Aα.

Is T onto, one-to-one? Find basis for range T and ker T .

Solution: Given any β = (a b c)T ∈ <3, where a, b and c are any real numbers, can wefind α, so that T (α) = β. We seek a solution to the linear system1 0 1

1 1 22 1 3

a1

a2

a3

=

abc

and we find the reduced row echelon form of the augmented matrix to be

1 0 1... a

1 1 2... b− a

2 1 3... c− b− a

.

Thus a solution exists only for c− a− b = 0, so T is not onto. To find a basis for range T ,we note that

T (α) = Aα =

a1 + a3

a1 + a2 + 2a3

2a1 + a2 + 3a3

= a1

112

+ a2

011

+ a3

123

.

This means that

1

12

,

011

,

123

spans range T , i.e., range T is the subspace of <3

spanned by the columns of the matrix defining T . The first two vectors in this set are LI asthe third is the sum of the first two. Therefore, the first two vectors form a basis for range


T , and dim(rangeT ) = 2. To find kerT , we wish to find all α ∈ <3 so that T (α) = θ<3 .Solving the resulting homogeneous system, a1 + a3

a1 + a2 + 2a3

2a1 + a2 + 3a3

=

000

we find that a1 = −a3 and a2 = −a3. Thus kerT consists of all vectors of the form,k(−1,−1, 1)T , where k ∈ <. Moreover, dim(kerT ) = 1. As kerT 6= θ<3, it follows that Tis not one-to-one.

Theorem 5.1.5 Let V and W be finite dimensional vector spaces over a field F and T :V →W be linear. Let β = α1, α2, . . . , αn be a basis of V , then

R(T ) = Span (T (β)) = Span(T (α1) , T (α2) , . . . , T (αn)

).

Proof: Let γ ∈ ImT , then ∃ an element α ∈ V such that T (α) = γ. Since α ∈ V , we canwrite α = c1α1 + c2α2 + . . .+ cnαn, where c1, c2, . . . , cn are uniquely determined. Thus,

T (α) = T (c1α1 + c2α2 + . . .+ cnαn)= T (c1α1) + T (c2α2) + . . .+ T (cnαn)= c1T (α1) + c2T (α2) + . . .+ cnT (αn); asT is linear .

As each T (αi) ∈ ImT , it follows that ImT is generated by T (α1), T (α2), . . . , T (αn). Thus,T (α) has been completely determined by the elements of T (α1), T (α2), . . . , T (αn). When,kerT = θ, then α1, α2, . . . , αn is LI and in this case, T (α1), T (α2), . . . , T (αn) is abasis of ImT . Therefore, we conclude that, if α1, α2, . . . , αn span a vector space V , thenT (α1), T (α2), . . . , T (αn) span ImT , where T : V →W is linear.Note: This theorem provides a method for finding a spanning set for the range of a lineartransformation. For example, define a linear transformation

T : P2(<) →M2×2(<) by T (f(x)) =(f(1)− f(2) 0

0 f(0)

)Since β = 1, x, x2 is a basis for P2(<), we have

R(T ) = Span (T (β)) = Span(T (1), T (x), T

(x2))

= Span

((0 00 1

),

(−1 00 0

),

(0 00 1

),

(−3 00 0

))= Span

((0 00 1

),

(−1 00 0

))Thus we have found a basis for R(T ), and so dim (R(T )) = 2.

Deduction 5.1.3 Let T : V → W be a linear transformation of an n− dimensional vectorspace into a vector space W . Also, let S = α1, α2, · · · , αn be a basis of V . If α is anarbitrary vector in V , then L(α) is completely determined by T (α1), T (α2), . . . , T (αn).

Theorem 5.1.6 Let V and W be finite dimensional vector spaces over a field F and T :V →W be a linear mapping. Then ImT is finite dimensional.


Proof: Since V is finite dimensional, let dimV = n and S = α1, α2, . . . , αn be a basis ofV . If α ∈ ImT, then ∃ a vector β ∈ V such that, β = T (α). Since, α ∈ V ,

α =n∑

i=1

ciαi; for some ci ∈ F

β = T (n∑

i=1

ciαi) =n∑

i=1

ciT (αi); as T is linear.

Therefore, ImT = L(T (α1), T (α2), . . . , T (αn)), i.e., ImT is generated by a finite set, soit is finite dimensional. Note that,

(i) if kerT = θ, then the images of a LI set of vectors in V are LI in W and, thenT (α1), T (α2), . . . , T (αn) is a basis of ImT .

(ii) If T : V → V be a linear mapping on V such that kerT = θ, then a basis of V ismapped onto another basis of V .

Ex 5.1.16 A mapping T : <3 → <3 is defined byT (x1, x2, x3) = (x1 + x2 + x3, 2x1 + x2 + 2x3, x1 + 2x2 + x3),

where (x1, x2, x3) ∈ <3. Show that T is a linear mapping. Find ImT and dimImT .

Solution: It can be easily verified that T is a linear mapping. If α1, α2, α3 be a basisof the domain space <3, ImT is the linear span of the vectors of T (α1), T (α2), T (α3). Weknow, e1, e2, e3, where e1 = (1, 0, 0), e2 = (0, 1, 0), e3 = (0, 0, 1) is a standard basis of <3,where

T (e1) = (1, 2, 1), T (e2) = (1, 1, 2), T (e3) = (1, 2, 1).

Since T (e1) = T (e3), ImT is the linear span of the vectors (1, 2, 1) and (1, 1, 2). HenceImT = L(1, 2, 1), (1, 1, 2). Since the vectors (1, 2, 1) and (1, 1, 2) are linearly independent,therefore, dimImT = 2.

Ex 5.1.17 A mapping T : <3 → <4 is defined byT (x, y, z) = (y + z, z + x, x+ y, x+ y + z),

where (x, y, z) ∈ <3. Show that T (e1), T (e2), T (e3) is a basis of ImT , where e1 =(1, 0, 0), e2 = (0, 1, 0), e3 = (0, 0, 1) is a standard basis of <3.

Solution: First, we are to find kerT , for which, let α = (x1, x2, x3) ∈ <3 be such thatT (α) = θ1, then,

y + z = 0, z + x = 0, x+ y = 0, x+ y + z = 0⇒ x = y = z = 0, i.e., kerT = θ.

Using the definition of T , we have, T (e1) = (0, 1, 1, 1), T (e2) = (1, 0, 1, 1) and T (e3) =(1, 1, 0, 1). First we are to show that the set 0, 1, 1, 1), (1, 0, 1, 1), T (e3) = (1, 1, 0, 1) is LI.Let c1, c2, c3 ∈ < be the scalars such that

c1T (e1) + c2T (e2) + c3T (e3) = θ1

⇒ (c2 + c3, c1 + c3, c1 + c2, c1 + c2 + c3) = (0, 0, 0, 0)⇒ c1 = c2 = c3 = 0.

This shows that T (e1), T (e2), T (e3) is LI in <4 and also kerT = θ, hence T (e1), T (e2), T (e3)is a basis for <3.


Ex 5.1.18 Let T : P1 → P2 be a linear transformation for which we know that T (x+ 1) =x2 − 1 and T (x− 1) = x2 + x. Find T (7x+ 3) and T (ax+ b).

Solution: It can be easily verified that x+ 1, x− 1 is a basis for P1. Also, 7x+ 3 can bewritten as a linear combination as 7x+ 3 = 5(x+ 1) + 2(t− 1), therefore,

T (7x+ 3) = T (5(x+ 1) + 2(x− 1)) = 5T (x+ 1) + 2T (x− 1)= 5(x2 − 1) + 2(x2 + x) = 7x2 + 2x− 5.

Writing ax+ b as linear combination of the given basis vectors, we see that,

ax+ b =(a+ b

2

)(x+ 1) +

(a− b

2

)(x− 1)

⇒ T (ax+ b) = T

((a+ b

2

)(x+ 1) +

(a− b

2

)(x− 1)

)=(a+ b

2

)T (x+ 1) +

(a− b

2

)T (x− 1)

=(a+ b

2

)(x2 − 1) +

(a− b

2

)(x2 + x)

= ax2 +(a− b

2

)x−

(a+ b

2

).

Theorem 5.1.7 Let T : V → V be a linear transformation on a finite dimensional vectorspace V (F ). Then the following statements are equivalent:

(i) ImT ∩ kerT = θ.

(ii) T [T (α)] = θ ⇒ T (α) = θ; α ∈ V.

Proof: First, we suppose that (i) holds. Then,

T [T (α)] = θ ⇒ T (α) ∈ kerT and T (α) ∈ ImT ; as α ∈ V⇒ T (α) ∈ kerT ∩ ImT⇒ T (α) = θ; as ImT ∩ kerT = θ.

Thus (i) ⇒ (ii). Again, let us suppose that (ii) is true and if possible, let (i) be not true.Then ∃α(6= θ) ∈ ImT ∩ kerT. Now

α ∈ ImT ∩ kerT ⇒ α ∈ ImT and α ∈ kerT⇒ α = T (β); for some β ∈ V and T (α) = θ

⇒ T (α) = T [T (α)] = θ

⇒ T (β) = θ; by (ii)⇒ α = T (β) = θ,

which is a contradiction. Therefore ImT ∩ kerT = θ. Hence (ii) ⇒ (i). Therefore, thegiven statements are equivalent.

Theorem 5.1.8 (Sylvester’s Law) Let V and W be two vector spaces over a field F . Let Vbe a finite dimensional and T : V →W be linear, then

rank(T ) + nullity(T ) = dim(ImT ) + dim(kerT ) = dimV. (5.4)


Proof: We know, if V is finite dimensional vector space, then both kerT and ImT arefinite dimensional.Case 1: Let kerT = V, then T (α) = θW for any α ∈ V . Hence ImT = θW and sodim(ImT ) = 0. Thus,

dim(kerT ) + dim(ImT ) = dimV + 0 = dimV.

Hence the theorem holds good in this case.Case 2: Let kerT = θ and let α1, α2, . . . , αn be a basis of V , so that dimV = n. ThenT (α1), T (α2), . . . , T (αn) is a basis of ImT , so dimImT = n. Thus, dimkerT = 0 andtherefore,

dim(kerT ) + dim(ImT ) = 0 + n = dimV.

Case 3: Let kerT be a non-trivial proper subspace of V . Let S1 = α1, α2, . . . , αm be abasis of kerT , so that dim(kerT ) = m. Then, S1 is a LI subset of V and so it can be extendedto form a basis of V . Let the extended basis of V be S2 = α1, α2, . . . , αm, αm+1, . . . , αn.Therefore, S = T (αm+1), . . . , T (αn) is a finite basis of ImT . First we are to show that Sspans ImT . Let β ∈ ImT, so ∃α ∈ V such that β = f(α). Since S2 is a basis of V , we can

find a unique set of real numbers c1, c2, . . . , cn such that, α =n∑

i=1

ciαi. Then,

β = T (α) = T (n∑

i=1

ciαi) =n∑

i=1

ciT (αi); Since T is linear

=n∑

i=m+1

ciT (αi); as T (αi) = θ; i = 1, 2, . . . ,m.

Thus, every element of ImT is expressible as a linear combination of elements of S and so,S generates ImT . Now, we are to show that S is LI. Suppose that,

n∑i=m+1

ciT (αi) = θ ⇒n∑

i=m+1

T (ciαi) = θ; as T is linear

⇒n∑

i=m+1

ciαi ∈ KerT

⇒n∑

i=m+1

ciαi =m∑

j=1

bjαj ; as S1 generates KerT

⇒m∑

j=1

bjαj +n∑

i=m+1

(−ci)αi = θ

⇒ b1 = 0 = b2 = . . . = bm; am+1 = 0 = . . . = an; as S2 is LI⇒ am+1 = 0 = . . . = an; in particular.

This shows that S is LI and consequently, S is basis of ImT . Thus, rank(T ) = dim(ImT ) =n−m, nullity(T ) = dim(KerT ) = m. Therefore,

rank(T ) + nullity(T ) = (n−m) +m = n = dimV.

This theorem is called the dimension theorem.Result: Reflecting on the action of a linear transformation, we see intuitively that the largerthe nullity, the smaller the rank. In other words, the more vectors that are carried into 0,the smaller the range. The same heuristic reasoning tells us that the larger the rank, thesmaller the nullity. This balance between rank and nullity is made precise, appropriately bythe dimension theorem.


Deduction 5.1.4 If T : V → W is a linear transformation and dimV = dimW , then T isone-to-one if and only if T maps V onto W .

Proof: First of all suppose T be one to one so that KerT = θ, which means thatdim(KerT ) = 0. Using Sylvester’s theorem, we have,

dim(kerT ) + dim(ImT ) = dimV = dimW, by hypothesis.

Here, dim(kerT ) = 0, therefore, dim(ImT ) = dimW, shows that T is onto. Conversely, letT be onto. Then, ImT = W , which implies that dim(ImT ) = dimW . We have,

dim(kerT ) + dim(ImT ) = dimV = dimW

dimW + dim(kerT ) = dimW

⇒ dim(kerT ) = 0 ⇒ kerT = θ.

Hence T is one to one.

Deduction 5.1.5 Surprisingly, the condition of one-to-one and onto are equivalent in aimportant special case.

Let V and W be vector space of equal(finite) dimension, and let T : V → W be linear.Then the following statements are equivalent:

(i) T is one-to-one.

(ii) T is onto.

(iii) rank(T ) = dim(V ).

Ex 5.1.19 Determine the linear mapping T : <3 → <3that maps the basis (0, 1, 1), (1, 0, 1),(1, 1, 0) of <3 to the (2, 1, 1), (1, 2, 1), (1, 1, 2) respectively. Find KerT and ImT . Verify thatdimKerT + dimImT = 3.

Solution: Let α1 = (0, 1, 1), α2 = (1, 0, 1), α3 = (1, 1, 0) and β1 = (2, 1, 1), β2 = (1, 2, 1), β3 =(1, 1, 2) ∈ R3. Let α = (x, y, z) be an arbitrary vector in R3, ∃ unique scalars c1, c2, c3 ∈ Rsuch that

α = c1α1 + c2α2 + c3α3

⇒ (x, y, z) = c1(0, 1, 1) + c2(1, 0, 1) + c3(1, 1, 0)⇒ (x, y, z) = (c2 + c3, c1 + c3, c1 + c2)⇒ c2 + c3 = x, c1 + c3 = y, c1 + c2 = z.

⇒ c1 =12(y + z − x), c2 =

12(x− y + z), c3 =

12(x+ y − z).

Since T is linear, henceT (α) = c1T (0, 1, 1) + c2T (1, 0, 1) + c3T (1, 1, 0)

⇒ T (α) = c1β1 + c2β2 + c3β3

⇒ T (α) =12(y + z − x)(2, 1, 1) +

12(x− y + z)(1, 2, 1) +

12(x+ y − z)(1, 1, 2)

⇒ T (x, y, z) = (y + z, x+ z, x+ y); (x, y, z) ∈ <3

which is the required linear transformation. Now let (x, y, z) ∈ <3, then

y + z = 0, x+ z = 0, x+ y = 0 ⇒ x = y = z = 0.


Hence KerT = θ and so dimKerT = 0. Also, ImT is the linear span of vectorsT (α1), T (α2), T (α3), i.e.

ImT = L(T (α1), T (α2), T (α3)

).

where α1, α2, α3 is any basis of the domain space <3. Since (0, 1, 1), (1, 0, 1), (1, 1, 0) isa basis of <3, so

ImT = L(

(2, 1, 1), (1, 2, 1), (1, 1, 2)).

Now, as the set of vectors (2, 1, 1), (1, 2, 1), (1, 1, 2) is linearly independent, dim ImT = 3.Hence,

dim KerT + dim ImT = 0 + 3 = 3

Ex 5.1.20 Let T : R2 → R2 be defined by T (x, y) = (x+ y, x). Prove that T is one-to-oneand onto.

Solution: It is easy to verify that T is a linear transformation. It is easy to see thatN(T ) = θ; so T is one-to-one. Since R2 is finite dimensional. So, T must be onto.

Ex 5.1.21 Determine the linear mapping T : R3 → R2 which maps the basis vectors(1, 0, 0), (0, 1, 0), (0, 0, 1) of R3 to the vectors (1, 1), (2, 3), (3, 2) respectively. Find KerTand ImT .

Solution: Let α = (x, y, z) be an arbitrary vector in <3, then

α = (x, y, z) = x(1, 0, 0) + y(0, 1, 0) + z(0, 0, 1)⇒ T (x, y, z) = xT (1, 0, 0) + yT (0, 1, 0) + zT (0, 0, 1)

= x(1, 1) + y(2, 3) + z(3, 2) = (x+ 2y + 3z, x+ 3y + 2z),

which is the required linear transformation. To find kerT , let β = (x, y, z) ∈ <3 be suchthat T (β) = θ, then

x+ 2y + 3z = 0 = x+ 3y + 2z ⇒ y − z = 0.

Now, if β = (0, 1, 1), then kerT = Lβ and so dimkerT = 1. Let γ be an arbitrary vectorin ImT , then γ can be expressed in the form

γ = x(1, 1) + y(2, 3) + z(3, 2).

Now, (1, 1), (2, 3), (3, 2) forms a LD set but (1, 1), (2, 3) forms a LI set, so that ImT =(1, 1), (2, 3) and dimImT = 2.

Ex 5.1.22 Let T : P2(<) → P3(<) be a linear transformation defined by T (f(x)) = 2f ′(x)+∫ x

0

3f(t) dt. Prove that T is injective.

Solution: Let β = 1, x, x2 be a basis of P2(R), then

R(T ) = span(T (1), T (x), T (x2)

)= span

(3x, 2 +

32x2, 4x+ x3

).

Since3x, 2 + 3

2x2, 4x+ x3

is linearly independent, rank(T ) = 3. Since dim (P3(R)) = 4,

T is not onto. From the dimension theorem

nullity(T ) + 3 = 3 ⇒ nullity(T ) = 0 ⇒ N(T ) = θ.

Therefore T is one-to-one.


Deduction 5.1.6 Let us consider a system of m linear equations in n unknowns as Ax = B,where the coefficient matrix A may be viewed as a linear mapping A : Fn → Fm. Thus, thesolution of the equation Ax = B may be viewed as the preimage of the vector B ∈ Fm underthe linear mapping A. Further, the solution of the associated homogeneous system Ax = θmay be viewed as the linear mapping A. Applying Sylvester’s theorem to this homogeneoussystem yields

dim(kerA) = dimFn − dim(ImA) = n− rankA.

But n is exactly the number of unknowns in the homogeneous system Ax = θ. Note that, ifr be the rank of the coefficient matrix A, which is also the number of pivot variables in anechelon form of Ax = θ, so s = n − r is also the number of free variables. Accordingly, asdimW = s, they form a basis for the solution space W .

Theorem 5.1.9 (Linear mapping with prescribed images ): Let V and W be twovectors spaces over the same field F . Let α1, α2, . . . , αn be an ordered basis of the finitedimensional vector space V and β1, β2, . . . , βn be an arbitrary set (not necessarily distinct)of n vectors in W . Then ∃ a unique linear mapping T : V → W such that T (αi) = βi, fori = 1, 2, . . . , n.

Proof: To prove this theorem, there are three basic steps. Let α be an arbitrary elementof V .Step 1: Mapping definition : Since α1, α2, . . . , αn is a basis of V , ∃ unique n tuple(c1, c2, . . . , cn) such that

α = c1α1 + c2α2 + · · ·+ cnαn.

For this vector α let us define a mapping T : V →W by

T (α) = c1β1 + c2β2 + · · ·+ cnβn

Then T is a well defined null of associating with each vector α in V a vector T (α) in W .From the definition it is clear that T (αi) = βi ; i = 1, 2, . . . , n. Since the constants ci’s areunique, T is well defined.Step 2: Linearity of mapping : Here we are to prove that T is linear. For this let α, β ∈ Vwhere α =

n∑i=1

ciαi and β =n∑

i=1

diαi, where the unique scalars ci, di ∈ F determined by the

basis α1, α2, . . . , αn. Then for a, b ∈ F , we have

T (aα+ bβ) = T

[n∑

i=1

(aci + bdi)αi

]=

n∑i=1

(aci + bdi)T (αi); as aci + bdi ∈ F

=n∑

i=1

(aci + bdi)βi = an∑

i=1

ciβi + bn∑

i=1

diβi = aT (α) + bT (β)

Therefore T is linear.Step 3: Uniqueness: To prove the uniqueness of T , let us assume, ∃ another linear mappingU : V →W such that U(αi) = βi, for i = 1, 2, . . . , n. Thus

U(α) = U

(n∑

i=1

ciαi

)=

n∑i=1

ciU(αi) ; as f linear

=n∑

i=1

ciβi = T (α) ; ∀ α ∈ V

⇒ U = T, i.e.T is unique.


Hence the theorem is proved. This shows that LT T with T (αi) = βi is unique. This the-orem tells us that a linear mapping is completely determined by its values on the elementsof a basis.Corollary: Let V andW be vector space and suppose that V has a finite basis α1, α2, . . . , αn.If U, T : V →W are linear and U (αi) = T (αi) for i = 1, 2, . . . , n, then U = T .

Let T : <2 → <2 be a linear transformation defined by T (x, y) = (2y−x, 3x) and supposethat U : <2 → <2 is linear. If we know that U(1, 2) = (3, 3) and U(1, 1) = (1, 3) then U = T .This follows from the corollary and from the fact that (1, 2), (1, 1) is a basis for <2.

Ex 5.1.23 Let T : <2 → < where T (1, 1) = 3 and T (0, 1) = −2. Find T (x, y).

Solution: Let α = (1, 1), β = (0, 1) and (1, 1), (0, 1) is a basis of <2. Hence the linearmap T (x, y) exists and is unique. For a, b ∈ <, the linear combination of the vectors α andβ is given by,

aα+ bβ = a(1, 1) + b(0, 1) = (a, a+ b)⇒ T (aα+ bβ) = aT (α) + bT (β)⇒ T (a, a+ b) = 3a− 2b⇒ T (x, y) = 3x− 2(y − x) = 5x− 2y,

which is the unique linear transformation.

Ex 5.1.24 Describe the linear operator T on <3 that makes the basis vectors (1, 0, 0),(0, 1, 0), (0, 0, 1) to (1, 1, 1), (0, 1,−1) and (1, 2, 0) respectively. Find T , T (1, 1,−1) andT (2, 2,−2). [WBUT 2003]

Solution: Let α1 = (1, 0, 0), α2 = (0, 1, 0), α3 = (0, 0, 1). Also, let β = (b1, b2, b3) be anyelement of <3. We have to determine the expression for T (b1, b2, b3).

Now, there exist scalars c1, c2 and c3 such that β = c1α1 + c2α2 + c3α3. That is,(b1, b2, b3) = c1(1, 0, 0) + c2(0, 1, 0) + c3(0, 0, 1) = (c1, c2, c3).

These equations give, c1 = b1, c2 = b2, c3 = b3. Therefore,

T (β) = T (c1α1 + c2α2 + c3α3) = c1T (α1) + c2T (α2) + c3T (α3)= c1(1, 1, 1) + c2(0, 1,−1) + c3(1, 2, 0)= (c1 + c3, c1 + c2 + 2c3, c1 − c2).

Thus the required transformation isT (x, y, z) = (x+ z, x+ y + 2z, x− y).

Therefore,T (1, 1,−1) = (0, 0, 0) and T (2, 2,−2) = (0, 0, 0).

Ex 5.1.25 Find T (x, y) where T : <2 → <3 is defined by T (1, 2) = (3,−1, 5), T (0, 1) =(2, 1,−1). Also, find T (1, 1).

Solution: Let α1 = (1, 2) and α2 = (0, 1). Let β = (b1, b2) be any element of <2. Now,there exists scalars c1, c2, c3 such that β = c1α1 + c2α2. That is,

(b1, b2) = c1(1, 2) + c2(0, 1) = (c1, 2c1 + c2).Therefore, b1 = c1 and b2 = 2c1 + c2, i.e., c1 = b1, c2 = b2 − 2c1 = b2 − 2b1. Hence,

T (β) = c1T (α1) + c2T (α2) = c1(3,−1, 5) + c2(2, 1,−1)= (3c1 + 2c2,−c1 + c2, 5c1 − c2).

That is, T (x, y) = (−x+ 2y,−3x+ y, 7x− y) and hence T (1, 1) = (1,−2, 6).

Isomorphism 311

5.2 Isomorphism

Let V and W be vector spaces over a field F . A linear mapping T : V → W is said to anisomorphism if T is both one-to-one and onto. In case, there exists an isomorphism of Vand W , we say that V is isomorphic to W and we write V ∼= W .

(i) Since T is both one-to-one and onto, T is invertible and T−1 : W → V is also a linearmapping which is both one-to-one and onto.

(ii) The existence of an isomorphism T : V →W implies the existence of another isomor-phism T−1 : W → V . In this case, V and W are said to be isomorphic.

The importance of isomorphism lies in the fact that an isomorphic image of a vector spacemay be easier to study or to visualize than the original vector space. For example, <n

is easier to visualize than the general n dimensional real vector space. Note that not allproperties of a vector space are reflected in its isomorphic images, for

(i) to study a vector through its isomorphic image, we need to fix a basis of the vectorspace.

(ii) there may be operations and properties which are specific to a given vector space butare not relevant in an isomorphic image.

Foe example, the product of two elements of Pn is defined in a natural way, but has nocounterpart in <n. Similarly angles and distances by an isomorphism from <n to itself.

Ex 5.2.1 If C be the vector space of all complex numbers over the field <, of all real numbers,then the mapping T : C → <2 defined by T (a+ ib) = (a, b); ∀(a+ ib) ∈ C is an isomorphism.

Solution: First we are to show that T is a linear transformation. For that, let α1, α2 ∈ Cwhere α1 = a1 + ib1 and α2 = a2 + ib2; ai, bi ∈ <. Now, ∀ x, y ∈ <, we have,

T (xα1 + yα2) = T [x(a1 + ib1) + y(a2 + ib2)]= T [(xa1 + ya2) + i(xb1 + yb2)]= (xa1 + ya2, xb1 + yb2); bydefinition

= x(a1, b1) + y(a2, b2) = xT (α1) + yT (α2); ∀ α1, α2 ∈ C.

Hence T is linear mapping. Now,

T (α1) = T (α2) ⇒ T (a1 + ib1) = T (a2 + ib2)⇒ (a1, b1) = (a2, b2)⇒ a1 = a2 ; b1 = b2

⇒ a1 + ib1 = a2 + ib2 ⇒ α1 = α2; ∀ α1, α2 ∈ C

Therefore T is one-to-one. Also for each (a, b) ∈ <2, ∃ a complex number (a+ ib) such thatT (a+ ib) = (a, b). So T is onto. Hence T is an isomorphism and therefore C(<) ∼= <2(<).

Ex 5.2.2 The linear mapping T : R3 → R3 maps the vectors (1, 2, 3), (3, 0, 1) and (0, 3, 1)to (−3, 0,−2), (−5, 2,−2) and (4,−1, 1) respectively. Show that T is an isomorphism.

Solution: α = (x, y, z) be an arbitrary vector in <3, then

α = (x, y, z) = c1(1, 2, 3) + c2(3, 0, 1) + c3(0, 3, 1)= (c1 + 3c2, 2c1 + 3c3, 3c1 + c2 + c3)

⇒ c1 + 3c2 = x, 2c1 + 3c3 = y, 3c1 + c2 + c3 = z

⇒ c1 =16(−x− y + 3z), c2 =

118

(7x+ y − 3z), c3 =19(x+ 4y − 3z).


Thus the linear transformation becomes,

T (x, y, z) = c1(1, 2, 3) + c2(3, 0, 1) + c3(0, 3, 1)

=(−3x+ y − 6z

3,2x− y

3,−x+ 2y − 3z

3

).

Now, we are to show that T is one-to-one, for this we are to show that kerT = θ. Now,T (α) = θ gives,

−3x+ y − 6z3

= 0,2x− y

3= 0,

−x+ 2y − 3z3

= 0

⇒ x = y = z = 0,

so that kerT = θ and so T is one-to-one. Also, for each (x, y, z) ∈ <3, ∃ real numbers−3x+y−6z

3 , 2x−y3 , −x+2y−3z

3 such that T (x, y, z) =(−3x+y−6z

3 , 2x−y3 , −x+2y−3z

3

)holds. So

ImT = <3 and therefore T is onto and consequently T is an isomorphism.

Theorem 5.2.1 Let V be the finite dimensional vector space of dimension n over a fieldF . Then every n-dimensional vector space V (F ) is isomorphic to the n-tuple space Fn(F ).

Proof: Since dimV = n, let α1, α2, . . . , αn be a basis of the finite dimensional vectorspace V (F ). Then, for each α ∈ V , ∃ a unique ordered set c1, c2, . . . , cn of scalars such

that α =n∑

i=1

ciαi. Let us consider the mapping T : V → Fn defined by,

T (α) = (c1, c2, . . . , cn); ∀ α ∈n∑

i=1

ciαi.

If α =n∑

i=1

ciαi and β =n∑

i=1

diβi be any two elements of V , then ∀ x, y ∈ F , we have

T (xα+ yβ) = T

[x

(n∑

i=1

ciαi

)+ y

(n∑

i=1

diβi

)]= T

[n∑

i=1

(xciαi + ydiβi)

]= (xc1α1 + yd1β1, xc2α2 + yd2β2, . . . , xcnαn + ydnβn)= (xc1α1, xc2α2, . . . , xcnαn) + (yd1β1, yd2β2, . . . , ydnβn)= x(c1α1, c2α2, . . . , cnαn) + y(d1β1, d2β2, . . . , dnβn)

= xT

(n∑

i=1

ciαi

)+ yT

(n∑

i=1

diβi

)= xT (α) + yT (β); ∀ α, β ∈ V

Here T is homomorphism. Also,

T (α) = T (β) ⇒ T

(n∑

i=1

ciαi

)= T

(n∑

i=1

diβi

)⇒ (c1, c2, . . . , cn) = (d1, d2, . . . , dn)⇒ ci = di; ∀ i = 1, 2, . . . , n

⇒n∑

i=1

ciαi =n∑

i=1

diβi ⇒ α = β

This shows that T is one-one. Again, corresponding to each vector (c1, c2, . . . , cn) ∈ Fn, ∃

a vectorn∑

i=1

ciαi in V such that, T(

n∑i=1

ciαi

)= (c1, c2, . . . , cn). Therefore, T is onto. Thus

Isomorphism 313

T is an isomorphism and hence V ∼= Fn(F ). Thus the mapping α → [α]S , S be any basisof a vector space V (of dim n), which maps each α ∈ V into the co-ordinate vector [α]S isan isomorphism between V and Fm. Note the following facts:

(i) We are to establish the isomorphicity of T on the choice of the ordered basis (c1, c2, . . . , cn).Different isomorphisms can be established on the choice of different ordered basis.

(ii) Since T : V → Fn is an isomorphism, both V and Fn have the same structure asa vector space except for the names of their elements. Therefore, Fn serves as aprototype of a vector space V over F of dimension n.

(iii) A real vector space V of dimension n and the vector space <n of n−tuple are isomorphicand therefore they have the same structure as vector spaces.

(iv) Let (i1, i2, · · · , in) be a fixed permutation of (1, 2, · · · , n). Then the mapping T definedby T (x1, x2, · · · , xn) = (xi1 , xi2 , · · · , xin

) is an isomorphism from Fn to itself. Thistransformation is called permutation transformations.

All the concepts and properties defined only through vector addition and scalar multiplica-tion are valid in any isomorphic image of a vector space.

Theorem 5.2.2 If W1 and W2 be the complementary subspace of vector space V (F ), thenthe correspondence that assigns to each vector α ∈W2, the coset (W1+α) is an isomorphismbetween W2 and (V/W1).

Proof: Since W1,W2 are the complementary subspace of the vector space V (F ), we haveV = W1 ⊕W2; where V = W1 +W2 and W1 ∩W2 = θ.

Let us consider the mapping T : W2 → (V/W1), defined by T (α) = W1 + α; ∀ α ∈ W2.Now, ∀ α, β ∈W2 and x, y ∈ F , we have

T (aα+ bβ) = W1 + (aα+ bβ) = (W1 + aα) + (W1 + bβ)= a(W1 + α) + b(W1 + β) = aT (α) + bT (β)

This shows that T is a linear transformation. Also, for ∀ α, β ∈W2 we have,T (α) = T (β) ⇒ W1 + α = W1 + β

⇒ α− β ∈W1

⇒ α− β ∈W1 ∩W2 ; as α, β ∈W2 ⇒ α− β ∈W2

⇒ α− β = θ ; as W1 ∩W2 = θ⇒ α = β.

Now, let (W1 +γ) be an arbitrary element of V/W1 so that γ ∈ V. Thus γ = α+β, for someα ∈W1. Hence

W1 + γ = W1 + (α+ β) = (W1 + α) + (W1 + β)= W1 + β = T (β); as α ∈W1 ⇒W1 + α = W1

Thus, corresponding to each member W1 + γ of V/W1, ∃ β ∈W2 such that T (β) = W1 + γ.So T is onto. Hence T is an isomorphism and hence W2

∼= (V/W1).

Theorem 5.2.3 Two finite dimensional vector spaces over the same field are isomorphic ifand only if they are of the same dimension.


Proof: First let V and W are two finite dimensional vector spaces over the same field Fsuch that dimV = dimW = n. Let S1 = α1, α2, . . . , αn and S2 = β1, β2, . . . , βn be thebasis of V and W respectively. Then ∀ α ∈ V, ∃ an unique ordered set c1, c2, . . . , cn of

scalars such that α =n∑

i=1

ciαi. Let us consider the mapping T : V →W , defined by

T (α) =n∑

i=1

ciβi ; ∀ α =n∑

i=1

ciαi ∈ V.

The map is well defined. Now, if α =n∑

i=1

ciαi and β =n∑

i=1

diαi for any two vectors in V ,

then ∀ a, b ∈ F , we have,

T (aα+ bβ) = T

(n∑

i=1

(aciαi + bdiαi)

)= T

(n∑

i=1

(aci + bdi)αi

)

=n∑

i=1

(aci + bdi)αi = a

n∑i=1

ciαi + b

n∑i=1

diαi

= aT (n∑

i=1

ciαi) + bT (n∑

i=1

diαi) = aT (α) + bT (β).

This shows that T is a linear transformation or homomorphism. Also, if α =n∑

i=1

ciαi and

β =n∑

i=1

diαi be such that

T (α) = T (β) ⇒ T (n∑

i=1

ciαi) = T (n∑

i=1

diαi)

⇒n∑

i=1

ciαi =n∑

i=1

diαi, i.e.,n∑

i=1

(ci − di)αi = θ

⇒ ci − di = 0; ∀i, as S2 is LI

⇒n∑

i=1

ciαi =n∑

i=1

diαi ⇒ α = β.

Therefore T is one-one. Again, if γ ∈W , then, for some scalars ci’s we have,

γ =n∑

i=1

ciβi; as S2 generates W

= T (n∑

i=1

ciαi), where,n∑

i=1

ciαi ∈ V.

Thus, for each γ ∈ V ∃n∑

i=1

ciαi ∈ V such that T (n∑

i=1

ciαi) = γ. This shows that T is onto

and hence V ∼= W. Conversely, let V ∼= W and let T be corresponding isomorphism. LetS1 = α1, α2, . . . , αn be a basis of V . Then, we claim that S2 = T (α1), T (α2), . . . , T (αn)is a basis of W . S2 is linearly independent, since

c1T (α1) + c2T (α2) + · · ·+ cnT (αn) = θ

⇒ T (c1α1 + c2α2 + · · ·+ cnαn) = T (θ); since T is LT⇒ c1α1 + c2α2 + · · ·+ cnαn = θ; since T is one-one⇒ c1 = c2 = · · · = cn = 0; as S1 is LI .

Isomorphism 315

Now, in order to show that V = L(T (α1), T (α2), . . . , T (αn)), let β be an arbitrary elementof V , then T being onto, ∃ α ∈ V such that T (α) = β. Thus, for some scalars ci, we have,

β = T (α) = T

(n∑

i=1

ciαi

); as S1 generates V

=n∑

i=1

ciT (αi); as T is linear.

Thus vector in V is a linear combination of elements of S2 and so, S2 generates V . HenceS2 is a basis of V and so, dimV = dimW .

Theorem 5.2.4 Let V and W be finite dimensional vector spaces over a field F and φ :V → W be an isomorphism. Then for a set of vectors S in V , S is linearly independent inV if and only if φ(S) is linearly independent in W.

Proof: Let S = α1, α2, . . . , αn be a linearly independent set in V . For some scalarsc1, c2, · · · , cn ∈ F , let us consider the relation, c1φ(α1) + c2φ(α2) + . . . + cnφ(αn) = θW ,which implies,

φ(c1α1 + c2α2 + . . .+ cnαn) = θW

⇒ c1α1 + c2α2 + . . .+ cnαn = θV , as φ is isomorphism⇒ c1 = c2 = · · · = cn = 0,

as α1, α2, . . . , αn is linearly independent set in V . Therefore, φ(α1), φ(α2), · · · , φ(αn) islinearly independent in W . Conversely, let β1, β2, . . . , βr be a linearly independent set inW . Since φ is isomorphism, ∃ unique elements α1, α2, . . . , αr in V such that

φ(αi) = βi, i = 1, 2, · · · , r

and therefore, S = α1, α2, . . . , αr. For some scalars c1, c2, · · · , cr ∈ F , let us consider therelation, c1α1 + c2α2 + . . .+ crαr = θV , which implies,

φ(c1α1 + c2α2 + . . .+ crαr) = θW

⇒ c1φ(α1) + c2φ(α2) + . . .+ crφ(αr) = θW , as φ is isomorphismc1β1 + c2β2 + . . .+ crβr = θW

⇒ c1 = c2 = · · · = cn = 0,

as β1, β2, . . . , βr be a linearly independent set in W and therefore, α1, α2, . . . , αn islinearly independent set in V .

Theorem 5.2.5 Any homomorphism of a finite dimensional vector space onto itself is anisomorphism.

Proof: Let T be a homomorphism of a finite dimensional vector space V (F ) onto itself.Let dimV = n and S = α1, α2, . . . , αn be a basis of V . We are to show that S1 =T (α1), T (α2), . . . , T (αn) is also a basis of V . Let α ∈ V . Since T is onto, ∃ β ∈ V , suchthat f(β) = α. Let,

β = c1α1 + c2α2 + · · ·+ cnαn ; ci ∈ F⇒ α = T (β) = T (c1α1 + c2α2 + · · ·+ cnαn)

= c1T (α1) + c2T (α2) + · · ·+ cnT (αn) ; T is linear

Thus, each α ∈ V is expressible as a linear combination elements of S1 generates V . Now,dimV = n and S1 is a set of n vectors of V , generating V , so S1 is a linearly independent


set. Now we shall show that T is one-one. For this, let α =n∑

i=1

ciαi and β =n∑

i=1

diαi be

two elements of V such that T (α) = T (β). Then

T (α) = T (β) ⇒ T

[n∑

i=1

ciαi

]= T

[n∑

i=1

diαi

]

⇒n∑

i=1

ciT (αi) =n∑

i=1

diT (αi) ; T is LI

⇒n∑

i=1

(ci − di)T (αi) = θ

⇒ ci − di = 0 ; ∀ i as S1 is LI

⇒n∑

i=1

ciαi =n∑

i=1

diαi ⇒ α = β ; ∀ α, β ∈ V

Hence T is also one-one and consequently it is an isomorphism.

5.3 Vector Space of Linear Transformation

In this section, we are to give algebraic operations on the set of all linear mappings.

Theorem 5.3.1 (Algebraic operations on the set of all linear mappings) : Let Vand W be two vector spaces over the same field F and let T : V → W, S : V → W be twolinear mappings. Then the set L(V,W ) of all collection of linear transformations from Vinto W , is a vector space with respect to the sum T+S : V →W and the scalar multiplicationcT : V →W , defined by

(i) (T + S)(α) = T (α) + S(α); ∀α ∈ V.

(ii) (cT )(α) = cT (α); ∀α ∈ V and c ∈ F.

Proof: First we are to show that if T and S are linear then T + S and cT are also lineartransformations. For this, let α, β ∈ V , then

(T + S)(α) = T (α) + S(α); (T + S)(β) = T (β) + S(β)(T + S)(α+ β) = T (α+ β) + S(α+ β); by definition

= T (α) + T (β) + S(α) + S(β); T, S linear= [T (α) + S(α)] + [T (β) + S(β)]= (T + S)(α) + (T + S)(β).

For any scalar k ∈ F, we have,

(T + S)(kα) = T (kα) + S(kα); by definition= kT (α) + kT (α); T, S linear= k[T (α) + S(α)] = k(T + S)(α).

Hence T + S is linear mapping. Again,

(cT )(α+ β) = cT (α+ β); by definition= c[T (α) + T (β)]; T is linear= cT (α) + cT (β) = (cT )(α) + (cT )(β).

Vector Space of Linear Transformation 317

and (cT )(kα) = c[T (kα)]; by definition, k ∈ F= c[kT (α)]; T is linear= ckT (α) = k(cT )(α).

Therefore, cT is also a linear transformation. Thus the scalar multiplication composition iswell defined. It is easy to prove that 〈L(V,W ),+〉 is an abelian group. However, it may benoted that the mapping

θ : V →W : θ(α) = θ;∀α ∈ V

is a linear transformation and θ is the zero of L(V,W ). Also, for each T ∈ L(V,W ), themapping (−T ), defined by

(−T )(α) = −T (α); ∀α ∈ V

is a linear transformation, which is the additive inverse of T . Also, the two compositionssatisfy the following properties

(i) [k(T + S)](α) = k[(T + S)(α)] = k[T (α) + S(α)]; k ∈ F= kT (α) + kS(α)= (kT )(α) + (kS)(α) = (kT + kS)(α); ∀α ∈ V

⇒ k(T + S) = kT + kS.

(ii) [(m+ n)T ](α) = (m+ n)T (α); m,n ∈ F= mT (α) + nT (α) = (mT )(α) + (nT )(α)= (mT + nT )(α); ∀α ∈ V

⇒ (m+ n)T = mT + nT.

(iii) [(mn)T ](α) = (mn)T (α); m,n ∈ F= m[nT (α)] = m[(nT )(α)]= [m(nT )](α); ∀α ∈ V

⇒ (mn)T = m(nT ).

(iv) (1T )(α) = 1T (α) = T (α) ⇒ 1T = T,

1 is the identity element in F . Therefore, L(V,W ) is a vector space. This linear spaceL(V,W ) of all linear mappings has domain V and co-domain W , as the linear mappingT : V →W is also a homomorphism of V into W , the linear space L(V,W ) is also denotedby Hom(V,W ). Two particular cases of the linear space L(V,W ) are of profound interest.The first one is L(V, V ) and the second one is L(V, F ), where F is a vector space over Fitself. The linear space L(V, F ) is said to be the dual space of V .

Theorem 5.3.2 Let V and W be two finite dimensional vector spaces over the same field F .Let dimV = m and dimW = n, then, the vector space L(V,W ) of all linear transformationsfrom V into W is of dimension mn.

Proof: Let B1 = α1, α2, · · · , αm and B2 = β1, β2, · · · , βn be the ordered bases of Vand W respectively. Then, for each pair of positive integers (x, y), where 1 ≤ x ≤ n and1 ≤ y ≤ m, there exists a linear transformation fxy : V →W such that,


fxy = δixβx; δix = Kronecker’s delta .

We shall show that the set S = fxy : 1 ≤ x ≤ n; 1 ≤ y ≤ m is a basis of L(V,W ). It isclear that S contains mn elements. Now, for some axy’s, we have

n∑x=1

m∑y=1

axyfxy = θ; θ is the zero of L(V,W )

⇒

[n∑

x=1

m∑y=1

axyfxy

](αi) = θ(αi) = θ

⇒n∑

x=1

m∑y=1

axyfxy(αi) =n∑

x=1

m∑y=1

axyδixβx = θ

⇒n∑

x=1

axiβx(αi) = θ; ∀i, (1 ≤ i ≤ m)

⇒ a1i = 0, a2i = 0, · · · , ani = 0; ∀i, as B2 is LI⇒ axy = 0, 1 ≤ x ≤ n; 1 ≤ y ≤ m.

This shows that S is LI. Now, let f be an arbitrary element of L(V,W ). Then f(αi) ∈ Wfor each i = 1, 2, · · · ,m. Let f(αi) = a1iβ1 + a2iβ2 + · · ·+ aniβn. Now,[

n∑x=1

m∑y=1

axyfxy

](αi) =

n∑x=1

m∑y=1

axyfxy(αi) =n∑

x=1

m∑y=1

axyδixβx

=n∑

x=1

axiβx = f(αi)

⇒ f =n∑

x=1

m∑y=1

axyfxy.

Thus, every element of L(V,W ) is a linear combination of elements of S, i.e., S generatesL(V,W ). So, S is the basis of L(V,W ). Hence, dimL(V,W ) = mn.

5.3.1 Product of Linear Mappings

Let U, V and W be three vector spaces over the same field F (not necessarily of samedimension). Let T : U → V and S : V → W be two linear mappings. Then, their product,denoted by S0T : U →W is the composite mapping, ( composite of S with T ) defined by

(ST )(α) = S (T (α)) ; ∀ α ∈ U

Generally ST is denoted by ST and is also said to be the product mapping ST . In generalST 6= TS.

Ex 5.3.1 Let T : P2 → P2 and S : P2 → P2 be defined by T (ax2 + bx + c) = 2ax + b andS(ax2 + bx+ c) = 2ax2 + bx. Compute TS and ST .

Solution: To compute TS and ST we have,

(TS)(ax2 + bx+ c) = T (S(ax2 + bx+ c)) = T (7ax2 + bx) = 14ax+ b.

(ST )(ax2 + bx+ c) = S(T (ax2 + bx+ c)) = S(2ax+ b) = 7ax.

Theorem 5.3.3 The product of two linear transformation is linear.


Proof: Let U, V and W be three vector space over the same field F . Let T : U → V andS : V → W be two linear mappings. We are to show that ST : U → W is linear. Let,α, β ∈ U and a, b ∈ F . Then we have,

(ST )(aα+ bβ) = S [T (aα+ bβ)] ; by definition= S [aT (α) + bT (β)] ;T is linear= aS [T (α)] + bS [T (β)] ;S is linear= a(ST )(α) + b(ST )(β).

This proves that the composite mapping ST : U →W is linear.

Theorem 5.3.4 The product of linear mappings is associative.

Proof: Let T1 : V1 → V2, T2 : V2 → V3, T3 : V3 → V4 be three linear transformation suchthat T3T2T1 is well defined. Let α ∈ V1. Then

[T3(T2T1)](α) = T3(T2T1)(α) = T3[T2T1(α)]= (T3T2)[T1(α)] = [(T3T2)T1](α)

⇒ T3(T2T1) = (T3T2)T1

Hence the product of mappings is associative.

Theorem 5.3.5 Let U, V and W be finite dimensional vector space over the same fieldF . If T and S are linear transformations from U into V and V into W respectively, then,rank(TS) ≤ rank(T ) and rank(TS) ≤ rank(S).

Proof: Here, T : U → V and S : V → W be two linear mappings. Since, S(U) ⊂ V , wehave,

S(U) ⊂ V ⇒ T [S(U)] ⊂ T (V )⇒ (TS)(U) ⊂ T (V )⇒ range(TS) ⊂ range(T )

⇒ dim[range(TS)] ≤ dim [range(T )] .

Hence, rank(TS) ≤ rank(T ). Again,

dim [TS(U)] ≤ dimS(U)⇒ rank(TS) = dim[(TS)(U)]⇒ rank(TS) = dim [TS(U)] ≤ dimS(U) = rank(S).

Ex 5.3.2 Let S and T be linear mappings of <2 to <2 defined by S(x, y) = (x+y, y), (x, y) ∈<2 and T (x, y) = (x, x+ y), (x, y) ∈ <2. Determine TS, ST , (TS − ST )2, (ST − TS)2.

Solution: Both linear mappings S and T are defined in <2. The composite mappingST : <2 → <2 is given by

ST (x, y) = S(T (x, y)) = S(x, x+ y)= (2x+ y, x+ y); (x, y) ∈ <2.

TS(x, y) = T (S(x, y)) = T (x+ y, y)= (x+ y, x+ 2y); (x, y) ∈ <2.


To evaluate (TS − ST )2 and (ST − TS)2, we first calculate TS − ST and ST − TS, whichare given by

(TS − ST )(x, y) = (−x, y); (x, y) ∈ <2

(ST − TS)(x, y) = (x,−y); (x, y) ∈ <2

⇒ (TS − ST )2(x, y) = (TS − ST )(TS − ST )(x, y)= (TS − ST )(−x, y) = (x, y) = <2.

and (ST − TS)2(x, y) = (ST − TS)(ST − TS)(x, y)= (ST − TS)(x,−y) = (x, y) = <2.

Now, ker(TS − ST )2 = ker(ST − TS)2 = θ. Thus, we see that, (TS − ST )2 = I<2 =(ST − TS)2.

Ex 5.3.3 Let V be the linear space of all real polynomials p(x). Let D and T be linearmappings on V defined by D(p(x)) = d

dxp(x), p(x) ∈ V and T (p(x)) = xp(x), p(x) ∈ V.Show that, DT − TD = Iv, DT 2 − T 2D = 2T .

Solution: The linear mappings D and T on V defined by D(p(x)) = ddxp(x), p(x) ∈ V

and T (p(x)) = xp(x), p(x) ∈ V.

(DT − TD)(p(x)) = D(T (p(x)))− T (D(p(x)))

= D(xp(x))− T (d

dxp(x)) =

d

dx(xp(x))− x

d

dxp(x)

= xd

dxp(x) + p(x)− x

d

dxp(x) = p(x).

Therefore, DT − TD ≡ Iv. Now,

DT 2p(x) = DT (T (p(x))) = DT (xp(x))

= D(T (xp(x))) = D(x2p(x)) = 2xp(x) + x2 d

dxp(x)

T 2Dp(x) = T 2(Dp(x)) = T 2(d

dxp(x))

= TT (d

dxp(x)) = Tx d

dxp(x) = x2 d

dxp(x)

⇒ (DT 2 − T 2D)p(x) = 2xp(x) = 2Tp(x)⇒ DT 2 − T 2D = 2T.

Ex 5.3.4 Let T1 : <3 → <3 and T2 : <3 → <3 be defined by T1(x, y, z) = (2x, y − z, 0) andT2(x, y, z) = (x+ y, 2x, 2z). Find the formulae to define the mappings

(a) T1 + T2, (b) 3T1 − 2T2, (c) T1T2, (d) T2T21 .

Solution: Using the given transformations,(a) (T1 + T2)(x, y, z) = T1(x, y, z) + T2(x, y, z) = (2x, y − z, 0) + (x+ y, 2x, 2z)

= (3x+ y, 2x+ y − z, 2z).(b) (3T1 − 2T2)(x, y, z) = 3T1(x, y, z)− 2T2(x, y, z)

= 3(2x, y − z, 0)− 2(x+ y, 2x, 2z)= (4x− 2y,−4x+ 3y − 3z,−4z).

(c) T1T2(x, y, z) = T1(x+ y, 2x, 2z) = (2x+ 2y, 2x− 2z, 0).(d) T2T

21 (x, y, z) = T2T1T1(x, y, z) = T2T1(2x, y − z, 0) = T2(4x, y − z, 0)

= (4x+ y − z, 8x, 0).


Ex 5.3.5 Give an example of a linear transformation T : <2 → <2 such that T 2(α) = −αfor all α ∈ <2. [IIT-JAM’10]

Solution: The linear transformation T : <2 → <2 is such that T 2(α) = −α for all α ∈ <2,i.e.,

T 2

(xy

)=(−x−y

)⇒ T 2 =

(−1 00 −1

)⇒ T 2(α) = −α.

Let T =(a bc d

), then

T 2 =(a2 + bc ab+ bdac+ cd bc+ d2

)=(−1 00 −1

)⇒ a2 + bc = −1 ab+ bd = 0; ac+ cd = 0 bc+ d2 = −1.

Let b = c = 0, then a2 = −1, i.e., a = ±i, d2 = −1, i.e., d = ±i. Therefore

T =(i 00 i

)and

(−i 00 −i

).

Also, when a+ d = 0 then a = −d. Then such matrix is written as

T =(

1 −√

2−√

2 −1

)⇒ T 2 =

(−1 00 −1

).

Therefore, T over real field is given as T =(

1 −√

2−√

2 −1

).

5.3.2 Invertible Mapping

Let V and W be vector space over a field F . A linear mapping T : V → W is said to beinvertible if ∃ a mapping S : W → V such that ST = Iv and TS = Iw, where Iv and Iw arethe identity operators in V and W respectively. In this case, S is said to be an inverse of Tand is denoted by T−1.

Theorem 5.3.6 Let V and W be vector space over the same field F . If T : V → W beinvertible then T has an unique inverse.

Proof: To prove the uniqueness, let T1 : W → V and T2 : W → V be two inverses of T .Then by definition,

T1T = T2T = Iv ; TT1 = TT2 = Iw.Now by using associative law, we get,

T1(TT2) = (T1T )T2 ⇒ T1Iw = IvT2 ⇒ T1 = T2.

This proves that the inverse of T is unique. And the inverse of the invertible linear mappingT is denoted by T−1. Also, T−1 : W → V is linear transformation and (T−1)−1 = T.

Theorem 5.3.7 Let V and W be vector space over the same field F . A linear mappingT : V →W is invertible if and only if T is one-to-one and onto.


Proof: Let the linear mapping T : V → W be invertible, then ∃ a mapping S : W → Vsuch that ST = Iv and TS = Iw. First, we are to show that T is one-to-one and onto, forthis let α, β ∈ V such that T (α) = T (β), then,

T (α) = T (β) ⇒ ST (α) = ST (β)⇒ Iv(α) = Iv(β); since ST = Iv

⇒ α = β

Therefore, T is one-to-one. To prove that T is onto, let δ ∈W . As, TS = Iw so, TS(δ) = δ,i.e.,TS(δ) = δ, which shows that S(δ) is a pre-image of δ under T . Therefore T is onto.Thus, when T is invertible, then it is both one-to-one and onto.Conversely, let T : V → W be both one-to-one and onto. Let α ∈ V be such that T (α) =β ∈W . As T is one-to-one β is the unique image of α under T . Since T is onto, each β ∈Whas a pre-image in V . Let us define a mapping S : W → V by S(β) = α. Then

ST (α) = S(β) = α ; ∀ α ∈ Vand TS(β) = T (α) = β ; ∀ β ∈W

⇒ ST = Iv and TS = Iw.

Hence T is invertible and this completes the proof.

Theorem 5.3.8 Let V and W be vector space over the same field F . If A linear mappingT : V →W is invertible, then the inverse mapping T−1 : W → V is linear.

Proof: Here T−1 : W → V is the inverse mapping of the linear mapping T : V → W soTT−1 = Iw and T−1T = Iv. Let γ, δ ∈W be such that T−1(γ) = α ∈ V and T−1(δ) = β ∈V .Thus, T (α) = γ and T (β) = δ. Since T is linear, we have

T (aα+ bβ) = aT (α) + bT (β) ; a, b ∈ F⇒ T (aα+ bβ) = aγ + bδ

⇒ T−1[T (aα+ bβ)] = T−1(aγ + bδ)⇒ Iv(aα+ bβ) = T−1(aγ + bδ)⇒ (aα+ bβ) = T−1(aγ + bδ) ; as aα+ bβ ∈ V⇒ aT−1(γ) + bT−1(δ) = T−1(aγ + bδ)

This proves that T−1 is linear. Hence, if T : V → V be an invertible linear mapping on Vthen the linear mapping T−1 : V → V has the property that T−1T = TT−1 = Iv. Both of Tand T−1 are automorphism. However, as, TT−1 = Iw, T

−1T = Iv and inverses are unique,we conclude that (T−1)−1 = T.

Ex 5.3.6 Let S and T be linear mappings of <3 to <3 defined by S(x, y, z) = (z, y, x), (x, y, z) ∈<3 and

T (x, y, z) = (x+ y + z, y + z, z), (x, y, z) ∈ <3.(i) Determine TS and ST . (ii) Prove that both S and T are invertible. Verify that(ST )−1 = T−1S−1.

Solution: Using the definition, we get,TS(x, y, z) = T (S(x, y, z)) = T (z, y, x)

= (x+ y + z, x+ y, x); (x, y, z) ∈ <3

ST (x, y, z) = S(T (x, y, z)) = S(x+ y + z, y + z, z)= (z, y + z, x+ y + z); (x, y, z) ∈ <3.


Now, kerTS = kerST = θ so that ST = TS = IR3 . Hence both S and T are invertible.Now let,

(ST )(x, y, z) = (z, y + z, x+ y + z) = (a, b, c)⇒ z = a, y + z = b, x+ y + z = c, i.e. x = c− b, y = b− a, z = a

⇒ (ST )−1(a, b, c) = (c− b, b− a, a)⇒ (ST )−1(x, y, z) = (z − y, y − x, x).

Now, we are to evaluate T−1 and S−1.

T (x, y, z) = (x+ y + z, y + z, z) = (a1, b1, c1)⇒ x+ y + z = a1, y + z = b1, z = c1

⇒ x = a1 − b1, y = b1 − c1, z = c1

⇒ T−1(a1, b1, c1) = (a1 − b1, b1 − c1, c1), i.e. T−1 = (x− y, y − z, z).Again S(x, y, z) = (z, y, x) = (a2, b2, c2)

⇒ x = c2, y = b2, z = a2

⇒ S−1(a2, b2, c2) = (c2, b2, a2), i.e. S−1(x, y, z) = (z, y, x).

Hence, T−1S−1 is given by,

T−1S−1(x, y, z) = T−1(S−1(x, y, z))= T−1(z, y, x) = (z − y, y − x, x)

Hence (ST )−1 = T−1S−1 is verified.

Ex 5.3.7 Let S : <3 → <3 and T : <3 → <3 be two linear mappings defined by S(x, y, z) =(z, y, x) and T (x, y, z) = (x + y + z, y + z, z), (x, y, z) ∈ <3. Prove that both S and T areinvertible. Verify that (ST )−1 = T−1S−1.

Solution: Let S(x, y, z) = (0, 0, 0). Then (z, y, x) = (0, 0, 0). Implies x = y = z = 0, soker(S) = θ, so S is one-to-one.

Also, domain and co-domain of S are of equal dimension 3, therefore, S is onto. HenceS is invertible. Now,

S(x, y, z) = (z, y, x), i.e., (x, y, z) = S−1(z, y, x) or, S−1(x, y, z) = (z, y, x).For the mapping T , let T (x, y, z) = (0, 0, 0). Then (x + y + z, y + z, z) = (0, 0, 0). This

gives,x+ y + z = 0, y + z = 0, z = 0, i.e., x = y = z = 0.

Therefore ker(T ) = θ. Thus T is one-to-one. Also, it is onto. Hence T is bijective andinvertible. Now,

T (x, y, z) = (x+ y + z, y + z, z) = (u, v, w), (say)where u = x+ y + z, v = y + z, w = z, i.e., z = w, y = v − w and x = u− v. Therefore,

T−1(u, v, w) = (x, y, z) = (u− v, v − w,w)or, T−1(x, y, z) = (x− y, y − z, z).

Last part: Now,ST (x, y, z) = S(x+ y + z, y + z, z) = (z, y + z, x+ y + z).

Therefore (ST )−1(z, y + z, x+ y + z) = (x, y, z)or, (ST )−1(x, y, z) = (z − y, y − x, x).

Again, T−1S−1(x, y, z) = T−1(z, y, x) = (z − y, y − x, x).Thus (ST )−1 = T−1S−1, verified.

Ex 5.3.8 Let T be a linear operator on <3 defined by T (x, y, z) = (2x, 4x− y, 2x+ 3y− z).Show that T is invariable and find a formula for T−1.


Solution: We are to find kerT by setting T (α) = θ, where, α = (x, y, z). ThereforeT (x, y, z) = θ gives

(2x, 4x− y, 2x+ 3y − z) = (0, 0, 0)⇒ 2x = 0, 4x− y = 0, 2x+ 3y − z = 0⇒ x = y = z = 0

This system has only trivial solution (0, 0, 0) and so nullity T = θ. Therefore, T is one-one.Also, V = <3,W = <3, so that dimV = dimW, so that T is onto. Hence by theorem, T isnon-singular and so it is invertible. Let δ = (a, b, c) be the image of (x, y, z) under T . Then(x, y, z) is the image of (a, b, c) under T−1. Thus,

T (x, y, z) = (2x, 4x− y, 2x+ 3y − z) = (a, b, c)⇒ 2x = a, 4x− y = b, 2x+ 3y − z = c

⇒ x =a

2, y = 2a− b, z = 7a− 3b− c.

⇒ T−1(a, b, c) = (a

2, 2a− b, 7a− 3b− c).

Thus the formula for T−1 : <3 → <3 is defined by T (x, y, z) = (x2 , 2x− y, 7x− 3y − z).

Ex 5.3.9 If T−1 : V → V be a linear operator on V such that T 2 − T + I = θ, then showthat T is invertible.

Solution: Here given that T 2 − T + I = θ. Thus,

T 2 − T + I = θ ⇒ T 2 = T − I

⇒ T 2(α) = T (α)− I(α) = β − α ; where T (α) = β

⇒ T (β) = γ = β − α ; where T (β) = γ

If T (α1) = β1 and T (α2) = β2 then T (α1) = T (α2) = β1 = β2. Thus,

T (β1) = T (β2) = γ1 = γ2 where T (β1) = γ1, T (β2) = γ2

This shows that β1 − α1 = β2 − α2 ⇒ α1 = α2, so T is one-to-one. Any β ∈ V , there isγ ∈ V such that T (β) = γ. Since γ = β − α, so β − α ∈ V . Now, for each β ∈ V , there isan α ∈ V such that T (α) = β. Hence T is onto. Thus T is invertible.

5.4 Singular and Non-singular Transformation

Let V,W be the two vector spaces over the same field F . A linear transformation T : V →Wis said to be singular, if ∃ a α(6= θ) ∈ V such that T (α) = θw and non-singular, if

T (α) = θw ⇒ α = θV , i.e., kerT = θ. (5.5)

Nonsingular linear mappings may also be characterized as those mappings that carry in-dependent sets into independent sets. Thus the linear transformation T : V → W is non-singular if it is invertible, i.e. T−1 exists.

Theorem 5.4.1 Let V and W be two vector spaces over the same field F . Then a lineartransformation T : V →W is non-singular if and only if T maps evenly LI subset of V ontoa LI subset of W .

Singular and Non-singular Transformation 325

Proof: Let the linear transformation T : V →W be non-singular and let S = α1, α2, . . . , αnis LI in W . Let for any scalars, c1, c2, . . . , cn ∈ F ;

c1T (α1) + c2T (α2) + · · ·+ cnT (αn) = θw

⇒ T (c1α1 + c2α2 + · · ·+ cnαn) = θw ; T is linear⇒ c1α1 + c2α2 + · · ·+ cnαn = θv ; T is non-singular⇒ c1 = c2 = · · · = cn = 0

Thus S1 = T (α1), T (α2), . . . , T (αn) is LI. Conversely, let the image under T of every LIsubset of V be LI subset of W . Now, if α be a non-zero element of V , then α is LI andso by given hypothesis T (α) is LI Consequently, T (α) = θw. Then

T (α) = θw ⇒ α = θv.

Hence T is non-singular. From this theorem, we have the following consequences:

(i) A linear transformation T : V → V is non-singular iff the image of a basis S of V isagain a basis of V under the mapping T .

(ii) A linear transformation T : V → V on a finite dimensional vector space V is invertibleiff T is non-singular.

(iii) Let V and W be vector space over the same field F and T : V → V be a lineartransformation. Then dimV = dimW ⇔ T is non-singular. Take a example. Sincedim<3 is less than dim<4, we have that dim(ImT ) is less than the dimension of thedomain of T , accordingly, no linear mapping T : <4 → <3 can be non-singular.

(iv) A base S of Vn(F ) can be changed to another base by a non-singular transformation.

Theorem 5.4.2 A linear transformation T : V →W is an isomorphism if and only if T isnon-singular.

Proof: Let the linear transformation T : V → W be an isomorphism. Then it is clearlyone-to-one. So, θv is the only vector such that T (θv) = θw. Consequently,

T (α) = θw ⇒ α = θv.

This shows that T is non-singular and T (α) = T (β). Conversely, let T be non-singular.Then,

T (α) = T (β) ⇒ T (α)− T (β) = θw

⇒ T (α− β) = θw ; T is linear

⇒ α− β = θv ; T is nonsingular

⇒ α = β

Hence T is one-to-one. Further let S1 = T (α1), T (α2), . . . , T (αn) is L.I. subset of W . ButW being finite-dimensional, it follows that T (α1), T (α2), . . . , T (αn) is a basis of W . Nowan arbitrary element β ∈W can be expressed as

β = c1T (α1) + c2T (α2) + · · ·+ cnT (αn) ; ci ∈ F= T (c1α1 + c2α2 + · · ·+ cnαn) ; T is linear

∈ RangeT

Thus, every element of W is in the range of T , i.e. W ⊆ R(T ). So W = R(T ) and so T isonto as R(T ) ⊆W . Hence T is an isomorphism.


5.5 Linear Operator

So far we have discussed some properties of linear mappings of V to W , where V and W arevector spaces over a field F . Let T : V → W and S : V → W be two linear mappings overa same field F . The sum T + S and the scalar product cT , c ∈ F as the mappings from Vto W is defined as

(i) (T + S)(α) = T (α) + S(α); ∀α ∈ V.

(ii) (cT )(α) = cT (α); ∀α ∈ V and c ∈ F.

We denote the vector space of all LT s from V into W by L(V,W ). Now we shall considerthe special case when W = V . A linear mapping T : V → V is called a linear mapping onV, T is also called a linear operator on V . The set of all linear operators on a vector spaceover a field F form, in its own right, a linear space over F , denoted by L(V, V ).

Deduction 5.5.1 The important feature is that can define another binary composition,called multiplication, on this set. Let T and S be two linear operators on V , then thecomposite mappings T S and S T are both linear operators on V . If we define ST byS T , then ST : V → V is defined by ST (α) = S[T (α)] for all α ∈ V. Since the compositionof linear mappings is associative, multiplication is associative, i.e., (ST )U = S(TU) for allS, T, U ∈ L(V, V ). The mapping IV : V → V defined by IV (α) = α for all α ∈ V is theidentity operator. Also, multiplication is distributive related to addition by T (S + U) =TS + TU and (S + U)T = ST + UT , for,

[T (S + U)]α = T [(S + U)α] = T [Sα+ Uα]= T (Sα) + T (Uα) [T is linear ]= (TS + TU)α, for all α ∈ V.

Therefore T (S+U) = TS+TU and similarly, (S+U)T = ST +UT . Thus the linear spaceL(V, V ) is a ring under addition and multiplication. It is a non-commutative ring with unity,IV being the unity in the ring.

Theorem 5.5.1 Let T : V → V be a linear operator on a finite dimensional vector spaceover a field F . Then the following five statements are equivalent.

(i) T is non-singular; kerT = θ.

(ii) T is one-to-one;

(iii) T is an onto mapping;

(iv) T maps a linearly independent set of V to another linearly independent set;

(v) T maps a bases of V to another bases.

Proof: Let (i) holds, i.e., the linear operator T is non-singular, then T is invertible. There-fore T is bijective, i.e., T is one-to-one and onto. Hence (ii) holds. Moreover T is linear, sothe mapping T : V → V is an isomorphism.

Let (ii) holds. Then for one-one mapping dim KerT = 0. The Sylvester’s law dim KerT+dim ImT = dimV gives dim ImT = n. But ImT ⊂ V and dimV = n. Hence ImT = Vand this proves that T is an onto mapping. Hence (iii) holds.

Let (iii) holds and let α1, α2, . . . , αn be a linearly independent set of V . Since T isonto, dim ImT = dimV. Therefore dim KerT = 0 and consequently, KerT = θ. SinceKerT = θ, the images of linearly independent set in V are linearly independent. Therefore

Matrix Representation of Linear Transformation 327

the set T (α1), T (α2), . . . , T (αn) being a linearly independent set of n elements, is a basesof V . Hence (v) holds.

Let (v) holds. Let α1, α2, . . . , αn be a bases of V . Then T (α1), T (α2), . . . , T (αn)is another bases of V . Let ξ ∈ KerT and let ξ = c1α1 + c2α2 + · · · + cnαn. Since T islinear, c1T (α1) + c2T (α2) + · · · + cnT (αn) = θ. This implies c1 = c2 = · · · = cn = 0, sinceT (α1), T (α2), . . . , T (αn) is a linearly independent set. Therefore ξ ∈ KerT ⇒ ξ = θ andconsequently, KerT = θ. Therefore T is one-to-one. dim KerT + dim ImT = dimVgives dim ImT = dimV . But ImT ⊂ V. so ImT = V and this implies that T is onto. Tbeing both one-to-one and onto, T is invertible and therefore non-singular. Hence (i) holds.

Thus all five conditions are equivalent.

5.6 Matrix Representation of Linear Transformation

In the previous section we have studied linear transformations by examining their rangesand all null spaces. In this section, we develop a one-to-one corresponds between matricesand linear transformations that allows us to utilize properties of one to study properties ofthe other.

Let V and W be two finite dimensional vector spaces over a field F with dimV = n(6= 0),dimW = m(6= 0) and T : V → W be a linear mapping. Let P = (α1, α2, . . . , αn) andQ = (β1, β2, . . . , βm) be ordered bases of V and W respectively. T is completely determinedby the images T (α1), T (α2), . . . , T (αn). Since (β1, β2, . . . , βm) is ordered, each T (αi) ∈ Wis a linear combination of the vectors β1, β2, . . . , βm, in a unique manner as

T (α1) = c11β1 + c21β2 + · · ·+ cm1βm

T (α2) = c12β1 + c22β2 + · · ·+ cm2βm

...T (αn) = c1nβ1 + c2nβ2 + · · ·+ cmnβm

where a unique co-ordinate set ci1, ci2, · · · , cim ∈ F determined by the ordered basis (β1, β2, . . . , βm).

Let γ =n∑

i=1

xiαi be an arbitrary vector of V and let T (γ) =m∑

j=1

yjβj ; xi, yj ∈ F . Now,

T (γ) = T (x1α1 + x2α2 + · · ·+ xnαn)= x1T (α1) + x2T (α2) + · · ·+ xnT (αn)= x1(c11β1 + c21β2 + · · ·+ cm1βm) + x2(c12β1 + c22β2 + · · ·+ cm2βm)

+ · · ·+ xn(c1nβ1 + c2nβ2 + · · ·+ cmnβm).

As β1, β2, . . . , βm is linearly independent, we have,y1 = c11x1 + c12x2 + · · ·+ c1nxn

y2 = c21x1 + c22x2 + · · ·+ c2nxn

...ym = cm1x1 + cm2x2 + · · ·+ cmnxn

or,

y1y2...ym

=

c11 c12 · · · c1n

c21 c22 · · · c2n

......

cm1 cm2 · · · cmn

x1

x2

...xn

or, Y = AX; i.e., [T (α)]Q = A[α]P (5.6)

where Y , i.e., [T (α)]Q is the co-ordinate column vector of T (α) with respect to the base Qin W ;X, i.e. [α]P is the co-ordinate column vector of α in V and A = [cij ]m×n is called the


matrix associated with T or the matrix representation of T or matrix of T relative to theordered bases P = (α1, α2, . . . , αn) and Q = (β1, β2, . . . , βm). Graphically,

α → T → T (α)

[α]s → A→ [T (α)]Q = A[α]P

The top horizontal arrow represents the linear transformation L from the n dimensionalvector space V into m dimensional vector space W and take the vector α in V to thevector T (α) in W . The bottom horizontal table represents the matrix A. Then [T (α)]Q, acoordinate vector in m dimensional vector space, is obtained simply by multiplying [α]P , aco-ordinate vector in n dimensional vector space, by the matrix A. The following are somefacts :

(i) If V = W and P = Q, then we write T (α) = Aα.

(ii) A can be written as [T ] or m(T ). Also, the co-ordinate vector of T (αj) relative to theordered basis (β1, β2, . . . , βm) is given by the jth column of A.

(iii) For given α1, α2, . . . , αn and β1, β2, . . . , βm, the matrix A is unique.

(iv) IF the matrix A is given, then the linear mapping T is unique.

(v) Let γ = α1, α2, . . . , αn ∈ V (F ), then γ can be written as γ =n∑

i=1

αiei where

e1, e2, . . . , en is the standard basis.

For given A, we have,

T (γ) =(∑

ci1αi,∑

ci2αi, · · · ,∑

cimαi

)relative to the standard basis. Physicists and others who deal at great length with lineartransformations perform most of their computations with the matrices of the linear trans-formations.

Ex 5.6.1 A linear mapping T : <3 → <3 is defined byT (x1, x2, x3) = (2x1 + x2 − x3, x2 + 4x3, x1 − x2 + 3x3); (x1, x2, x3) ∈ <3.

Find the matrix T relative to the ordered basis ((0, 1, 1), (1, 0, 1), (1, 1, 0)) of <3.

Solution: Using the definition of T , we have,

T (0, 1, 1) = (0, 5, 2) = 0(1, 0, 0) + 5(1, 0, 0) + 2(0, 0, 1)T (1, 0, 1) = (1, 4, 4) = 1(1, 0, 0) + 4(0, 1, 0) + 4(0, 0, 1)T (1, 1, 0) = (3, 1, 0) = 3(1, 0, 0) + 1(0, 1, 0) + 0(0, 0, 1).

Since S = (0, 1, 1), (1, 0, 1), (1, 1, 0) is a basis for <3, the co-ordinate vectors of T (0, 1, 1), T (1, 0, 1), T (1, 1, 0)with respect to the given ordered basis are the same as T (0, 1, 1), T (1, 0, 1), T (1, 1, 0) respec-tively. Thus, the matrix representation T , relative to the ordered basis ((0, 1, 1), (1, 0, 1), (1, 1, 0))of <3 is

[T ] = m(T ) =

0 1 35 4 12 4 0

Ex 5.6.2 Let T : R2 → R3 be a linear transformation defined by

T (x, y) = (x+ 3y, 0, 2x− 4y).Find the matrix representation relative to the bases α = e1, e2 and γ = e3, e2, e1.


Solution: Let α = e1, e2 and β = e1, e2, e3 be the standard basis for R2 and R3,respectively. Now

T (1, 0) = (1, 0, 2) = 1e1 + 0e2 + 2e3and T (0, 1) = (3, 0,−4) = 3e1 + 0e2 − 4e3

Hence [T ]αβ =

1 30 02 −4

If γ = e3, e2, e1 6= β, with respect to the ordered bases, we get,

[T ]γ′

β =

2 −40 01 3

Ex 5.6.3 In T : <2 → <2, T maps (1, 1) to (3,−3) and (1,−1) to (5, 7). Determine thematrix of T relative to the ordered bases ((1, 0), (0, 1)).

Solution: We see that, (1, 1), (3,−3) and (1,−1), (5, 7) are the bases for <2. Let ∃scalars c1, c2 and d1, d2 ∈ < such that,

(1, 0) = c1(1, 1) + c2(1,−1) and (0, 1) = d1(1, 1) + d2(1,−1).

Solutions of the linear equations in c1, c2, d1, d2 gives c1 = c2 = 12 , d1 = 1

2 , d2 = − 12 and so,

(1, 0) =12(1, 1) +

12(1,−1) and (0, 1) =

12(1, 1)− 1

2(1,−1).

Since T is a linear transformation, we have,

T (1, 0) =12T (1, 1) +

12T (1,−1)

=12(3,−3) +

12(5, 7) = (4, 2) = 4(1, 0) + 2(0, 1)

T (0, 1) =12T (1, 1)− 1

2T (1,−1)

=12(3,−3)− 1

2(5, 7) = (−1,−5) = −1(1, 0)− 5(0, 1).

Hence, the matrix of T is(

4 −12 −5

). Let (x, y) ∈ <2, then (x, y) = x(1, 0) + y(0, 1), so that

T (x, y) = xT (1, 0) + yT (0, 1); as T is linear= x(4, 2) + y(−1,−5) = (4x− y, 2x− 5y),

which the linear transformation.

Ex 5.6.4 The matrix of a linear transformation T : <3 → <2 is A =(

1 2 −34 2 −1

). Determine

the transformation relative to ordered bases ((1, 2, 1), (2, 0, 1)(0, 3, 4)) of <3 and ((2, 1), (0, 5))of <2. Find the matrix of T relative to the ordered bases ((1, 1, 0), (1, 0, 1)(0, 1, 1)) of <3 and((1, 0), (0, 1)) of <2.

Solution: Here, by definition,

T (1, 2, 1) = 1(2, 1) + 4(0, 5) = (2, 21)T (2, 0, 1) = 2(2, 1) + 2(0, 5) = (4, 12)T (0, 3, 4) = −3(2, 1)− 1(0, 5) = (−6,−8).


Let (a, b, c) ∈ <3 and let for some scalars ci ∈ <,

(a, b, c) = c1(1, 2, 1) + c2(2, 0, 1) + c3(0, 3, 4)= (c1 + 2c2, 2c1 + 3c3, c1 + c2 + 4c3)

⇒ c1 + 2c2 = a, 2c1 + 3c3 = b, c1 + c2 + 4c3 = c

⇒ c1 =113

(3a+ 8b − 6c), c2 =113

(5a− 4b+ 3c), c3 =113

(−2a− b+ 4c).

Thus, the linear transformation T : <3 → <2 is given by,

T (a, b, c) = c1T (1, 2, 1) + c2T (2, 0, 1) + c3T (0, 3, 4); T linear= c1(2, 21) + c2(4, 12) + c3(−6,−8)= (2c1 + 4c2 − 6c3, 21c1 + 12c2 − 8c3)

=(

38a+ 6b− 24c13

,139a+ 128b− 122c

13

); (a, b, c) ∈ <3.

Now, using the above linear transformation T : <3 → <2, we get,

T (1, 1, 0) = (4413,26713

) =4413

(1, 0) +26713

(0, 1)

T (1, 0, 1) = (1413,1713

) =1413

(1, 0) +1713

(0, 1)

T (0, 1, 1) = (−1813,

613

) = −1813

(1, 0) +613

(0, 1).

Therefore, the matrix T is given by [T ] =(

4413

1413 −

1813

26713

1713

613

).

Ex 5.6.5 Let α1, α2, α3 and β1, β2 be ordered bases of the real vector spaces V and Wand a linear mapping T : V →W maps the basis vectors as

T (α1) = β1 + β2, T (α2) = 2β1 − β2, T (α3) = β1 + 3β2.Find the matrix T relative to the ordered bases α1, α2, α3 of V and β1, β2 of W .

Solution: For the given linear mapping, the matrix of T is [T ] =(

1 −1 31 2 1

). Now we

calculate the matrix of T relative to the ordered bases (α1+α2, α2, α3) of V and (β1, β1+β2)of W . For this,

T (α1 + α2) = T (α1) + T (α2) = 3β1 = 3β1 + 0(β1 + β2)T (α2) = = 2β1 − β2 = 3β1 − 1.(β1 + β2)T (α3) = = β1 + 3β2 = −2β1 + 1.(β1 + β2).

Therefore the matrix of T is [T ] =(

3 3 −20 −1 3

).

Ex 5.6.6 Let N be the vector space of all real polynomials of degree atmost 3. DefineS : N → N by (Sp)(x) = p(x+1), p ∈ N . Then find the matrix of S in the basis 1, x, x2, x3considered as column vectors. [NET(June)12]

Solution: We have S : N → N defined by (Sp)(x) = p(x+ 1), p ∈ N . Then

(S1)(x) = 1.(x+ 1) = 1 = 1.1 + 0.x+ 0.x2 + 0.x3

(Sx)(x) = x(x+ 1) = x+ 1 = 1.1 + 1.x+ 0.x2 + 0.x3

(Sx2)(x) = x2(x+ 1) = (x+ 1)2 = 1 + 2x+ x2 = 1.1 + 2.x+ 1.x2 + 0.x3

(Sx3)(x) = x3(x+ 1) = (x+ 1)3 = 1 + 3x+ 3x2 + x3 = 1.1 + 3.x+ 3.x2 + 1.x3


Therefore, the matrix of S with respect to basis 1, x, x2, x3 is

1 1 1 11 1 2 30 0 1 30 0 0 1

.

Theorem 5.6.1 Let V,W be two finite dimensional vector space over a field F and T :V →W be a linear transformation. Then rank of T = rank of the matrix T .

Proof: Let α1, α2, · · · , αn and β1, β2, · · ·βm be the ordered bases of the vector spacesV and W respectively so that, dimV = n and dimW = m. Let the matrix of T relative tothe chosen bases be m(T ), given by,

m(T ) =

c11 c12 · · · c1n

c21 c22 · · · c2n

......

cm1 cm2 · · · cmn

where the scalars cij ∈ F are uniquely determined by the basis β1, β2, · · ·βm. Therefore,

T (αj) = c1jβ1 + c2jβ2 + · · ·+ cmjβm; j = 1, 2, · · · , n.

Since (α1, α2, · · · , αn) is a ordered basis of V , T (α1), T (α2), · · · , T (αn) generates ImT . Letrank of T = r, then dimImT = r.Without loss of any generality, let T (α1), T (α2), · · · , T (αr)be a basis of ImT . Then each T (αr+1), T (αr+2), · · · , T (αn) belongs to LT (α1), T (α2), · · · , T (αn).Let us consider the isomorphism φ : W → Fm, defined by,

φ(c1β1 + c2β2 + · · ·+ cmβm) =

c1c2...cm

,

then, we have,φ(T (α1)) =

c11c21...

cm1

;φ(T (α2)) =

c12c22...

cm2

; · · · ;φ(T (αn)) =

c1n

c2n

...cmn

.

Since, T (α1), T (α2), · · · , T (αr) is a LI set and φ is an isomorphism φ(T (α1)), φ(T (α2)), · · · ,φ(T (αn)) is a LI set of m tuples of Fm. Since each of T (αr+1), T (αr+2), · · · , T (αn) belongto LT (α1), T (α2), · · · , T (αr) and φ is isomorphism, each of φ(T (αr+1)), φ(T (αr+2)), · · · ,φ(T (αn)) belong to LT (α1), T (α2), · · · , T (αr). Therefore the first r column vectors ofm(T ) are LI and each of the remaining n − r column vectors belongs to the linear span ofthe first column vectors. Consequently, the column rank of m(T ) = r and therefore the rankof m(T ) = r. Hence the theorem.

Theorem 5.6.2 (Matrix representation of composite mapping) Let T : V → U andS : U → W be linear mappings where V,U,W are finite dimensional vector spaces over afield F . Then relative to a choice of order bases m(ST ) = m(S).m(T ), where m(T ) is thematrix of T relative to the chosen bases.

Proof: Let (α1, α2, . . . , αn) be an ordered basis of V, (β1, β2, . . . , βp) be an ordered basis ofU and (γ1, γ2, . . . , γm) be an ordered basis of W respectively, so that dimV = n, dimU = p,and dimW = m.. Relative to the ordered bases, let the matrices of transformations T, Sand ST relative to the bases be

a11 a12 . . . a1n

a21 a22 . . . a2n

. . . . . . . . . . . .ap1 ap2 . . . apn

,

b11 b12 . . . b1p

b21 b22 . . . b2p

. . . . . . . . . . . .bm1 bm2 . . . bmp

,

c11 c12 . . . c1n

c21 c22 . . . c2n

. . . . . . . . . . . .cm1 cm2 . . . cmn


respectively. Then, by definition,

T (αj) = a1jβ1 + a2jβ2 + · · ·+ apjβp

S(βj) = b1jγ1 + b2jγ2 + · · ·+ bmjγm

ST (αj) = c1jγ1 + c2jγ2 + · · ·+ cmjγm,

where, j = 1, 2, . . . , n. As S and T are linear, so ST (αj) = S[t(αj)] and therefore,

S[a1jβ1 + a2jβ2 + · · ·+ apjβp] = a1jS(β1) + a2jS(β2) + · · ·+ apjS(βp)= a1j [b11γ1 + · · ·+ bm1γm] + a2j [b12γ1 + · · ·+ bm2γm] + apj [b1pγ1 + · · ·+ bmpγm]

=p∑

k=1

b1kakjγ1 +p∑

k=1

b2kakjγ2 + · · ·+p∑

k=1

bmkakjγm

Therefore, c1j =p∑

k=1

b1kakj , c2j =p∑

k=1

b2kakj , . . ., cmj =p∑

k=1

bmkakj and consequently,c11 c12 . . . c1n

c21 c22 . . . c2n

. . . . . . . . . . . .cm1 cm2 . . . cmn

=

b11 b12 . . . b1p

b21 b22 . . . b2p

. . . . . . . . . . . .bm1 bm2 . . . bmp

a11 a12 . . . a1n

a21 a22 . . . a2n

. . . . . . . . . . . .ap1 ap2 . . . apn

That is, m(ST ) = m(S).m(T ) and hence the theorem.

Ex 5.6.7 Let U, V,W be the vector spaces of dimensions 3, 2, 4 respectively and their basesbe respectively (1, 0, 0), (1, 1, 0), (1, 1, 1), (1,−1), (0, 1) and (0, 0, 1, 0),(−1, 0, 0, 1), (1, 1, 1,−1), (0, 1, 2, 1). The linear mappings T : U → V and S : V → W bedefined by T (x, y, z) = (2x− 4y+ 9z, 5x+ 3y− 2z), S(x, y) = (3x+ 4y, 5x− 2y, x+ 7y, 4x),show that [ST ] = [S][T ].

Solution: First the mapping T is defined as T (x, y, z) = (2x − 4y + 9z, 5x + 3y − 2z).Therefore,

T (1, 0, 0) = (2, 5) = 2(1,−1) + 7(0, 1)T (1, 1, 0) = (−2, 8) = −2(1,−1) + 6(0, 1)T (1, 1, 1) = (7, 6) = 7(1,−1) + 13(0, 1).

Therefore, the matrix representation of T is [T ] =(

2 −2 77 6 13

). The mapping S is defined

as S(x, y) = (3x+ 4y, 5x− 2y, x+ 7y, 4x). Therefore,

S(1,−1) = (−1, 7,−6, 4) = −16(0, 0, 1, 0) + 5(−1, 0, 0, 1) + 4(1, 1, 1,−1) + 3(0, 1, 2, 1)S(0, 1) = (4,−2, 7, 0) = 5(0, 0, 1, 0)− 10(−1, 0, 0, 1)− 6(1, 1, 1,−1) + 4(0, 1, 2, 1).

Therefore, the matrix representation of S is [S] =(−16 5 4 35 −10 −6 4

)T

. Using the definition

of composite mapping,

ST (1, 0, 0) = 2S(1,−1) + 7S(0, 1) == 3(0, 0, 1, 0)− 60(−1, 0, 0, 1)− 34(1, 1, 1,−1) + 34(0, 1, 2, 1),

ST (1, 1, 0) = −2S(1,−1) + 6S(0, 1) == 62(0, 0, 1, 0)− 70(−1, 0, 0, 1)− 44(1, 1, 1,−1) + 18(0, 1, 2, 1),

ST (1, 1, 1) = 7S(1,−1) + 13S(0, 1) == −47(0, 0, 1, 0)− 95(−1, 0, 0, 1)− 50(1, 1, 1,−1) + 73(0, 1, 2, 1).


Therefore the matrix representation of the composite mapping ST is given by,

[ST ] =

3 62 −47−60 −70 −95−34 −44 −5034 18 73

=

−16 55 −104 −63 4

(2 −2 77 6 13

)= [S][T ].

Theorem 5.6.3 Let V and W be finite dimensional vector spaces over a field F and letT : V → W be a linear mapping. Then T is invertible (non-singular) if and only if thematrix of T relative to some chosen bases is non-singular.

Proof: First, let T : V → W be invertible, then by definition, T is one-to-one and onto.Since T is one-to-one, dimKerT = 0 and as T is onto, Im T = W .

dimKerT + dimImT = dimV ⇒ dimV = dimW = n(say).

Then the matrix of T , i.e., m(T ), is an n × n matrix. Also rank of T = rank of m(T ).Therefore m(T ) being an n× n matrix of rank n, is non-singular.

Conversely, let the matrix m(T ) be non-singular. Then the matrix m(T ) is a squarematrix of order n, say. Therefore, the rank of m(T ) is n. Since the order of m(T ) is n,dimV = dimW = n. Since rank of T = rank of m(T ), the rank of T = n. ThereforeIm T = W and this implies T is onto.

dimKerT + dimImT = dimV ⇒ dimKerT = 0.

Hence T is one-to-one. T being both one-to-one and onto, T is invertible. This completesthe proof.

Theorem 5.6.4 (Matrix of the inverse mapping:) Let V and W be two finite di-mensional vector spaces of the same dimension over a field F and let T : V → W be aninvertible mapping. Then relative to chosen ordered bases, the matrix of the inverse mappingT ′ : W → V is given by m(T ′) = [m(T )]−1.

Proof: Let dimV = dimW = n and let (α1, α2, . . . , αn), (β1, β2, . . . , βn) be ordered basesof V and W respectively. Relative to the chosen bases, let the matrices of T and T ′ bea11 a12 . . . a1n

a21 a22 . . . a2n

. . . . . . . . . . . .an1 an2 . . . ann

,

b11 b12 . . . b1n

b21 b22 . . . b2n

. . . . . . . . . . . .bn1 bn2 . . . bnp

respectively. Then,

T (αj) = a1jβ1 + a2jβ2 + · · ·+ anjβn

T ′(βj) = b1jα1 + b2jα2 + · · ·+ bnjαn,

for j = 1, 2, . . . , n. Since the mapping T ′ : W → V is the inverse of T , so, T ′T = Iv andTT ′ = Iw. Therefore,

T ′T (αj) = T ′[a1jβ1 + a2jβ2 + · · ·+ anjβn]= a1jT

′(β1) + a2jT′(β2) + · · ·+ anjT

′(βn) as T ′ is linear

= a1j [b11α1 + b21α2 + · · ·+ bn1αn] + a2j [b12α1 + b22α2 + · · ·+ bn2αn]+ · · ·+ anj [b1nα1 + b2nα2 + · · ·+ bnnαn]

= (b11a1j + b12a2j + · · ·+ b1nanj)α1 + (b21a1j + b22a2j + · · ·+ b2nanj)α2

+ · · ·+ (bn1a1j + bn2a2j + · · ·+ bnnanj)αn.


But T ′T = Iv ⇒ T ′T (αj) = αj . Therefore,

bi1a1j + bi2a2j + · · ·+ binanj = 1 if i = j

= 0 if i 6= j

It follows that m(T ′).m(T ) = In. Similarly, using TT ′ = Iw ⇒ TT ′(βj) = βj . Therefore,

ai1b1j + ai2b2j + · · ·+ ainbnj = 1 if i = j.

= 0 if i 6= j.

It follows that m(T ).m(T ′) = In = m(T ′)m(T ). Hence by definition of inverse of matrix, weget, m(T ′) = [m(T )]−1.

Ex 5.6.8 Let (α1, α2, α3), (β1, β2, β3) be ordered bases of the real vector spaces V and Wrespectively. A linear mapping T : V → W maps the basis vectors as T (α1) = β1, T (α2) =β1 +β2, T (α3) = β1 +β2 +β3. Find the matrix of T relative to the ordered bases (α1, α2, α3)of V and (β1, β2, β3) of W . Find the matrix of T−1 relative to the same chosen orderedbases.

Solution: The linear mapping T : V → W which maps the basis vectors as T (α1) =β1, T (α2) = β1 + β2, T (α3) = β1 + β2 + β3 can be written as

T (α1) = 1β1 + 0β2 + 0β3;T (α2) = 1β1 + 1β2 + 0β3;T (α3) = 1β1 + 1β2 + 1β3.

Therefore, the matrix representation is m(T ) =

1 1 10 1 10 0 1

. We see that the matrix represen-

tation m(T ) is non-singular and therefore T is non-singular, so T−1 exists and T−1 is linear.Thus the inverses are given by, T−1(β1) = α1; T−1(β1 +β2) = α2; T−1(β1 +β2 +β3) = α3,i.e., T−1(β1) = α1;T−1(β1) + T−1(β2) = α2;T−1(β1) + T−1(β2) + T−1(β3) = α3, as T−1 islinear. They can be written as,

T−1(β1) = α1 = 1α1 + 0α2 + 0α3;T−1(β2) = α2 − α1 = −1α1 + 1α2 + 0α3;T−1(β3) = α3 − α2 = 0α1 − 1α2 + 1α3.

Therefore, the matrix representation is m(T−1) =

1 −1 00 1 −10 0 1

. We see that

m(T )m(T−1) =

1 1 10 1 10 0 1

1 −1 00 1 −10 0 1

=

1 0 00 1 00 0 1

= m(T−1)m(T ).

Therefore, [m(T )]−1 exists and [m(T )]−1 = m(T−1).

Theorem 5.6.5 (Isomorphism between linear mappings and matrices:) Let V andW be finite dimensional vector spaces over a field F with dimV = n and dimW = m. Then,the linear space over F of all linear mappings of V to W , i.e., L(V,W ) and the vector spaceof all m× n matrices over F , i.e., Mm,n are isomorphic.


Proof: Let (α1, α2, . . . , αn) and (β1, β2, . . . , βm) be the ordered bases of V and W respec-tively . Let us define a mapping m : L(V,W ) →Mm,n by

m(T ) = (aij)m×n for T ∈ L(V,W ), (aij)m×n,

being the matrix of T relative to the ordered bases (α1, α2, . . . , αn) of V and (β1, β2, . . . , βm)ofW . Let T ∈ L(V,W ), S ∈ L(V,W ), then T+S ∈ L(V,W ). Letm(T ) = (aij)m×n,m(S) =(bij)m×n,m(T + S) = (cij)m×n. Then

T (αj) = a1jβ1 + a2jβ2 + · · ·+ amjβm

S(αj) = b1jβ1 + b2jβ2 + · · ·+ bmjβm

(T + S)(αj) = c1jβ1 + c2jβ2 + · · ·+ cmjβm,

for j = 1, 2, . . . , n. Since T and S are linear, (T + S)(αj) = T (αj) + S(αj), so,

c1jβ1 + c2jβ2 + · · ·+ cmjβm = (a1j + b1j)β1 + (a21j + b2j)β2 + · · ·+ (amj + bmj)βm.

As β1, β2, . . . , βm is linearly independent, we have cij = aij + bij for i = 1, 2, . . . ,m; j =1, 2, . . . , n. Hence, it follows that m(T +S) = m(T )+m(S). Let k ∈ F , then kT ∈ L(V,W ).Let m(kT ) = (dij)m,n, then,

(kT )(αj) = d1jβ1 + d2jβ2 + · · ·+ dmjβm for j = 1, 2, . . . , n.

Again (kT )(αj) = kT (αj), by the definition of kT , therefore,

k[a1jβ1 + a2jβ2 + · · ·+ amjβm]= (ka1j)β1 + (ka2j)β2 + · · ·+ (kamj)βm.

Therefore dij = kaij for i = 1, 2, . . . ,m, since β1, β2, . . . , βm is linearly independent.Consequently, dij = kaij for i = 1, 2, . . . ,m; j = 1, 2, . . . , n. It follows that m(kT ) =km(T ), and so, m is a homomorphism. To prove, m is an isomorphism, let m(T ) = m(S)for some T, S ∈ L(V,W ). Let m(T ) = (aij)m×n, m(S) = (bij)m×n, then,

T(αj) = a1jβ1 + a2jβ2 + · · ·+ amjβm for j=1,2, . . . ,n;S(αj) = b1jβ1 + b2jβ2 + · · ·+ bmjβm for j=1,2, . . . ,n.

m(T ) = m(S) ⇒ aij = bij for all i, j. Hence, T (αj) = S(αj) for j = 1, 2, . . . , n Let γ be anarbitrary vector in V . Since T (αj) = S(αj) for all basis vectors αj , T (γ) = S(γ) for allγ ∈ V and this implies T = S. Therefore m(T ) = m(S) ⇒ T = S, proving that m is one-to-one. To prove thatm is onto, let (aij)m×n ∈Mm,n. Then there exist a unique linear mappingT : V →W whose matrix is (aij), because if we prescribe the jth column of (aij) as the co-ordinates of T (αj) relative to (β1, β2, . . . , βm), i.e., T (αj) = a1jβ1+a2jβ2+ · · ·+amjβm thenT is determined uniquely with (ai,j) as the associated matrix. Thus m is an isomorphismand therefore the linear space L(V,W ) and Mm,n are isomorphic.

Theorem 5.6.6 Let V and W be two finite dimensional vector spaces over a field F andlet T : V → W be a linear mapping. Let A be the matrix of T relative to a pair of orderedbases of V and W and C be the matrix of T relative to a different pair of ordered bases of Vand W . Then the matrix C is equivalent to A, i.e., ∃ non-singular matrices P and Q suchthat C = P−1AQ.

Proof: Let V and W be two finite dimensional vector spaces over a field F with dimV = nand dimW = m, say. Let A be the matrix of T relative to the ordered bases (α1, α2, . . . , αn)


of V and (β1, β2, . . . , βn) of W . Let C be the matrix of T relative to a the ordered bases(γ1, γ2, . . . , γn) of V and (δ1, δ2, . . . , δn) of W . Let A = (aij)m×n, C = (cij)m×n. Then,

T (αj) = a1jβ1 + a2jβ2 + · · ·+ amjβm;T (γj) = c1jδ1 + c2jδ2 + · · ·+ cmjδm,

for j = 1, 2, . . . , n. Let T1 : V → V be such that T1(αi) = γi fori = 1, 2, . . . , n and Q be thematrix of T1 relative to the ordered bases (α1, α2, . . . , αn) of V . Since T1 maps a bases ofV onto other bases, T1 is non-singular and therefore Q is non-singular. Let Q = (qij)n×n,then

γj = T1(αj) = q1jα1 + q2jα2 + · · ·+ qnjαn; for j = 1, 2, . . . , n.

Let T2 : W → W be such that T2(βi) = δi for i=1,2, . . . ,n and P be the matrix ofT2 relative to the ordered bases (β1, β2, . . . , βn) of W . Since T2 maps a bases of W ontoother bases, T2 is non-singular and therefore P is non-singular. Let P = (pij)m×m, thenδj = T2(βj) = p1jβ1 + p2jβ2 + · · ·+ pmjβm for j = 1, 2, . . . ,m.

T (γj) = c1jδ1 + c2jδ2 + · · ·+ cmjδm

= c1j [p11β1 + p21β2 + · · ·+ pm1βm] + c2j [p12β1 + p22β2 + · · ·+ pm2βm]+ · · ·+ cmj [p1mβ1 + p2mβ2 + · · ·+ pmmβm]

=

(m∑

k=1

p1kckj

)β1 + · · ·+

(m∑

k=1

pmkckj

)βm,

and T (γij) = T [q1jα1 + q2jα2 + · · ·+ qnjαn]= q1jT (α1) + q2jT (α2) + · · ·+ qnjT (αn)= q1j [a11β1 + a21β2 + · · ·+ am1βm] + q2j [a12β1 + a22β2 + · · ·+ am2βm]

+ · · ·+ qnj [a1nβ1 + a2nβ2 + · · ·+ amnβm]

=

(m∑

k=1

a1kqkj

)β1 + · · ·+

(m∑

k=1

amkqkj

)βm,

Since (β1, β2, . . . , βn) is a basis, we have

m∑k=1

p1kckj =m∑

k=1

a1kqkj for i=1, 2, . . . ,m; j=1, 2, . . . , n.

This gives PC = AQ. Since P is non-singular, C = P−1AQ. Thus a linear mappingT ∈ L(V,W ) has different matrices relative to different pairs of ordered bases of V andW butall such matrices are equivalent matrices. Here PT is known as the transition matrix or thematrix oft the change of basis from the ordered bases (α1, α2, . . . , αn) of V to (β1, β2, . . . , βn)of W .

Theorem 5.6.7 Let V and W be two finite dimensional vector spaces over a field F withdimV = n and dimW = m and A,C are m× n matrices over F such that C = P−1AQ forsome non-singular matrices P,Q. Then ∃ a linear mapping T ∈ L(V,W ) such that A andC are matrices of T relative to different pairs of ordered bases of V and W .

Proof: Let us consider A = (aij)m×n; C = (cij)m×n; P = (pij)m×n; Q = (qij)n×n. Let(α1, α2, . . . , αn), (β1, β2, . . . , βn) be a pair of ordered bases of V and W respectively and letthe mapping T : V →W be defined by

T (αj) = a1jβ1 + a2jβ2 + · · ·+ anjβn, j = 1, 2, . . . , n.


Then T is a uniquely determined linear mapping and the matrix of T relative to the orderedbases (α1, α2, . . . , αn) and (β1, β2, . . . , βn), is A. Let a mapping T1 : V → V be defined byT1(αj) = q1jα1 + q2jα2 + · · · + qnjαn. Then T1 is a uniquely determined mapping on V .Since Q is a non-singular matrix, T1(α1), T1(α2), . . . , T1(αn) is a basis of V . Let T1(αi) =γi; i = 1, 2, . . . , n. Let the mapping T2 : W → W be defined by T2(βj) = p1jβ1 + p2jβ2 +· · · + pmjβm. Then T2 is a uniquely determined mapping on W . Since P is a non-singularmatrix, T2(β1), T2(β2), . . . , T2(βn) is a basis of W . Let T2(βi) = δi, i = 1, 2, . . . , n. LetT ′ be the linear mapping belongs to L(V,W ) whose matrix relative to the ordered bases(γ1, γ2, . . . , γn) of V and (δ1, δ2, . . . , δm) of W be C. Then,

T ′(γj) = c1jδ1 + c2jδ2 + · · ·+ cmjδm

= c1j [p11β1 + p21β2 + · · ·+ pm1βm] + c2j [p12β1 + · · ·+ pm2βm] + · · ·+cmj [p1mβ1 + · · ·+ pmmβm]

=

(m∑

k=1

p1kckj

)β1 + · · ·+

(m∑

k=1

pmkckj

)βm,

and T (γj) = T [q1jα1 + q2jα2 + · · ·+ qnjαn]= q1jT (α1) + q2jT (α2) + · · ·+ qnjT (αn)= q1j [a11β1 + · · ·+ am1βm] + q2j [a12β1 + · · ·+ am2βm] +· · ·+ qnj [a1nβ1 + · · ·+ amnβm]

=

(n∑

k=1

a1kqkj

)β1 + · · ·+

(n∑

k=1

amkqkj

)βm.

Now, C = P−1AQ, i.e., PC = AQ which gives

m∑k=1

pikckj =n∑

k=1

aikqkj for j = 1, 2, · · · , n.

So T ′(γj) = T (γj) for j = 1, 2, · · · , n and this proves T = T ′. Thus A is the matrix ofT relative to the ordered bases (α1, α2, . . . , αn) of V and (β1, β2, . . . , βm) of W ; C is thematrix of T relative to the ordered bases (γ1, γ2, . . . , γn) of V and (δ1, δ2, . . . , δm) of W . Pis the matrix of the non-singular mapping T2 ∈ L(W,W ) that maps βi to δi; Q is the matrixof the non-singular mapping T1 ∈ L(V, V ) that maps αi to γi. Thus, ∃ a linear mappingT ∈ L(V,W ) such that A and C are matrices of T relative to different pairs of ordered basesof V and W . It is the converse of the previous theorem.

Theorem 5.6.8 Let V and W be two finite dimensional vector spaces over a field F withdimV = n and dimW = m and A be an m×n matrices over F .Relative to two different pairsof ordered bases of V and W , let A determines two linear maps T1 and T2 such that m(T1) =A,m(T2) = A. Then ∃ non-singular linear mappings S ∈ L(V, V ) and R ∈ L(W,W ), suchthat T1 = R−1T2S.

Proof: Let (α1, α2, . . . , αn) and (γ1, γ2, . . . , γn) be a pair of ordered bases of V and let themapping S : V → V be defined by S(αi) = γi, i = 1, 2, . . . , n. Let (β1, β2, . . . , βm), (δ1, δ2, . . . , δm)be a pair of ordered bases of W and let R : W → W be defined by R(βi) = δi, i =1, 2, . . . ,m. Then S and R are non-singular. Let A = (aij)m×n, consider,

T1(αj) = a1jβ1 + a2jβ2 + · · ·+ amjβm;T2(γj) = a1jδ1 + a2jδ2 + · · ·+ amjδm, j = 1, 2, . . . , n.


Then, T2S(αj) = T2(γj) = a1jδ1 + a2jδ2 + · · ·+ amjδm,

RT1(αj) = R(a1jβ1 + a2jβ2 + · · ·+ amjβm)= a1jR(β1) + a2jR(β2) + · · ·+ amjR(βm)= a1jδ1 + a2jδ2 + · · ·+ amjδm.

Since T2S(αj) = RT1(αj) for j = 1, 2, . . . , n, T2S = RT1 and therefore T1 = R−1T2S.

Theorem 5.6.9 Let V and W be finite dimensional vector spaces over a field F withdimV = n and dimW = m and let T1, T2 be two linear maps in L(V,W ) such thatT1 = R−1T2S, where R is a non-singular mapping in L(W,W ), S is a non-singular map-ping in L(V, V ). Then m(T1) = m(T2) relative to different pairs of ordered bases of V andW .

Proof: Let (α1, α2, . . . , αn) (β1, β2, . . . , βm) be a pair of ordered bases of V and W respec-tively. Let S(αi) = γi andR(βi) = δi. SinceR and S are non-singular, (S(α1), S(α2), . . . , S(αn))is a basis of V and (R(β1), R(β2), . . . , R(βn)) is a basis of W . Let A = (aij)m×n be thematrix of T1 relative to the ordered bases (α1, α2, . . . , αn) of V and (β1, β2, . . . , βm) of W .Then

T1(αj) = a1jβ1 + a2jβ2 + · · ·+ amjβm,

RT1(αj) = R(a1jβ1 + a2jβ2 + · · ·+ amjβm)= a1jδ1 + a2jδ2 + · · ·+ amjδm.

Since RT1 = T2S, we have

T2S(αj) = a1jδ1 + a2jδ2 + · · ·+ amjδm

or, T2(γj) = a1jδ1 + a2jδ2 + · · ·+ amjδm.

This shows that A is the matrix of T2 relative to the ordered bases (α1, α2, . . . , αn) of Vand (β1, β2, . . . , βm) of W . This is the converse of the previous theorem.

Deduction 5.6.1 Matrix representation of a linear operator: Let V be a vectorspace of dimension n over a field F and T : V → V is a linear operator on V . Let(α1, α2, . . . , αn) be an ordered bases of V , then by theorem, T is completely determined bythe images T (α1), T (α2), . . . , T (αn) and so, each T (αi) is a linear combination of the vectorsα1, α2, . . . , αn in the basis, say,

T (α1) = a11α1 + a21α2 + · · ·+ an1αn

T (α2) = a12α1 + a22α2 + · · ·+ an2αn

......

T (αn) = a1nα1 + a2nα2 + · · ·+ annαn,

where aij are unique scalars in F determined by the ordered basis (α1, α2, . . . , αn). Proceed-ing in the similar arguments as in the case of linear mappings, the matrix representation ofthe linear operator T is Y = AX, where X is the co-ordinate vector of any arbitrary elementξ in V relative to the ordered basis (α1, α2, . . . , αn) and Y is the co-ordinate vector of T (ξ)relative to the same basis. Relative to the ordered bases (α1, α2, . . . , αn), The co-ordinate

vectors of T (α1), T (α2), . . . , T (αn) are

a11

a21

...an1

,

a12

a22

...an2

, · · · ,

a1n

a2n

...ann

respectively and

m(T ) = [T ] = [[T (α1)], [T (α2)], · · · , [T (αn)]] .


The matrix A =

a11 a12 · · · a1n

a21 a22 · · · a2n

......

......

an1 an2 · · · ann

is said to be the matrix of T relative to the basis

(α1, α2, . . . , αn) and is denoted by m(T ).

Ex 5.6.9 A linear mapping T : <3 → <3 is defined by T (x1, x2, x3) = (2x1 + x2 − x3, x2 +4x3, x1 − x2 + 3x3), (x1, x2, x3) ∈ <3. Find the matrix of T relative to the ordered bases

(i) ((1, 0, 0), (0, 1, 0), (0, 0, 1)) of <3;

(ii) ((0, 1, 1), (1, 0, 1), (1, 1, 0)) of <3.

Solution: (i) First, we write T (1, 0, 0), T (0, 1, 0) and T (0, 0, 1) as the linear combination ofthe basis vectors (1, 0, 0), (0, 1, 0), (0, 0, 1) of <3 as

T (1, 0, 0) = (2, 0, 1) = 2(1, 0, 0) + 0(0, 1, 0) + 1(0, 0, 1)T (0, 1, 0) = (1, 1,−1) = 1(1, 0, 0) + 1(0, 1, 0)− 1(0, 0, 1)T (0, 0, 1) = (−1, 4, 3) = −1(1, 0, 0) + 4(0, 1, 0) + 3(0, 0, 1).

Therefore the matrix of T is [T ] =

2 1 −10 1 41 −1 3

.

(ii) First, we write T (0, 1, 1), T (1, 0, 1) and T (1, 1, 0) as the linear combination of the basisvectors (0, 1, 1), (1, 0, 1), (1, 1, 0) of <3 as

T (0, 1, 1) = (0, 5, 2) =72(0, 1, 1)− 3

2(1, 0, 1) +

32(1, 1, 0)

T (1, 0, 1) = (1, 4, 4) =72(0, 1, 1) +

12(1, 0, 1) +

12(1, 1, 0)

T (1, 1, 0) = (3, 1, 0) = −1(0, 1, 1) + 1(1, 0, 1) + 2(1, 1, 0).

Therefore the matrix of T =

72

72 −1

− 32

12 1

32

12 2

.

Theorem 5.6.10 Let V be a finite dimensional vector space of dimension n over a field F .Let T ∈ L(V, V ) and m(T ) be the matrix of T with respect to an ordered basis α1, α2, . . . , αnof V , then m(T1T2) = m(T1).m(T2);∀T1, T2 ∈ L(V, V ).

Proof: Let m(T1) = (aij)n×n, m(T2) = (bij)n×n; aij , bij ∈ F. Then

T1(αj) = a1jα1 + a2jα2 + · · ·+ anjαn for j = 1, 2, . . . , n;T2(αj) = b1jα1 + b2jα2 + · · ·+ bnjαn for j = 1, 2, . . . , n.

T1T2(αj) = T1(T2(αj)) = T1(b1jα1 + b2jα2 + · · ·+ bnjαn)= b1jT1(α1) + b2jT1(α2) + · · ·+ bnjT1(αn)= b1j(a11α1 + · · ·+ an1αn) + · · ·+ bnj(a1nα1 + · · ·+ annαn)

=

(n∑

k=1

aikbkj

)α1 + · · ·+

(n∑

k=1

ankbkj

)αn.

This shows that,


m(T1T2) =

∑a1kbk1

∑a1kbk2 · · ·

∑a1kbkn∑

a2kbk1

∑a2kbk2 · · ·

∑a2kbkn

......

......∑

ankbk1

∑ankbk2 · · ·

∑ankbkn

= (aij)n×n.(bij)n×n = m(T1).m(T2).

Ex 5.6.10 The matrix of a linear mapping T : <3 → <3 relative to the ordered basis

((−1, 1, 1), (1,−1, 1), (1, 1,−1)) of <3 is

1 2 22 1 33 3 1

. Find the matrix of T relative to the

ordered basis ((0, 1, 1), (1, 0, 1), (1, 1, 0)).

Solution: Since the matrix m(T ) of the linear mapping T is given, so

T (−1, 1, 1) = 1.(−1, 1, 1) + 2(1,−1, 1) + 3.(1, 1,−1) = (4, 2, 0)T (1,−1, 1) = 2.(−1, 1, 1) + 1(1,−1, 1) + 3.(1, 1,−1) = (2, 4, 0)T (1, 1,−1) = 2.(−1, 1, 1) + 3(1,−1, 1) + 1.(1, 1,−1) = (2, 0, 4).

Let (x, y, z) ∈ <3 and ci ∈ <, so,

(x, y, z) = c1(−1, 1, 1) + c2(1,−1, 1) + c3(1, 1,−1)⇒ −c1 + c2 + c3 = x, c1 − c2 + c3 = y, c1 + c2 − c3 = z

⇒ c1 =12(y + z), c2 =

12(z + x), c3 =

12(x+ y),

⇒ T (x, y, z) = c1T (−1, 1, 1) + c2T (1,−1, 1) + c3T (1, 1,−1)

=12(y + z)(4, 2, 0) +

12(z + x)(2, 4, 0) +

12(x+ y)(2, 0, 4)

= (2x+ 3y + 3z, 2x+ y + 3z, 2x+ 2y).

Therefore, T (0, 1, 1) = (6, 4, 2), T (1, 0, 1) = (5, 5, 2) and T (1, 1, 0) = (5, 3, 4). Let,

(6, 4, 2) = c1(0, 1, 1) + c2(1, 0, 1) + c3(1, 1, 0)⇒ c2 + c3 = 6, c1 + c3 = 4, c1 + c2 = 2 ⇒ c1 = 0, c2 = 2, c3 = 4.

Let (5, 5, 2) = c1(0, 1, 1) + c2(1, 0, 1) + c3(1, 1, 0), then, c1 = 1, c2 = 1 and c3 = 4. Lastly, let(5, 3, 4) = c1(0, 1, 1) + c2(1, 0, 1) + c3(1, 1, 0), then, c1 = 1, c2 = 3 and c3 = 2. Therefore, the

matrix representation of T is

0 1 12 1 34 4 2

. Also, m(T ) is non-singular, so, T is also so and so

T−1 exists.

Theorem 5.6.11 Let V be a finite vector space of dimension n over a field F . Then alinear mapping T ∈ L(V, V ) is invertible iff the matrix of T relative to a chosen orderedbasis of V, i.e., m(T ) is non-singular.

Proof: Let T : V → V be an invertible mapping. Then there exist a linear mappingT ′ : V → V such that T.T ′ = T ′.T = I, I being the identity mapping on V . So m(TT ′) =m(T ′T ) = m(I). That is, m(T ).m(T ′) = m(T ′).m(T ) = In. This shows that m(T ) is non-singular.

Conversely, let m(T ) be a non-singular. Then there exist a matrix P such that m(T ).P =P.m(T ) = In. If T ′ be the linear mapping such that m(T ′) = P with respect to the samechosen basis of V , then m(T )m(T ′) = m(T ′)m(T ) = In. That is, m(TT ′) = m(T ′T ) =m(IV ). Since T → m(T ) is an isomorphism, we have TT ′ = T ′T = IV . This shows that Tis invertible and T ′ is inverse of T . This completes the proof.

Orthogonal Linear Transformation 341

Ex 5.6.11 Let (α1, α2, α3) be an ordered basis of a real vector space V and a linear mappingT : V → V is defined by T (α1) = α1 + α2 + α3, T (α2) = α1 + α2, T (α3) = α1. Show that Tis non-singular. Find the matrix of T−1 relative to the ordered basis (α1, α2, α3).

Solution: Let m(T ) be the matrix of T relative to the ordered basis (α1, α2, α3). Thelinear mapping T : V →W which maps the basis vectors as T (α1) = α1 +α2 +α3, T (α2) =α1 + α2, T (α3) = α1 can be written as

T (α1) = 1α1 + 1α2 + 1α3;T (α2) = 1α1 + 1α2 + 0α3;T (α3) = 1α1 + 0α2 + 0α3.

Therefore, the matrix representation is m(T ) =

1 1 11 1 01 0 0

. Since m(T ) is non-singular and

therefore T is non-singular. Hence,

T−1(α1 + α2 + α3) = α1;T−1(α1 + α2) = α2;T−1(α1) = α3.

Since the mapping T is linear, we have,

T−1(α1) + T−1(α2) + T−1(α3) = α3;T−1(α1) + T−1(α2) = α1;T−1(α1) = α3.

T−1(α1) = α3;T−1(α2) = α2 − α3;T−1(α3) = α1 − α2.

Therefore m(T−1) =

0 0 10 1 −11 −1 0

. We have seen that the matrix m(T ) associated with a

linear mapping T ∈ L(V, V ) depends on the choice of an ordered basis.

5.7 Orthogonal Linear Transformation

Let V be a finite dimensional Euclidean space. A linear mapping T : V → V is said to beorthogonal transformation on V if,⟨

T (α), T (β)⟩

= 〈α, β〉,∀α, β ∈ V. (5.7)

An orthogonal matrix preserves inner product. Let V be an Euclidean space. If a linearmapping T : V → V is orthogonal on V then for all α, β ∈ V,

(i) 〈α, β〉 = 0 ⇒ 〈T (α), T (β)〉 = 0;

(ii) ||T (α)|| = ||α||;

(iii) ||T (α)− T (β)|| = ||α− β||;

(iv) T is one-one.

Result 5.7.1 If 〈α, β〉 = 0 and T be an orthogonal transformation, then 〈T (α), T (β)〉 =〈α, β〉 = 0. Thus an orthogonal transformation preserves orthogonality. But, a lineartransformation preserving orthogonality is not necessarily an orthogonal transformation. Onthe finite dimensional Euclidean space <, the mapping T : < → < defined by T (α) = 2α is alinear transformation and 〈T (α), T (β)〉 = 〈2α, 2β〉 = 4〈α, β〉. Hence T is not an orthogonaltransformation though 〈α, β〉 = 0 ⇒ 〈T (α), T (β)〉 = 0, i.e., though under T , orthogonalityof vectors is preserved.


Theorem 5.7.1 A linear mapping T : V → V is orthogonal if and only if T preserveslength of vectors, i.e., ||T (α)|| = ||α||.

Proof: First let T be orthogonal, then ∀α ∈ V, we have T (α)T (α) = α.α, i.e.,

[||T (α)||]2 = (||α||)2 ⇒ ||T (α)|| = ||α||.

Conversely, let ||T (α)|| = ||α||,∀x ∈ V. Now, if α, β ∈ V, then ||T (α) − T (β)|| = ||α − β||,i.e.,

〈T (α− β), T (α− β)〉 = 〈α− β, α− β〉or, 〈T (α)− T (β), T (α)− T (β)〉 = 〈α− β, α− β〉; as T is linearor, ||T (α)||2 − 2〈T (α)T (β)〉+ ||T (β)||2 = ||α||2 − 2〈α, β〉+ ||β||2

or, 〈T (α), T (β)〉 = 〈α, β〉; ||T (α)|| = ||α||, ||T (β)|| = β.

Hence the theorem.

Theorem 5.7.2 A linear mapping T : V → V is orthogonal if and only if for every unitvector α ∈ V , the mapping T (α) is also an unit vector.

Proof: First let, the linear mapping T : V → V is orthogonal, then

||α|| = 1 ⇒ ||T (α)|| =√〈T (α), T (α)〉 =

√〈α, α〉 = 1.

Conversely, let T be linear and ||α|| = 1 ⇒ ||T (α)|| = 1. We shall first show ||T (α)|| =||α||;∀α ∈ V. If α = θ, then T (θ) = θ and hence ||T (θ)|| = ||θ||. Let α 6= θ. If ||α|| = 1,then the result holds. If ||α|| 6= 1, then unit vector α = 1

||α||α is an unit vector. Hence,

||T (θ)|| = ||θ|| gives

||T(

1||α||

α

)|| = || 1

||α||α|| ⇒ || 1

||α||T (α)|| = || 1

||α||α||

⇒ 1||α||

||T (α)|| = 1||α||

||α|| ⇒ ||T (α)|| = ||α||.

Hence by the previous theorem, T is orthogonal.

Theorem 5.7.3 Let α1, α2, · · · , αn be a basis of Euclidean space V of dimension n. Thena linear mapping T : V → V is orthogonal on V if and only if 〈T (αi), T (αj)〉 = 〈αi, αj〉,∀i, j.

Proof: First, let the linear mapping T : V → V is orthogonal on V , then, by definition,〈T (α), T (β)〉 = 〈α, β〉,∀α, β ∈ V. Therefore,

〈T (αi), T (αj)〉 = 〈αi, αj〉,∀i, j.

Conversely, let, 〈T (αi), T (αj)〉 = 〈αi, αj〉,∀i, j. Let α, β be any two elements of V

α =n∑

k=1

akαk; β =n∑

k=1

bkβk,

where, ai, bi ∈ <. Since T is a linear transformation, T (α) =n∑

k=1

akT (αk); T (β) =n∑

k=1

bkT (βk), and so,

Orthogonal Linear Transformation 343

〈T (α), T (β)〉 =

⟨n∑

k=1

akT (αk),n∑

k=1

bkT (βk)

⟩=

n∑k=1

n∑l=1

akbk 〈T (αk), T (αl)〉

and 〈α, β〉 =

⟨n∑

k=1

akαk,n∑

k=1

bkβk

⟩=

n∑k=1

n∑l=1

akbk 〈αk, αl〉 .

Since, 〈T (α), T (β)〉 = 〈α, β〉,∀α, β ∈ V , therefore, T is an orthogonal on V .

Theorem 5.7.4 Let V be a finite dimensional Euclidean space. Then the linear mappingT : V → V is orthogonal on V if and only if T maps an orthonormal basis to anotherorthonormal basis.

Proof: Let α1, α2, · · · , αn be an orthonormal basis of V , then by definition,

〈αi, αj〉 = 1; if i = j

= 0; if i 6= j.

First let T : V → V is orthogonal mapping on V , then by definition, 〈T (α), T (β)〉 =〈α, β〉,∀α, β ∈ V and so, 〈T (αi), T (αj)〉 = 〈αi, αj〉,∀i, j. Therefore,

〈T (αi), T (αj)〉 = 1; if i = j

= 0; if i 6= j.

This proves that, T (α1), T (α2), · · · , T (αn) is an orthonormal set and as this contains nvectors, it is an orthonormal basis of V .Conversely, let, α1, α2, · · · , αn be an orthonormal basis of V and T (α1), T (α2), · · · , T (αn)is also an orthonormal set. Let, α, β ∈ V can be written as

α =n∑

i=1

aiαi, ai ∈ <;β =n∑

i=1

biαi, bi ∈ <.

Since, α1, α2, · · · , αn be an orthonormal basis of V , we have 〈α, β〉 =n∑

i=1

aibi. Since,

T : V → V is linear, so

T (α) =n∑

i=1

aiT (αi), ai ∈ <;T (β) =n∑

i=1

biT (αi), ai, bi ∈ <

⇒ 〈T (α), T (β)〉 =n∑

i=1

aibi,

as α1, α2, · · · , αn be an orthonormal basis of V . Therefore, 〈T (α), T (β)〉 = 〈α, β〉 , for allα, β ∈ V and this shows that T is an orthogonal mapping.

Theorem 5.7.5 Let V be an Euclidean space of dimension n. Let A be the matrix of thelinear mapping T : V → V relative to an orthonormal basis. Then T is orthogonal on V ifand only if A is real orthogonal matrix.

Proof: Let A = [aij ]m×n be a matrix of T relative to the ordered orthonormal basis(α1, α2, · · · , αn) of V , then

T (αj) =n∑

i=1

aijαi; forj = 1, 2, · · · , n

〈T (αi), T (αj)〉 =n∑

i=1

aijaij ; as (α1, α2, · · · , αn) is orthonormal.


Let the mapping T : V → V be orthogonal on V , then 〈T (αi), T (αj)〉 = 〈αi, αj〉 . Since(α1, α2, · · · , αn) is an orthogonal set, so,

n∑i=1

aijaij = 1; if i = j

= 0; if i 6= j.

Therefore, ATA = In and this shows that A is an orthogonal matrix. Conversely, let, A bean orthogonal matrix, then ATA = In. Therefore,

n∑i=1

aijaij = 1; if i = j

= 0; if i 6= j.

Hence, 〈T (αi), T (αj)〉 =n∑

i=1

aijaij , as α1, α2, · · · , αn is an orthonormal set. Consequently,

〈T (αi), T (αj)〉 = 1; if i = j

= 0; if i 6= j.

This proves that, T (α1), T (α2), · · · , T (αn) is an orthonormal set and as this contains nvectors, it is an orthonormal basis of V . As T maps an orthonormal basis α1, α2, · · · , αnto another orthonormal basis of V , so T is orthogonal.

5.8 Linear Functional

In this section, we are concerned exclusively with linear transformations from a vector spaceV into its field of scalars F , which is itself a vector space of dimension 1 over F . Let V (F )be a vector space. A linear functional on V is defined by a linear mapping φ : V → F suchthat for every α, β ∈ V and for every a, b ∈ F ,

φ(aα+ bβ) = aφ(α) + bφ(β). (5.8)

A linear functional on V is a linear mapping from V into F . For example,

(i) Let V be a vector space of continuous real valued functions over [0, 2π]. The functionh : V → R defined by

h(x) =12π

∫ 2π

0

x(t) g(t) dt; g ∈ V

is a linear functional on V . In particular, if g(t) = sinnt or cosnt, h(x) is often calledthe n’th Fourier coefficient of x.

(ii) Let V = Mn×n(F ) and define φ : V → F by φ(A) = tr(A), the trace of A. Then, φ isa linear functional.

(iii) Let V be a finite dimensional vector space, and let α = α1, α2, . . . , αn be an orderedbasis of V . For each i = 1, 2, . . . , n, define φi(β) = ai where [β]α = (a1, a2, . . . , an)T

is the co-ordinate vector of β relative to α. Then φi is a linear functional on V calledthe ith co-ordinate function with respect to the basis α. Note that φi (αj) = δij .

Linear Functional 345

5.8.1 Dual Space

Let V (F ) be a vector space. The set of all linear functionals on a vector space V (F ) is alsoa vector space over F with addition and scalar multiplication, defined by,

(φ+ ψ)(α) = φ(α) + ψ(α) and (aφ)(α) = aφ(α), (5.9)

where φ and ψ are linear functionals on V and a ∈ F . This vector space L(V, F ) is calledthe dual space of V and is denoted by V ∗.

Ex 5.8.1 Let φ : <3 → < and ψ : <3 → < be the linear functionals defined by φ(x, y, z) =2x− 3y + z and ψ(x, y, z) = 4x− 2y + 3z. Find (i) φ+ ψ, (ii) 3φ and (iii) 2φ− 5ψ.

Solution: Here we use the property of linearity of the functionals φ and ψ. Now

(i)(φ+ ψ)(x, y, z) = φ(x, y, z) + ψ(x, y, z)= 2x− 3y + z + 4x− 2y + 3z = 6x− 5y + 4z.

(ii) By the same property, we have

(3φ)(x, y, z) = 3φ(x, y, z) = 3(2x+ 3y + z)= 6x− 9y + 3z.

(iii)(2φ− 5ψ)(x, y, z) = 2φ(x, y, z)− 5ψ(x, y, z)= 2(2x− 3y + z)− 5(4x− 2y + 3z) = −16x+ 4y − 13z.

Ex 5.8.2 Let φ be the linear functional on <2 defined by φ(2, 1) = 15 and φ(1,−2) = −10.Find φ(x, y) and φ(−2, 7).

Solution: Let φ(x, y) = ax+ by. Using the conditions φ(2, 1) = 15 and φ(1,−2) = −10, wehave,

2a+ b = 15 and a− 2b = −10 ⇒ a = 4, b = 7.

Thus φ(x, y) = 4x+ 7y and so φ(−2, 7) = 41.

Theorem 5.8.1 Let α1, α2, · · · , αn be a basis of a finite dimensional vector space V (F ).Let φ1, φ2, · · · , φn ∈ V ∗ be the linear functionals, defined by, φi(αj) = δij ; δij = Kornecker delta,then φi; i = 1, 2, · · · , n is a basis of V ∗.

Proof: Let α be an arbitrary element in V ∗ and let us suppose that

φ(α1) = a1, φ(α2) = a2, · · · , φ(αn) = an.

We are first show that αi : i = 1, 2, · · · , n spans V ∗. Let ψ =n∑

i=1

aiφi, then,

ψ(α1) = (a1φ1 + a2φ2 + · · ·+ anφn)(α1)= a1φ1(α1) + a2φ2(α1) + · · ·+ anφn(α1)= a1.1 + a2.0 + · · ·+ an.0 = a1 = φ(α1).

In general, we can write ψ(αi) = ai = φ(αi); i = 1, 2, · · · , n. Since, φ and ψ agree on the

basis vectors, so φ = ψ =n∑

i=1

aiφi, accordingly, φi; i = 1, 2, · · · , n spans V ∗. Now, we are


to show that, φi; i = 1, 2, · · · , n is LI. For this, let,

c1φ1 + c2φ2 + · · ·+ cnφn = θ

⇒ (c1φ1 + c2φ2 + · · ·+ cnφn)(αi) = θ(αi)⇒ c1φ1(αi) + c2φ2(αi) + · · ·+ cnφn(αi) = 0

⇒n∑

k=1

ckδki = 0, for i = 1, 2, · · · , n

so that c1 = c2 = · · · = cn = 0. Hence, φi; i = 1, 2, · · · , n is LI and consequently, it formsa basis of V ∗. This basis is defined as a dual basis or the basis dual to αi.

Ex 5.8.3 Find the dual basis of the ordered basis set S = (2, 1), (3, 1) for <2.

Solution: Let the dual basis of S is given by S∗ = φ1, φ2. To explicitly define a formulafor φ1, we need to consider equations

1 = φ1(2, 1) = φ1(2e1 + e2) = 2φ1(e1) + φ1(e2)0 = φ1(3, 1) = φ1(3e1 + e2) = 3φ1(e1) + φ2(e2)

Solving these equations, we obtain φ1(e1) = −1 and φ2(e2) = 3, i.e. φ1(x, y) = −x + 3y.Similarly, φ2(x, y) = x− 2y.

Theorem 5.8.2 (Dual basis) : The dual space of an n dimensional vector space is n di-mensional.

Proof: Let V (F ) be an n dimensional vector space and let V ∗ be the dual space of V . LetS = α1, α2, · · · , αn be an ordered basis of V (F ). Then, for each i = 1, 2, · · · , n there existsan unique linear functional φi on V such that, φi(αj) = δij . Let, S′ = φ1, φ2, · · · , φn, theclearly, S′ ⊆ V ∗. We are to show that S′ is a basis of V ∗. To show φi; i = 1, 2, · · · , n islinearly independent let,

c1φ1 + c2φ2 + · · ·+ cnφn = θ

⇒ (c1φ1 + c2φ2 + · · ·+ cnφn)(αi) = θ(αi)⇒ c1φ1(αi) + c2φ2(αi) + · · ·+ cnφn(αi) = 0

⇒n∑

k=1

ckδki = 0, for i = 1, 2, · · · , n

so that c1 = c2 = · · · = cn = 0. Hence, S′ = φi; i = 1, 2, · · · , n is LI. Further, let φ be anarbitrary element in V ∗ and let us suppose that

φ(α1) = a1, φ(α2) = a2, · · · , φ(αn) = an.

Let α be any element of V . Since, S is a basis of V , Let α =n∑

i=1

ciαi, for some ci’s ∈ F .

Therefore,

φ(α) = φ(c1α1 + c2α2 + · · ·+ cnαn)= c1φ(α1) + c2φ(α2) + · · ·+ cnφ(αn)= c1.a1 + c2.a2 + · · ·+ cn.an

=n∑

i=1

ciai =n∑

i=1

ai.(n∑

j=1

cjδij)


=n∑

i=1

ai

n∑j=1

cj .φi(αj)

=n∑

i=1

ai.φi[n∑

j=1

cjαj ]

=n∑

i=1

ai.φi(α) = (a1φ1 + a2φ2 + · · ·+ anφn)(α).

Thus, φ = a1φ1 + a2φ2 + · · · + anφn, i.e., each element of V ∗ can be expressed as a linearcombination of elements of S′. Thus, S′ generates V ∗ and consequently, S′ is a basis of V ∗.Accordingly, dimV ∗ = n = dimV.

Ex 5.8.4 Find the dual basis of the bases set S = (1,−2, 3), (1,−1, 1), (2,−4, 7) of V3(<).

Solution: Let α1 = (1,−2, 3), α2 = (1,−1, 1) and α3 = (2,−4, 7), then, the basis set ofV3(<) is S = α1, α2, α3. We are to find the dual basis S∗ = φ1, φ2, φ3 of S. We seekthe functionals the functionals

φi(x, y, z) = aix+ biy + ciz; i = 1, 2, 3

where by definition of the dual basis φi(αj) = δij . Using these conditions, we have

φ1(α1) = φ1(1,−2, 3) = 1;φ2(α1) = φ2(1,−2, 3) = 0;φ3(α1) = φ3(1,−2, 3) = 0φ1(α2) = φ1(1,−1, 1) = 0;φ2(α2) = φ2(1,−1, 1) = 1;φ3(α2) = φ3(1,−1, 1) = 0φ1(α3) = φ1(2,−4, 7) = 0;φ2(α3) = φ2(2,−4, 7) = 0;φ3(α3) = φ3(2,−4, 7) = 1.

Thus we have the following system of equations

a1 − 2b1 + 3c1 = 1; a1 − b1 + c1 = 0; 2a1 − 4b1 + 7c1 = 0a2 − 2b2 + 3c2 = 0; a2 − 2b2 + c2 = 1; 2a2 − 4b2 + 7c2 = 0a3 − 2b3 + 3c3 = 0; a3 − b3 + c3 = 0; 2a3 − 4b3 + 7c3 = 0.

Solving, the system of equations yields, a1 = −3, b1 = −5, c1 = −2, so φ1(x, y, z) = −3x −5y − 2z. Similarly, φ2(x, y, z) = 2x + y and φ3(x, y, z) = x + 2y + z. Therefore, S∗ =φ1, φ2, φ3 is the dual basis of S, where φ1, φ2, φ3 are defined as above.

Ex 5.8.5 Find a basis of the annihilator W 0 of the subspace W of <4 spanned by α1 =(1, 2,−3, 4) and α2 = (0, 1, 4,−1).

Solution: Here, we seek to find a basis of the set of linear functionals φ such that φ(α1) = 0and φ(α2) = 0, where, φ(x, z, t) = ax+ by + cz + dt. Thus

φ(1, 2,−3, 4) = a+ 2b− 3c+ 4d = 0φ(0, 1, 4,−1) = b+ 4c− d = 0.

The system of two equations in the unknowns a, b, c and d is in echelon form with freevariables c and d.(i) Let c = 1, d = 0, then a = 11, b = −4. In this case, the linear function is φ1(α) =11x− 4y + z.(ii) Let c = 0, d = 1, then a = 6, b = −1. In this case, the linear function is φ2(α) = 6x−y+t.The linear functions φ1(α) and φ2(α) form a basis of the annihilator W 0.

Ex 5.8.6 Let V be the vector space of polynomials over < of maximum degree 2. Let

φ1, φ2, φ3 be the linear functionals on V defined by φ1(f(t)) =1∫0

f(t)dt, φ2(f(t)) = f ′(1),

φ3(f(t)) = f(0). Here f(t) = a+ bt+ ct2 ∈ V and f ′(t) denotes the derivative of f(t). Findthe basis f1(t), f2(t), f3(t) of V that is dual to φ1, φ2, φ3.


Solution: Let αi = fi(t) = ai + bit + cit2; i = 1, 2, 3. By definition of dual basis, we have

φ1(αi) = δ1i . Thus,

φ1(α1) =∫ 1

0

(a1 + b1t+ c1t2)dt = a1 +

b12

+c13

= 1

φ1(α2) =∫ 1

0

(a2 + b2t+ c2t2)dt = a2 +

b22

+c23

= 0

φ1(α3) =∫ 1

0

(a3 + b3t+ c3t2)dt = a3 +

b32

+c33

= 0.

Using the definition φ2(f(t)) = f ′(1), we get,

φ2(α1) = f ′1(1) = b1 + 2c1 = 0φ2(α2) = f ′2(1) = b2 + 2c2 = 1φ2(α3) = f ′3(1) = b3 + 2c3 = 0.

Using the definition φ3(f(t)) = f(0), we get,

φ3(α1) = f1(0) = a1 = 0φ3(α2) = f2(0) = a2 = 0φ3(α3) = f3(0) = a3 = 1.

Solving each system yields, a1 = 0, b1 = 3, c1 = − 32 ; a2 = 0, b2 = − 1

2 , c2 = 34 and a3 =

1, b3 = −3, c3 = 32 . Thus f1(t) = 3t− 3

2 t2, f2(t) = − 1

2 t+34 t

2 and f3(t) = 1− 3t+ 32 t

2. Thusf1(t), f2(t), f3(t) given above of the basis of V that is dual to φ1, φ2, φ3.

Theorem 5.8.3 Let α1, α2, · · · , αn be a basis of V and let φ1, φ2, · · · , φn be the dualbasis of <n. Then for any vector β ∈ V, β = φ1(β)α1 + φ2(β)α2 + · · · + φn(β)αn, and forany linear functional ψ ∈ V ∗, ψ = ψ(α1)φ1 + ψ(α2)φ2 + · · ·+ ψ(αn)φn.

Proof: Let β = a1α1 + a2α2 + · · ·+ anαn, for some ai ∈ F . Thus,

φ1(β) = φ1(a1α1 + a2α2 + · · ·+ anαn)= a1φ1(α1) + a2φ1(α2) + · · ·+ anφ1(αn)= a1.1 + a2.0 + · · ·+ an.0 = a1.

Thus, in general, φi(β) = ai; i = 1, 2, · · · , n and therefore,

β = φ1(β)α1 + φ2(β)α2 + · · ·+ φn(β)αn.

Applying the functional ψ in both sides, we get,

ψ(β) = φ1(β)ψ(α1) + φ2(β)ψ(α2) + · · ·+ φn(β)ψ(αn)= ψ(α1)φ1(β) + ψ(α2)φ2(β) + · · ·+ ψ(αn)φn(β)= (ψ(α1)φ1 + ψ(α2)φ2 + · · ·+ ψ(αn)φn) (β).

This relation holds for ∀β ∈ V and so ψ ≡ ψ(α1)φ1 +ψ(α2)φ2 + · · ·+ψ(αn)φn. This theoremgives the relationship between bases and their duals.

Theorem 5.8.4 Let α = α1, α2, · · · , αn and β = β1, β2, · · · , βn be a bases of V andlet φ = φ1, φ2, · · · , φn and ψ = ψ1, ψ2, · · · , ψn be the bases of V ∗ dual to α and βrespectively. If T be the transition matrix from α to β, then (T−1)t is the transition matrixfrom φ to ψ.


Proof: Let the elements of β can be written in the linear combination of the elements ofα as

βi =n∑

j=1

aijαj ; i = 1, 2, · · · , n

so that by the given condition T = [aij ]n×n. Now, the elements of ψ can be written in termsof φ as,

ψi =n∑

j=1

bijφj ; i = 1, 2, · · · , n

where R = [bij ]n×n. We shall show that, R = [T−1]t. Let Ri = (bi1, bi2, · · · , bin) be the ith

row of R and Cj = (aj1, aj2, · · · , ajn)t be the jth column of T t. Then by the definition ofdual space,

ψi(βj) =n∑

k=1

(bikφk)(ajkαk) =n∑

k=1

bikajk = RiCj = δij

⇒ RT t =

R1C1 R1C2 · · · R1Cn

R2C1 R2C2 · · · R2Cn

.... . .

...RnC1 RnC2 · · · RnCn

=

1 0 . . . 00 1 . . . 0...

. . ....

0 0 · · · 1

= In

⇒ R = (T t)−1 = (T−1)t.

Therefore, if T be the transition matrix from α to β, then (T−1)t is the transition matrixfrom φ to ψ.

5.8.2 Second Dual Space

Let V (F ) be a vector space. Then its dual space V ∗, containing of all the linear functionalsof V , is also a vector space. Hence V ∗ itself has a dual space V ∗∗, consists of all linearfunctional on V ∗, called the second dual of V , which consists of all linear functionals of V ∗.

Theorem 5.8.5 Each element of α ∈ V determines a specific element of α ∈ V ∗∗.

Proof: For every φ ∈ V ∗, we define a mapping α : V ∗ → F by α(φ) = φ(α). First, we areto show that α : V ∗ → F is linear. For this, let a, b ∈ F and φ, ψ ∈ V ∗, we have,

α(aφ+ bψ) = (aφ+ bψ)(α) = aφ(α) + bψ(α) = aα(φ) + bα(α).

Therefore, the mapping α : V ∗ → F is linear. Next we are to show that, if V is finitedimensional, then the mapping α : V ∗ → F is an isomorphism of V onto V ∗∗. Let α(6= θ) ∈V , then ∃φ ∈ V ∗ such that

φ(α) 6= 0 ⇒ α(φ) = φ(α) 6= 0⇒ α 6= θ.

Since α 6= θ ⇒ α 6= θ, the mapping α : V ∗ → F is non singular and so it is an isomorphism.Since V is finite dimensional, we have,

dimV = dimV ∗ = dimV ∗∗.

Hence the mapping α : V ∗ → F is an isomorphism and so that each element α ∈ Vdetermines an unique element α ∈ V ∗∗. Thus the isomorphism between V and V ∗∗ doesnot depend on any choice of bases for the two vector spaces.


(i) Let V be a finite dimensional vector space and let α ∈ V . If α(φ) = 0 for all φ ∈ V ∗,then α = θ.

(ii) Let φ1, φ2, . . . , φn be a ordered basis for V ∗. Then by this theorem, we concludethat for this ordered basis, there exists a dual basis α1, α2, . . . , αn in V ∗∗, i.e.

δij = αi (φj) = φj (αi) ; ∀ i, j.

Thus, φ1, φ2, . . . , φn is a dual basis of α1, α2, . . . , αn. Therefore, if V be a finitedimensional vector space with dual space with dual space V ∗, then every ordered basisfor V ∗ is the dual basis for some basis for V .

5.8.3 Annihilators

Let V (F ) be a vector space and let S be a subset (not necessarily subspace) of V . Anannihilator of S, denoted by S0, is defined by,

S0 = f ∈ V ∗; f(α) = 0; ∀α ∈ S (5.10)

where f is a linear functional in V ∗. Clearly, θ0 = V ∗ and V 0 = θ.

Theorem 5.8.6 If S is any subset of a vector space V (F ), then S0 is a subspace of V ∗.

Proof: By definition of annihilators, 0 ∈ S0 as θ(α) = 0,∀α ∈ S, thus S 6= φ. Let ussuppose φ, ψ ∈ S0, then ∀α ∈ S, we have φ(α) = 0 and ψ(α) = 0. Thus for any scalarsa, b ∈ F , we have

(aφ+ bψ)(α) = aφ(α) + bψ(α) = a0 + b0 = 0⇒ aφ+ bψ ∈ S0.

Thus, φ ∈ S0, ψ ∈ S0 implies aφ+ bψ ∈ S0,∀a, b ∈ F . Hence S0 is a subspace of V ∗.

Theorem 5.8.7 Let V (F ) be a finite dimensional vector space and W is a subspace of V .Then dimW + dimW 0 = dimV.

Proof: Case 1: If W = θ, then W 0 = V ∗. Thus dimW = 0 and

dimW 0 = dimW ∗ = dimV.

In this case the result follows.Case 2: Again if W = V , then W 0 = θ. Therefore,

dimW = dimV and dimW 0 = 0.

So, in this case also, the result follows.Case 3: Now let W is a proper subspace of V . Let dimV = n and dimW = r, 0 < r < n.We are to show that dimW 0 = n−r. Let S = α1, α2, · · · , αr be a basis of W . By extensiontheorem S can be extended to the basis of V as S1α1, α2, · · · , αr, β1, β2, · · · , βn−r. Let thedual space of V ∗ be φ1, φ2, · · · , φn−r such that

φi(αj) = δij , ψi(βj) = δij .

By definition, of the dual basis, each of ψi’s annihilates each αi and so ψi ∈ W 0; i =1, 2, · · · , n− r. We assert that ψi is a basis of W 0 and it is LI, as it is a subset of a LI set.We are to show that ψj spans W 0. Let ψ ∈W 0, then

ψ = ψ(α1)φ1 + ψ(α2)φ2 + · · ·+ ψ(αr)φr + ψ(β1)ψ1 + · · ·+ ψ(βn−r)ψn−r

= 0φ1 + 0φ2 + · · ·+ 0φr + ψ(β1)ψ1 + · · ·+ ψ(βn−r)ψn−r

= ψ(β1)ψ1 + · · ·+ ψ(βn−r)ψn−r.


Thus each element of W 0 is a linear combination of elements of S. Hence φ1, φ2, · · · , φn−rspans W 0 and so it is the basis of W 0. Hence dimW 0 = n− r = dimV − dimW.Corrolary: Here W is exactly the set of vectors α such that φi(α) = 0; i = r+1, r+2, · · · , n.In case r = n − 1, W is the null space of φr. Thus if W is a r−dimensional subspace of an−dimensional vector space V , then W is the intersection of (n− r) hyperspaces in V .Corrolary: Let W1 and W2 be two finite dimensional vector space such that W1 = W2. IfW1 = W2 then obviously, W 0

1 = W 02 . If W1 6= W2, then one of the two subspaces contains

a vector which is not in the other. Let a vector α ∈ W2 but α 6∈ W1. By the previouscorollary, there is a linear functional φ such that φ(β) = 0 for all β ∈ W , but φ(α) 6= 0.Then φ is in W 0

1 but not in W 02 and W 0

1 6= W 02 . Thus if W1 and W2 are subspaces of a finite

dimensional vector space, then W1 6= W2 if and only if W 01 6= W 0

2 .

Result 5.8.1 The first corollary says that, if we select some ordered basis for the space,each r−dimensional subspace can be described by specifying (n − r) homogeneous linearcondition on the coordinates relative to that basis. Now, we are to look briefly at systemsof homogeneous linear equations from the point of view of linear functionals. Consider thesystem of linear equations

A11x1 +A12x2 + . . .+A1nxn = 0A21x1 +A22x2 + . . .+A2nxn = 0

......

Am1x1 +Am2x2 + . . .+Amnxn = 0

(5.11)

for which we wish to find the solutions. If we let φi; i = 1, 2, · · · ,m be the linear functionalon Fm defined by

φi(x1, x2, · · · , xn) = Ai1x1 +Ai2x2 + . . .+Ainxn

then we are seeking the subspace annihilated by φ1, φ2, · · · , φm. Row reduction of the coef-ficient matrix provides us with a systematic method of finding this subspace. The n tuple(Ai1, Ai2, · · · , Ain) gives the coordinates of the linear functional φi relative to the basis whichis dual to the standard basis for Fn. The row space of the coefficient matrix may thus beregarded as the space of linear functionals spanned by φ1, φ2, · · · , φm. The solution space isthe subspace annihilated by the space of functionals.Now, we are to describe at the system of equations from the ‘dual’ point of view, i.e., supposethat we are given m vectors in Fn as

αi = (Ai1, Ai2, · · · , Ain)

and we wish to find the annihilator of the subspace spanned by these vectors. As a typicallinear functional on Fn has the form

φ(x1, x2, · · · , xn) = c1x1 + c2x2 + · · ·+ cnxn

the condition that φ be in the annihilator is that

n∑j=1

Aijcj = 0; i = 1, 2, · · · ,m,

which shows that (c1, c2, · · · , cn) is a solution of the homogeneous system Ax = 0. Thereforefrom the ‘dual’ point of view, row reduction gives us a systematic method of finding theannihilator of the subspace spanned by a given finite set of vectors in Fn.


Ex 5.8.7 Let W be the subspace of <5 which is spanned by the vectors α1 = (2,−2, 3, 4,−1),α2 = (−1, 1, 2, 5, 2), α3 = (0, 0,−1,−2, 3) and α4 = (1,−1, 2, 3, 0). How does one describeW 0, the annihilator of W?

Solution: Let us form a 4 × 5 matrix A with row vectors α1, α2, α3, α4 and find the rowreduced echelon matrix which is row equivalent to A as

A =

2 −2 3 4 −1−1 1 2 5 20 0 −1 −2 31 −1 2 3 0

∼

1 −1 0 −1 00 0 1 2 00 0 0 0 10 0 0 0 0

= R(say).

Let φ be a linear functional on <5 as φ(x1, x2, x3, x4, x5) =5∑

j=1

cjxj , then φ is in W 0 if and

only if φ(αi) = 0 for i = 1, 2, 3, 4, i.e., if and only if

5∑j=1

Aijcj = 0; i = 1, 2, 3, 4 ⇒5∑

j=1

Rijcj = 0; i = 1, 2, 3

⇒ c1 − c2 − c4 = 0, c3 + 2c4 = 0, c5 = 0.

We obtain all such linear functionals φ by assigning arbitrary values of c2 and c4, say c2 = aand c4 = b, so that c1 = a + b, c3 = −2b, c5 = 0. Therefore, W 0 consists of all linearfunctionals φ of the form

φ(x1, x2, x3, x4, x5) = (a+ b)x1 + ax2 − 2bx3 + bx4.

The dimension of W 0 is 2 and a basis φ1, φ2 for W 0 can be found by taking a = 1, b = 0and a = 1, b = 0 as

φ1(x1, x2, x3, x4, x5) = x1 + x2;φ2(x1, x2, x3, x4, x5) = x1 − 2x3 + x4.

The above general φ in W 0 is φ = aφ1 + bφ2.

Ex 5.8.8 Find the subspace which φ1, φ2, φ3 annihilate, where the three functionals on <4

are φ1(x1, x2, x3, x4) = x1 +2x2 +2x3 +x4, φ2(x1, x2, x3, x4) = 2x2 +x4, φ3(x1, x2, x3, x4) =−2x1 − 4x3 + 3x4.

Solution: The subspace which φ1, φ2, φ3 annihilate may be found explicitly, by forminga 3 × 4 matrix A with coefficients as row vectors and by finding the row reduced echelonmatrix which is row equivalent to A as

A =

1 2 2 10 2 0 1−2 0 −4 3

∼

1 0 2 00 1 0 00 0 0 3

.

Therefore, the linear functionals ψ1, ψ2, ψ3 given by

ψ1(x1, · · · , x4) = x1 + 2x3, ψ2(x1, · · · , x4) = x2, ψ3(x1, · · · , x4) = x4

span the same subspace of <4∗ and annihilate the same subspace of <4 as do φ1, φ2, φ3. Thesubspace annihilated consists of the vectors with x1 = −2x3, x2 = x4 = 0.

Theorem 5.8.8 If φ ∈ V ∗ annihilates a subset W of V , then φ annihilates the L(W ) ofW .


Proof: Let α ∈ L(W ), then ∃α1, α2, · · · , αn ∈W for which

α =n∑

i=1

aiαi, for some scalars ai ∈ F

⇒ φ(α) =n∑

i=1

aiφ(αi) =n∑

i=1

ai0 = 0.

Since α is an arbitrary element of L(W ), φ annihilates L(W ), i.e., W 0 = (L(W ))0 .

Theorem 5.8.9 If W is a subspace of a finite dimensional vector space V (F ), then W ∗ ∼=(V ∗/W 0).

Proof: Here,

dim(V ∗/W 0) = dimV ∗ − dimW 0

= dimV − dimW 0; as dimV ∗ = dimV

= dimV − (dimV − dimW ) = dimW = dimW ∗.

Hence by theorem of isomorphism, we have, W ∗ ∼= (V ∗/W 0).

Theorem 5.8.10 For any subset W of V , (i) W ⊂W 00 and (ii) W1 ⊂W2 ⇒W 02 ⊂W 0

1 .

Proof: (i) Let α ∈W , then for every linear functional φ ∈W 0, we have

α(φ) = φ(α) = 0 ⇒ α ∈ (W 0)0.

Under the identification of V and V ∗∗, we have α ∈W 00. Hence W ⊂W 00.(ii) Let φ ∈W 0

2 , then φ(α) = 0,∀α ∈W2. But W1 ⊂W2 hence φ annihilates every elementof W1, i.e., φ ∈W1. Hence W 0

2 ⊂W 01 .

Theorem 5.8.11 Let S and T be subspaces of a finite dimensional vector space V (F ), then(S + T )0 = S0 ∩ T 0.

Proof: Let φ ∈ (S + T )0, i.e., φ(α) = 0;∀α ∈ S + T, then φ annihilates S + T and so,in particular φ annihilates S and T . Let α ∈ S + T, then α = s + t, where s ∈ S andt ∈ T . Clearly, s ∈ S ⇒ s ∈ S + T, and t ∈ T ⇒ t ∈ S + T, so, φ(s) = 0,∀s ∈ S andφ(t) = 0,∀t ∈ T. Therefore,

φ ∈ (S + T )0 ⇒ φ ∈ S0 and φ ∈ T 0

⇒ φ ∈ S0 ∩ T 0 ⇒ (S + T )0 ⊆ S0 ∩ T 0.

Again let ψ ∈ S0 ∩ T 0, i.e., ψ ∈ S0 and ψ ∈ T 0, then ψ annihilates S and T . Therefore,

ψ(α) = ψ(s) + ψ(t) = 0 + 0 = 0.

Thus ψ annihilates S + T , i.e., ψ ∈ (S + T )0, i.e., S0 + T 0 ⊂ (S + T )0. Similarly, let S andT be subspaces of a finite dimensional vector space V (F ), then (S ∩ T )0 = S0 + T 0.

Theorem 5.8.12 For any subset S of a vector space V (F ), L(S) = S00.

Proof: We have S0 = [L(S)]0. Now, S0 being a subspace of V ∗, it follows that [L(S)]0 isalso a subspace and therefore [L(S)]00 = L(S). Thus,

S0 = [L(S)]0 ⇔ S00 = [L(S)]00 ⇔ S00 = [L(S)].

Therefore, for any subset S of a vector space V (F ), L(S) = S00.


Ex 5.8.9 Let W be a subspace of <4 spanned by (1, 2,−3, 4), (1, 3,−2, 6) and (1, 4,−1, 8).Find a basis of the annihilator of W .

Solution: Let α1 = (1, 2,−3, 4), α2 = (1, 3,−2, 6) and α3 = (1, 4,−1, 8) and S =α1, α2, α3. Let α ∈ L(S) then, it suffices to find a basis of the set of linear function-als

φ(α) = φ(x, y, z, w) = ax+ by + cz + dw

for which φ(α1) = 0 = φ(α2) = φ(α3). Thus,

φ(1, 2,−3, 4) = a+ 2b− 3c+ 4d = 0φ(1, 3,−2, 6) = a+ 3b− 2c+ 6d = 0φ(1, 4,−1, 8) = a+ 4b− c+ 8d = 0.

The system of three equations in unknowns a, b, c, d is in echelon form b + c + 2d = 0with free variable a. Let c = 0, d = 1, then b = −2 and hence the linear functionalφ1(x, y, z.w) = −2y + w. Let c = 1, d = 0 then b = −1, a = 5 and φ2 = 5x− y + z. The setof linear functionals φ1, φ2 is LI and so is basis of W 0, the annihilator of W .

5.9 Transpose of a Linear Mapping

Let U and V be two vector spaces over the same field F . Let T : U → V be an arbitrarylinear mapping from a vector space U into a vector space V . Now, for any linear functionalφ ∈ U∗, the composite mapping φ0T is linear from U to F , so that φ0T ∈ ψ∗. Then themapping T ∗ : U∗ → V ∗, defined by

[T ∗(φ)](α) = φ[T (α)]; ∀φ ∈ U∗ and α ∈ U (5.12)

is called the adjoint or transpose of the linear transformation T .

Ex 5.9.1 Let φ be a linear functional on <2, defined by φ(x, y) = 3x − 2y. For the linearmapping T : <3 → <2, defined by T (x, y, z) = (x+ y + z, 2x− y), find [T ∗(φ)](x, y, z).

Solution: Using the definition of the transpose mapping, we have T ∗(φ) = φ0T, i.e., wehave,

[T ∗(φ)](α) = φ[T (α)]⇒ [T ∗(φ)](x, y, z) = φ[T (x, y, z)] = φ(x+ y + z, 2x− y)

= 3(x+ y + z)− 2(2x− y) = −x+ 5y + 3z.

Theorem 5.9.1 The adjoint T ∗ of a linear transformation T is also linear.

Proof: Let U and V be two vector spaces over the same field F and T : U → V be a lineartransformation. Now, ∀φ, ψ ∈ U∗, a, b ∈ F and α ∈ U , we have,

[T ∗(aφ+ bψ)](α) = (aφ+ bψ)T (α)= aφ[T (α)] + bψ[T (α)]; as φ, ψ are linear= a[T ∗(φ)](α) + b[T ∗(φ)](α)= [aT ∗(φ) + bT ∗(ψ)](α)

⇒ T ∗(aφ+ bψ) = aT ∗(φ) + bT ∗(ψ).

Thus, the adjoint T ∗ of a linear transformation T is also linear.

Transpose of a Linear Mapping 355

Theorem 5.9.2 Let U and V be two vector spaces over the same field F and T : U → Vbe a linear transformation, then ker(T ∗) = [Im(T )]∗.

Proof: T ∗ be the adjoint linear transformation, for the linear transformation T : U → V .Let φ ∈ ker(T ∗), then,

φ ∈ ker(T ∗) ⇔ T ∗(φ) = 0 ⇔ φ[T (α)] = 0; ∀α ∈ U⇔ φ[T (α)] = 0; ∀T (α) ∈ Im(T )⇔ φ ∈ [Im(T )]∗ ⇒ ker(T ∗) = [Im(T )]∗.

Hence the theorem. Similarly, if T : U → V is linear and U is a finite dimension, then(kerT )∗ = Im(T ∗). Thus the null space of T ∗ is the annihilator of the range of T .

Theorem 5.9.3 Let U and V be two vector spaces over the same field F and T : U → Vbe a linear transformation, then rank(T ) = rank(T ∗).

Proof: Since U and V has finite dimension, so

dimU = dim[Im(T )] + dim[Im(T )]∗

= dim[Im(T )] + dim[ker(T ∗)] = ρ(T ) + γ(T ∗).

But, T ∗ being the linear transformation from U∗ into V ∗, so, we have,

dimU∗ = ρ(T ∗) + γ(T ∗)⇒ dimU = ρ(T ∗) + γ(T ∗); as dimU = dimU∗

⇒ ρ(T ) = ρ(T ∗); i.e., rank(T ) = rank(T ∗).

Let N be the null space of T . Every functional in the range of T ∗ is in the annihilator ofN , for suppose ψ = T ∗φ for some φ ∈W ∗, then for α ∈ N , we have,

ψ(α) = (T ∗φ)(α) = φ(T (α)) = φ(0) = 0.

Now, the range of T ∗ is a subspace of the space N0, and

dimN0 = n− dimN = rank(T ) = rank(T ∗)

so that the range T ∗ must be exactly N0. Therefore, the range of T ∗ is the annihilator ofthe null space of T .

Exercise 5


1. Let T : <2 → <3 be a linear transformation given by T (x1, x2) = (x1 +x2, x1−x2, x2),then rank T is

(a) 0 (b) 1 (c) 2 (d) 3.

2. The rank and nullity of T , where T is a linear transformation from <2 → <2 definedby T (a, b) = (a− b, b− a,−a), are respectively(a) (1,1) (b) (2,0) (c) (0,2) (d) (2,1)

3. Which of the following is not a linear transformation?(a) T : <2 → <2 : T (x, y) = (2x − y, x) (b) T : <2 → <3 : T (x, y) = (x + y, y, x)(c) T : <3 → <3 : T (x, y, z) = (x+ y + z, 1,−1) (d) T : < → <2 : T (x) = (2x,−x)


4. Let T : <2 → <2 be the linear transformation such that T ((1, 2)) = (2, 3) andT ((0, 1)) = (1, 4). Then T ((5, 6)) is [IIT-JAM’10](a) (6,-1) (b) (-6,1) (c) (-1,6) (d) (1,-6)

5. Let T1 and T2 be linear operators on <2 defined as follows: T1(a, b) = (b, a), T2(a, b) =(0, b). Then T1T2 defined by T1T2(a, b) = T1(T2(a, b)) maps (1,2) into(a) (2,1) (b) (1,0) (c) (0,2) (d) (2,0)

6. Let T : <3 → <3 be the linear transformation whose matrix with respect to the

standard basis e1, e2, e3 of <3 is

0 0 10 1 01 0 0

. Then T [IIT-JAM’10]

(a) maps the subspace spanned by e1 and e2 onto itself (b)Has distinct eigen values (c)has eigen vectors that span <3 (d) has a non-zero null space.

7. Let T : <3 → <3 be the linear transformation whose matrix with respect to the

standard basis of <3 is

0 a b−a 0 c−b −c 0

, where a, b, c are real numbers not all zero. Then

T [IIT-JAM’10](a) is one-to-one (b)is onto (c) does not map any line through the origin onto itself(d) has rank 1.

8. For m 6= n, let T1 : <n → <m and T2 : <m → <n be linear transformation such thatT1T2 is bijective. If R(T ) is the rank of T , then [IIT-JAM’11](a) R(T1) = n and R(T2) = m (b) R(T1) = m and R(T2) = n (c)R(T1) = nand R(T2) = n (d) R(T1) = m and R(T2) = m

9. Let W be the vector space of all real polynomials of degree atmost 3. Define T : W →W by (Tp)(x) = p′(x), where p′ is the derivative of p. The matrix of T in the basis1, x, x2, x3 considered as column vectors, is given by [NET(June)11]

(a)

0 0 0 00 1 0 00 0 2 00 0 0 3

(b)

0 0 0 01 0 0 00 2 0 00 0 3 0

(c)

0 1 0 00 0 2 00 0 0 30 0 0 0

(d)

0 1 2 30 0 0 00 0 0 00 0 0 0

10. LetN be the vector space of all real polynomials of degree atmost 3. Define S : N → N

by (Sp)(x) = p(x+1), p ∈ N . Then the matrix of S in the basis 1, x, x2, x3 consideredas column vectors, is given by [NET(June)12]

(a)

1 0 0 00 2 0 00 0 3 00 0 0 4

(b)

1 1 1 11 1 2 30 0 1 30 0 0 1

(c)

1 1 2 31 1 2 32 2 2 33 3 3 3

(d)

0 0 0 01 0 0 00 1 0 00 0 1 0

11. Let T be a linear transformation on the real vector space <n over < such that T 2 = λT

for some λ ∈ <. Then [NET(June)11]

(a) if ||Tx|| = |λ|||x|| for all x ∈ <n

(b) if ||Tx|| = ||x|| for some nonzero vector x ∈ <n the λ = ±1.

(c) T = λI, where I is the identity transformation on <n.

(d) if ||Tx|| > ||x|| for a nonzero vector x ∈ <n then T is necessarily singular.


12. For a positive integer n, let Pn denote the space of all polynomials p(x) with coefficientsin < such that degp(x) ≤ n, and let Bn2 denote the standard basis of Pn given byBn = 1, x, x2, · · · , xn. If T : P3 → P4 is the linear transformation defined by

T (P (x)) = x2p′(x) +∫ x

0

p(t) dt

and A = (aij) is the 5× 4 matrix of T with respect to standard basis B3 and B4, then[NET(Dec)11](a) a32 = 3

2 and a33 = 73 (b) a32 = 3

2 and a33 = 0 (c) a32 = 0 and a33 = 73 (d)

a32 = 0 and a33 = 0.

13. Consider the linear transformation T : <7 → <7 defined by T (x1, x2, · · · , x6, x7) =(x7, x6, · · · , x2, x1). Which of the following statements are true? [NET(Dec)11](a) The determinant of T is 1 (b) There is a basis of <7 with respect to which T is adiagonal matrix (c) T 7 = I (d) The smallest n such that Tn = I is even.

14. Let M2(<) denote the set of 2 × 2 real matrices. Let A ∈ M2(<) be of trace 2and determinant -3. Identifying M2(<) with <4, consider the linear transformationT : M2(<) →M2(<) defined by T (B) = AB. Then which of the following statementsare true? [NET(Dec)11](a) T is diagonizable (b) 2 is eigen value of T (c) T is invertible (d) T (B) = Bfor some 0 6= B in M2(<).

15. If U and V be vector spaces of dimension 4 and 6 respectively. Then dimhom(V,U) is(a) 4 (b) 6 (c) 10 (d) 24

16. Let V = f(x) ∈ R[x] : degf(x) ≤ 1 where R is the field of real numbers. Definee∗1, e

∗2 : V → R by

e∗1[f(x)] =∫ 1

0

f(x)dx and e∗2[f(x)] =∫ 2

0

f(x)dx.

Then, the basis of V whose dual basis e∗1, e∗2 is(a) 1 + x, 1− x (b) 2− 2x,− 1

2 + x (c) 2 + 2x, 12 − x (d) 1−x

2 , 1+x2 .

17. Let T be a linear transformation on a vector space V such that T 2 − T + I = 0, then(a) T is singular (b) T is non-singular (c) T is invertible (d) T is not invertible.

18. Let T be a linear transformation on <3 given by T (x, y, z) = (2x, 4x− y, 2x+ 3y− z),then(a) T is singular (b) T is non-singular (c) T is invertible (d) T−1(x, y, z) =(

x2 , 2x− y, 7x− 3y − z

).


1. Test whether the following mappings are linear or not.

(a) T : R2 → R2 defined by T (x, y) = (x+ y, x).

(b) T : R3 → R defined by T (x, y, z) = x+ y + z.

(c) T : R2 → R2 defined by T (x, y) = (2x+ 1, 2y − 1).

(d) T : R3 → R3 defined by T (x, y, z) = (3x− 2y + 3z, y, 2x+ y − z).


(e) T : R3 → R2 defined by T (x, y, z) = (xy, x+ y).

(f) T : R3 → R3 defined by T (x, y, z) = (3x, 2y, z).

(g) T : R4 → R2 defined by T (x, y, z, t) = (x+ y, z + t).

(h) T : R3 → R2 defined by T (x, y, z) = (|x− y|, |y|).(i) T : R3 → R3 defined by T (x, y, z) = (xy, yz, zx).

(j) T : R2 → R3 defined by T (x, y) = (x, x− y, x+ y).

(k) T : V → V defined by T (u(t)) = au(t) + bu(t) + cu(t), where V is the vectorspace of functions having derivatives of all order and a, b, c are arbitrary scalars.

(l) T : P4(x) → P4(x) defined by T (p(x)) = xp′(x) + p(x), where P4(x) is the set ofall polynomials of degree 4.

(m) T : V2×2 → P3(x) defined by T(a bc d

)= (a− b)+(b− c)x+(c−d)x2 +(d−a)x3.

(n) T : P2(x) → P3(x) defined by T (p(x)) = p(x) + 51∫0

p(t) dt.

2. Let V be the vector space of all n× n matrices over the field F , and let B be a fixedn × n matrix. If T (A) = AB − BA, verify that T is a linear transformation from Vinto V .

3. Show that the transformation map T : <2 → <2, defined by T (x, y) = (x+ 4, y− 3) isnot a linear transformation.

4. Show that a linear transformation T : U → V is injective if and only if Kernal of T isθ.

5. Let T : <2 → <2 be the linear transformation which rotates each vector α ∈ <2 by anangle π

4 . Show that T has no eigen vectors.


1. (a) Show that T : <2 → <2 defined by T (x, y) = (2x+ y, x) is a linear mapping.

(b) Let Mmn(F ) be a vector space over the field F and let [aij ] be a m × n fixedmatrix over T : Mmn →Mmn by

T [bij ] = [aij ].[bij ]; ∀ [bij ] ∈Mmn.

Prove that T is a linear transformation.

(c) Let F 3 and F 2 be two vector space over the same field F . T : F 3 → F 2 defineby

T (a, b, c) = (a, b); ∀ (a, b, c) ∈ F 3

Prove that T is linear.

(d) Let P be a fixed m × n matrix with entries in the field F and let Q be a fixedn× n matrix over F . Prove that T : Fm×n → Fm×n defined by T (A) = PAQ isa linear transformation.

(e) Show that the translation map T : <2 → <2 defined by T (x, y) = (x + 4, y − 3)is not a linear transformation. [ BH‘07]

2. Let T : U → V be a linear transformation. Show that the general solution of T (x) = yis the sum of the general solution of T (x) = 0 and a particular solution of T (x) = y.


3. (a) A mapping T : <3 → <3, defined by, T (x, y, z) = (x + 2y + 3z, 3x + 2y + z, x +y + z); (x, y, z) ∈ <3. Show that T is linear. Find kerT and dimension of kerT .

(b) Show that T : <3 → <3, defined by T (x, y, z) = (x + y, y + z, z + x) is a lineartransformation. Determine dimension of kerT and ImT .

(c) Find a basis and dimension of kerT , where the linear mapping T : <3 → <2 isdefined by T (x, y, z) = (x+ y, y + z).

(d) Let V be the vector space over < of all polynomials of degree at most 6. LetT : V → V be given by T (p(x)) = p′(x). Determine the rank and nullity of themapping T . [JU(M.Sc.)‘06]

4. (a) Let T : R2 → R where T (1, 1) = 3 and T (0, 1) = −2. Find T (x, y).

(b) Let S : R3 → R where T (0, 1,−2) = 1, T (0, 0, 1) = −2 and T (1, 1, 1) = 3. FindT (x, y, z).

(c) If T : R2 → R3 is defined by T (1, 2) = (3,−1, 5), T (0, 1) = (2, 1,−1). Show thatT (x, y) = (−x+ 2y,−3x+ y, 7x− y).

5. (a) Let (1, 1,−1), (4, 1, 1), (1,−1, 2) be a basis of R3 and let T : R3 → R2 be the lin-ear transformation such that T (1, 1,−1) = (1, 0), T (4, 1, 1) = (0, 1), T (1,−1, 2) =(1, 1). Find T (x, y, z).

(b) Determine the linear mapping T : R2 → R2 which maps the basis vectors(1, 1), (0, 1) of R2 to the vectors (2, 0), (1, 0) respectively.

(c) Find a linear transformation S : R3 → R2 which maps the vectors (1, 1, 1),(1, 1, 0), (1, 0, 0) to (2, 1), (2, 1), (2, 1) respectively.

(d) Find a linear transformation T : R4 → R3 which transforms the elementaryvectors (1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 1, 0), (0, 0, 0, 1) to (1, 2, 3), (1, 1, 2), (1, 2, 2) and(2, 1, 3) respectively.

(e) Show that the linear transformation T : R3 → R3 which transforms the vectors(3,−1,−2), (1, 1, 0), (−2, 0, 2) to twice the elementary vectors 2(1, 0, 0), 2(0, 1, 0),2(0, 0, 1) is (x− y + z, x+ y + z, x− y + 2z).

6. Determine ker(T ) and nullity T when T is given by(a) T (x, y, z) = (x− y, y − z, z − x)(b) T (x, y, z) = x+ y − z(c) T (x, y) = (cosx, sin y).

7. Let F be a subfield of the complex numbers and let T be the function from F 3 intoF 3 defined by

T (x, y, z) = (x− y + 2z, 2x+ y,−x− 2y + 2z).

(a) Verify that T is a linear transformation.

(b) If (a, b, c) is a vector in F 3, what are the condition on a, b and c that the vectorbe in the range of T? What is the rank of T?

(c) What are the conditions on a, b and c that (a, b, c) be in null space of T? Whatis the nullity of T?

(d) Let T be a linear operator defined on a finite dimensional vector space V . Ifrank(T 2) = rank(T ), find R(T ) ∩ N(T ), where R(T ), N(T ) denote respectivelythe range and the null space of T . [Gate’97]


(e) Let V be the vector space of square matrices of order n over F . Let T : V → Fbe a trace mapping T (A) = a11 + a22 + · · ·+ ann, where A = [aij ]. Prove that Tis linear on V . Find nullity and Im(T ). Also verify disession theorem.

8. Prove that T defined below is a linear transformation

(a) T : <3 → <2 defined by T (x, y, z) = (x− y, 2z)

(b) T : <2 → <2 defined by T (x, y) = (x+ y, 0, 2x− y)

(c) T : M2×3(F ) →M2×2 defined by

T

(a11 a12 a13

a21 a22 a23

)=(

2a11 − a12 a13 + 2a12

0 0

)(d) T : P2(<) → P3(<) defined by T (f(x)) = xf(x) + f ′(x).

Find both Nullity and dimension of T and verify dimension theorem.

9. (a) If T : <2 → <2 is given by T (x, y) = (x, 0) for all (x, y) ∈ <2. Show that T is alinear transformation and verify that

dim ker(T ) + dim im(T ) = 2. [WBUT 2004]

(b) If T : <3 → <3 is defined as T (x, y, z) = (x − y, y − z, z − x). Show that T isa linear transformation and verify Sylvester’s law, viz., rank of T + nullity ofT = 3.

10. If A ∈ <m×n and B ∈ <n×m have a common eigenvalue λ ∈ < show that the linearoperator T : <m×n → <m×n, defined by T (X) = AX −XB is singular. [Gate’98]

11. Let V and W be finite dimensional vector spaces and let T : V → W be a lineartransformation and u1, u2, · · · , un be a subset of V such that Tu1, Tu2, · · · , Tunis LI in W . Show that u1, u2, · · · , un is LI in V . Deduce that, if T is onto, thendimV ≥ dimW. [Gate’99]

12. (a) Find a linear transformation T : <4 → <4 for which the null space is spanned by(2,−2, 1,−2), (−3, 4, 3, 1) and range space by (3, 2, 1, 0), (0, 1, 2, 3).

(b) Find a linear transformation whose kernal is spanned by (1, 2, 3, 4) and (0, 1, 1, 1).

(c) Find a linear transformation T : <4 → <3 for which the null space is spannedby (2, 1,−1,−2) and image space by (cos θ, sin θ, 0), (− sin θ, cos θ, 0) and (0, 0, 1),where θ is an arbitrary real number.

13. (a) A linear mapping T : <3 → <3 maps the vectors (2, 1, 1), (1, 2, 1) and (1,1,2) to(1, 1,−1), (1,−1, 1) and (1, 0, 0) respectively. Show that T is not an isomorphism.

(b) A linear mapping S : <3 → <3 maps the vectors (0,1,1), (1, 0, 1) and (1,1,0) to(2, 1, 1), (1, 2, 1) and (1, 1, 2) respectively. Show that S is not an isomorphism.

(c) A linear mapping T : <3 → <3 maps the basis vectors α, β, γ to α + β, β + γ, γrespectively. Show that T is an isomorphism.

14. Let S and T be linear mappings of <3 to <3 defined byS(x, y, z) = (z, y, x); (x, y, z) ∈ <3 andT (x, y, z) = (x+ y + z, y + z, z); (x, y, z) ∈ <3.

Determine TS and ST . Prove that both S and T are invertible, verify that (ST )−1 =T−1S−1.


15. Consider the basis S = α1, α2, α3 for <3 where α1 = (1, 1, 1), α2 = (1, 1, 0) and α3 =(1, 0, 0) and T : <3 → <2 be a linear transformation, such that T (α1) = (1, 0), T (α2) =(2,−1) and T (α3) = (4, 3). Find T (2,−3, 5). [Gate’2k]

16. Let T : V → V be a linear transformation on a vector space V over the field Ksatisfying the property Tx = θ ⇒ x = θ. If x1, x2, · · · , xn are linearly independentelements in V , show that Tx1, Tx2, · · · , Txn are also linearly independent. [Gate’01]

17. Let T : <3 → <3 be a linear transformation defined by T (x, y, z) = (x+y, y−z). Thenfind the matrix of T with respect to the ordered bases ((1, 1, 1), (1,−1, 0), (0, 1, 0)) and((1, 1), (1, 0)). [Gate’03]

18. Let V be the vector space of polynomials in t over <. Let I : V → < be a mapping

defined by I[p(t)] =1∫0

p(t)dt. Prove that I is linear functional on V .

19. (a) Show that the following mapping T : <3 → <3 be defined by T (x1, x2, x3) =(2x1 + x2 + 3x3, 3x1 − x2 + x3,−4x1 + 3x2 + x3) where x1, x2, x3 ∈ < is linear.Find rank of T . [CH‘05, ‘00]

(b) Let T : <3 → <3 be defined by T (x1, x2, x3) = (x1 + x2, x2 + x3, x3 + x1) wherex1, x2, x3 ∈ <. Show that T is a linear map. Determine the dimension of kerTand ImT . [CH‘99]

(c) Let T : <3 → <4 be a linear transformation defined by T (x1, x2, x3) = (x2 +x3, x3 + x1, x1 + x2, x1 + x2 + x3) where x1, x2, x3 ∈ <. Find kerT . Whatconclusion can you draw regarding the linear dependence and independence ofthe image set of the set of vectors (1, 0, 0), (0, 1, 0), (0, 0, 1). CH‘04

(d) Determine the linear transformation T : <3 → <4 that maps the vectors (1, 2, 3), (1, 3, 2), (2, 3, 1)of <3 to the vector (0, 1, 1, 1), (1, 0, 1, 1) and (1, 1, 0, 1) respectively. Find kerTand rank of T . [CH‘06]

(e) Let a linear transformation T : <4 → <2 be defined by T (x1, x2, x3, x4) = (3x1−2x2−x3−4x4, x1 +x2−2x3−3x4) where x1, x2, x3, x4 ∈ <. Find rank T , nullityT and the basis of kerT . [CH‘02]

(f) Let V be a vector space of 2 × 2 matrices over <. Let T : V → V be the linear

mapping defined by T (A) = AM −MA, where M =(

1 20 3

). Find a basis of

kerT and the dimension of it. [CH‘04]

20. (a) Let T : <3 → <3 be defined by T (x1, x2, x3) = (x1 + x2, x2 + x3, x3 + x1) wherex1, x2, x3 ∈ <. Show that T is a linear map. Find the matrix associated with itwith respect to the standard ordered basis of <3. [CH‘05, ‘01]

(b) Let T : <3 → <3 be a linear transformation defined by T (x1, x2, x3) = (x1 −x2, x1 + 2x2, x2 + 3x3) where x1, x2, x3 ∈ <. Find the matrix representation of Twith respect to the ordered bases (1, 0, 0), (0, 1, 0), (0, 0, 1) and (1, 1, 0), (1, 0, 1), (0, 1, 1).[CH‘05, ‘99]

(c) A linear transformation T : <3 → <3 transforms the vectors (1, 0, 0), (1, 1, 0),(1, 1, 1) to the vectors (1, 3, 2), (3, 4, 0) and (2, 1, 3) respectively. Find T and thematrix representation of T relative to the standard basis of <3. [CH‘07]

21. A linear mapping f : <3 → <3 maps the vectors (2, 1, 1), (1, 2, 1), (1, 1, 2) to (1, 1,−1),(1,−1, 1), (1, 0, 0) respectively. Examine, whether f is an isomorphism. [CH‘03]


22. Find the linear transformation T on <3 which maps the basis (1, 0, 0), (0, 1, 0), (0, 0, 1)to (1, 1, 1), (0, 1,−1) and (1, 2, 0) respectively. Find the images of (1, 1,−1) and(2, 2,−2) under T and hence show that T is one-one. [ BH‘03]

23. If α1 = (1,−1), α2 = (2,−1), α3 = (−3, 2) β1 = (1, 0), β2 = (0, 1), β3 = (1, 1) is therea linear transformation T from R2 into R2 such that T (αi) = βi.

24. (a) Let a linear mapping T : <3 → <3 defined by T (a, b, c) = (a+ b+ c, 2b+ 2c, 3c).Find the matrix T with respect to the standard basis of <3. Using this matrixof T with respect to the ordered basis C = (1, 0, 0), (1, 1, 0), (3, 4, 2). Hencecomment on the nature of the elements of C. [BU(M.Sc.)‘02]

(b) Prove that the transformation T : <3 → <2 be defined by T (x1, x2, x3) = (3x1 −2x2 + x3, x1 − 3x2 − 2x3) where x1, x2, x3 ∈ < is a linear transformation. Findthe matrix of T with respect to the ordered bases (1, 0, 0), (0, 0, 1), (0, 1, 0) and(0, 1), (1, 0) of <3 and <2 respectively. [BH(M.Sc)‘99, ‘98]

25. The matrix representation of a linear mapping T : <3 → <2 relative to the ordered

bases (0, 1, 1), (1, 0, 1), (1, 1, 0) of <3 and (1, 0), (1, 1) of <2 is(

1 2 42 1 0

). Find T .

Also, determine the rank of T . CH‘02

26. Find the matrix of the linear transformation T in a real vector space of dimension 2defined by

T (x, y) = (2x− 3y, x+ y)with respect to the ordered basis (1, 0), (0, 1) and also determine whether the trans-formation T is non-singular. [BH‘04]

27. The matrix representation of a linear transformation T : <3 → <3 is

1 1 2−1 2 10 1 3

relative to the standard basis of <3. Find the explicit representation of T and thematrix representation of T relative to the ordered basis

(1, 1, 1), (0, 1, 1), (0, 0, 1). [CH‘07, ‘04, ‘98]

28. A linear transformation T : <3 → <3 is defined by T (x, y, z) = (x+ 3y + 3z, 2x+ y +3z, 2x+2y). Determine the matrix of T , relative to the ordered basis (2, 1, 1), (1, 2, 1),(1, 1, 2) of <3. Is T invertible? If so determine the matrix of T−1 relative to the samebasis. [CH‘06]

29. T : <4 → <3 is linear and is such thatT (x, y, z, w) = (x+ y + z + w, 5x+ 7y + z + w, 4x+ 6y).

Determine that matrix of T relative to the ordered bases (1, 1, 0, 0), (1, 0, 1, 0), (1, 1, 1, 0),(1, 1, 1, 1) of <4 and (1, 2, 1), (2, 1, 1), (1, 1, 2) of <3. [CH‘10]

30. (a) The matrix of a linear mapping T : <3 → <3 with respect to the ordered basis

(0, 1, 1), (1, 0, 1), (1, 1, 0) of <3 is given by,

0 3 02 3 −22 −1 2

. Determine the matrix

of T , relative to the ordered basis (2, 1, 1), (1, 2, 1), (1, 1, 2) of <3. Is T invertible?If so determine the matrix of T−1 relative to the same basis. [CH‘03]

(b) Prove that no linear transformation from of <3 to <4 is invertible. [CH: 10]


(c) Show that the following mapping f : <3 → <3 defined by f(x, y, z) = (−3x +3y − 2z, 6y − 3z, x − y + 2z) for all (x, y, z) ∈ <3 is linear. Is f non-singular?Justify your answer. Find the matrix of the above linear mapping f relative tothe ordered basis (1, 0, 0), (0, 1, 0), (0, 0, 1). CH‘97

(d) Show that the following transformation T is one-to-one. Find the left inverse ofT , where T (x, y, z) = (x+ y + z, x, y − z).

(e) Let T : <3 → <3 defined by T (x, y, z) = (x − y, x + 2y, y + 3z). Show that T isinvertible and determine T−1.

(f) Let T : <3 → <3 defined by T (x, y, z) = (3x + y − 2z,−x + y,−2x + 2z) isinvertible and determine T−1.

31. (a) The linear transformation T on <3 maps the basis vector (1, 0, 0), (0, 1, 0), (0, 0, 1)to (1, 1, 1), (0, 1,−1) and (1, 2, 0) respectively. Find T (1,−1, 1). BH‘02

(b) The linear transformation T on <3 maps the basis vector (1, 0, 0), (0, 1, 0), (0, 0, 1)to (2, 2, 2), (0,−1, 1) and (1, 3, 0) respectively. Find T (2, 1,−1) and T (2, 2,−2). [BH‘04]

32. Let V be a vector space and T a linear transformation from V into V . Prove thefollowing statements are equivalent

(a) The intersection of the range of T and the null space of T is a zero subspace ofV .

(b) If T (T (α)) = θ, then T (α) = θ.

33. Let V be the space of n× 1 matrices over F and let W be the space of m× 1 matricesover F and let T be the LT from V into W defined by T (X) = AX. Prove that T isthe zero transformation iff A is the zero matrix.

34. Let T : R3 → R2 be the LT defined by T (x, y, z) = (x,−y, 2z). Then prove thatN(T ) = (a, a, 0) : a ∈ R and R(T ) = R2. Also, prove that the mapping is notinjective (i.e. nullity(T ) = 1)

35. Describe explicitly a linear transformation from R3 into R3 which has an its range thesubspace spanned by (1, 0,−1) and (1, 2, 2).

36. Let V be the vector space of all real polynomials p(x). Let D and T be linear mappingsof V of V defined by

D(p(x)) = ddx (p(x)), p(x) ∈ V and T (p(x)) =

x∫0

p(t) dt, p(x) ∈ V .

(a) Show that DT = IV but TD 6= IV .(b) Find the null space of TD.

37. Let V be the linear space of all real polynomials p(x). Let D and T be linear mappingson V defined by D(p(x)) = d

dx (p(x)), p(x) ∈ V and T (p(x)) = xp(x), p(x) ∈ V .(a) Show that DT − TD = IV(b) DT 2 − T 2D = 2T .

38. Let S and T be linear mappings of R3 to R3 defined by S(x, y, z) = (z, y, x) andT (x, y, z) = (x, x+ y, x+ y + z), (x, y, z) ∈ R3.(a) Determine ST and TS(b) Prove that both S and T are invertible. Verify that (ST )−1 = T−1S−1.


39. A linear transformation T is given below. Find another linear transformation S suchthat ST = TS = IV .(a) T (x, y) = (−y, 3x+ 5y)(b) T (x, y, z) = (−x− 2y + z, x+ y, x).

40. Find the composite transformation T1T2T3 when T1(x, y, z) = (x−y, 0, x+y), T2(x, y, z) =(x+ y, x+ y + z), T3(x, y, z) = (x, y − x, x− y + z).

41. Let V be a vector space over a field F and S, T be the linear mappings on V . IfST − TS = IV , prove that (a) ST 2 − T 2S = 2T, (b) STn − TnS = nTn−1, for n ≥ 2.

42. Let α1, α2, α3 and β1, β2, β3 be ordered bases of the real vector spaces V and Wrespectively. A linear mapping T : V →W maps the basis vector of V as T (α1) = β1,T (α2) = β1 + β2, T (α3) = β1 + β2 + β3. Show that T is non-singular transformationand find the matrix of T−1 relative to the above ordered bases. [ CH: 10]

Chapter 6

Inner Product Space

In the earlier chapter we have studied concepts like subspace dimension, linear dependence,independence, linear transformations and their representations, which are valid over anyfinite field F . Here we shall discuss the definition and the properties of the inner product onan arbitrary vector space V , particularly we confine ourselves to vector spaces over the fieldF , where F is either the real field < or the complex field C. Fortunately, there is a singleconcept known in Physics as scalar product or dot product of two vectors which covers boththe concepts of length and angle.

6.1 Inner Product Space

In linear algebra, scalar product is usually called it inner product.

6.1.1 Euclidean Spaces

Let V (<) be a real vector space. A real inner product or dot product or scalar product ofvectors of V is a mapping f : V ×V → <, that assigns to each order pair of vectors (α, β) ofV a real number f(α, β), generally denoted by α.β or 〈α, β〉, satisfying the following axioms:

(i) Symmetry: 〈α, β〉 = 〈β, α〉; ∀ α, β ∈ V

(ii) Linearity: 〈α, β + γ〉 = 〈α, β〉+ 〈α, γ〉; ∀ α, β, γ ∈ V

(iii) Homogeneity: 〈cα, β〉 = c〈α, β〉 = 〈α, cβ〉; ∀ α, β ∈ V and c ∈ <

(iv) Positivity: 〈α, α〉 > 0 if α(6= θ) ∈ V, and 〈α, α〉 = 0 iff α = θ.

A real vector space in an inner product define in it, is known as an Euclidean inner productspace. Thus an inner product is Euclidean space if it is a positive definite symmetric bilinearform. The properties (i) and (iii) simultaneously can be written as

〈aα+ bβ, γ〉 = a〈α, γ〉+ b〈β, γ〉.

This property states that an inner product function is linear in the first position. Similarly,the inner product function

〈α, aβ + bγ〉 = a〈α, β〉+ b〈β, γ〉

is also linear in its second position. Thus an inner product of linear combinations of vectorsis equal to a linear combination of the inner products of the vectors. If V is an inner productspace, then by the dimension of V we mean the dimension of V as real vector space, and aset W is a basis for V if W is a basis for the real vector space V .

365

366 Inner Product Space

Ex 6.1.1 Show that 〈α, β〉 = 2x1y1 − x1y2 − x2y1 + 3x2y2 is an inner product in <2, whereα = (x1, x2) and β = (y1, y2).

Solution: Here, the given relation of 〈α, β〉, where α = (x1, x2), β = (y1, y2) can be writtenas

〈α, β〉 = βTAα, where A =[

2 −1−1 3

].

Since A is real symmetric matrix, so 〈α, β〉 = 〈β, α〉 holds. Conditions (ii) and (iii) areobvious. Now, 〈α, α〉 = βTAα = 2x2

1 − 2x1x2 + 3x22

= 2[x1 −

x2

2

]2+

52x2

2 > 0.

Therefore, all the conditions of the definition of inner product is satisfied, hence, the given〈α, β〉 is an inner product in <2. Alternatively, we see that, the diagonal elements 2,3 of Aare positive and detA = 8 is positive, so that A is positive definite. Thus, 〈α, β〉 is an innerproduct.

Ex 6.1.2 Find the values of k so that 〈α, β〉 = x1y1 − 3x1y2 − 3x2y1 + kx2y2 is an innerproduct in <2, where α = (x1, x2) and β = (y1, y2).

Solution: From the positive definite property of the definition of inner product, we have〈α, α〉 > 0 for α 6= θ. Hence,

〈α, α〉 = x21 − 6x1x2 + kx2

2 > 0.This relation holds only if (−6)2 − 4.1.k < 0, i.e., k > 9. Alternatively, the inner productcan be written in the form

〈α, β〉 = βTAα, where A =[

1 −3−3 k

].

Now, 〈α, β〉 is an inner product if A is positive definite, i.e.,k > 0 and k − 9 > 0 ⇒ k > 9.

6.1.2 Unitary Space

Let V (C) be a vector space over the complex field C. A complex inner product is a mappingf : V × V → C, that assigns each ordered pair of vector (α, β) of V a complex numberf(α, β), generally denoted by 〈α, β〉, satisfying the following properties:

(i) Conjugate symmetric property: 〈α, β〉 = 〈β, α〉; where 〈β, α〉 = complexconjugate of the complex number 〈β, α〉.

(ii) Linear property: 〈cα+ dβ, γ〉 = c〈α, γ〉+ d〈β, γ〉; ∀α, β, γ ∈ V ; c, d ∈ C

(iii) Positive definite property: 〈α, α〉 > 0 ∀α(6= θ) ∈ V and 〈θ, θ〉 = 0.

As 〈α, α〉 = 〈α, α〉 so 〈α, α〉 is a real number. A complex vector space V together with acomplex inner product defined on it is said to be complex inner product space or unitaryspace. Thus the inner product in unitary space is a positive definite Hermitian form.

Deduction 6.1.1 Using the property (i) and (ii), we obtain,

〈α, cβ〉 = 〈cβ, α〉 = c 〈β, α〉 = c 〈β, α〉 = c 〈α, β〉.

The inner product is conjugate linear in the second argument, i.e.,

〈α, cβ + dγ〉 = c〈α, β〉+ d〈α, γ〉; ∀α, β, γ ∈ V and ∀c, d ∈ C

Inner Product Space 367

Combining linear in the first position and conjugate linear in the second position, we obtain,by induction, ⟨∑

i

ciαi,∑

j

djβj

⟩=∑

i

∑j

ai bj 〈αi, βj〉. (6.1)

Note: A vector space with an associated inner product is called inner product space orpre-Hilbert space.

Ex 6.1.3 Show that Vn(<) is a Euclidean vector space.

Solution: Let α = (a1, a2, . . . , an) and β = (b1, b2 . . . , bn) ∈ Vn(<) where ai, bi ∈ <. Wedefine dot or standard inner product in Vn(<) by

〈α, β〉 = a1b1 + a2b2 + · · ·+ anbn.Now, (i) 〈α, β〉 = βTα = a1b1 + a2b2 + · · ·+ anbn

= b1a1 + b2a2 + · · ·+ bnan; (since ai, bi ∈ <)= 〈β, α〉; ∀ α, β ∈ Vn(<)

(ii) Let γ = (c1, c2, . . . , cn) ∈ Vn(<), then,

〈α, β + γ〉 = a1(b1 + c1) + a2(b2 + c2) + · · ·+ an(bn + cn)= (a1b1 + a2b2 + · · ·+ anbn) + (a1c1 + a2c2 + · · ·+ ancn)= 〈α, β〉+ 〈α, γ〉; ∀ α, β, γ ∈ Vn(<)

(iii) Let k ∈ <, then 〈kα, β〉 = ka1b1 + ka2b2 + · · ·+ kanbn

= k(a1b1 + a2b2 + · · ·+ anbn)= k〈α, β〉; ∀ α, β ∈ Vn(<)

Similarly, 〈α, kβ〉 = k〈α, β〉. Hence, 〈kα, β〉 = k〈α, β〉 = 〈α, kβ〉; ∀ α, β ∈ Vn(<).(iv) If α 6= θ then ai 6= 0; ∀ i and

〈α, α〉 = a12 + a2

2 + · · ·+ an2 > 0.

Hence all the four axioms of a Euclidean vector space are satisfied and hence Vn(<) isa Euclidean vector space. The inner product, defined above is known as standard innerproduct, in Vn(<). Similarly, the canonical inner product on Cn is 〈α, β〉 = βα.

Result 6.1.1 An inner product defined by 〈α, β〉 =∞∑

i=1

aibi converges absolutely for any

pair of points in V . This inner product defined as l2 space or Hilbert space.

Ex 6.1.4 Prove that the vector space C[a, b] of real valued continuous functions on [a, b] isan infinite dimensional Euclidean vector space, if

〈f, g〉 =∫ b

a

f(t)g(t)dt for f, g ∈ C[a, b].

Solution: Here we are to show that properties of the definition of inner product on V aresatisfied. Let f, g, h ∈ C[a, b] and k ∈ <. Then

(i) 〈f, g〉 =∫ b

a

f(t)g(t)dt =∫ b

a

g(t)f(t)dt

= 〈g, f〉 ;∀f, g ∈ C[a, b]


(ii) 〈f, g + h〉 =∫ b

a

f(t)[g(t) + h(t)]dt

=∫ b

a

f(t)g(t)dt+∫ b

a

f(t)h(t)dt

= 〈f, g〉+ 〈f, h〉 ;∀f, g, h ∈ C[a, b]

(iii) 〈kf, g〉 =∫ b

a

kf(t)g(t)dt = k

∫ b

a

f(t)g(t)dt

= k〈f, g〉 ; ∀ k ∈ <

(iv) 〈f, f〉 =∫ b

a

f2(t)dt > 0; if f(t) 6= 0 ∀ t ∈ [a, b]

= 0; if f(t) = 0 ∀ t ∈ [a, b]

Thus all the axioms of a Euclidean vector space are satisfied. Hence C[a, b] is an infinitedimensional Euclidean vector space. The vector space P (t) of all polynomials is a subspaceof C[a, b] for any interval [a, b], and hence the above is also an inner product of P (t).

Similarly if U denotes the vector space of complex continuous functions on the real [a, b],

then the inner product in U is defined by 〈f, g〉 =∫ b

a

f(t)g(t)dt.

Ex 6.1.5 If M is the set of all m× n matrices over R and〈A,B〉 = Tr(BTA) for A,B ∈M ,

prove that M is a Euclidean vector spaces.

Solution: Let A = [aij ]m×n, B = [bij ]m×n, C = [cij ]m×n ∈ M where aij , bij , cij ∈ < andλ ∈ <. Then, (i) 〈A,B〉 = Tr(BTA) =

n∑i=1

n∑k=1

bkiaki

=n∑

i=1

n∑k=1

akibki ; Since, aij , bij ∈ <

= Tr(ATB) = 〈B,A〉 ; ∀ A,B ∈M

(ii) 〈A,B + C〉 = Tr(B + C)TA =n∑

i=1

n∑k=1

(bki + cki)aki

=n∑

i=1

n∑k=1

bkiaki +n∑

i=1

n∑k=1

ckiaki

= Tr(BTA) + Tr(CTA) = 〈A,B〉+ 〈A,C〉; ∀ A,B,C ∈M

(iii) 〈λA,B〉 = Tr(BTλA) =n∑

i=1

n∑k=1

bki(λaki)

= λn∑

i=1

n∑k=1

bkiaki ; [since λ, aij , bij ∈ <]

= λTr(BTA) = λ〈A,B〉; ∀ A,B ∈M and λ ∈ <

(iv) 〈A,A〉 = Tr(ATA) =n∑

i=1

n∑k=1

akiaki

=n∑

i=1

n∑k=1

a2ki > 0; when A 6= 0

= 0; when A = 0

Norm 369

Hence M is a Euclidean space similarly if U denotes the vector space of m×n matrices overC, then the inner product in V is defined by

〈A,B〉 = Tr(B∗A),where, B∗ = conjugate transpose of the matrix B.

6.2 Norm

For any α ∈ V , a inner product space, the norm (or length or magnitude of α), denoted bythe non negative values ‖α‖, is defined by

‖α‖ =√〈α, α〉. (6.2)

This definition of length seems reasonable because at least we have ||α|| > 0, if α 6= θ. Thisdistance between two vectors α and β in the inner product space V is

d(α, β) = ||α− β|| =√〈α− β, α− β〉. (6.3)

A vector space together with a norm on it is called a normed vector space or normed linearspace.

Property 6.2.1 When c is a real or complex number,

‖cα‖ =√〈cα, cα〉 =

√|c|2〈α, α〉

= |c|√〈α, α〉 = |c|.‖α‖.

Property 6.2.2 If α 6= θ, then 〈α, α〉 > 0 and so ‖α‖ > 0 and if α = θ, then

〈α, α〉 = 〈θ, θ〉 = 0 ⇒ ‖α‖ = 0.

Therefore, ||α|| ≥ 0 and ||α|| = 0 if and only if α = θ.

Property 6.2.3 If α, β ∈ V , then the non negative real number ‖α− β‖ is called distancebetween α and β. If 〈α, α〉 = 1 i.e. ‖α‖ = 1, then α is called unit vector or is said to benormalized. Hence non zero vector α ∈ V can be normalized by us if α = 1

‖α‖α.

Ex 6.2.1 Find the norm of α, where α = (1− 2i, 3 + i, 2− 5i) ∈ C3 .

Solution: (i) Using the usual inner product in C3 we have,

〈α, α〉 = αα = 〈1− 2i, 3 + i, 2− 5i〉〈1− 2i, 3 + i, 2− 5i〉= 〈1− 2i, 3 + i, 2− 5i〉〈1 + 2i, 3− i, 2 + 5i〉= 5 + 10 + 29 = 44.

Hence ‖α‖ =√〈α, α〉 =

√44 = 2

√11.

Ex 6.2.2 Find the norm of A =(

1 23 −4

)in the space of 2× 2 matrices over <.

Solution: Let V be the vector space of 2 × 2 matrices over <. Then the inner product inV is defined by 〈A,B〉 = tr(BTA). Now,

〈A,A〉 = tr

(1 32 −4

)(1 23 −4

)= tr

[10 −10−10 20

]= 30.

Hence, ‖A‖ =√〈A,A〉 =

√30.


Ex 6.2.3 Consider f(t) = 3t−5 and g(t) = t2 in the polynomial P (t) with the inner product.Find 〈f, g〉, ||f ||, ||g||.

Solution: Using the definition of inner product,

〈f, g〉 =∫ 1

0

f(t) g(t) dt =∫ 1

0

(3t− 5)t dt = −1112.

According to the definition of norm,

||f ||2 = 〈f, f〉 =∫ 1

0

(3t− 5)2 dt = 13 ⇒ ||f || =√

13

||g||2 = 〈g, g〉 =∫ 1

0

(t2)2 dt =15⇒ ||g|| = 1√

5.

Cauchy Schwartz’s inequality in Euclidean space

Let α and βbe any two vectors in an Euclidean space V (F ), then |〈α, β〉| ≤ ‖α‖.‖β‖.Proof: Case I: Let one or both of α, β be θ, then both sides are zero and the equality signholds.Case II: Let us consider, two non null linearly dependent vectors α, β. Then ∃ c(6= 0) ∈ <such that α = cβ and hence ‖α‖ = |c|.‖β‖ and

〈α, β〉 = 〈cβ, β〉 = c〈β, β〉 = c‖β‖2

|〈α, β〉| = |c|‖β‖2 = ‖α‖.‖β‖

In this case, the equality sign holds.Case III: Let α, β be not linearly independent. Then α − cβ 6= θ for all real c. Then byusing the properties of norms,

〈α − cβ, α− cβ〉 > 0⇒ 〈α, α〉 − 2c〈α, β〉+ k2〈β, β〉 > 0. (6.4)

Since 〈α, α〉, 〈α, β〉, 〈β, β〉 are all real and 〈β, β〉 6= 0, the expression 〈α, α〉−2c〈α, β〉+c2〈β, β〉is a real quadratic polynomial in c. For some c, (6.4) holds if

〈α, β〉2 − 〈α, α〉〈β, β〉 < 0⇒ [|〈α, β〉|]2 ≤ ‖α‖2.‖β‖2

Cauchy Schwartz’s inequality in unitary space

Let α and β be any two vectors in an unitary space V (C), then |〈α, β〉| ≤ ‖α‖.‖β‖.Proof: Let α = θ, then ||α|| = 0. In this case, we have,

〈α, β〉 = 〈θ, β〉 = 〈0θ, β〉 = 0〈θ, β〉 = 0.

Therefore, |〈α, β〉| = 0. Thus, if α = θ, then, both sides being 0, the equality holds. Now,we consider the case, when α 6= θ. Then 1

||α||2 is a positive real number, since ||α|| > 0, forα 6= θ. Let us consider the vector

γ = β − 〈β, α〉||α||2

α.

Therefore, we have,

Norm 371

〈γ, γ〉 =(β − 〈β, α〉

||α||2α, β − 〈β, α〉

||α||2α

)=(β, β − 〈β, α〉

||α||2α

)− 〈β, α〉||α||2

(α, β − 〈β, α〉

||α||2α

); linearity

= 〈β, β〉 − 〈β, α〉||α||2

〈β, α〉 − 〈β, α〉||α||2

〈α, β〉+〈β, α〉||α||2

〈β, α〉||α||2

〈α, α〉

= ||β||2 − 〈β, α〉||α||2

〈β, α〉 − 〈α, β〉||α||2

〈α, β〉+〈β, α〉||α||2

〈β, α〉||α||2

||α||2

= ||β||2 − 〈α, β〉||α||2

〈α, β〉 = ||β||2 − |〈α, β〉|2

||α||2,

as 〈α, β〉 and 〈α, β〉 are complex conjugate. Again from the definition, 〈γ, γ〉 = ||γ||2 ≥ 0,we have,

||β||2 − |〈α, β〉|2

||α||2≥ 0

⇒ |〈α, β〉|2 ≤ ||α||2||β||2 ⇒ |〈α, β〉| ≤ ‖α‖.‖β‖.

This is also known as Cauchy Schwartz’s inequality.

Result 6.2.1 Let α = (α1, α2, . . . , αn), β = (β1, β2, . . . , βn) ∈ Vn(C), the unitary space,then

〈α, β〉 =n∑

i=1

αiβi; ‖α‖2 =n∑

i=1

|αi|2 ; ‖β‖2 =n∑

i=1

|βi|2

Hence the Cauchy Schwartz’s inequality for unitary space becomes∣∣∣∣∣n∑

i=1

αiβi

∣∣∣∣∣2

≤n∑

i=1

|αi|2n∑

i=1

|βi|2

The equality sign holds when either, (i) ai = 0 or bi = 0 or both ai = bi = 0; i = 1, 2, . . . , nor, (ii) ai = cbi for some nonzero real c ; i = 1, 2, . . . , n.

Result 6.2.2 Let α and β be two non null vectors in V (<). If θ be the angle between them,then

cos θ =〈α, β〉‖α‖.‖β‖

For unitary space, when α = (α1, α2, . . . , αn), β = (β1, β2, . . . , βn) we have

cos θ =∑αi βi√∑

|αi|2.∑|βj |2

By the Cauchy Schwartz’s inequality −1 ≤ cos θ ≤ 1, and so the angle exists and is unique.

Result 6.2.3 Let f and g be any real continuous functions defined in [0, 1], then the CauchySchwartz’s inequality is[

〈f, g〉]2

=[∫ 1

0

f(t)g(t)dt]2≤∫ 1

0

f2(t)dt∫ 1

0

g2(t)dt.

Ex 6.2.4 Find the angle between the vectors α = (2, 3, 5) and β = (1,−4, 3) in <3.


Solution: By definition of inner product, we have,

||α|| =√〈α, α〉 =

√4 + 9 + 25 =

√38

||β|| =√〈β, β〉 =

√1 + 16 + 9 =

√26

〈α, β〉 = 2− 12 + 15 = 5.

If θ be the angle between the vectors α and β, then it is given by

cos θ =〈α, β〉

||α|| ||β||=

5√38

√26.

Since cos θ is positive, θ is an acute angle.

Ex 6.2.5 Let V be a vector space of polynomials with inner product given by 〈f, g〉 =∫ 1

0

f(t)g(t)dt. Taking f(x) = x+ 2, g(x) = 2x− 3, find 〈f, g〉 and ‖f‖.


〈f, g〉 =∫ 1

0

f(t)g(t)dt =∫ 1

0

(t+ 2)(2t− 3)dt = −296.

Also, using the definition of norm, we have,

〈f, f〉 =∫ 1

0

f(t)f(t)dt =∫ 1

0

(t+ 2)2dt =193.

〈g, g〉 =∫ 1

0

g(t)g(t)dt =∫ 1

0

(2t− 3)2dt =133.

Hence, ‖f‖ =√〈f, f〉 =

√193 and so the length of f(x) is

√193 . If θ be the angle between

f(t) and g(t), then

cos θ =〈f, g〉‖f‖.‖g‖

=−29/6√

193

= − 292√

19× 13.

Since cos θ is negative, θ is an obtuse angle.

Ex 6.2.6 Find the cosine of the θ between α and β if α =(

2 13 −1

);β =

(0 −12 3

)in the

vector space of 2× 2 matrices over R.


〈α, α〉 = tr

(2 31 −1

)(2 13 −1

)= tr

(13 −1−1 2

)= 15

〈β, β〉 =(

0 2−1 3

)(0 −12 3

)= tr

(4 66 10

)= 14

〈α, β〉 = tr(βTα) = tr

(0 2−1 3

)(2 13 −1

)= 2.

Hence, the cosine of the θ between α and β is given by,

cos θ =〈α, β〉‖α‖.‖β‖

=2√

(15× 14)=

2√210

.

Since cos θ is positive, θ is an acute angle.

Norm 373

Ex 6.2.7 If in an inner product space ‖α+ β‖ = ‖α‖+ ‖β‖ holds, prove that α and β arelinearly dependent.

Solution: If α, β are any two vectors, then by Cauchy Schwartz’s inequality, we have|〈α, β〉| ≤ ‖α‖‖β‖.

Using the given condition, we get,

‖α+ β‖2 = [‖α‖+ ‖β‖]2

⇒ 〈α+ β, α+ β〉 = ‖α‖2 + ‖β‖2 + 2 · ‖α‖‖β‖⇒ 〈α, α〉+ 〈α, β〉+ 〈β, α〉+ 〈β, β〉 = 〈α, α〉+ 〈β, β〉+ 2 · ‖α‖‖β‖⇒ 2Re〈α, β〉 = 2 · ‖α‖‖β‖.

Since Re〈α, β〉 ≤ |〈α, β〉| we have ‖α‖.‖β‖ ≤ |〈α, β〉|. From this and Cauchy Schwartz’sinequality we have, ‖α‖.‖β‖ = |〈α, β〉|. Thus the equality shows that α and β are linearlydependent. The converse of this example is not true. For example, let α = (−1, 1, 0) andβ = (2,−2, 0) in V3(<). Hence β = −2α so that α and β are linearly dependent. But,

‖α‖ =√

1 + 1 + 0 =√

2; ‖β‖ =√

(2)2 + (−2)2 + 0 = 2√

2

‖α+ β‖ =√

12 + (−1)2 + 0 =√

2.

Hence, ‖α+ β‖ 6= ‖α‖+ ‖β‖.

Ex 6.2.8 If α and β are any two vectors in an inner product space V (F ), then,‖α+ β‖ ≤ ‖α‖.‖β‖.

Solution: Since α, β ∈ V (F ), the inner product space, by Cauchy Schwartz’s inequality,we have,

|〈α, β〉| ≤ ‖α‖.‖β‖.Using the definition of inner product we have,

‖α+ β‖2 = 〈α+ β, α+ β〉 = 〈α, α〉+ 〈α, β〉+ 〈β, α〉+ 〈β, β〉= ‖α‖2 + 〈α, β〉+

√〈α, β〉+ ‖β‖2

= ‖α‖2 + 2Re〈α, β〉+ ‖β‖2

≤ ‖α‖2 + 2Re|〈α, β〉|+ ‖β‖2

≤ ‖α‖2 + 2‖α‖.‖β‖+ ‖β‖2 =(‖α‖+ ‖β‖

)2

.

Hence ‖α + β‖ ≤ ‖α‖ + ‖β‖. This is well known triangle inequality. Let α and β be twoadjacent side of a triangle as indicated then α+β is the another side of the triangle forward byα and β. Geometrically it stats that “the length of one side of a triangle is less than or equalto the sum of the lengths of the other two sides”. In the similar manner, if α1, α2, · · · , αnbe an orthogonal set of vectors, then

||α1 + α2 + · · ·+ αn||2 = ||α1||2 + ||α2||2 + · · ·+ ||αn||2,

which is the well known Pythagoras theorem.

Ex 6.2.9 If α, β be any two vectors in an inner product space V , then

‖α+ β‖2 + ‖α− β‖2 = 2[‖α‖2 + ‖β‖2

].


Solution: Since α, β are two vectors, by definition,

‖α+ β‖2 + ‖α− β‖2 = 〈α+ β, α+ β〉+ 〈α− β, α− β〉= 〈α, α〉+ 〈α, β〉+ 〈β, α〉+ 〈β, β〉+ 〈α, α〉 − 〈α, β〉−〈β, α〉+ 〈β, β〉 = 2‖α‖2 + 2‖β‖2.

This is the well known parallelogram law. Let if α, β are two adjacent sides of a parallelo-gram, then α+β and α−β represent the diagonally of it. Hence the geometrical significantof this law indicates that “ sum of squares of the diagonal of a parallelogram is equal to thesum of the squares of its sides”.

To obtain the real polar form 〈α, β〉, subtracting, we get,

‖α+ β‖2 − ‖α− β‖2 = 4〈α, β〉

⇒ 〈α, β〉 =14[‖α+ β‖2 − ‖α− β‖2

],

which shows the inner product can be obtained from the norm function.

Ex 6.2.10 Prove that for any inner product space V ,||aα+ bβ||2 = |a|2||α||2 + |b|2||β||2 + ab〈α, β〉+ ab〈β, α〉.

Solution: Using the definition,||aα+ bβ||2 = 〈aα+ bβ, aα+ bβ〉 = a〈α, aα+ bβ〉+ b〈β, aα+ bβ〉

= a〈aα+ bβ, α〉+ b〈aα+ bβ, β〉= aa〈α, α〉+ b〈β, α〉+ ba〈α, β〉+ b〈β, β〉= aa〈α, α〉+ b〈β, α〉+ ba〈α, β〉+ b〈β, β〉= aa〈α, α〉+ ab〈β, α〉+ ba〈α, β〉+ bb〈β, β〉= |a|2||α||2 + ab〈α, β〉+ ba〈β, α〉+ |b|2||β||2.

6.3 Orthogonality

Let V (F ) be an inner product space and α, β ∈ V . Then the vector α is said to be orthogonalto the vector β, if, 〈α, β〉 = 0. (6.5)

Using the symmetric property 〈β, α〉 = 〈α, β〉 = 0 of the inner product, we say that if α isorthogonal to β, then β is orthogonal to α. Hence if 〈α, β〉 = 0, then α and β are orthogonal.

If the set of vectors S = α1, α2, · · · , αn in an inner product space V (F ) be such thatany two distinct vectors in S are orthogonal, i.e.,

〈αi, αj〉 = 0 for, i 6= j, (6.6)

then the set S is called an orthogonal set. This orthogonal set plays a fundamental role inthe theory of Fourier series.

Result 6.3.1 The null vector θ ∈ V is orthogonal to any non null vector α ∈ V , as

〈θ, α〉 = 〈θα, α〉 = θ〈α, α〉 = 0.

Also, the null vector θ is the only vector orthogonal to itself. For it, if α(6= θ) is orthogonalto every α ∈ V , then 〈α, α〉 = 0 ⇒ α = θ.

An orthogonal set of vectors may contain the null vector θ.

Orthogonality 375

Result 6.3.2 If α is orthogonal to β, then every scalar multiple of α is also orthogonal toβ. Since, 〈kα, β〉 = k〈α, β〉 = k.0 = 0,

so if α is perpendicular to β, then kα is also perpendicular to β.

Ex 6.3.1 Show that sin t and cos t are orthogonal functions in the vector space of continuousfunctions C[−π, π].

Solution: According to the definition of inner product, in the vector space, C[−π, pi] ofcontinuous functions on [−π, π], we get,

〈sin t, cos t〉 =∫ π

−π

sin t cos t dt =[12

sin2 t

]π

−π

= 0.

Thus sin t and cos t are orthogonal functions in the vector space C[−π, π].

Ex 6.3.2 Find a non zero vector γ that is perpendicular to α = (1, 2, 1) and β = (2, 5, 4) in<3.

Solution: Let γ = (x, y, z), then we want 〈α, γ〉 = 0 and 〈β, γ〉 = 0. This yields a homoge-neous system

x+ 2y + z = 0; 2x+ 5y + 4z = 0or, x+ 2y + z = 0; y + 2z = 0.

Set z = 1 to obtain y = −2 and x = 3. Thus (3,−2, 1) is a desired non zero vector orthogonalto α and β. Normalizing γ, we obtain the unit vector

γ =1||γ||

γ =1√14

(3,−2, 1)

orthogonal to α and β.

6.3.1 Orthonormal Set

Let V (F ) be an inner product space. Normalizing an orthogonal set S returns to the processof multiplying each vector in S by the reciprocal of its length in order to transform S intoan orthonormal set of vectors. A set S = α1, α2, . . . , αn of vectors in V (F ) is said to beorthonormal if 〈αi, αj〉 = 0 ; if i 6= j

= 1 ; if i = j.

A vector α ∈ V is said to be normalised if ||α|| = 1. The orthogonal set of vectors does notcontain the null vectors θ as ||θ|| = 0. Note that, if α1, α2, · · · , αn is an orthogonal set ofvectors, then kα1, kα2, · · · , kαn is an orthogonal, for any scalars k1, k2, · · · , kn.

Theorem 6.3.1 Every orthogonal (or orthonormal ) set of non null vectors in an innerproduct space is linearly independent.

Proof: Let T = α1, α2, . . . , αn be an orthogonal subset of an inner product space V (F ),where 〈αi, αj〉 = 0 for i 6= j. Let us consider the relation,

c1α1 + c2α2 + · · ·+ cnαn = θ,

where ci’s are scalars. Now, taking the inner product we get,

〈c1α1 + c2α2 + · · · + cnαn, αi〉 = 〈θ, αi〉 = 0; i = 1, 2, · · · , nor, c1〈α1, αi〉 + c2〈α2, αi〉+ · · ·+ cn〈αn, αi〉 = 0,or, ci〈αi, αi〉 = 0; since 〈αi, αj〉 = 0 for i 6= j.

Since αi 6= θ and hence 〈αi, αj〉 6= 0, therefore ci = 0. Thus c1 = c2 = · · · = cn = 0, showsthat T = α1, α2, . . . , αn is linearly independent.


6.3.2 Orthogonal Complement

Let V (F ) be an inner product space, and W be any subset of V . The orthogonal complementof W , denoted by W⊥ is defined by

W⊥ = α ∈ V ; 〈α, β〉 = 0, ∀ β ∈W

Therefore, W⊥ consists of all vectors in V that are orthogonal to every vector β ∈ W . Inparticular, for a given vector α ∈ V , we have,

α⊥ = β ∈ V ; 〈β, α〉 = 0,

i.e., α⊥ consists of all vectors in V that are orthogonal to the given vector α.

Result 6.3.3 Clearly θ ∈ W⊥. Let us consider two scalars a, b ∈ F and two element ofW⊥ are α1, α2. Thus for any β ∈W , we have

〈aα1 + bα2, β〉 = a〈α1, β〉+ b〈α2, β〉= a0 + b0 = 0

Hence aα1 + bα2 ∈W⊥ and so W⊥ is a subspace of V . From this result we conclude, if Wbe a subset of a vector space V , the W⊥ is a subspace of V .

Result 6.3.4 Let W = θ, then ∀ α ∈ V , we have 〈α, θ〉 = 0. Hence W⊥ = θ⊥ = V.

Result 6.3.5 By definition, V ⊥ = α : 〈α, β〉 = 0 ; ∀ β ∈ V . Also, we have,

〈α, α〉 = 0 ⇒ α = θ so that V ⊥ = θ.

Result 6.3.6 Suppose W is a subspace of V , then both W and W⊥ are subspaces of V .Let α ∈W ∩W⊥, then,

α ∈W and α ∈W⊥ ⇒ 〈α, α〉 = 0 ⇒ α = θ

⇒ W ∩W⊥ = θ

Ex 6.3.3 Find the basis for the subspace α⊥ of <3, where, α = (1, 3,−4).

Solution: According to the definition, α⊥ consists of all vectors β = (x, y, z) such that〈α, β〉 = 0 ⇒ x+ 3y − 4z = 0.

If y = 1, z = 0, then β1 = (−3, 1, 0) and if y = 0, z = 1, then β2 = (4, 0, 1), whereβ1, β2 form a basis for the solution space of the equation, and hence a basis of α⊥. Thecorresponding orthonormal basis is

1√10

(−3, 1, 0), 1√17

(4, 0, 1).

Ex 6.3.4 Let α = (1,−2,−1, 3) be a vector in <4. Find the orthogonal and orthonormalbasis for α⊥.

Solution: Since α⊥ consists of all vectors β = (x, y, z, t) ∈ <4 such that 〈α, β〉 = 0.Therefore, we are to find the solutions of the linear equation x− 2y− z+ 3t = 0. A non nullsolution of x− 2y − z + 3t = 0 is β1 = (0, 1, 1, 1). Now, find a non null solution solution ofthe system,

x− 2y − z + 3t = 0, y + z + t = 0

say, β2 = (5, 1, 0,−1). Lastly, find a non null solution of the linear system,

x− 2y − z + 3t = 0, y + z + t = 0, 5x+ y − t = 0,

say, β3 = (1,−7, 9,−2). Thus, (0, 1, 1, 1), (5, 1, 0,−1), (1,−7, 9,−2) is orthogonal basis forα⊥. The corresponding orthonormal basis is

1√3(0, 1, 1, 1), 1√

27(5, 1, 0,−1), 1√

135(1,−7, 9,−2)

.

Orthogonality 377

6.3.3 Direct Sum

Let W be a subspace of V . Then V is called the direct sum of two subspace W and W⊥,denoted by W ⊕W⊥ is given by

V = W ⊕W⊥

Property 6.3.1 Let U and W be subspace of a finite dimensional inner product space V,then,

(U +W )⊥ = U⊥ ∩W⊥ and (U +W )⊥ = U⊥ +W⊥.

Theorem 6.3.2 Let W be a subspace of a finite dimensional inner product space V . ThenV is the direct sum of W and W⊥ ( i.e. V = W ⊕W⊥) and W⊥⊥ = W .

Proof: Since W is a subspace of V , so it has an orthogonal basis. Let S = α1, α2, . . . , αkbe a basis of W . By extension theorem, S be extended to S1 = α1, α2, . . . , αk, . . . , αnto form a basis of V . Using Gram-Schmidt orthonormalization process to S1 we get anorthonormal basis β1, β2, . . . , βn of V where

βi =k∑

j=1

aij αj ; i = 1, 2, . . . , n.

Hence β1, β2, . . . , βn is an orthonormal basis ofW as β1, β2, . . . , βk ∈W . Thus we concludethat, there is an orthonormal basis of W which is part of an orthonormal basis of V .

Hence, ∃ an orthonormal basis β1, β2, . . . , βk of W which is part of an orthonormalbasis β1, β2, . . . , βn of V . Also as β1, β2, . . . , βn is orthonormal, βk+1, . . . , βn ∈ W⊥. Ifβ ∈W , then

β =n∑

i=1

aiβi =k∑

i=1

aiβi +n∑

i=k+1

aiβi ∈W +W⊥

Hence, β ∈ V ⇒ β ∈ W +W⊥ so that V = W +W⊥. Also if α ∈ W ∩W⊥, then α ∈ Wand α ∈W⊥ and

〈α, α〉 = 0 ⇒ α = θ ⇒W ∩W⊥ = θ.

These shows that V = W ⊕ W⊥ as a basic result in linear algebra. We prove this forfinite dimensional vector space V , but it also holds for spaces of arbitrary dimension. Bydefinition, W⊥ = α, 〈α, β〉 = 0; ∀β ∈ W. The relation 〈α, β〉 = 0 = 〈β, α〉 suggests thatα is orthogonal to β. Therefore,

β ∈ (W⊥)⊥ ⇒W ⊆W⊥⊥.

W⊥ is a subspace of V . Therefore, W⊥⊥ is also a subspace of V . Now, γ ∈W⊥⊥ ⇒ γ ∈ V.So, γ can be expressed in the form γ = α + β, where α ∈ W⊥ and β ∈ W. Therefore,〈α, γ〉 = 0, i.e.,

〈α, α〉+ 〈α, β〉 = 0 ⇒ 〈α, α〉 = 0⇒ α = θ ⇒ α = θ ⇒ γ = β ∈W⇒ γ ∈W⊥⊥ ⇒ γ ∈W, i.e.,W⊥⊥ ⊆W.

Therefore, W = W⊥⊥ and hence the theorem.

Ex 6.3.5 Find the basis of the subspace W of R4 orthogonal to α = (1,−2, 3, 4) and β =(3,−5, 7, 8)


Solution: Here dim <4 = 4, so a basis of R4 contains four linearly independent vectors.Here we are to find out an orthogonal set of four vectors out of them two are orthogonal toα = (1,−2, 3, 4) and β = (3,−5, 7, 8). Since, α and β are LI and W = α, β so, α, β is abasis of W . Therefore, dimW = 2. We know,

<4(<) = W ⊕W⊥, dim<4(<) = dimW + dimW⊥

⇒ 4 = 2 + dimW⊥ ⇒ dimW⊥ = 2.

Thus the basis of W⊥ consists of two vectors. Let the other elements be γ = (x1, x2, x3, x4)and δ = (y1, y2, y3, y4), which are the basis of W⊥. For orthogonality, α.γ = 0 and β.δ = 0,so,

x1 − 2x2 + 3x3 + 4x4 = 0 and 3y1 − 5y2 + 7y3 + 8y4 = 0.

The rank of the coefficient matrix of the system of linear equations is 2. Now, γ = (0, 2, 0, 1)and δ = (5, 0, 1, 1) satisfies the relation. Hence the basis of the subspace W of <4 orthogonalto α, β, i.e., a basis of W⊥ is (0, 2, 0, 1), (5, 0, 1, 1).

Ex 6.3.6 Let V be the vector space of 2× 2 matrices over <. (i) Show thatα =

(1 00 0

), β =

(0 10 0

), γ =

(0 01 0

), δ =

(0 00 1

)form a orthogonal basis of V . (ii) Find the basis for the orthogonal complement of (a) thediagonal matrices, (b) the symmetric matrices.

Solution: (i) The relation c1α + c2β + c3γ + c4δ = θ holds, if c1 = c2 = c3 = c4 = 0.Therefore, the given set of vectors α, β, γ, δ are linearly independent and hence forms anorthogonal basis of V .(ii)(a) Here the diagonal matrices are α and δ. Let W1 be the subspace of V spanned by α

and δ. Hence we seek all matrices X =(a cb d

)such that

〈α,X〉 = 0 = 〈δ,X〉

⇒ tr

[(a bc d

)(1 00 0

)]= 0 = tr

[(a bc d

)(0 00 1

)]⇒ tr

(a 0c 0

)= 0 = tr

(0 b0 d

)⇒ a = 0 = d.

The free variables are b and c. First we choose, b = 1, c = 0 so the solution is X1 =(

0 01 0

)and if we choose, b = 0, c = 1, then the solution is X2 =

(0 10 0

). Thus X1, X2 is a basis

of W1⊥.

(b) Here the symmetric matrices are α, δ. Let W2 be the subspace of V spanned by α, δ .

Hence we seek all matrices X =(a cb d

)such that

〈α,X〉 = 0 = 〈δ,X〉

⇒ tr

[(a bc d

)(1 00 0

)]= 0 = tr

[(a bc d

)(0 00 1

)]⇒ tr

(a 0c 0

)= 0 = tr

(0 b0 d

)⇒ a = 0 = d

Orthogonality 379

Taking the free variables as b = −1, c = 1, then the solution is X =(

0 −11 0

).

Ex 6.3.7 Let V be the inner product space P3 with the inner product 〈f, g〉 =∫ 1

0

f(t)g(t)dt.

Let W be the subspace of P3 with basis 1, t2. Find a basis for W⊥.

Solution: Let p(t) = at3 + bt2 + ct+d be an element of W⊥. Since p(t) must be orthogonalto each of the vectors in the given basis for W , we have,

〈p(t), 1〉 =∫ 1

0

(at3 + bt2 + ct+ d).1dt =a

4+b

3+c

2+ d = 0,

〈p(t), t2〉 =∫ 1

0

(at3 + bt2 + ct+ d).t2dt =a

6+b

5+c

4+d

3= 0.

Solving the homogeneous system, we get a = 3l + 16m; b = − 154 l − 15m; c = l; d = m.

Therefore,

p(t) = (3l + 16m)t3 + (−154l − 15m)t2 + lt+m

= l(3t3 − 154t2 + t) +m(16t3 − 15t2 + 1).

Now, (3t3− 154 t

2 + t), (16t3− 15t2 + 1) are LI, as they are not multiples of each other andW⊥ = (3t3 − 15

4 t2 + t), (16t3 − 15t2 + 1). Hence they form a basis of W⊥.

Ex 6.3.8 Find the orthogonal complement of the row space of the matrix

A =

1 1 22 3 53 4 7

.

Solution: Consider the system of linear homogeneous equations AX = 0, with the givenmatrix A as the coefficient matrix. Therefore,

x1 + x2 + 2x3 = 0, 2x1 + 3x2 + 5x3 = 0, 3x1 + 4x2 + 7x3 = 0⇒ x1 + x2 + 2x3 = 0, 2x1 + 3x2 + 5x3 = 0 ⇒ x2 + x3 = 0.

Thus the solutions are given by k(1, 1,−1); k ∈ Z.. Thus the orthogonal complement ofthe row space of A is L((1, 1,−1)).

Ex 6.3.9 Find the orthogonal basis of the row space of the matrix A =

1 1 1 11 2 1 02 1 2 3

.

Solution: Apply elementary row operations on the given matrix A, we get,1 1 1 11 2 1 02 1 2 3

→

1 −1 0 11 2 1 00 −3 0 3

→

0 −1 0 11 2 1 00 −1 0 1

→

0 −1 0 11 2 1 00 0 0 0

→

0 −1 0 11 1 1 10 0 0 0

.

Thus, we obtain a row echelon matrix R3 whose row vectors are (0,−1, 0, 1) and (1, 2, 1, 0).Thus the basis of the row space of A is (0,−1, 0, 1), (1, 2, 1, 0) and they are orthogonal.Hence the orthonormal basis of the row space of the matrixA is 1√

2(0,−1, 0, 1), 1√

6(1, 2, 1, 0).


Ex 6.3.10 Find the orthogonal basis for α⊥ in C3, where α = (1, i, 1 + i).

Solution: Here α⊥ consists of all vectors β = (x, y, z) such that,

〈α, β〉 = α.β = x− iy + (1− i)z = 0.

Let x = 0, then one of the solutions is α1 = (0, 1 − i, i). Now, we are to find a solution ofthe system

x− iy + (1− i)z = 0; (1 + i)y − iz = 0.

Let z = 2, then the solution is α2 = (−1 + i, 1 + i, 2). Thus we see that α1, α2 form anorthogonal basis for α⊥. The corresponding orthonormal basis is

1√3(0, 1− i, i), 1√

8(−1 + i, 1 + i, 2).

6.4 Projection of a Vector

Let β(6= θ) be a fixed vector in inner product space V . Then for a vector α(6= θ) in V ,∃ c ∈ F such that

〈α− cβ, β〉 = 0 ⇒ c =〈α, β〉〈β, β〉

. (6.7)

It is analogous to a coefficient in the Fourier series of a function. The unique scalar c isdefined as the scalar component of α along β or the Fourier coefficient of α with respect toβ. cβ is said to be the projection of α along β and is given by,

Proj (α, β) = cβ =〈α, β〉〈β, β〉

β. (6.8)

Ex 6.4.1 Find the Fourier coefficient and projection of α = (1, 3, 1, 2) along β = (1,−2, 7, 4)in <4.

Solution: Using the definition of inner product, we have,〈α, β〉 = 1− 6 + 7 + 8 = 10 and ||β||2 = 〈β, β〉 = 1 + 4 + 49 + 16 = 70.

Since β 6= θ, the Fourier coefficient c is given by,

c =〈α, β〉〈β, β〉

=1070

=17.

The projection of α = (1, 3, 1, 2) along β = (1,−2, 7, 4) in <4 is given by,

Proj (α, β) = cβ =17(1,−2, 7, 4).

Ex 6.4.2 Find the Fourier coefficient and projection of α = (1 − i, 3i, 1 + i) along β =(1, 2− i, 3 + 2i) in C3.

Solution: Using the definition of inner product, we have,

〈α, β〉 = 〈(1− i, 3i, 1 + i), (1, 2− i, 3 + 2i)〉= (1− i).1 + 3i(2 + i) + (1 + i)(3− 2i) = 3(1 + 2i).

||β||2 = 〈β, β〉 = 12 + (2− i)(2 + i) + (3 + 2i)(3− 2i) = 19.


c =〈α, β〉〈β, β〉

=319

(1 + 2i).

Projection of a Vector 381

Thus the projection of α along β in C3 is given by,

Proj (α, β) = cβ =319

(1 + 2i)(1, 2− i, 3 + 2i)

=319

(1 + 2i, 4 + 3i,−1 + 8i).

Ex 6.4.3 Find the Fourier coefficient and projection of α = t2 along β = t+3 in the vectorspace P (t).


〈α, β〉 =∫ 1

0

t2(t+ 3)dt =54; ||β||2 =

∫ 1

0

(t+ 3)2dt =373.


c =〈α, β〉〈β, β〉

=54× 3

37=

15148

.

The projection of α = t2 along β = t+ 3 in P (t) is given by,

Proj (α, β) = cβ =15148

(t+ 3).

Ex 6.4.4 Find the Fourier coefficient and projection of α =(

1 23 4

)along β =

(1 15 5

)in

M22.


〈α, β〉 = tr

(1 51 5

)(1 23 4

)= tr

(16 2216 22

)= 38;

||β||2 = tr

(1 51 5

)(1 15 5

)= tr

(26 2626 26

)= 52.


c =〈α, β〉〈β, β〉

=3852

=1926.

The projection of α along β is given by,

Proj (α, β) = cβ =1926

(1 15 5

).

Theorem 6.4.1 Let α1, α2, . . . , αr from an orthogonal set of non null vectors in V , and

α be any vector in V − L(α1, α2, . . . , αr). If β = α −r∑

k=1

ckαk; where ck = 〈α,αk〉〈αk,αk〉 , the

scalar component of α along β, then β is orthogonal to α1, α2, . . . , αr.

Proof: Given that ci = 〈α,αi〉〈αi,αi〉 = the component (Fourier coefficient ) of α along the given

vector αi. By definition of inner product, we have for i = 1, 2, . . . , r;

〈β, αi〉 = 〈α−r∑

k=1

ckαk, αi〉 = 〈α, αi〉 −r∑

k=1

ck〈αk, αi〉

= 〈α, αi〉 − c1.0− · · · − c1〈αi, αi〉 − · · · − cr.0( since α′is are orthonormal)

= 〈α, αi〉 −〈α, αi〉〈αi, αi〉

〈αi, αi〉 = 0


This shows that β is orthogonal to each αi. Hence the theorem is proved. From this theorem,we have,

(i) Let S = α1, α2, . . . , αr, α and T = α1, α2, . . . , αr, β. Given β = α −r∑

k=1

ckαk so

that each vector in T is a linear combination of the vectors of S. Hence L(T ) ⊂ L(S).

By the same argument, as α = β +r∑

k=1

ckαk, we have, L(S) ⊂ L(T ). Hence it follows

that L(S) = L(T ).

(ii) This theorem is valid when α ∈ L(α1, α2, . . . , αr). In this case, α =r∑

k=1

ckαk, where

ck is the scalar component of α along αk. Clearly β = θ and so β is orthogonal to eachαi and

L(T ) = L(α1, α2, . . . , αr, θ) = L(α1, α2, . . . , αr)= L(α1, α2, . . . , αr, α); as α is a LC of α1, α2, . . . , αr= L(S)

Theorem 6.4.2 An orthogonal set of non-null vectors in a finite dimensional inner productspace is either a basis or can be extended to an orthogonal basis.

Proof: Let S = α1, α2, . . . , αr be an orthogonal set of non null vectors in V and dimV =n, where, 1 ≤ r ≤ n. Therefore S = α1, α2, . . . , αr is a linearly independent set.Case 1: Let r = n, then S = α1, α2, . . . , αn is an orthogonal basis of V .Case 2: Let r < n, then L(S) = L(α1, α2, . . . , αr) is a proper subspace of V , and so thereexists a non-null vectors αr+1 ∈ V but αr+1 /∈ L(α1, α2, . . . , αr). Let

c1α1 + c2α2 + · · ·+ crαr + cr+1βr+1 = θ,

where ci’s are scalars. If cr+1 6= 0, then βr+1 is a linear combination of α1, α2, . . . , αr sothat βr+1 ∈ L(α1, α2, . . . , αr), which is a contradiction. Therefore, cr+1 must be zero andso the set α1, α2, . . . , αr, βr+1 is linearly independent. Let,

αr+1 = βr+1 −r∑

i=1

diαi; di =〈βr+1, αi〉||αi||2

where di is the scalar component of αr+1 along αi, then αr+1 is non-null and 〈αr+1, αi〉 = 0for i = 1, 2, . . . , r. Therefore, S1 = α1, α2, . . . , αr+1 is an orthogonal set and the given setS has been extended to the orthogonal setS1.If r + 1 = n, then S1 is an orthogonal basis of V . If r + 1 < n, then by repeated ap-plication, we obtain in a finite number of steps an orthogonal set of n vectors Sn−r =α1, α2, . . . , αr+1, . . . , αn in V . This set Sn−r being an orthogonal set of non null vectors,in linearly independent and so form a basis of V . If this extended set is normalized, then Vhas an orthonormal basis.

Result 6.4.1 Let S = α1, α2, · · · , αn be an orthogonal basis of V , then they are linearlyindependent. Now any α ∈ V can be expressed as a linear combination of vectors of S as

α =〈α, α1〉〈α1, α1〉

α1 +〈α, α2〉〈α2, α2〉

α2 + · · ·+ 〈α, αn〉〈αn, αn〉

αn.

Gram-Schmidt process of orthogonalization

Theorem 6.4.3 Every non-null subspace W of a finite dimensional inner product space Vpossesses an orthogonal basis.


Proof: Let V be an inner product space of n vectors and dimW = r where 1 ≤ r ≤ n.Let S = α1, α2, . . . , αr be a basis of W , then S is linearly independent and none of theelements of S in θ. Now we construct an orthogonal basis. Since all the basis vectors arenon-null, set

β1 = α1 and β2 = α2 − cβ1.

If β2 is orthogonal to β1, then0 = 〈β2, β1〉 = 〈α2 − cβ1, β1〉

= 〈α2, β1〉 − c1〈β1, β1〉 ⇒ c1 =〈α2, β1〉〈β1, β1〉

⇒ β2 = α2 −〈α2, β1〉〈β1, β1〉

β1

i.e. c1 is the complement of α2 along α1 and c1β1 is the projection of α2 upon β1. For thevalue of c1, β2 is orthogonal to β1 and

L(β1, β2) = L(β1, α2) = L(α1, α2).

Let α3 /∈ L(β1, β2) and let β3 = α3 − d1β1 − d2β2, where d1β1, d2β2 are the projection ofα3 upon β1, β2 respectively. If β3 is orthogonal to β1, β2 then

〈β3, β1〉 = 0 ; 〈β3, β2〉 = 0

⇒ d1 =〈α3, β1〉〈β1, β1〉

; d2 =〈α3, β2〉〈β2, β2〉

⇒ β3 = α3 −〈α3, β1〉〈β1, β1〉

β1 −〈α3, β2〉〈β2, β2〉

β2

and L(β1, β2, β3) = L(β1, β2, α3) = L(α1, α2, α3).

Proceeding in this way we can construct β1, β2, . . . , βr where

βr = αr −〈αr, β1〉〈β1, β1〉

β1 −〈αr, β2〉〈β2, β2〉

β2 − · · · −〈αr, βr−1〉〈βr−1, βr−1〉

βr−1

and βr 6= θ as S is linearly independent. Also

(βr, βi) = 0 ; i = 1, 2 . . . , r − 1L(β1, β2, . . . , βr) = L(α1, α2, . . . , αr)

Hence β1, β2, . . . , βr is an orthogonal basis of the subspace W . Let, β1, β2, . . . , βr be anorthogonal basis of the subspace W, then for any β ∈ V, we have,

β = c1β1 + c2β2 + · · ·+ crβr,

where ci are the Fourier coefficients of β with respect to βi, i = 1, 2, · · · , r. Since W =L(β1, β2, . . . , βr), where β1, β2, . . . , βr form an orthogonal basis of the subspace W,then,

Proj(β,W ) = c1β1 + c2β2 + · · ·+ crβr. (6.9)

Here ci is the component of β along βi.

Ex 6.4.5 Apply Gram Schmidt process to obtain an orthogonal basis of <3 using the stan-dard inner product having given that (1, 0, 1), (1, 0,−1), (0, 3, 4) is a basis.


Solution: Let α1 = (1, 0, 1), α2 = (1, 0,−1), α3 = (0, 3, 4). Since∣∣∣∣∣∣1 1 00 0 31 −1 4

∣∣∣∣∣∣ = 6 6= 0

so S = α1, α2, α3 is linearly independent. Also S contains 3 vectors and dim<3 = 3.Hence S is a basis of <3. Let us construct an orthogonal basis β1, β2, β3 by Gram-Schmidtprocess. For which, let

β1 = α1 = (1, 0, 1)

β2 = α2 − c1β1 = α2 −〈α2, β1〉〈β1, β1〉

β1

= (1, 0,−1)− 1.1 + 0.0 + 1.(−1)1.1 + 0.0 + 1.1

(1, 0, 1) = (1, 0,−1)

and β3 = α3 −〈α3, β1〉〈β1, β1〉

β1 −〈α3, β2〉〈β2, β2〉

β2

= (0, 3, 4)− 0.1 + 3.0 + 4.11.1 + 0.0 + 1.1

(1, 0, 1)− 0.1 + 3.0 + 4.(−1)1.1 + 0.0 + (−1).(−1)

(1, 0,−1)

= (0, 3, 4)− 2(1, 0, 1) + 2(1, 0,−1) = (0, 3, 0).

Thus, β2 is orthogonal to β1 and β3 is orthogonal to β1, β2. Also,L(β1, β2, β3) = L(α1, α2, α3).

Therefore, an orthogonal basis of the subspace is (1, 0, 1)(1, 0,−1)(0, 3, 0) and the corre-sponding orthogonal basis is

1√2(1, 0, 1), 1√

2(1, 0,−1), (0, 1, 0).

Ex 6.4.6 Find the orthogonal basis for <4 containing two vectors(1, 2, 1,−1) and (0, 1, 2,−1).

Solution: Since dim<4 = 4, the basis of <4 contains four linearly independent vectors.Let e1, e2, e3, e4 be a standard basis of R4, where e1 = (1, 0, 0, 0), e2 = (0, 1, 0, 0), e3 =(0, 0, 1, 0), and e4 = (0, 0, 0, 1). Let α1 = (1, 2, 1,−1) and α2 = (0, 1, 2,−1). Now,

α1 = (1, 2, 1,−1) = 1e1 + 2e2 + 1e3 + (−1)e4.

Since the coefficient of e1 is non zero, using replacement theorem, α1, e2, e3, e4 is a basis.Also, α2 = (0, 1, 2,−1) = 0e1 + 1e2 + 2e3 + (−1)e4

= 0(α1 − 2e2 − r3 + e4) + e2 + 2e3 − e4

= 0α1 + e2 + 2e3 − e4.

Since the coefficient of e2 is non zero, so α1, α2, e3, e4 is a new basis. Now we constructan orthogonal basis β1, β2, β3, β4 by Gram-Schmidt process. For which, letβ1 = α1 = (1, 2, 1,−1).

β2 = α2 −〈α2, β1〉〈β1, β1〉

β1

= (0, 1, 2,−1)− 0.1 + 1.2 + 2.1 + (−1).(−1)1.1 + 2.2 + 1.1 + (−1).(−1)

(1, 2, 1,−1)

= (0, 1, 2,−1)− 57(1, 2, 1,−1) = −1

7(5, 3,−9, 2)

β3 = e3 −〈e3, β1〉〈β1, β1〉

β1 −〈e3, β2〉〈β2, β2〉

β2


= (0, 0, 1, 0)− 17(1, 2, 1,−1)−

97

11949

× (−17)(5, 3,−9, 2)

= (0, 0, 1, 0)− (17,27,17,−1

7) +

9119

(5, 3,−9, 2) =117

(4,−1, 3, 5).

β4 = e4 −〈e4, β1〉〈β1, β1〉

β1 −〈e4, β2〉〈β2, β2〉

β2 −〈e4, β3〉〈β3, β3〉

β3

= (0, 0, 0, 1)− (−1)7

(1, 2, 1,−1)− −(2/7)119/49

(−57,−3

7,97,−2

7)

− (5/17)3/17

(− 417,− 1

17,

317,

517

) =53(−1

3,13, 0,

13).

Here, β1 is orthogonal to β2, β3, β4, β2 is orthogonal to β3, β4 and β3 is orthogonal to β4.Also,

Lβ1, β2, β3, β4 = Lα1, α2, α3, α4.Therefore, β1, β2, β3, β4 is the required orthogonal basis of the subspace.

Ex 6.4.7 Given α1 = 1, α2 = 1 + x, α3 = x + x2 as a basis of P2(x) over R. Define theinner product as

〈f, g〉 =∫ 1

−1

f(x)g(x) dx

where f(x), g(x) are elements of P2(x). Construct an orthonormal basis of P2(x) from thegiven set.

Solution: Let β1 = α1 = 1. β2 = α2 − c21β1, where c21 = 〈α2,β1〉〈β1,β1〉 . Now,

〈α2, β1) =1∫

−1

(1 + x).1 dx = 2 and 〈β1, β1〉 =1∫

−1

1.1 dx = 2.

Therefore β2 = α2 − 22β1 = 1 + x− 1 = x. Similarly,

β3 = α3 − c31β1 − c32β2, where c31 = 〈α3,β1〉〈β1,β1〉 , c32 = 〈α3,β2〉

〈β2,β2〉 .

Again, 〈α3, β1〉 =1∫

−1

(x+ x2).1 dx = 23 , 〈α3, β2〉 =

1∫−1

(x+ x2).x dx = 23

and (β2, β2) =1∫

−1

x2 dx = 23 .

Therefore β3 = α3 − 2/32 β1 − 2/3

2/3β2 = (x+ x2)− 13 − x = x2 − 1

3 .

Thus, the set 1, x, x2 − 1/3 is an orthogonal basis of P2(x). Again,

‖β1‖2 = 〈β1, β1〉 =∫ 1

−1

1 dx = 2, ‖β2‖2 = 〈β2, β2〉 =∫ 1

−1

x2 dx =23,

‖β3‖2 = 〈β3, β3〉 =∫ 1

−1

(x2 − 1/3)2 dx =845.

Therefore, ‖β1‖ =√

2, ‖β2‖ =√

63 , ‖β3‖ = 2

√10

15 . Hence an orthogonal basis of P2(x) is

√22,

√6

2x,

√104

(3x2 − 1).

Ex 6.4.8 Find the orthogonal basis for <3 containing the vectors (− 1√2, 1√

2, 0) with the

standard inner product.


Solution: We know, dim<3 = 3, so a basis of <3 contains there linearly independentvectors. First we construct an orthogonal set of three vectors with α1 = (− 1√

2, 1√

2, 0) as an

element. Let the other elements be α2 = (x1, x2, x3) and α3 = (y1, y2, y3). Since α1, α2, α3

are orthogonal, we have,

α1.α2 = 0 ⇒ − 1√2x1 +

1√2x2 + 0x3 = 0 ⇒ x1 = x2

α1.α3 = 0 ⇒ − 1√2y1 +

1√2y2 + 0y3 = 0 ⇒ y1 = y2

α2.α3 = 0 ⇒ x1y1 + x2y2 + x3y3 = 0⇒ 2x1y1 + x3y3 = 0

For simplicity, we choose y1 = 0, then x3 = 0, y3 6= 0. Thus the introduced vectors areα2 = (x1, x2, 0), α3 = (0, 0, y3). Hence,

1||α2||

α2 =1√2(1, 1, 0) ;

1||α3||

α3 = (0, 0, 1)

Thus an orthogonal basis is 1√2(−1, 1, 0), 1√

2(1, 1, 0), (0, 0, 1).

Ex 6.4.9 Find an orthogonal basis for the space of solutions of the linear equation 3x−2y+z = 0.

Solution: First we shall find a basis, not necessarily orthogonal. For instance we give z anarbitrary value say, z = 1. Thus we have to satisfy 3x − 2y = −1. By inspection, we let,x = 1, y = 2 or x = 3, y = 5, i.e.

α1 = (1, 2, 1) and α2 = (3, 5, 1).

Then obviously α1, α2 are linearly independent. The space of solution has dimension 2, soα1, α2 form a basis of that space of solution. There are of course many basis for this space.Let

β1 = α1 = (1, 2, 1)

β2 = α2 −〈α2, β1〉〈β1, β1〉

β1 = (3, 5, 1)− 3.1 + 5.2 + 1.11.1 + 2.2 + 1.1

(1, 2, 1)

= (3, 5, 1)− 146

(1, 2, 1) =13(2, 1,−4)

Then β1, β2 is an orthogonal basis of the given space of solutions.

Ex 6.4.10 Find an orthogonal basis for the space of solution of the linear equations 3x −2y + z + w = 0 = x+ y + 2w

Solution: Let W be the space of solutions in <4. Then W is the space orthogonal tothe two vectors (3,−2, 1, 1) and (1, 1, 0, 2). These are obviously linearly independent (by

any number of arguments, you can prove at once that the matrix(−2 11 0

)has rank 2, for

instance). Hence dimW = 4− 2 = 2. Next we find a basis for the space of solutions. Let usput w = 1, and

3x− 2y + z = −1; x+ y = −2


by ordinary elimination. If we put y = 0, then we get a solution with x = −2 and

z = −1− 3x+ 2y = 5

If we put y = 1, then we get a solution with x = −3 and

z = −1− 3x+ 2y = 10

Thus we get the two solutions α1 = (−2, 0, 5, 1) and α2 = (−3, 1, 10, 0). These two solu-

tions are linearly independent, because for instance the matrix(−1 0−3 1

)has rank 2. Hence

α1, α2 is a basis for the space of solutions. To find an orthogonal basis, let

β1 = α1 = (−2, 0, 5, 1)

β2 = α2 −〈α2, β1〉〈β1, β1〉

β1 = (−3, 1, 10, 1)− 6 + 50 + 14 + 25 + 1

(−2, 0, 5, 1)

=110

(−3, 10, 5,−9)

Thus β1, β2 is an orthogonal basis for the space of solutions.

Ex 6.4.11 Find an orthogonal and orthonormal basis for the subspace W of C3 spanned byα1 = (1, i, 1) and α2 = (1 + i, 0, 2).

Solution: To find an orthogonal and orthonormal basis for the subspace W of C3, we use,the Gram-Schmidt process of orthogonalization. Let,

β1 = α1 = (1, i, 1)

β2 = α2 −〈α2, β1〉〈β1, β1〉

β1 = (1 + i, 0, 2)− 〈(1 + i, 0, 2), (1, i, 1)〉〈(1, i, 1), (1, i, 1)〉

(1, i, 1)

= (1 + i, 0, 2)− 3 + i

3(1, i, 1) =

13(2i, 1− 3i, 3− i).

Thus, the orthogonal basis is (1, i, 1), (2i, 1− 3i, 3− i). Also,

||β1||2 = 〈β1, β1〉 = 3, ||β2||2 = 24.

Hence the orthonormal basis is 1√3(1, i, 1), 1

2√

6(2i, 1− 3i, 3− i).

Ex 6.4.12 Let W be a real valued solution space ofd2y

dx2+ 4y = 0. Find the orthogonal and

orthonormal basis for W .

Solution: Since y = φ(x) = 0 satisfies the differential equation d2ydx2 + 4y = 0, so φ(x) ∈ W

and consequently, W is non empty and it is easily verified that W is a subspace of V (<).

The solution ofd2y

dx2+ 4y = 0 is of the type,

y = c1 cos 2x+ c2 sin 2x, for c1, c2 ∈ <.

Thus, W = L(cos 2x, sin 2x). If c1 cos 2x+ c2 sin 2x = 0, for c1, c2 ∈ < is an identity, thenc1 = c2 = 0. Therefore, cos 2x, sin 2x is linearly independent and it is a basis of W , sothat dimW = 2. To find the orthogonal basis, we use the inner product definition

〈f, g〉 =∫ π

0

f(t)g(t)dt


and use the Gram-Schmidt process of orthogonalization. Let the orthogonal basis beβ1, β2, then,

β1 = cos 2x and β2 = sin 2x− 〈sin 2x, cos 2x〉〈cos 2x, cos 2x〉

= sin 2x.

Again, using the inner product definition 〈f, g〉 =∫ π

0

f(t)g(t)dt, we have

〈cos 2x, cos 2x〉 =∫ π

0

cos2 2xdx =π

2; 〈sin 2x, sin 2x〉 =

∫ π

0

sin2 2xdx =π

2.

The corresponding orthonormal basis is√

2π cos 2x,

√2π sin 2x

.

Ex 6.4.13 Find the orthonormal basis of the row space of the matrix

A =

1 2 01 0 −12 2 1

.

Solution: Let α1 = (1, 2, 0), α2 = (1, 0,−1), α3 = (2, 2, 1) be three rows of A. The vectors

α1, α2, α3 are linearly independent as

∣∣∣∣∣∣1 2 01 0 −12 2 1

∣∣∣∣∣∣ = −4 6= 0. Thus α1, α2, α3 is a basis of

the row space of the matrix A. To find orthogonal basis, let,β1 = α1 = (1, 2, 0).β2 = α2 − c21β1 = α2 − 〈α2,β1〉

〈β1,β1〉β1

= (1, 0,−1)− 15 (1, 2, 0) = 1

5 (4,−2,−5)and β3 = α3 − c31β1 − c32β2 = α3 −

〈α3, β1〉〈β1, β1〉

β1 −〈α3, β2〉〈β2, β2〉

β2

= (2, 2, 1)− 65(1, 2, 0) +

145

(4,−2,−5) =19(8,−4, 8).

Hence the orthogonal basis of row space is β1, β2, β3, i.e., (1, 2, 0), 15 (4,−2,−5),

19 (8,−4, 8) and the corresponding orthonormal basis is β1/‖β1‖, β2/‖β2‖, β3/‖β3‖, i.e., 1√

5(1, 2, 0),

13√

5(4,−2,−5),

13(2,−1, 2)

.

Ex 6.4.14 Let V be the subspace of <4 spanned by (1, 1, 1, 1), (1,−1, 2, 2), (1, 2,−3,−4).Find the orthogonal and orthonormal basis for V . Find the projection of α = (1, 2,−3, 4)onto V .

Solution: Let α1 = (1, 1, 1, 1), α2 = (1,−1, 2, 2), α3 = (1, 2,−3,−4). To find an orthogonaland orthonormal basis for the subspace V of <4, we use, the Gram-Schmidt process oforthogonalization. Let,

β1 = α1 = (1, 1, 1, 1)

β2 = α2 −〈α2, β1〉〈β1, β1〉

β1 = (1,−1, 2, 2)− 44(1, 1, 1, 1) = (0,−2, 1, 1).

β3 = α3 −〈α3, β1〉〈β1, β1〉

β1 −〈α3, β2〉〈β2, β2〉

β2

= (1, 2,−3,−4)− −44

(1, 1, 1, 1)− −116

(0,−2, 1, 1) =16(12,−4,−1,−7).


Therefore, an orthogonal basis for the subspace V of <4, is,(1, 1, 1, 1), (0,−2, 1, 1), (12,−4,−1,−7)

and the corresponding orthonormal basis for the subspace V of <4, is 1

2 (1, 1, 1, 1), 1√6(0,−2, 1, 1), 1√

210(12,−4,−1,−7).

To extend the above orthogonal basis to the orthogonal basis for <4, we adjoin with the givenbasis of <4 any vector of the fundamental system to form a basis of <4 and then proceed tofind the orthogonal basis of <4. Now, we need only compute the Fourier coefficients.

c1 =〈α, β1〉〈β1, β1〉

= 1; c2 =〈α, β2〉〈β2, β2〉

= −12; c3 =

〈α, β3〉〈β3, β3〉

= − 110.

Since W = L(β1, β2, β3), where β1, β2, β3 form an orthogonal basis of the subspace W,then,

Proj(β,W ) = c1β1 + c2β2 + c3β3

= (1, 1, 1, 1)− 12(0,−2, 1, 1)− 1

10(12,−4,−1,−7) =

15(−1, 12, 3, 6).

Ex 6.4.15 Let S = (1, 1, 1, 1), (1, 1,−1,−1), (1,−1, 1,−1), (1,−1,−1, 1) of <4. (i) Showthat S is orthogonal and a basis of <4. (ii) Express α = (1, 3,−5, 6) as a linear combinationof the vectors of S. (iii) Find the co-ordinates of an arbitrary vector β = (a, b, c, d) ∈ <4,relative to the basis S.

Solution: Let α1 = (1, 1, 1, 1), α2 = (1, 1,−1,−1), α3 = (1,−1, 1,−1), α4 = (1,−1,−1, 1).(i) Using the definition of inner product, we get,

〈α1, α2〉 = 〈α1, α3〉 = 〈α1, α4〉 = · · · = 〈α3, α4〉 = 0.

Thus, S is orthogonal and hence S is linearly independent. As any four linearly independentvectors form a basis of <4, so S is a basis for <4.(ii) To express α = (1, 3,−5, 6) as a linear combination of the vectors of S, we are to findthe scalars c1, c2, c3, c4 ∈ < such that,

α = c1α1 + c2α2 + c3α3 + c4α4

or, ((1, 3,−5, 6) = c1(1, 1, 1, 1) + c2(1, 1,−1,−1) + c3(1,−1, 1,−1) + c4(1,−1,−1, 1)or, c1 + c2 + c3 + c4 = 1; c1 + c2 − c3 − c4 = 3;

c1 − c2 + c3 − c4 = 1; c1 − c2 − c3 + c4 = 1;

⇒ c1 =54; c2 =

34; c3 = −13

4; c4 =

94.

(iii) Since S is orthogonal, we need only find the fourier coefficients of β with respect to thebasis vectors. Therefore,

c1 =〈β, α1〉〈α1, α1〉

=12(a+ b+ c+ d); c2 =

〈β, α2〉〈α2, α2〉

=12(a+ b− c− d);

c3 =〈β, α3〉〈α3, α3〉

=12(a− b+ c− d); c4 =

〈β, α4〉〈α4, α4〉

=12(a− b− c+ d)

are the co-ordinate of β with respect to the basis S.


Exercise 6


1. The transformation T : R2 → R defined by T (x, y) = x+ y+α is linear if α equals to(a) 5 (b) 2 (c) 1 (d) 0

2. The transformation T : R2 → R defined by T (x, y) = xk + y is linear if k equals to(a) 0 (b) 1 (c) 2 (d) 3

3. If a linear transformation T : R2 → R2 be defined by T (x1, x2) = (x1 +x2, 0) then ker(T ) is [WBUT 2007](a) (1,−1) (b) (1, 0) (c) (0, 0) (d) (0, 1), (1, 0)

4. The transformation T : R2 → R2 defined by T (x, y) = (x2, y2) is(a) linear (b) non-linear

5. The integral operator I : R→ R defined by If(x) =b∫

a

f(x) dx is

(a) linear (b) non-linear

6. The integral operator D : R→ R defined by f(x) =df

dxf(x) is

(a) linear (b) non-linear

7. The transformation T : R2 → R is linear then 5T is(a) linear (b) non-linear

8. The ker (T ) for the mapping T : R3 → R3 defined by T (x, y, z) = (x+ y, y + z, z + x)is(a) (0, 0, 1) (b) (0, 0, 0) (c) (1,−1,−1) (d) (1,−1, 1), (−1, 1,−1)

9. The Im (T ) for the mapping T : R3 → R2 defined by T (x, y, z) = (x+ z, y + z) is(a) L(1, 0), (0, 1) (b) L(1, 0), (0, 1), (1, 1) (c) L(1, 0) (d) L(0, 1)

10. For a bijective mapping T : R3 → R3 the rank of T is(a) 1 (b) 2 (c) 3 (d) 4

11. If T : R3 → R3 is bijective then nullity of T is(a) 0 (b) 1 (c) 2 (d) 3

12. If T (x, y, z) = (x+ 2y − z, x+ 3y + 2z), x, y, z ∈ R then (1, 2, 0) is(a) (1,2) (b) (5,7) (c) (5,2) (d) (2,7)

13. Let T : V → V be a linear transformation. If im(T ) = V then ker(T ) is(a) V (b) (1, 0, 0), (0, 1, 0), (0, 0, 1) (c) θ (d) none of these

14. Let T : R3 → R2 and S : R3 → R2 be defined by T (x, y, z) = (x + y, y + z) andS(x, y, z) = (x− z, y) then (T1 + T2)(x, y, z) is(a) (x+ y, y + z) (b) (2x+ y − z, 2y + z) (c) (x+ y, x+ 2y)(d) (x− y, x− z)

15. If S and T are linear operators on R2 defined by S(x, y) = (y, x) and T (x, y) = (x, 0)then ST is equal to(a) (x, y) (b) (x, 0) (c) (0, x) (d) (y, x)


16. If S and T are two linear operators on R2 defined by S(x, y) = (x + y, x − y) andT (x, y) = (y, x) then 2S + 3T is(a) (2x+ 5y, 5x− 2y) (b) (x+ 2y, 2x− y) (c) (x+ 4y, 4x) (d) (2x, 3y)

17. If S : R2 → R2 defined by S(x, y) = (x, x+ y) then S2 is(a) (x, 2x+ 2y) (b) (x, 2x+ y) (c) (x+ y, x) (d) (x2, (x+ y)2)

18. If T1 and T2 be two operators defined by T1(x, y) = (−x, y) and T2(x, y) = (0, y) thenT2T

21 is

(a) (0, x) (b) (x, 0) (c) (0, y) (d) (x,−y)

19. Let T : R2 → R2 be defined by T (x, y) = (x+ y, x). Then T−1(x, y) is(a) (x− y, x) (b) (x, x+ y) (c) (x− y, x+ y) (d) (y, x− y)

20. Let T : R2 → R2 be defined by T (x, y) = (y, x) then T−1(3, 4) is(a) (3, 4) (b) (4, 3) (c) (−3,−4) (d) (−4,−3)

21. Let S : R2 → R2 and T : R2 → R2 be two mappings defined by S(x, y) = (x + y, x)and T (x, y) = (y, x) then T−1S−1(x, y) is(a) (x, y) (b) (y, x) (c) (y, x− y) (d) (x− y, y)

22. If ‖α‖ = 2 then the norm of the vector −5α is(a) −10 (b) 10 (c) 2 (d) −2

23. If α and β are orthogonal vectors then ‖α+ β‖2 is(a) ‖α‖2 + ‖β‖2 (b) ‖α‖2 − ‖β‖2 (c) (‖α‖+ ‖β‖)2 (d) none of these

24. If α = (4, 0, 3) is a vector of an inner product space then the normalized α is(a) (4, 0, 3) (b) 1

5 (4, 0, 3) (c) (1, 0, 1) (d) (1, 0, 0)

25. Let α and β be two vectors of a Euclidean space V then (α+ β, α− β) = 0 iff(a) ‖α+ β‖ = ‖α− β‖ (b) ‖α‖ = ‖β‖ (c) ‖α− β‖ = 0 (d) ‖α+ β‖ = 0

26. Let V be the vector space of polynomials with inner product given by (f, g) =1∫0

f(t)g(t) dt. If f(t) = t2 − t then ‖f(t)‖ is

(a) −1/6 (b) 1/30 (c) −1/30 (d) 1/6

27. If V is a vector space of all polynomials in t with inner product defined by (f, g) =1∫0

f(t)g(t) dt. If f(t) = 2t+ 1 and g(t) = t2 + 1 then (f, g) is

(a) 1/6 (b) 1/10 (c) 17/6 (d) 6/17

28. If α, β are two vectors in a real inner product space such that ‖α‖ = ‖β‖ then (α +β, α− β) is equal to(a) 0 (b) 1 (c) −1 (d) none of these

29. Let α = (2, 3, 4) and β = (−1, 1, k) be two vectors in a Euclidean space. If α and βare orthogonal then k is equal to(a) 0 (b) 1 (c) 1/4 (d) −1/4

30. If the vectors α = (k, 0, 0) and β = (0, k, 0) are orthogonal then k is(a) 0 (b) 1 (c) −1 (d) for all values of k



1. Find the value(s) of k ∈ <, such that 〈α, β〉 = x1x2−k(x2y1 +x1y2)+ y1y2 is an innerproduct. [Gate’04]

2. Let A be a 2× 2 orthogonal matrix of trace and determinant 1. Show that the anglebetween Au and u = [1, 0]T is 450. [Gate’02]

3. Let T : <4 → < be a linear functional defined by T (x1, x2, x3, x4) = −x2. Find theunique vector α ∈ <4 such that f(β) = 〈α, β〉 for all β ∈ <4. [Gate’04]

4. In Euclidean 2 space give an orthonormal basis of which one vector is in the directionof the vector (1, 2). [BH’98]


1. Let V be an inner product space over < and T : V → V be a LT such that 〈Tu, v〉 =〈u, Tv〉 and 〈Tu, u〉 ≥ 0, for all u, v ∈ V . Prove that

|〈Tu, v〉|2 ≤ 〈Tu, u〉〈Tv, v〉,∀u, v ∈ V. [Gate’99]

2. Consider the inner product 〈α, β〉 = βTAα on <3, where A =

2 1 −11 1 0−1 0 3

. Find an

orthonormal basis B of S = (x1, x2, x3) : x1 + x2 + x3 = 0 and then extend it to anorthonormal basis C of <3.

3. Prove that the set of vectors (2, 3,−1), (1,−2,−4), (2,−1, 1) is an orthogonal basisof <3 with usual inner product and express the vector (4, 3, 2) at a linear combinationof these basis vectors. [BH’04]

4. If u, v are two vectors of an inner product space V , then show that||u+ v||2 + ||u− v||2 = 2||u||2 + 2||v||2. [BH’05]

5. Find an orthonormal basis of the subspace of <4 spanned by(2,−1, 0,−1), (6, 1, 4,−5) and (4, 1, 3,−4).

6. Let α1 = (1, 1, 1, 1), α2 = (0, 1, 1, 1), α3 = (0, 0, 1, 1) and α4 = (0, 0, 0, 1) in <4. Start-ing from α1, α2, α3, α4 obtain an orthonormal basis of <4. If you use α4, α3, α2α1what is the orthonormal basis obtained?

7. (a) Use Gram-Schmidt process to obtain an orthogonal basis from the basis set(1, 2,−2), (2, 0, 1), (1, 1, 0) of the Euclidean space <3 with standard inner prod-uct. [VH’05,’98]

(b) Use Gram-Schmidt process to obtain an orthogonal basis of the subspace of theEuclidean space <4 generated by the set (1, 1, 0, 1), (1,−2, 0, 0), (1, 0,−1, 2).[BU(M.Sc.)98]

(c) Use Gram-Schmidt process to obtain an orthogonal basis of <3 containing thevector (− 1√

2, 1√

2, 0) with the standard inner product. [VH’03]

(d) Use Gram-Schmidt process to obtain an orthogonal basis from the basis set(1, 1, 1), (0, 1, 2), (2, 1, 1) of the Euclidean space <3 with standard inner prod-uct. [VH’01]

8. Find the orthogonal basis for of the subspace W of C3, spanned byα1 = (1, i, 0) and α2 = (1, 2, 1− i).


9. Extend (2, 3,−1), (1,−2,−4) to an orthogonal basis of <3 and then find the orthonor-nal basis. [VH’02]

10. What is the orthogonal complement of the subspace of the even polynomials in Pn(<)

with respect to the inner product 〈p, q〉 =1∫

−1

p(t)q(t)dt?

11. Obtain an inner product on <2 such that ||〈2, 3〉T || < ||〈1, 1〉T ||.

12. (a) Show that if αT = ( 12 ,

12 ,

12 ,

12 ) and βT = (1

2 ,−12 ,

12 ,−

12 ) then A = I −ααT − ββT

is an orthogonal projector.

(b) Obtain the orthogonal projector (with respect to the Euclidean inner product)

into the column space of A =

3 2 11 3 −2−2 1 −3

.

13. (a) Obtain an orthogonal matrix with ( 12 ,

12 ,

12 ,

12 )T as the first column.

(b) Obtain an orthogonal matrix of order 3 on the integers whose first row is (1, 2,−1).[VH 01, 05]

Answer


1. d 2. b 3. a 4. b 5. a 6. a 7. a 8. b 9. a 10. c 11. a12. b 13. c 14. b 15. c 16. a 17. b 18. c 19. d 20. b 21. d 22. b23. a 24. b 25. b 26. b 27. c 28. a 29. d 30. b


Chapter 7

Matrix Eigenfunctions

One of the most important topic in linear algebra is determination of eigenvalues and eigen-vectors. For a square matrix, eigenfunctions (eigenvalue and eigenvector) plays a significantrole in the field of Applied Mathematics, Applied Physics, Economics, Astronomy, Engineer-ing and Statistics. The analysis of electrical circuit, small oscillations, frequency analysis indigital system, etc. can be done with the help of eigenvalues and eigenvectors. These areuseful in the study of canonical forms of a matrix under similarity transformations and inthe study of quadratic forms, especially the extrema of quadratic form.

7.1 Matrix Polynomial

If the elements of a matrix A be polynomial in x with degree n atmost, then,A = xnA0 + xn−1A1 + · · ·+ xAn−1 +An, (7.1)

where Ai are the square matrices of the same order as that of A. Such a polynomial (7.1) iscalled matrix polynomial of degree n, provided the leading co-efficient A0 6= 0. The symbolx is called indeterminate. For example, 1 + x x2 − 1 1

2 x2 + x+ 2 2x2 + 3 x x2 + 5

= x2

0 1 00 1 01 0 1

+ x

1 0 00 1 00 1 0

+

1 −1 12 2 23 0 5

= A0x

2 +A1x+A2,

where the coefficients A0, A1, A2 are real matrices of order 3×3 as of A is a matrix polynomialof degree 2. We say that the metric polynomial is r−rowed, if the order of each of the matrixcoefficients Ai; i = 1, 2, · · · , n be r. Two matrix polynomials are said to be equal, if and onlyif the coefficients of the like powers of the indeterminate x are same.

7.1.1 Polynomials of Matrices

Let us consider a polynomial f(x) = c0xn + c1x

n−1 + · · · + cn−1x + cn over a field F . LetA be a given square matrix, then we define,

f(A) = c0An + c1A

n−1 + · · ·+ cn−1A+ cnI, (7.2)

where I is an unit matrix of the same order as A, is a polynomial of matrix A. A polynomialf(x) is said to annihilate A if f(A) = 0, the zero matrix. Let f and g be polynomials. Forany square matrix A and scalar k,

(i) (f + g)(A) = f(A) + g(A)

(ii) (fg)(A) = f(A)g(A)

395

396 Matrix Eigenfunctions

(iii) (kf)(A) = kf(A)

(iv) f(A)g(A) = g(A)f(A)

(iv) tells us that any two polynomials in A commute.

7.1.2 Matrices and Linear Operator

Let T : V → V be a linear operator on a vector space V . Powers of T are defined by thecomposition operation, i.e.

T 2 = T.T, T 3 = T 2.T, . . .

Also, for a polynomial f(x) = anxn + · · ·+ a1x+ a0, we define f(T ) like matrices as

f(T ) = anTn +

... + a1T + a0I

where I is the identity mapping. We say that T is zero or root of f(x) if f(T ) = 0, the zeromapping. Suppose A is a matrix representation of a linear operator T . Then f(A) is thematrix representation of f(T ), and, in particular, f(T ) = 0 if and only if f(A) = 0.

Ex 7.1.1 Let A =(

1 −24 5

). Find f(A), where f(x) = x2−3x+7 and f(x) = x2−6x+13.

Solution: Given that, A =(

1 −24 5

), therefore,

A2 =(

1 −24 5

)(1 −24 5

)=(−7 −1224 17

).

For f(x) = x2 − 3x+ 7 the value of f(A) = A2 − 3A+ 7I becomes,

f(A) =(−7 −1224 17

)− 3

(1 −24 5

)+ 7

(1 00 1

)=(−3 −612 9

).

For f(x) = x2 − 6x+ 13 the value of f(A) = A2 − 6A+ 13I becomes,

f(A) =(−7 −1224 17

)− 6

(1 −24 5

)+ 13

(1 00 1

)=(

0 00 0

).

In the second case, A is the root of f(x).

7.2 Characteristic Polynomial

Characteristic polynomial of a matrix

If A = [aij ]n×n be a given square matrix of order n over the field F , then, an ordinarypolynomial in λ of the nth degree with scalar coefficients as

χA(λ) = |A− λI| =

∣∣∣∣∣∣∣∣∣a11 − λ a12 · · · a1n

a21 a22 − λ · · · a2n

......

...an1 an2 · · · ann − λ

∣∣∣∣∣∣∣∣∣ , (7.3)

Characteristic Polynomial 397

where, I is the unit matrix of order n, is defined as characteristic polynomial or characteristicmatrix of A. The equation

χA(λ) = |A− λI| = 0 (7.4)

is defined as the characteristic equation of the matrix A. The degree of the characteristicequation is the same as the order of the square matrix A. Let us write it as

χA(λ) = C0λn + C1λ

n−1 + C2λn−2 + · · ·+ Cn−1λ+ Cn = 0

where the coefficients Ci are functions of the elements of A. It can be shown that

C0 = (−1)n; C1 = (−1)n−1n∑

i=1

aii,

C2 = (−1)n−2 × sum of the principle minors of order 2,

and so on. Lastly, Cn = det A. Let A =(a11 a12

a21 a22

), be a matrix of order 2, then the

characteristic polynomial of A is,

χA(λ) = |A− λI| =∣∣∣∣a11 − λ a12

a21 a22 − λ

∣∣∣∣ = (a11 − λ)(a22 − λ)− a12a21

= λ2 − (a11 + a22)λ+ (a11a22 − a12a21)= λ2 − tr(A) λ+ |A|,

where, tr(A) denotes the trace of A, i.e., the sum of the diagonal elements of A. Similarly,for a matrix of order 3, the characteristic polynomial is

χA(λ) = |A− λI| =

∣∣∣∣∣∣a11 − λ a12 a13

a21 a22 − λ a23

a31 a32 a33−λ

∣∣∣∣∣∣= λ3 − tr(A) λ2 + (A11 +A22 +A33)λ− |A|,

where, A11, A22, A33 are the cofactors of a11, a22, a33 respectively. In general, if A be asquare matrix of order n, then the characteristic polynomial is

χA(λ) =

∣∣∣∣∣∣∣∣∣a11 − λ a12 · · · a1n

a21 a22 − λ · · · a2n

......

...an1 an2 · · · ann − λ

∣∣∣∣∣∣∣∣∣= λn − S1λ

n−1 + S2λn−2 − · · ·+ (−1)nSn

where Sk is the sum of the principal minors of A of order k.

Ex 7.2.1 Find the characteristic polynomial of A =(

1 34 5

).

Solution: The characteristic polynomial of A is given by,

χA(λ) = |A− λI| =∣∣∣∣1− λ 3

4 5− λ

∣∣∣∣= (λ− 1)(λ− 5)− 12 = λ2 − 6λ− 7.

Also, tr(A) = 1 + 5 = 6 and |A| = −7, so, χA(λ) = λ2 − 6λ− 7.


7.2.1 Eigen Value

If the matrix A be of order n, then the characteristic equation of A is an nth degree equationin λ. The roots of

χA(λ) = |A− λI| = 0 (7.5)

are defined as characteristic roots or latent roots or eigen values of the square matrix A.The spectrum of A is the set of distinct characteristic roots of A. Thus, if A = [aij ]n×n,then the eigen values of the matrix A is obtained from the characteristic equation

|A− λI| = λn + p1λn−1 + p2λ

n−2 + · · ·+ pn = 0,

where p1, p2, · · · , pn can be expressed in terms of elements aij ’s of the matrix A = [aij ]n×n

over F . Clearly χA(λ) is a monic (i.e., the leading coefficients is 1) polynomial of degree n,since the heighest power of λ occurs in the term

∏ni=1(λ−aii) in χA(λ). So by fundamental

theorem of algebra χA(λ) has exactly n (not necessarily distinct ) roots. We usually denotethe characteristic roots of A by λ1, λ2, · · ·λn, so that

χA(λ) = |A− λI| = (λ− λ1)(λ− λ2) · · · (λ− λn).

Ex 7.2.2 The characteristic roots of a 3×3 matrix are known to be in arithmetic progression.Determine them, given tr(A) = 15 and |A| = 80.

Solution: Let the characteristic roots of the matrix A be a− d, a, a+ d. Then,

(a− d) + a+ (a+ d) = 15 ⇒ a = 5.Also, (a− d)a(a+ d) = 80 ⇒ (a2 − d2)a = 80 ⇒ d = 3.

Therefore, the characteristic roots of the matrix are 2, 5, 8.

7.2.2 Eigen Vector

Let us consider, the matrix equationAX = λX, i.e.,(A− λI)X = 0, (7.6)

where A = [aij ]n×n is a given n×n matrix and X = [x1, x2, · · · , xn]T is a n× 1 column nonnull matrix and λ is scalar as

a11 − λ a12 · · · a1n

a21 a22 − λ · · · a2n

......

...an1 an2 · · · ann − λ

x1

x2

...xn

=

00...0

.

Thus corresponding to each eigen value λi of the matrix A, there is a non null solution(A− λiI)X = 0. If X = Xi be the corresponding non null solution, then the column vectorXi is defined as eigen or inverient or characteristic or latent vector or pole. Determinationof scalar λ and the non-null vector X, satisfying AX = λX, is known as the eigen valueproblem.

Ex 7.2.3 Determine the eigen values and eigen vector of the matrix

2 2 11 3 11 2 2

.


Solution: The characteristic equation of the given matrix A is |A− λI| = 0, i.e.,∣∣∣∣∣∣2− λ 2 1

1 3− λ 11 2 2− λ

∣∣∣∣∣∣ = 0

or, (2− λ)(λ2 − 5λ+ 4) + 3(λ− 1) = 0or, (λ− 1)2(λ− 5) = 0 ⇒ λ = 1, 1, 5.

Thus the eigen values of the given matrix are 1, 1, 5 and 1 is an 2 fold eigen value of thematrix A. The spectrum of A is 1, 1, 5. Corresponding to λ = 1, consider the equation(A− I)X = 0, where A is the given matrix and X = [x1, x2, x3]T . The coefficient matrix isgiven by 1 2 1

1 2 11 2 1

∼

1 2 10 0 00 0 0

.

The system of equation is equivalent to x1 + 2x2 + x3 = 0. We see that [1, 0,−1]T is oneof the non null column solution, which is a eigen vector corresponding to the eigen valueλ = 1. For λ = 5, the coefficient matrix is given by−3 2 1

1 −2 11 2 −3

∼

0 8 −80 −4 41 2 −3

∼

0 0 00 −1 11 0 −1

.

The system of equation is equivalent to −x2 + x3 = 0, x1 − x3 = 0 so that x3 = 1 givesx1 = x2 = 1. Hence [1, 1, 1]T is a eigen vector corresponding to the eigen value λ = 5.

Cayley-Hamilton theorem

Theorem 7.2.1 Every square matrix satisfies its characteristic equation.

Proof: Let A be a matrix of order n and I be the unit matrix of the same order. Itscharacteristic equation is |A− λI| = 0, i.e.,

λn + a1λn−1 + a2λ

n−2 + · · ·+ an = 0; ai = scalars⇒ |A − λI| = (−1)nλn + a1λ

n−1 + a2λn−2 + · · ·+ an.

We are to show that An + a1An−1 + a2A

n−2 + · · ·+ an−1A+ anI = 0. Now, cofactors of theelements of the matrix A−λI are polynomial in λ of degree at most (n− 1), so adj(A−λI)is a matrix polynomial in λ of degree (n− 1) as,

adj(A− λI) = λn−1A1 + λn−2A2 + · · ·+ λAn−1 +An,

where Ai are suitable matrices of order n, each of which will contain terms with same powersof λ. Now, using the relation, (A− λI)adj(A− λI) = |A− λI|I, we get,

(A− λI)[λn−1A1 + λn−2A2 + · · ·+ λAn−1 +An]= (−1)nλn + a1λ

n−1 + a2λn−2 + · · ·+ anI.

This relation is true for all values of λ, so, equating coefficients of like powers λn, λn−1, · · ·of λ from both sides, we get,


−A1 = (−1)nI

AA1 − IA2 = (−1)na1I

AA2 −A3 = (−1)na2I

...AAn = (−1)nanI.

Multiplying these relations successively by An, An−1, · · · , A, I respectively and adding weget,

0 = (−1)n[An + a1An−1 + a2A

n−2 + · · ·+ an−1A+ anI]⇒ An + a1A

n−1 + a2An−2 + · · ·+ an−1A+ anI = 0. (7.7)

Thus every matrix A is a root of its characteristic polynomial.

Deduction 7.2.1 Now, if |A| 6= 0, then A−1 exists. In this case, multiplying (7.7) by A−1,we get,

An−1 + a1An−2 + a2A

n−3 + · · ·+ an−1I + anA−1 = 0

⇒ A−1 = − 1anAn−1 − a1

anAn−2 − a2

anAn−3 − · · · − an−1

anI.

Therefore, Cayley-Hamilton theorem can be applied to find the inverse of a matrix.

Result 7.2.1 Suppose A = [aij ] be a triangular matrix. Then A−λI is a triangular matrixwith diagonal entries aii − λ, and hence, the characteristic polynomial is

|A− λI| = (λ− a11)(λ− a22) · · · (λ− ann).

Result 7.2.2 Suppose the characteristic polynomial of an n square matrix A is a productof n distinct factors, then A is similar to the diagonal matrix D = diag(a11, a22, · · · , ann).

Ex 7.2.4 What are the possible eigen values of a square matrix A (over the field <) satis-fying A3 = A ?

Solution: According to Cayley-Hamilton theorem, every square matrix satisfies its owncharacteristic equation, so A3 = A becomes

λ3 = λ⇒ λ(λ2 − 1) = 0 ⇒ λ = −1, 0, 1.

Ex 7.2.5 Verify Cayley-Hamilton theorem for A =(

1 1−1 3

)and hence find A−1 and A6.

Solution: The characteristic equation of the given matrix A is

|A− λI| =∣∣∣∣1− λ 1−1 3− λ

∣∣∣∣ = 0 ⇒ λ2 − 4λ+ 4 = 0.

By Cayley-Hamilton theorem, we have A2 − 4A+ 4I = 0. Now,(1 1−1 3

)(1 1−1 3

)− 4

(1 1−1 3

)+(

1 00 1

)=(

0 4−4 8

)−(

4 4−4 12

)+(

4 00 4

)= 0.


so the Cayley-Hamilton theorem is verified. Therefore,

A−1 = I − 14A =

(1 00 1

)− 1

4

(1 1−1 3

)=

14

[(4 00 4

)−(

1 1−1 3

)]=

14

(3 −11 1

).

Now divide λ6 by λ2 − 4λ+ 4, we get,

λ6 = (λ2 − 4λ+ 4)(λ4 + 4λ3 + 12λ2 + 32λ+ 80) + 192λ− 320

= 192λ− 320 ⇒ A6 = 192A− 320I =(−128 192−192 256

).

Ex 7.2.6 If A =(

3 11 2

), express 2A5 − 3A4 +A2 − 5I as a linear polynomial in A.

Solution: The characteristic equation of the given matrix A is |A− λI| = 0, i.e.,∣∣∣∣3− λ 11 2− λ

∣∣∣∣ = 0 ⇒ λ2 − 5λ+ 5 = 0.

By Cayley-Hamilton theorem, we have A2− 5A+ 5I = 0. Now divide 2λ5− 3λ4 + λ2− 5 byλ2 − 5λ+ 5, we get,

2λ5 − 3λ4 + λ2 − 5 = (λ2 − 5λ+ 5)(2λ3 + 7λ2 + 25λ+ 91) + 330λ− 460= 330λ− 460

⇒ 2A5 − 3A4 +A2 − 5I = 330A− 460I.

Ex 7.2.7 Verify Cayley-Hamilton theorem for A =

0 0 13 1 0−2 1 4

and hence find A−1.


|A− λI| = 0 ⇒

∣∣∣∣∣∣0− λ 0 1

3 1− λ 0−2 1 4− λ

∣∣∣∣∣∣ = 0

or, λ3 − 5λ2 + 6λ− 5 = 0.

By Cayley-Hamilton theorem, we have A3 − 5A2 + 6A− 5I = 0. For verification, we have,

A2 =

0 0 13 1 0−2 1 4

0 0 13 1 0−2 1 4

=

−2 1 43 1 3−5 5 14

A3 =

−2 1 43 1 3−5 5 14

0 0 13 1 0−2 1 4

=

−5 5 14−3 4 15−13 19 51

.

Therefore, the expression A3 − 5A2 + 6A− 5I becomes, −5 5 14−3 4 15−13 19 51

− 5

−2 1 43 1 3−5 5 14

+ 6

0 0 13 1 0−2 1 4

− 5

1 0 00 1 00 0 1

= 0.


Thus, Cayley-Hamilton theorem is verified. To find A−1, we get,

A3 − 5A2 + 6A− 5I = 0

⇒ A−1 =15(A2 − 5A+ 6I) =

15

4 1 −1−12 2 35 0 0

.

Ex 7.2.8 Verify Cayley-Hamilton theorem for A =

1 0 01 0 10 1 0

and hence find A−1 and A50.


|A− λI| = 0 ⇒

∣∣∣∣∣∣1− λ 0 0

1 0− λ 10 1 0− λ

∣∣∣∣∣∣ = 0

or, λ3 − λ2 − λ+ 1 = 0.

By Cayley-Hamilton theorem, we have A3 − A2 − A+ I = 0. For verification, using A, theexpression A3 −A2 −A+ I becomes,1 0 0

2 0 11 1 0

−

1 0 01 1 01 0 1

−

1 0 01 0 10 1 0

+

1 0 00 1 00 0 1

= 0.

Hence the Cayley-Hamilton theorem is verified. Now, using the relation A3−A2−A+I = 0,we have, A−1 = −A2 +A+ I, i.e.,

A−1 = −

1 0 01 1 01 0 1

+

1 0 01 0 10 1 0

+

1 0 00 1 00 0 1

=

1 0 00 0 1−1 1 0

.

From the relation A3 −A2 −A+ I = 0, we see that

A3 = A2 +A− I

⇒ A4 = A3 +A2 −A = A2 +A2 − I

⇒ A5 = A3 +A3 −A = A3 +A2 − I.

Thus we get, for every integer n ≥ 3, we have An = An−2 + A2 − I. Using this recurrencerelations, we have, A4 = A2 +A2 − I,A6 = A4 +A2 − I, i.e.,

A4 =

1 0 02 1 02 0 1

; A6 =

1 0 03 1 03 0 1

.

From symmetry, we see that A50 =

1 0 025 1 025 0 1

.

Ex 7.2.9 A matrix A has eigen values 1 and 4 with corresponding eigenvectors (1,−1)T

and (2, 1)T respectively. Find the matrix A. [Gate’97]


Solution: Let the required matrix be A =(a11 a12

a21 a22

). Then from the equation AX = λX,

we have, (a11 a12

a21 a22

)(1−1

)= 1

(1−1

)⇒ a11 − a12 = 1

a21 − a22 = −1 .(a11 a12

a21 a22

)(2−1

)= 4

(2−1

)⇒ 2a11 + a12 = 8

2a21 + a22 = 4 .

Solving these equations, we get a11 = 3, a12 = 2, a21 = 1 and a22 = 2. Therefore, the matrix

is A =(

3 21 2

).

Theorem 7.2.2 If the eigen values of A are distinct, then the eigen vectors are linearlyindependent.

Proof: Let xk be the eigen vector of an n × n square matrix A corresponding to theeigen value λk, where λk; k = 1, 2, · · · , n are distinct. Let xk = [xk1, xk2, · · · , xkn]T fork = 1, 2, · · · , n. Thus we have, Axk = λkxk; k = 1, 2, · · · , n. Therefore,

A2xk = A(Axk) = λk(Axk) = λk(λkxk) = λ2xk.

By the principle of mathematical induction, we conclude Apxk = λpkxk, for any positive

integer p. Let us consider the relation, X = c1x1 + c2x2 + · · · + cnxn = θ; where ci’s arescalars. Equating ith component to 0, we get,

c1x1i + c2x2i + · · ·+ cnxni = 0,

it is true for i = 1, 2, · · · , n. Since x = θ, so Ax = θ. Therefore,

A(c1x1 + c2x2 + · · ·+ cnxn) = θ

or, c1(Ax1) + c2(Ax2) + · · ·+ cn(Axn) = θ

or, c1λ1x1 + c2λ2x2 + · · ·+ cnλnxn = θ.

Equating ith component to 0, we get,

c1λ1x1i + c2λ2x2i + · · ·+ cnλnxni = 0.

Again, equating the ith component of A2x = θ to zero, gives

c1λ21x1i + c2λ

22x2i + · · ·+ cnλ

2nxni = 0.

Continuing this process and lastly we get,

c1λn−11 x1i + c2λ

n−12 x2i + · · ·+ cnλ

n−1n xni = 0.

The n equations in n unknowns have a non null solution, if and only if,∣∣∣∣∣∣∣∣∣∣∣

1 1 · · · 1λ1 λ2 · · · λn

λ21 λ2

2 · · · λ2n

......

λn−11 λn−1

2 · · · λn−1n

∣∣∣∣∣∣∣∣∣∣∣= 0,

which is true if and only if some two λ’s are equal, which is contradictory to the hypothesis.Thus the system has no non null solution. Therefore, we must have

c1x1i = c2x2i = · · · = cnxni = 0; i = 1, 2, · · · , n.

Hence, c1x1 = c2x2 = · · · = cnxn = θ. But x1, x2, · · · , xn are non null vectors, since theyare eigen vectors, hence c1 = c2 = · · · = cn = 0. This shows that x1, x2, · · · , xn is linearlyindependent.


Properties of eigen values

Property 7.2.1 Two similar matrices have same eigen values.

Proof: Let A and B be two similar matrices. Hence ∃ non singular matrix P such thatB = P−1AP. The characteristic polynomial of B is

|B − λI| = |P−1AP − λI| = |P−1AP − λP−1IP |= |P−1(A− λI)P | = |P−1||A− λI||P |= |A− λI||PP−1| = |A− λI|.

Therefore, A and B have the same characteristic polynomial and hence they have the sameeigen values. But the matrices having the same eigen values may not be similar. For

example, the matrices A =(

1 00 1

)and B =

(1 20 1

)have the same characteristic polynomial

and hence they have the same eigen values 1, 1. But there is any non singular matrix P ,such that P−1AP = B so that B is not similar to A.

Property 7.2.2 If A and B be two square invertible matrices then AB and BA have thesame eigen value.

Proof: Now, AB can be written in the formAB = B−1B(AB) = B−1(BA)B.

So, AB and B−1(AB)B are similar matrices, and therefore, AB and B−1(AB)B havethe same eigen values. Therefore, AB and BA have the same eigen values. Similarly, A−1Band B−1A have the same eigen value.

Property 7.2.3 The eigen values of a matrix and its transpose are same.

Proof: Let A be a square matrix, then the eigen values of A are the roots of the equation|A− λI| = 0. Now,

|A− λI| =∣∣(A− λI)T

∣∣ = ∣∣AT − λIT∣∣

=∣∣AT − λI

∣∣ ; as IT = I.

Since A and AT have the same characteristic polynomial, A and AT have same eigen values.

Property 7.2.4 If λ1, λ2, · · · , λn be the eigen values of A, then kλ1, kλ2, · · · , kλn are theeigen values of kA, k being a arbitrary scalar.

Proof: Let A be a square matrix of order n. Let Xi be a eigen vector of the matrix Acorresponding to the eigen value λi. Then, AXi = λiXi; i = 1, 2, · · · , n. Therefore,

k(AXi) = k(λiXi); i = 1, 2, · · · , n and k 6= 0(kA)Xi = (kλi)Xi; i = 1, 2, · · · , n

showing that kλi is the eigen value of kA for i = 1, 2, · · · , n. Moreover the correspondingeigen vectors of A and kA are the same. Thus there are more than one eigen vector of Acorresponding to the same eigen value of A. On the other hand, if λ1 and λ2 be two eigenvector X of A, then,

AX = λ1X;AX = λ2X ⇒ (λ1 − λ2)X = 0.

Since the eigen vector X 6= θ, we get λ1 = λ2. Thus the eigen vector X of a matrix A cannot correspond to more than one eigen value of A.


Property 7.2.5 The product of the eigen values of A is |A|.

Proof: Let λ1, λ2, · · · , λn be the eigen values of A, then,

|A− λI| = (−1)n(λ− λ1)(λ− λ2) · · · (λ− λn).⇒ |A| = (−1)n(−1)nλ1.λ2. · · · , λn; putting λ = 0

= λ1.λ2. · · · , λn.

This shows that, the product of the eigen values of A is |A|. Thus, if A is non singular, then|A| 6= 0, therefore, none of the eigen value of a non singular matrix is 0.

Property 7.2.6 If A be a square matrix, then the sum of the characteristic roots of A isequal to the trace of A.

Proof: Let A = [aij ]n×n be a square matrix of order n. Then, by definition,

tr(A) = a11 + a22 + · · ·+ ann =n∑

i=1

aii.

Let λ1, λ2, · · · , λn be the eigen values of A, then,

|A− λI| = (−1)n(λ− λ1)(λ− λ2) · · · (λ− λn).

The coefficient of λn−1 in the right hand side of the above relation is (−1)n+1n∑

i=1

λi, and

that the coefficient in |A− λI| is (−1)n−1n∑

i=1

aii. Therefore,

(−1)n+1n∑

i=1

λi = (−1)n−1n∑

i=1

aii

⇒ tr(A) =n∑

i=1

aii =n∑

i=1

λi.

Property 7.2.7 If λ1, λ2, · · · , λn be the eigen values of a non singular matrix A of order n,then

(i)1λ1,

1λ2, · · · , 1

λnare the eigen values of A−1.

(ii) λm1 , λ

m2 , · · · , λm

n are the eigen values of Am,m is positive integer.

Proof: Let A be an n × n non-singular matrix, so that |A| 6= 0. Let the eigen vectors ofthe non-singular matrix A corresponding to the eigen value λr be Xr. Then,

AXr = λrXr ⇒ (AXr) = A−1(λrXr)or, A−1Xr = λr(A−1Xr).

This shows that 1λr

is the eigen value of A−1, r = 1, 2, · · · , n, with the same correspondingeigen vector. Now,

A2Xr = A(AXr) = A(λrXr) = λr(AXr) = λr(λrXr) = λ2rXr.

Let it is true for m = k, i.e., AkXr = λkrXr, then


A(AkXr) = Ak+1Xr = A(λkrXr)

= λkr (AXr) = λk

r (λrXr) = λk+1r Xr.

The result is true for m = k + 1 if it is true for m = k. By the principle of mathematicalinduction we conclude that AmXr = λm

r Xr, where m is a positive value integer. This showsthat, λm

r are the eigen value of Am. Combining we conclude that, λ−m1 , λ−m

2 , · · · , λ−mn are

the eigen values of A−m,m is positive integer.

Property 7.2.8 If λ be an eigen value of a non-singular matrix A, then 1λ |A| is an eigen

value of adjA.

Proof: Since A is non-singular, so A−1 exists and is given by A−1 = 1|A|adjA. If I be an

unit matrix of order n, then AA−1 = A−1A = I and the characteristic polynomial becomes,

|A− λI| = |A− λAA−1| = |A− λA1|A|

adjA|

= |A|∣∣∣∣I − λ

|A|adjA

∣∣∣∣ = λn

|A|n−1

∣∣∣∣ |A|λ I − adjA

∣∣∣∣ .Since A is non-singular, so 1

λ 6= 0, therefore, the characteristic equation becomes,

|A− λI| = 0 ⇒ λn

|A|n−1

∣∣∣∣ |A|λ I − adjA

∣∣∣∣ , i.e., ∣∣∣∣adjA− |A|λI

∣∣∣∣ = 0.

This shows that, if λ be an eigen value of a non-singular matrix, then 1λ |A| is an eigen value

of adjA.

Property 7.2.9 If λ is a eigen value of an orthogonal matrix A then so is λ−1.

Proof: Let A be an orthogonal matrix so that AAT = I. Now, since λ is a characteristicroot of A, so |A− λI| = 0. Therefore,

|A− λAAT | = 0 ⇒ |A(I − λAT )| = 0

⇒ λ|A||AT − 1λI| = 0.

Now, λ 6= 0, since A is non singular and also |A| 6= 0. Therefore,∣∣∣∣AT − 1λI

∣∣∣∣ = 0,

showing that λ−1 is a characteristic root of AT . But AT and A have same characteristicroot. Hence λ−1 is also characteristic root of A.

Property 7.2.10 The eigenvalue of the idempotent matrix is either 1 or 0.

Proof: A matrix A is idempotent if A2 = A. Let λ be an eigenvalue of A and X be itscorresponding eigenvector, so, AX = λX. Multiplying this equation by A,

A2X = λAX ⇒ AX = λAX as A2 = Aor, λX = λ(λX) as AX = λXor, (λ2 − λ)X = 0 or, λ2 − λ = 0 as X 6= 0.

Hence λ2 − λ = 0 or, λ(λ− 1) = 0, i.e., the eigenvalues of A are either 1 or 0.

Property 7.2.11 The eigen values of a real symmetric matrix are all real.


Proof: Let A be a real symmetric matrix, so that AT = A. We assume that some roots ofthe characteristic equation in λ of A belong to the complex field C. Let α + iβ; i =

√−1

be a complex root of the characteristic equation of A, then, |A− (α+ iβ)I| = 0. Let,

B = A− (α+ iβ)IA− (α+ iβ)I = (A− Iα)2 + β2I.

But, |B| = |A− (α+ iβ)I||A− (α+ iβ)I|= 0.|A− (α+ iβ)I| = 0 as, |A− (α+ iβ)I| = 0.

There is a non null matrix X such that BX = 0. Therefore,

XTBX = 0 ⇒ 0 = XT (A− Iα)2 + β2IX⇒ 0 = XT (A− Iα)2X + β2XTX

⇒ 0 = XT (A− Iα)T (A− Iα)X + β2XTX;as (A− Iα)T = AT − αIT = A− αI

⇒ 0 = (A− Iα)XT (A− Iα)X + β2XTX. (7.8)

Now, (A−αI)X is a real column vector and X is a real non zero column vector. Therefore,

(A− Iα)XT (A− Iα)X ≥ 0 and XTX > 0.

Thus the relation (7.8) is possible only when β = 0, which shows that all the roots of thecharacteristic equation of A are real.

Property 7.2.12 The eigenvalues of a real skew-symmetric matrix are purely imaginary orzero.

Proof: As in previous case, we can show that (λ+λ) XTX = 0, as for real skew-symmetric

matrix AT

= AT = −A. SinceX

TX 6= 0, λ+ λ = 0 or, λ = −λ.

Let λ = a + ib. Then λ = a − ib. Therefore, from the relation, λ = −λ, we havea+ ib = −a+ ib or a = 0. Therefore, λ = ib, i.e., λ is purely imaginary or zero.

Property 7.2.13 The eigen vectors corresponding to distinct eigen values of a real sym-metric matrix A are mutually orthogonal.

Proof: Let A be a real symmetric matrix, so that AT = A. Therefore, the eigen values ofa real symmetric matrix A are all real. Let X1, X2 be two eigen vectors corresponding theeigen values λ1, λ2(λ1 6= λ2). Then

AX1 = λ1X1 and AX2 = λ2X2

⇒ XT2 AX1 = λ1(XT

2 X1); XT1 AX2 = λ2(XT

1 X2).

Taking transpose and noting that AT = A, we get,

XT1 AX2 = λ1(XT

1 X2) = λ2XT1 X2

⇒ (λ1 − λ2)XT1 X2 = 0.

But, λ1 6= λ2, so XT1 X2 = 0, where X1, X2 are non-null vectors. Hence X1 and X2 are

orthogonal.

Property 7.2.14 The eigen values of an orthogonal matrix are ±1.


Proof: Let A be an orthogonal matrix and let λi be an eigen value with Xi as thecorresponding eigen vector. Then

AXi = λiXi ⇒ (AXi)T = (λiXi)T

⇒ (AXi)T (AXi) = λ2iX

Ti Xi

⇒ XTi (ATA)Xi = λ2

iXTi Xi

⇒ XTi Xi = λ2

iXTi Xi; as AAT = I.

Since the eigen vector Xi is an non null, XTi Xi 6= 0, and so, λ2

i = 1, i.e., λi = ±1. Hence,the eigen values of an orthogonal matrix has unit modulus.

Property 7.2.15 For a square matrix A, the following statements are equivalent:

(i) A scalar λ is an eigen value of A.

(ii) The matrix A− λI is singular.

(iii) The scalar λ is a root of the characteristic polynomial of A.

Property 7.2.16 The eigen values of an unitary matrix are of unit modulus.

Proof: Let A be an unitary matrix, so that A0A = I, where A0 denotes the transposeconjugate of A. Let X be the eigen vector of A corresponding to the eigen value λ, then

AX = λX ⇒ [AX]0 = [λX]0; taking transpose conjugate⇒ X0A0 = λX0 ⇒ X0A0AX = λX0λX

⇒ X0(A0A)X = λλX0X

⇒ X0IX = λλX0X, i.e., (1− λλ)X0X = 0.

Since X0X 6= 0, it follows that, λλ = 1, i.e. |λ|2 = 1, i.e., |λ| = 1.

Property 7.2.17 The eigen values of a Hermitian matrix are all real. The eigen vectorscorresponding to distinct eigen values are orthogonal.

Proof: Let A be a Hermitian matrix. Let X be the eigen vector of A corresponding to theeigen value λ, then AX = λX. Taking the Hermitian conjugate of this equation and notingA0 = A, we have, X0A = λX0. Thus,

X0(AX) = X0(λX) ⇒ X0AX = λX0X

⇒ [X0AX]0 = [λX0X]0; taking transpose⇒ X0A0[X0]0 = λX0[X0]0.

Since A is a Hermitian matrix, so, A0 = A and [X0]0 = X, and so,

λX0X = λX0X ⇒ (λ− λ)X0X = 0.

SinceX0X 6= 0, it follows that, λ−λ = 0, i.e., λ is real. The eigen values of a skew-Hermitianmatrix are purely imaginary or zero.

Let X1 and X2 be two eigenvectors of A corresponding to the distinct eigen values λ1 andλ2 respectively, so that AX1 = λ1X1 and AX2 = λ2X2. Taking the Hermitian conjugate,we have, X0

2A = λ2X02 , where, we use the fact that λ2 is real. Therefore,

(λ1 − λ2) X02 X1 = 0.

Since λ1 6= λ2, we have, X02 X1 = 0, showing that the vectors X1 and X2 are orthogonal to

each other.


Leverrier-Fraddeev method to find eigen value

In this method, we are to generate the coefficients of the characteristic polynomial. Thecharacteristic polynomial is

P (λ) = |A− λI| = λn + p1λn−1 + · · ·+ pn−1λ+ pn, (7.9)

where λ1, λ2, · · · , λn is a complete set of roots of polynomial. The method is based onNewton’s formula for the sums of the powers of the roots of an algebraic equation. LetSk = λk

1 + · · ·+ λkn; k = 1, 2, · · · , n so that

S1 =n∑

i=1

λi = Tr(A); S2 =n∑

i=1

λ2i = Tr(A2); · · · ;Sn =

n∑i=1

λni = Tr(An).

For k ≤ n, using Newton’s formula Sk + p1Sk−1 + · · ·+ pk−1S1 = −kpk we have,

p1 = −S1; p2 = −12[S2 + p1S1]; · · · pn = − 1

n[Sn + p1Sn−1 + · · ·+ pn−1S1].

Hence the coefficients of the characteristic polynomial p1, p2, · · · , pn can be easily found whenS1, S2, · · · , Sn are known. Thus the sequence of matrices Ai can be found by using thefollowing scheme

A1 = A; Tr A1 = a1; B1 = A1 − a1I

A2 = AB1;12

Tr A2 = a2; B2 = A2 − a2I

...

An = ABn−1;1n

Tr An = an; Bn = An − anI

where Bn is a null matrix. Thus the coefficients of the characteristic polynomial area1 = −p1; a2 = −p2; · · · ; an = −pn. The Leverrier-Faddeev method may also be used todetermine all the eigenvectors. Suppose the matrices B1,B2, . . . ,Bn−1 and the eigenvaluesλ1, λ2, . . . , λn are known. Then the eigenvectors x(i) can be determined using the formula

x(i) = λn−1i e0 + λn−2

i e1 + λn−3i e2 + · · ·+ en−1, (7.10)

where e0 is a unit vector and e1, e2, . . . , en−1 are column vectors of the matrices B1,B2, . . . ,Bn−1

of the same order as e0. Using this method one can compute the inverse of the matrix A.It is mentioned that Bn = 0. That is, An − anI = 0 or, ABn−1 = anI. From this relationone can write Bn−1 = anA−1. This gives,

A−1 =1an

Bn−1 =1−pn

Bn−1. (7.11)

Ex 7.2.10 Find the characteristic polynomial of the matrix A =

1 6 11 2 00 0 3

.

Solution: Here (i) A1 = A. Now, a1 = Tr (A) = 1 + 2 + 3 = 6. Hence

B1 = A1 − a1I =

−5 6 11 −4 00 0 −3

.

(ii) A2 = AB1 =

1 6 11 2 00 0 3

−5 6 11 −4 00 0 −3

=

1 −18 −2−3 −2 10 0 −9

.


Hence a2 = 12 × [1− 2− 9] = −5 and B2 = A2 − a2I =

6 −18 −2−3 3 10 0 −4

.

(iii)A3 = AB2 =

1 6 11 2 00 0 3

6 −18 −2−3 3 10 0 −4

=

−12 0 00 −12 00 0 −12

.

Thus a3 = 13 Tr A3 = −12 and B3 = A3 − a3I =

0 0 00 0 00 0 0

. Hence the characteristic poly-

nomial is λ3 − 6λ2 + 5λ+ 12 = 0. The eigenvalues of A are −1, 3, 4. To find the eigenvector

corresponding to the eigenvalue λ = −1, let us take e0 =

001

, e1 =

10−3

, e2 =

−21−4

.

From the formula (7.10) we get the results of calculation in the following table

λi I II III X(i)

0 −1 −2 −3λ1 = −1 0 0 1 1

1 3 −4 00 3 −2 1

λ2 = 3 0 0 1 19 −9 −4 −40 4 −2 2

λ3 = 4 0 0 1 116 −12 −4 0

.

7.2.3 Eigen Space

If the eigen values of A are real then the eigen vectors X1, X2, · · · , Xn ∈ <n. The subspacegenerated by the non null vectors is known as eigen or characteristic space of the matrixA and is denoted by Eλ. If λ is an eigen value of A, then the algebraic multiplicity of λ isdefined to be the multiplicity of λ as a root of the characteristic polynomial of A, while thegeometric multiplicity of λ is defined to be the dimension of its eigen space, i.e., dimEλ. Thegeometric multiplicity of an eigen value λ ≤ the algebraic multiplicity of the eigen value λ.If the geometric multiplicity of an eigen value λ = the algebraic multiplicity of the eigenvalue λ, then λ is said to be regular.

Theorem 7.2.3 The eigen vector of an n× n matrix A over a field F corresponding to aneigen value λ of A together with the zero column vector is a subspace of Vn(F ).

Proof: Let Eλ be the set of all eigen vectors of A corresponding to the eigen value λ.Obviously, each vector of Eλ is n× 1 column vector. Let X1, X2 ∈ Eλ and c1, c2 ∈ F. Then,AX1 = λX1 and AX2 = λX2. Now,

A(c1X1 + c2X2) = A(c1X1) +A(c2X2) = c1(AX1) + c2(AX2)= c1(λX1) + c2(λX2) = κ(c1X1 + c2X2).

It shows that c1X1 +c2X2 ∈ Eλ, if X1, X2 ∈ Eλ. Hence Eλ∪θ, where θ is the zero columnvector in Vn(F ), is a subspace of Vn(F ). Eλ ∪ θ is known as the characteristic subspacecorresponding to the eigen value λ or eigen space of λ.


Characteristic polynomial of block diagonal matrices

Let A be a block triangular matrix, say, A =(A1 B0 A2

)where A1 and A2 are square

matrices. Then A− λI is also a block triangular matrix, with diagonal blocks A1 − λI andA2 − λI. Thus,

|A− λI| =∣∣∣∣A1 − λI B

0 A2 − λI

∣∣∣∣ = |A1 − λI| |A2 − λI|.

Thus the characteristic polynomial of A is the product of the characteristic polynomials ofthe diagonal blocks A1 and A2. In general, let A is a block triangular matrix with diagonalblocks A1, A2, · · · , Ar; then the characteristic polynomial of A is

|A− λI| = |A1 − λI| |A2 − λI| · · · |Ar − λI|.

Ex 7.2.11 Find the characteristic polynomial of the block triangular matrix

A =

9 −1... 5 7

8 3... 2 −4

. . . . . . . . . . . . . . .

0 0... 3 6

0 0... −1 8

.

Solution: The given block triangular matrix A can be written in the form A =[A1 B0 A2

].

Now, the characteristic polynomial of A1 and A2 are

|A1 − λI| = λ2 − 12λ+ 35 = (λ− 5)(λ− 7),|A2 − λI| = λ2 − 11λ+ 30 = (λ− 5)(λ− 6).

Accordingly, the characteristic polynomial of A is

|A− λI| = (λ− 5)(λ− 7)(λ− 5)(λ− 6) = (λ− 5)2(λ− 6)(λ− 7).

Characteristic polynomial of linear operator

Let T : V → V be a linear operator on a vector space V (F ) with finite dimension. For anypolynomial f(t) = c0 + c1t+ · · ·+ cnt

n, let us define

f(T ) = c0I + c1T + · · ·+ cnTn,

where I is the identity mapping and powers of T are defined by the composition operation.The characteristic polynomial of the linear operator T is defined to be the characteristicpolynomial of the matrix representation of T . Cayley-Hamilton states that

“A linear operator T is a zero of the characteristic polynomial”.Eigen function : Let T : V → V be a linear operator on a vector space with finitedimension. A scalar λ is called an eigenvalue of T if ∃ a non null vector α such that,

T (α) = λα.

Every vector satisfying this relation is called an eigen vector of T corresponding to the eigenvalue λ. If λ is an eigen value of T if T −λI is non singular. The set Eλ, which is the kernal


of T − λI, of all eigen vectors belonging to an eigen value λ is a subspace of V , called theeigen space of λ.

Note that if A and B are matrix representation of T , then B = P−1AP , where P is achange of basis matrix. Thus A and B are similar and they have the same characteristicpolynomial. Accordingly, the characteristic polynomial of T is independent of the particularbasis in which the matrix representation of T is computed.

Ex 7.2.12 For the linear operator T : V → V , find all eigen values and a basis foreigenspace T (x, y) = (3x+ 3y, x+ 5y).

Solution: The matrix A that represents the linear operator T : V → V relative to thestandard basis of <2 as

A = [T ] =(

3 31 5

).

The characteristic polynomial of a linear operator is equal to the characteristic polynomialof any matrix A that represents the linear operator. Therefore, the characteristic polynomialfor the linear operator T : V → V is given by

|A− λI| =∣∣∣∣3− λ 3

1 5− λ

∣∣∣∣ = λ2 − 8λ+ 15.

Ex 7.2.13 Let T : <3 → <3 be defined byT (x, y, z) = (2x+ y − 2z, 2x+ 3y − 4z, x+ y − z).

Find all eigen values of T and find a basis of each eigen space.

Solution: The matrix A that represents the linear operator T : V → V relative to thestandard basis of <2 is

A = [T ] =

2 1 −22 3 −41 1 −1

.

The characteristic polynomial of a linear operator is equal to the characteristic polynomialof any matrix A that represents the linear operator. The characteristic polynomial for thelinear operator T : V → V is given by

|A− λI| =

∣∣∣∣∣∣2− λ 1 −2

2 3− λ −41 1 −1− λ

∣∣∣∣∣∣= λ3 − 4λ2 + 5λ− 2 = (λ− 1)2(t− 2).

Thus the eigen values of A are 1, 2. Now we find the linearly independent eigenvectors foreach eigenvalue of A. Corresponding to λ = 1, consider the equation (A− I)X = 0, whereX = [x1, x2, x3]T . The coefficient matrix is given by

A− I =

1 1 −22 2 −41 1 −2

∼

1 1 −20 0 00 0 0

.

⇒ x1 + x2 − 2x3 = 0.

We see that [1,−1, 0]T and [2, 0, 1]T are two linearly independent eigen vector correspondingto the eigen value λ = 1. Similarly, for λ = 2, we obtain,

A− 2I =

0 1 −22 1 −41 1 −3

∼

0 1 −20 0 01 1 −3

.

x1 + x2 − 3x3 = 0 and x2 − 2x3 = 0.

Diagonalization 413

We see that [1, 2, 1]T is a solution and so it is the eigen vector corresponding to the eigenvalue λ = 2.

Ex 7.2.14 For the linear operator D : V → V defined by D(f) = dfdt , where, V is the space

of functions with basis S = sin t, cos t, find the characteristic polynomial.

Solution: First we are to find the matrix A representing the differential operator D relativeto the basis S. Now,

D(sin t) = cos t = 0. sin t+ 1. cos tD(cos t) = − sin t = (−1). sin t+ 0. cos t

⇒ A =(

0 −11 0

).

Therefore, the characteristic polynomial for the linear operator D : V → V is given by|A− λI| =

∣∣∣∣0− λ −11 0− λ

∣∣∣∣ = λ2 + 1.

7.3 Diagonalization

Diagonalization of a matrix

A given n square matrix A with eigen values λ1, λ2, · · · , λn is said to be diagonalisable, if ∃a non singular matrix P such that

D = P−1AP = diag(λ1, λ2, · · · , λn) (7.12)

is diagonal. Thus the n×n matrix A is said to be diagonalisable, if A is similar to an n×ndiagonal matrix. Below we are to derive a necessary and sufficient for diagonalizability of amatrix A.

Theorem 7.3.1 An n× n matrix A over the field F is diagonalisable if and only if A hasn linearly independent eigen vectors.

Proof: First let A is diagonalisable. Then by definition A is similar to a diagonal matrixD = diag(λ1, λ2, · · · , λn), where the eigen values of A are λ1, λ2, · · · , λn and there exists anon-singular matrix P of order n such that

A = PDP−1, i.e., AP = PD.Let X1, X2, · · · , Xn be n column vectors of P , then,

ith column vector of AP = ith column vector of DP.

Therefore, AXi = λiXi. Since λi is an eigen value of A, this relation AXi = λiXi shows thatXi is the eigen vector of A corresponding to the eigen value λi. Therefore, eigen vectors of Aare n column vector of P . P is non-singular, so these vectors are LI in Vn(F ). Consequently,A has n linearly independent eigen vectors.

Conversely, let X1, X2, · · · , Xn be n linearly independent eigen vectors of A correspondingto the eigen values λ1, λ2, · · · , λn respectively. Then, AXi = λiXi. Let P be an n×n matrixwhose ith column vector is Xi. Since Xi’s are linearly independent in Vn(F ), P is non-singular. If D is a diagonal matrix, than,

D = diag(λ1, λ2, · · · , λn), so that AP = PD, i.e., A = PDP−1.Consequently, A is similar to D and A is diagonalisable.

Ex 7.3.1 Diagonalise the matrix A =

1 −3 33 −5 36 −6 4

, if possible.



|A− λI| =

∣∣∣∣∣∣1− λ −3 3

3 −5− λ 36 −6 4− λ

∣∣∣∣∣∣ = 0

or, (2 + λ)2(4− λ) = 0 ⇒ λ = −2,−2, 4.

Thus the eigen values of the given matrix are −2,−2, 4 and −2 is an 2-fold eigen value ofthe matrix A. Corresponding to λ = −2, consider the equation (A + 2I)X = 0, whereX = [x1, x2, x3]T . The coefficient matrix is given by

A+ 2I =

3 −3 33 −3 36 −6 6

∼

1 −1 10 0 00 0 0

.

The system of equation is equivalent to x1−x2+x3 = 0. We see that [1, 1, 0]T and [1, 0,−1]T

generate the eigen space of the eigen value −2 and they form a basis of the eigen space E−2

of −2. For λ = 4, the coefficient matrix is given by

A+ 4I =

3 −3 33 −9 36 −6 0

∼

1 1 −10 2 −10 0 0

.

⇒ x1 + x2 − x3 = 0, 2x2 − x3 = 0

so that x3 = 2 gives x1 = x2 = 1. Hence [1, 1, 2]T is a eigen vector corresponding to theeigen value λ = 5. Thus [1, 1, 2]T generates the eigen space of the eigen value 4 and theyform a basis of the eigen space E4 of 4. These three vectors [1, 1, 0]T , [1, 0,−1]T and [1, 1, 2]T

are LI, so the given matrix A is diagonalisable and the diagonalising matrix is

P =

1 1 11 0 10 −1 2

so that P−1 = −12

1 −3 1−2 2 1−1 1 −1

⇒ P−1AP = −1

2

1 −3 1−2 2 1−1 1 −1

1 −3 33 −5 36 −6 4

1 1 11 0 10 −1 2

=

−2 0 00 −2 00 0 4

,

where the diagonal elements are eigen values of A.

Ex 7.3.2 Show that the matrix A =(

3 −52 −3

)is diagonalizable over the complex field C.

Solution: The characteristic polynomial of A is

|A− λI| = λ2 − (3− 3)λ+ (−9 + 10) = λ2 + 1.

Now, we consider the following two subcases:

(i) A is a matrix over the real field <, then the characteristic polynomial has no real roots.Thus A has no eigen values and no eigen vectors, and so A is not diagonalizable.

(ii) A is a matrix over the complex field C. Then it has two distinct eigen values i and −i.

Diagonalization 415

Therefore, X1 = (5, 3− i)T and X2 = (5, 3 + i)T are the linearly independent eigen vectorsof A corresponding to the eigen values i and −i respectively. Thus,

P =(

5 53− i 3 + i

)and P−1AP =

(i 00 −i

).

As expected, the diagonal entries in D are the eigen values of A. Therefore, the matrix Ais diagonalizable over the complex field C.

Definition 7.3.1 For an r-fold eigenvalue λ of the matrix A, r is called the algebraic mul-tiplicity of λ. If k be the number of linearly independent eigenvectors corresponding to aneigenvalue λ then k is the geometric multiplicity of λ. The geometric multiplicity of aneigenvalue is less than or equal to its algebraic multiplicity. If the geometric multiplicity ofλ is equal to its algebraic multiplicity, then λ is said to be regular.

Ex 7.3.3 Let A =(

1 −11 3

). Find the algebraic and geometric multiplicities of the eigen-

values. Also, diagonalise A, if possible.

Solution: The characteristic equation of A is

|A− λI| =∣∣∣∣1− λ −1

1 3− λ

∣∣∣∣ = 0

or, λ2 − 4λ+ 4 = 0 ⇒ (λ− 2)2 = 0.

Therefore, the eigenvalues are λ = 2, 2. Hence the algebraic multiplicity of 2 is 2. Let[x1, x2]

T be the eigenvector corresponding to 2. Then(1− 2 −1

1 3− 2

)[x1

x2

]=[

00

]or,(−1 −11 1

)[x1

x2

]=[

00

],

or, x1 + x2 = 0. Let x2 = k then x1 = −k.

Thus the eigenvectors are[−kk

]= k

[−1

1

], i.e., there is only one independent eigenvector

corresponding to λ = 2. So, the geometric multiplicity of the eigenvalue 2 is 1. Sincethe number of independent eigenvectors is 1, of the matrix A of order 2 × 2, so A is notdiagonasable.

Deduction 7.3.1 Suppose a matrix A can be diagonalized as P−1AP = D, where, D isdiagonal. Then A has the extremely useful diagonal factorization A = PDP−1. Using thefactorization, the algebra of A reduces reduces to the algebra of the diagonal matrix D,which can be easily evaluated. Suppose D = diag(λ1, λ2, · · · , λn), then,

Am =(PDP−1

)m= PDmP−1 = P diag(λm

1 , λm2 , · · · , λm

n ) P−1.

More generally, for a polynomial f(t),

f(A) = f(PDP−1

)= Pf(D)P−1 = Pdiag (f(λ1), f(λ2), · · · , f(λn))P−1.

Furthermore, if the diagonal entries of D are nonnegative, let

B = P diag(√λ1,√λ2, · · · ,

√λn) P−1.

Then B is nonnegative square root of A, i.e., B2 = A and the eigen values of A are nonneg-ative.


Ex 7.3.4 Let A =(

3 12 2

), find f(A), where f(t) = t3 − 5t2 + 3t+ 6 and A−1.


|A− λI| =∣∣∣∣3− λ 1

2 2− λ

∣∣∣∣ = (λ− 1)(λ− 4).

Thus the eigen values of A are 1,4. We see that X1 = (1,−2)T and X2 = (1, 1)T are linearlyindependent eigen vectors corresponding to λ1 = 1 and λ2 = 4 respectively, and hence forma basis of <2. Therefore, A is diagonalisable. Let,

P =(

1 1−2 1

)so, P−1 =

13

(1 −12 1

)⇒ P−1AP =

(1 00 4

)= D.

Thus the diagonal elements are eigen values of A. Using the diagonal factorization A =PDP−1, and 14 = 1 and 44 = 256, we get,

A4 = PD4P−1 =(

1 1−2 1

)(1 00 256

)(13 −

13

23

13

)=(

171 85170 86

).

Also, f(1) = 5 and f(4) = 2, hence,

f(A) = Pf(D)P−1 =(

1 1−2 1

)(5 00 2

)(13 −

13

23

13

)=(

3 −1−2 4

).

Using√

1 = 1 and√

4 = 2, we obtain,A1/2 = B = P

√DP−1 =

(1 1−2 1

)(1 00 2

)(13 −

13

23

13

)=(

53

13

23

43

).

where B2 = A and where B has positive eigen values 1 and 2.

Ex 7.3.5 Let A =(

2 21 3

). (a) Find all eigen values and corresponding eigenvectors.

(b) Find a nonsingular matrix P such that D = P−1AP is diagonal,(c) Find A6 and f(A), where f(t) = t4 − 3t3 − 6t2 + 7t+ 3.(d) Find a matrix B such that B3 = A and B has real eigenvectors.

Solution: (a) The characteristic polynomial of A is

|A− λI| =∣∣∣∣2− λ 2

1 3− λ

∣∣∣∣ = (λ− 1)(λ− 4).

Thus the eigen values of A are 1,4. The eigen vectors corresponding to λ1 = 1 and λ2 = 4are X1 = (2,−1)T and X2 = (1, 1)T respectively.(b) Since the vectors X1 and X2 are linearly independent, so A is diagonalisable. Finally,let P be the matrix whose columns are the unit vectors X1 and X2 respectively, then,

P =(

2 1−1 1

)so, P−1 =

13

(1 −11 2

)⇒ D = P−1AP =

(1 00 4

),

Diagonalization 417

where the diagonal elements are eigen values of A.(c) Using the diagonal factorization A = PDP−1, and 16 = 1 and 46 = 4096, we get,

A6 = PD6P−1 =(

2 1−1 1

)(1 00 4096

)(13 −

13

13

23

)=(

1366 22301365 2731

).

Also, f(1) = 2 and f(4) = −1, hence,

f(A) = Pf(D)P−1 =(

2 1−1 1

)(2 00 −1

)(13 −

13

13

23

)=(

1 2−1 0

).

(d) Here(

1 00 3√

4

)is the real cube root of D. Hence the real cube root of A is

B = P3√DP−1 =

(2 1−1 1

)(1 00 3√

4

)(13 −

13

13

23

)=

13

(2 + 3

√4 −2 + 2 3

√4

−1 + 3√

4 1 + 2 3√

4

).

7.3.1 Orthogonal Diagonalisation

A square matrix A is said to be orthogonally diagonalisable, if there exists an orthogonalnon-singular matrix P such that

P−1AP = a diagonal matrix.

In this case, P is said to diagonalise A orthogonally.

Theorem 7.3.2 A square matrix is orthogonally diagonalisable, if and only if it is realsymmetric.

Proof: First let A be orthogonally diagonalisable, then there exists an orthogonal matrixP such that P−1AP is a diagonal matrix, say P−1AP = D, where D is an diagonal matrix.Since P is an orthogonal matrix, we have PT = P−1 and so,

A = P−1DP = PTDP

⇒ AT = [PTDP ]T = PTDTP

= PTDP = A, as D is diagonal so, DT = D,

shows that A is an symmetric matrix. Conversely, let A be a real symmetric matrix of ordern, the A has n linearly independent eigen vectors. Using Gram-Schmidt process, thesen eigen vectors can be converted to a set of linearly indendent orthogonal vectors. Thisorthogonal set can be normalized to get a set of n orthogonal eigen vectors. Let P be then× n matrix whose column vectors are these n orthonormal eigen vectors. Clearly, P is anorthogonal matrix and P−1AP is a diagonal matrix. Thus, A is orthogonally diagonalisable,if it be symmetric.

Ex 7.3.6 Let A =(

7 33 −1

), find an orthogonal matrix P such that D = P−1AP is diago-

nal.


|A− λI| = λ2 − (7− 1)λ+ (−7− 9) = (λ− 8)(λ+ 2).


Thus the eigenvalues of A are −2, 8. The eigen vectors corresponding to λ1 = −2 and λ2 = 8are X1 = (1,−3)T and X2 = (3, 1)T respectively. Since A is symmetric, the eigen vectorsX1 and X2 are orthogonal. Normalize X1 and X2 to obtain, respectively, the unit vectors

X1 =(

1√10,−3√10

)and X2 =

(3√10,

1√10

).

Finally, let P be the matrix whose columns are the unit vectors X1 and X2 respectively,then,

P =

(3√10

1√10

1√10

−3√10

)and P−1 =

(3√10

1√10

1√10

−3√10

)

⇒ D = P−1AP =

(3√10

1√10

1√10

−3√10

)(7 33 −1

)( 3√10

1√10

1√10

−3√10

)=(

8 00 −2

).

As expected, the diagonal entries in D are the eigen values of A.

Ex 7.3.7 Diagonalise the matrix A =

6 4 −24 12 −4−2 −4 13

, if possible.

Solution: Here the given matrix A is a real symmetric matrix. The characteristic equationof the given matrix A is

|A− λI| =

∣∣∣∣∣∣6− λ 4 −2

4 12− λ −4−2 −4 13− λ

∣∣∣∣∣∣ = 0

or, (4− λ)(λ2 − 27λ+ 162) = 0or, (λ− 9)(λ− 18)(4− λ) = 0 ⇒ λ = 4, 9, 18.

Thus the eigen values of the given matrix are 4, 9, 18. Corresponding to λ = 4, consider theequation (A− 4I)X = 0, where and X = [x1, x2, x3]T . The coefficient matrix is given by

A− 4I =

2 4 −24 8 −4−2 −4 9

∼

1 2 −10 0 00 0 7

.

⇒ x1 + 2x2 − x3 = 0, 7x3 = 0.

We see that, [−2, 1, 0]T generates the eigen space of the eigen value 4 and forms a basis ofthe eigen space E4 of 4. For λ = 9, the coefficient matrix is given by

A− 9I =

−3 4 −24 3 −4−2 −4 4

∼

1 7 −60 5 −40 0 0

.

⇒ x1 + 7x2 − 6x3 = 0, 5x2 − 4x3 = 0

so that x3 = 5 gives x1 = 2, x2 = 4. Hence [2, 4, 5]T is a eigen vector corresponding to theeigen value λ = 9. Thus [2, 4, 5]T generates the eigen space of the eigen value 9 and theyform a basis of the eigen space E9 of 9. For λ = 18, the coefficient matrix is given by

A− 18I =

−12 4 −24 −6 −4−2 −4 −5

∼

2 0 10 1 10 0 0

.

⇒ 2x1 + x3 = 0, x2 + x3 = 0

Diagonalization 419

so that x3 = −2 gives x1 = 1, x2 = 2. Hence [1, 2,−2]T is a eigen vector corresponding tothe eigen value λ = 18. Thus [1, 2,−2]T generates the eigen space of the eigen value 18 andthey form a basis of the eigen space E18 of 18. These three vectors [1, 1, 0]T , [1, 0,−1]T and[1, 1, 2]T are linearly independent and orthogonal, so the given matrix A is diagonalisableand the diagonalising orthogonal matrix is

P =

−2√5

23√

513

1√5

43√

523

0 53√

5− 2

3

so that P−1AP =

4 0 00 9 00 0 18

,

where the diagonal elements are eigen values of A.

Diagonalization of linear operator

A linear operator T : V → V is said to be diagonalisable, if it can be represented bya diagonal matrix D. According to the definition, the linear operator T : V → V isdiagonalisable if and only if there exists a basis S = α1, α2, · · · , αn of V for which,

T (α1) = λ1α1, T (α2) = λ2α2, · · · , T (αn) = λnαn. (7.13)

In such a case, T is represented by the diagonal matrix D = diag(λ1, λ2, · · · , λn) relative tothe basis S = α1, α2, · · · , αn.

Ex 7.3.8 Each of the following real matrices defines a linear transformation on <2 :

(a) A =(

5 63 −2

); (b) B =

(1 −12 −1

); (c) C =

(5 −11 3

).

Find for each matrix, all eigen values and maximum set S of linearly indeperndent eigenvectors. Which of these linear operators are diagonalisable?

Solution: (a) The characteristic polynomial of A is∣∣∣∣5− λ 63 −2− λ

∣∣∣∣ = λ2 − 3λ− 28 = (λ+ 4)(λ− 7).

Therefore, the eigen values of A are −4, 7. For λ1 = −4, if X1 = (x1, x2)T be the non nulleigen vector, then

AX1 = −4X1 ⇒9x1 + 6x2 = 03x1 + 2x2 = 0

⇒ 3x1 + 2x2 = 0.

Thus, X1 = (2,−3)T is a eigen vector corresponding to λ1 = −4. Similarly, X2 = (3, 1)T isa eigen vector corresponding to λ2 = 7.

So, S = (2,−3), (3, 1) is a maximal set of linearly independent eigen vectors. Since S isa basis of <2, A is diagonalisable. Using the basis S, A can be represented by the diagonalmatrix

D =(

2 3−3 1

)−1(5 63 −2

)(2 3−3 1

)=(−4 00 7

).

(b) The characteristic polynomial of B is∣∣∣∣1− λ −12 −1− λ

∣∣∣∣ = λ2 + 1 = (λ+ i)(λ− i).


There is no real characteristic root of B. Thus, B, a real matrix representing a lineartransformation on <2, has no eigen values and no eigen vectors. Hence in particular, B isnot diagonalisable in <2.

As a polynomial over C, the eigen values of B are −i, i. Therefore, X1 = (1, 1 + i)T andX2 = (1, 1− i)T are the linearly independent eigen vectors of A corresponding to the eigenvalues i and −i respectively. Now S = (1, 1 + i), (1, 1 − i) is a basis of C2 consisting ofeigen vectors of B. Using this basis B, B can be represented by the diagonal matrix

D =(

1 11 + i 1− i

)T (1 −12 −1

)(1 1

1 + i 1− i

)=(i 00 −i

).

As expected, the diagonal entries in D are the eigen values of A. Therefore, the matrix Ais diagonalizable over the complex field C.

(c) The characteristic polynomial of C is∣∣∣∣5− λ −11 3− λ

∣∣∣∣ = λ2 − 8λ+ 16 = (λ− 4)2.

Therefore, the eigen values of C are 4, 4. For λ1 = 4, if X1 = (x1, x2)T be the non null eigenvector, then

CX1 = 4X1 ⇒ x1 − x2 = 0.

The homogeneous system has only one independent solution, say (1, 1)T , so, (1, 1)T is aneigen vector of C. Furthermore, since there are no other eigen values, the solution setS = (1, 1) is a maximal set of linearly independent eigen vectors of C. Since S is not abasis of <2, C is not diagonalisable.

7.4 Minimal Polynomial

It turns out that in the case of some matrices having eigen values with multiplicity greaterthan unity, there may exist polynomials of degree less than n, which equal to the zero matrix.

Minimal polynomial of a matrix

Let the characteristic polynomial of a matrix A be

χ(λ) = (λ1 − λ)d1(λ2 − λ)d2 · · · (λl − λ)dl ;l∑

i=1

di = n.

The Cayley-Hamilton theorem states that

χ(A) = (λ1I −A)d1(λ2I −A)d2 · · · (λlI −A)dl = 0.

If r1, r2, · · · , rl be the smallest positive integers for which,

J(A) ≡ (λ1I −A)r1(λ2I −A)r2 · · · (λlI −A)rl = 0

where, ri ≤ di; (1 ≤ i ≤ l) then,

J(λ) = (λ1 − λ)r1(λ2 − λ)r2 · · · (λl − λ)rl (7.14)

is called the minimal polynomial of the matrix A. The degree of the minimal polynomial ofan n× n matrix A is atmost n. It follows at once that if all the eigen values of a matrix aredistinct, its minimal polynomial equal to characteristic polynomial.

Minimal Polynomial 421

Theorem 7.4.1 The minimal polynomial m(t) of a matrix A divides every polynomial whichhas A as zero. In particular, m(t) divides the characteristic polynomial of A.

Proof: Suppose f(t) is a polynomial for which f(A) = 0. By the division algorithm, ∃polynomials q(t) and s(t) for which

f(t) = m(t)q(t) + r(t) (7.15)

where either r(t) = 0 or degr(t) < degm(t). Substituting t = A in (7.15) and using the factthat f(A) = 0 and m(A) = 0, we get r(A) = 0. If r(t) 6= 0, then by the division algorithmdegr(t) < degm(t). This means that there is a polynomial r(t) of degree less than that ofm(t) such that r(A) = 0, which is a contradiction, as by definition of minimal polynomial,m(t) is a polynomial of least degree such that m(A) = 0. Hence r(t) = 0 and so

f(t) = m(t)q(t), i.e.,m(t) divides f(t).

As a particular case, since A satisfies its own characteristic equation, by Cayley-Hailton’stheorem m(t) divides the characteristic polynomial.

Theorem 7.4.2 Let m(t) be the minimal polynomial of an n square matrix A. Then thecharacteristic polynomial of A divides (m(t))n.

Proof: Let the minimal polynomial of an n square matrix A be

m(t) = tr + c1tr−1 + · · ·+ cr−1t+ cr.

Define, the matrices Bj as follows

B0 = I so I = B0

B1 = A+ c1I so c1I = B1 −AB0

B2 = A2 + c1A+ c2I so c2I = B2 −AB1

......

Br−1 = Ar−1 + c1Ar−2 + · · ·+ cr−1I so cr−1I = Br−1 −ABr−2

Also, we have,−ABr−1 = crI − (Ar + c1A

r−1 + · · ·+ cr−1A+ crI)= crI −m(A) = crI.

Set, B(t) = tr−1B0 + tr−2B1 + · · ·+ tBr−2 +Br−1, we get,

(tI −A)B(t) = (trB0 + tr−1B1 + · · ·+ tBr−1)−(tr−1AB0 + tr−2AB1 + · · ·+ABr−1)

= trB0 + tr−1(B1 −AB0) + tr−2(B2 −AB1) + · · ·+ t(Br−1 −ABr−2)−ABr−1

= trI + c1tr−1I + c2t

r−2I + · · ·+ cr−1tI + crI = m(t)I|tI −A||B(t)| = |m(t)I| = (m(t))n; taking determinant.

Since |B(t)| is a polynomial, |tI − A| divides (m(t))n. Hence the characteristic polynomialdivides (m(t))n.

Theorem 7.4.3 The characteristic polynomial and the minimal polynomial of a matrix Ahave the same irreducible factors.

Proof: Let f(t) is an irreducible polynomial. If f(t) divides the minimal polynomialm(t), then as m(t) divides the characteristic polynomial, f(t) must divide the characteristicpolynomial. On the other hand, if f(t) divides the characteristic polynomial of A, by theabove theorem, f(t) must divide (m(t))n. But f(t) is irreducible, hence f(t) also dividesm(t). Thus m(t) and the characteristic polynomial have the same irreducible factors.


Result 7.4.1 This theorem does not say that m(t) =characteristic polynomial, only thatany irreducible factor of one must divide the other. In particular, since a linear factor isirreducible, m(t) and characteristic polynomial have the same linear factors, so that theyhave the same roots. Thus we conclude, a scalar λ is an eigen value of the matrix A if andonly if λ is a root of the minimal polynomial of A.

Ex 7.4.1 Find the characteristic and minimal polynomials of each of the following matrices

A =

3 1 −12 4 −2−1 −1 3

and B =

3 2 −13 8 −33 6 −1

.

Solution: The characteristic polynomial χA(λ) of A is

|A− λI| =

∣∣∣∣∣∣3− λ 1 −1

2 4− λ −2−1 −1 3− λ

∣∣∣∣∣∣= 24− 28λ+ 10λ2 − λ3 = (λ− 2)2(6− λ).

The characteristic polynomial χB(λ) of B is

|B − λI| =

∣∣∣∣∣∣3− λ 2 −1

3 8− λ −33 6 −1− λ

∣∣∣∣∣∣= 24− 28λ+ 10λ2 − λ3 = (λ− 2)2(6− λ).

Thus the characteristic polynomial of both matrices is same. Since the characteristic poly-nomial and the minimal polynomial have the same irreducible factors, it follows that both(t − 2) and (6 − t) must be factors of m(t). Also, m(t) must divide the characteristicpolynomial. Hence it follows that m(t) must be one of the following:

f(t) = (t− 2)(6− t) or g(t) = (t− 2)2(6− t).

(i) By Cayley-Hailton’s theorem, g(A) = 4(A) = 0, so we need only test f(t). Now,

(A− 2I)(6I −A) =

1 1 −12 2 −2−1 −1 1

3 −1 1−2 2 21 1 3

= 0.

Therefore, m(t) = (t− 2)(6− t) is the minimal polynomial of A.(ii) By Cayley-Hailton’s theorem, g(B) = 4(B) = 0, so we need only test f(t). Now,

(B − 2I)(6I −B) =

1 2 −13 6 −33 6 −3

3 −2 1−3 −2 3−3 −6 7

= 0.

Therefore, m(t) = (t− 2)(6− t) is the minimal polynomial of B.

Deduction 7.4.1 Consider the following two n square matrices as

J(λ, n) =

λ 1 0 · · · 0 00 λ 1 · · · 0 0...

...0 0 0 · · · λ 10 0 0 · · · 0 λ

and A =

λ a 0 · · · 0 00 λ a · · · 0 0...

...0 0 0 · · · λ a0 0 0 · · · 0 λ

Minimal Polynomial 423

where a 6= 0. The matrix J(λ, n), called Jordan Block has λ’s on the diagonal, 1’s onthe superdiagonal and 0’s elsewhere. The matrix A, which is the generalization of J(λ, n),has λ’s on the diagonal, a’s on the super diagonal and 0’s elsewhere. Now we see thatf(t) = (t− λ)n is the characteristic and minimal polynomial of both J(λ, n) and A.

Ex 7.4.2 Find the minimal polynomial of the matrix A =

2 1 0 00 2 0 00 0 2 00 0 0 5

.

Solution: The characteristic polynomial χ(t) of A is given by,

|A− tI| =

∣∣∣∣∣∣∣∣2− t 1 0 0

0 2− t 0 00 0 2− t 00 0 0 5− t

∣∣∣∣∣∣∣∣ = (t− 2)3(t− 5).

Since the characteristic polynomial and the minimal polynomial have the same irreduciblefactors, it follows that both t−2 and t−5 must have factors of m(t). Also, m(t) must dividethe characteristic polynomial. Hence it follows that m(t) must be one of the following threepolynomials:

(i) m(t) = (t− 2)(t− 5), (ii) m(t) = (t− 2)2(t− 5), (iii) m(t) = (t− 2)3(t− 5).

For the type (i), we have,

(A− 2I)(A− 5I) =

0 1 0 00 0 0 00 0 0 00 0 0 3

−3 1 0 00 −3 0 00 0 −3 00 0 0 0

6= 0.

For the type (ii), we have,

(A− 2I)(A− 2I)(A− 5I) =

0 0 0 00 0 0 00 0 0 00 0 0 9

−3 1 0 00 −3 0 00 0 −3 00 0 0 0

= 0.

For the type (iii), we have obviously, (A− 2I)3(A− 5I) = 0 follows from Cayley-Hailton’stheorem. Since m(t) is minimal polynomial, we have m(t) = (t− 2)2(t− 5).

Deduction 7.4.2 Let us consider an arbitrary monic polynomial

f(t) = tn + cn−1tn−1 + · · ·+ c1t+ c0.

Let us consider an nth order square matrix A with 1’s on the subdiagonal, last column[−c0,−c1, · · · ,−cn−1]T and 0’s elsewhere as follows

A =

0 0 · · · 0 −c01 0 · · · 0 −c1...

...0 0 · · · 1 −cn−1

,

then A is called the companion matrix of the polynomial f(t). Moreover, the character-istic and minimal polynomial of the comparison matrix A are both equal to the originalpolynomial f(t).

Ex 7.4.3 Find a matrix whose minimal polynomial is t3 − 5t2 + 6t+ 8.


Solution: Here the given monic polynomial is f(t) = t3 − 5t2 + 6t + 8. Let A be thecomparison matrix of the polynomial f(t), then by definition

A =

0 0 −81 0 −60 1 5

.

Also the characteristic and minimal polynomial of the comparison matrix A are both equalto the original given polynomial f(t).

Minimal polynomial of linear operator

The minimal polynomial of the operator T is defined independently of the theory of matrices,as the monic polynomial of the lowest degree with leading coefficient 1, which has T as zero.However, for any polynomial f(t),

f(T ) = 0 if and only if f(A) = 0,

where A is any matrix representation of T . Accordingly, T and A have the same minimalpolynomials.

(i) The minimal polynomial m(t) of a linear operator T divides every polynomial thathas T as a zero. In particular, the minimal polynomial m(t) divides the characteristicpolynomial of T .

(ii) The characteristic and minimal polynomials of a linear operator T have the sameirreducible factors.

(iii) A scalar λ is an eigen value of a linear operator if and only if λ is a root of the minimalpolynomial m(T ) of T .

Minimal polynomial of block diagonal matrices

Let A be a block diagonal matrix with diagonal blocks A1, A2, · · · , Ar. Then the minimalpolynomial of A is equal to the least common multiple of the minimal polynomials of thediagonal blocks Ai.

Ex 7.4.4 Find the characteristic and minimal polynomial of the block diagonal matrix

A =

2 5... 0 0

... 0

0 2... 0 0

... 0· · · · · · · · · · · · · · · · · · · · ·

0 0... 4 2

... 0

0 0... 3 5

... 0· · · · · · · · · · · · · · · · · · · · ·

0 0... 0 0

... 7

Solution: The given block diagonal matrix can be written in the form

A = diag(A1, A2, A3); where, A1 =[

2 50 2

], A2 =

[4 23 5

], A3 = [7].

The characteristic polynomials of A1, A2, A3 are

|A1 − λI| = (λ− 2)2; |A2 − λI| = (λ− 2)(λ− 7); |A3 − λI| = λ− 7.

Bilinear Forms 425

Thus the characteristic polynomial of A is

|A− λI| = |A1 − λI| |A2 − λI| |A3 − λI|= (λ− 2)2(λ− 2)(λ− 7)(λ− 7) = (λ− 2)2(λ− 7)2.

The minimal polynomials m1(t),m2(t),m3(t) of the diagonal blocks A1, A2, A3 respectively,are equal to the characteristic polynomials, i.e.,

m1(t) = (t− 2)2; m2(t) = (t− 2)(t− 7); m3(t) = t− 7.

But m(t), the minimal polynomial of A is equal to the least common multiple of m1(t),m2(t)and m3(t), i.e., m(t) = (t− 2)2(t− 7).

7.5 Bilinear Forms

Let V be an n dimensional Euclidean space, then B(α, β) : V × V → < is said to be abilinear form if it is linear, homogeneous, with respect to both the arguments α, β. If B isa symmetric Bilinear form then we can write

B(α, α) = 〈Bα,α〉 = αTBα (7.16)

where B = [bij ] real symmetric n × n matrix, known as the matrix of the quadratic form,B(α, α) is known as a quadratic form. For example, the expression

5x1y1 + 2x1y2 − 3x1y3 + 7x2y1 − 5x2y2 + 3x3y3

is a bilinear form in the variables x1, x2 and y1, y2, y3. If we change the base vector suchthat α = Pα′ then,

α′TPTBPα′ = B(Pα′, Pα′). (7.17)

The matrices of the two quadratic forms (7.16) and (7.17) connected by the transformation,i.e., if we change co-ordinates in a quadratic form, its matrix is change to a matrix whichis congruent to the matrix of the original quadratic form. In other words if we have twoquadratic form whose matrices are congruent to each other then they represent the samequadratic form only with respect to two coordinate system connected by a non singulartransformation.

7.5.1 Real Quadratic Forms

A homogeneous expression of the second degree in any number of variables of the form

Q(α, α) = αTBα =n∑

i=1

n∑j=1

aijxixj ; aij = aji

= (x1, x2, · · · , xn)

a11 a12 · · · a1n

a21 a22 · · · a2n

......

an1 an2 · · · ann

x1

x2

...xn

, (7.18)

where aij are constants belonging to a field of numbers and x1, x2, · · · , xn are variables,belonging to a field of numbers (not necessarily same) is defined as a quadratic form. If thevariables assumes real variables only, the form is said to be quadratic form in real variables.When the constants aij and the variables xi’s are all real, then the expression is said to bea real quadratic form with B as associated matrix. For example,


(i) x21 + 2x1x2 + 3x2

2 is real quadratic forms in 2 variables with associated matrix

B1 =(

1 11 3

).

(ii) x21 + 3x2

2 + 3x23 − 4x2x3 + 4x3x1 − 2x1x2 is real quadratic forms in 3 variables with

associated matrix

B2 =

1 −1 2−1 −3 −22 −2 3

..

Now, A real quadratic form Q(α, α) is said to be

(i) positive definite, if Q > 0, for all α 6= θ.

(ii) positive semi-definite, if Q ≥ 0, for all α and Q = 0 for some α 6= θ.

(iii) negative definite, if Q < 0, for all α 6= θ.

(iv) negative semi-definite, if Q ≤ 0, for all α and Q = 0 for some α 6= θ.

(v) indefinite, if Q ≥ 0, for some α 6= θ and Q ≤ 0 for some α 6= θ.

These five classes of quadratic forms are called value classes.

Ex 7.5.1 Find the quadratic form that corresponds to a symmetric matrix A =(

5 −3−3 8

).

Solution: The quadratic form Q(α) that corresponds to a symmetric matrix A is Q(α) =αTAα, where α = (x1, x2)T is the column vectors of unknowns. Thus,

Q(α) = (x1, x2)(

5 −3−3 8

)(x1

x2

)= 5x2

1 − 6x1x2 + 8x22.

Ex 7.5.2 Examine whether the quadratic form 5x2 + y2 + 5z2 + 4xy− 8xz− 4yz is positivedefinite or not.

Solution: The given quadratic form can be written as

Q(x, y, z) = 5x2 + y2 + 5z2 + 4xy − 8xz − 4yz= (2x+ y − 2z)2 + x2 + z2.

Since Q > 0, for all (x, y, z) and Q = 0, only when x = y = z = 0, i.e., α = θ. Hence Q ispositive definite. Alternatively, if Q(α, α) = αTBα, then the associated matrix B is given

by, B =

5 2 −42 1 −24 −2 5

. The principal minors of B are

5,∣∣∣∣ 1 −2−2 5

∣∣∣∣ = 9, |B| = 17

are all positive, hence the given quadratic form Q is positive definite.

Ex 7.5.3 Prove that the real quadratic Q = ax2 + bxy + cy2 is positive definite, if a > 0and b2 < 4ac(a, b, c 6= 0).

Canonical Form 427


Q(x, y, z) = ax2 + bxy + cy2

= a

[(x+

b

2ay)2 +

4ca− b2

4a2y2

].

Since the expression Q is positive definite, so, Q ≥ 0, i.e., if a > 0 and b2 < 4ac(a, b, c 6= 0).

Also, if Q(α, α) = αTBα, then the associated matrix B is given by, B =(a b

2b2 c

). The

principal minors of B are a,∣∣∣∣ a b

2b2 c

∣∣∣∣ = a2− b2

4 . For a, b, c 6= 0, and if a > 0 and b2 < 4ac, then,

all principal minors are positive, hence the given quadratic form Q is positive definite.

Ex 7.5.4 Examine whether the quadratic form x2 + 2y2 + 2z2− 2xy+ 2xz− 4yz is positivedefinite or not.


Q(x, y, z) = x2 + 2y2 + 2z2 − 2xy + 2xz − 4yz= (y − z − x)2 + (y − z)2.

Since Q ≥ 0, for all (x, y, z) and if we take x = 0, y = z = 1, then Q = 0. Hence Q ≥ 0, forall α 6= θ and so Q is positive semi-definite.

Ex 7.5.5 Examine whether the quadratic form x2 + y2 − 2z2 + 2xy − 2yz − 2xz is positivedefinite or not.


Q(x, y, z) = x2 + y2 − 2z2 + 2xy − 2yz − 2xz= (x+ y − z)2 − 3z2.

We see that, Q ≥ 0, for some α 6= θ and Q ≤ 0 for some α 6= θ. For example, if (x, y, z) =(1, 0, 0), then Q > 0, (x, y, z) = (0, 0, 1), then Q < 0, (x, y, z) = (1,−1, 0), then Q = 0, so,the given expression Q is indefinite.

7.6 Canonical Form

Let us consider the real quadratic form Q(α, α) = αTAα, where A is a real symmetric matrixof order n. From spectral theorem, we know that eigen vector of A forms an orthonormalbasis of V . Let P be the n square matrix whose columns are orthogonal eigen vector of Athen, |P | 6= 0 and PT = P−1 and hence the non-singular linear transformation α′ = Pα willtransform αTAα to

αTPTAPα = α′TP−1APα′ = α′TDα′ = Q′(α′, α′), (7.19)

where D = P−1AP = PTAP is the symmetric diagonal matrix whose element in thediagonal are the eigen values of the matrix A, i.e., D = diag(λ1, λ2, · · · , λn) = P−1AP.Now, Q′(α′, α′) is real quadratic form and it is called a linear transformation of Q. Thematrix D of Q′ is congruent to A and Q′ is said to be congruent to Q. When expressed interms of coordinates, the equation (7.19) has the form

Q′(α′, α′) = α′TDα′ = λ1x′21 + λ2x

′22 + · · ·+ λnx

′2n . (7.20)


Let λ1, λ2, · · · , λp be the positive eigen values of A, λp+1, λp+2, · · · , λr the negative eigenvalues of A and λr, λr+1, · · · , λn be the zero eigen values of A, where r is the rank of A. IfA be an n× n real symmetric matrix of rank r(≤ n), then ∃ a non singular matrix P suchthat PTAP , i.e., D becomes diagonal with the form Ip

−Ir−p

0

; 0 ≤ p ≤ r.

Thus, if p, r − p, n − r are defined to be the positive, the negative and the zero indices ofinertia, and it is expressed by writing

In(A) = (p, r − p, n− r). (7.21)

The quantity, p− (r− p) = s is defined as the signature. We can reduce the equation (7.20)to the further signature form applying the following transformation

x′i = 1√|αi|

x′′i ; i = 1, 2, · · · , rx′i = x′′i ; i = r + 1, · · · , n

(7.22)

The equation (7.22) transforms to (7.20) into the quadratic form

x′′21 + · · ·+ x′′2p − x′′2p+1 − · · · − x′′2r . (7.23)

We have reduce the quadratic form (7.20) to the quadratic form (7.23) which is the sum ofthe square terms with coefficients as +1 and −1 respectively. The quadratic form (7.20) iscalled the canonical or normal form of Q. The number of positive terms in the normal formis called index.

Deduction 7.6.1 Sylvesters law of inertia: Sylvestes law of inertia states that whena quadratic form is reduced to a normal form similar to (7.23), the rank and signature ofthe form remains invariant, i.e., In(A) is independent of the method of reducing (7.20) tothe canonical form (7.23).

Deduction 7.6.2 Classification of quadratic forms : A quadratic form Q(α, α) =αTAα is said to be a positive definite if Q(α, α) > 0;∀α 6= θ, negative definite if Q(α, α) <0;∀α 6= θ, positive semi definite if Q(α, α) ≥ 0;∀α, negative semi definite if Q(α, α) ≤ 0;∀αand is said to be indefinite if Q(α, α) can take positive value for some α 6= θ as well asnegative value for some other α 6= θ. Thus

(i) the quadratic form Q(α, α) = αTAα is positive definite if (r = n), all the eigenvalues are positive, i.e., In(A) = (n, 0, 0). In this case, the canonical form becomesx′′21 + x′′22 + · · ·+ x′′2n .

(ii) the quadratic form Q(α, α) = αTAα is negative definite if all the eigen values arenegative, i.e., In(A) = (0, n, 0). In this case, the canonical form becomes −x′′21 −x′′22 −· · · − x′′2n .

(iii) the quadratic form Q(α, α) = αTAα is positive semi definite if r < n, r − p = 0, i.e.,In(A) = (p, 0, n− p) and positive semi definite if In(A) = (0, r, n− r).

(iv) the quadratic form Q(α, α) = αTAα is indefinite if In(A) = (p, r − p, n − r), wherep > 0 and r − p > 0.

Canonical Form 429

Each quadratic form must be of one of these five types. Sylvester’s criterion states thatthe real symmetric matrix A is positive definite if and only if all its principal minors of Aare positive. This remains valid if we replace the word ‘positive’ everywhere by the word‘non-negative’.

Ex 7.6.1 Reduce the quadratic form 5x21 +x2

2 + 10x23− 4x2x3− 10x3x1 to the normal form.


Q(α, α′) = (x1 x2 x3)

5 0 −50 1 −2−5 −2 10

x1

x2

x3

,

where, the associated symmetric matrix is given by A =

5 0 −50 1 −2−5 −2 10

. Let us apply

congruence operations on A to reduce it to the normal form

A−−−−−→R3 +R2

5 0 −50 1 −20 −2 5

−−−−−→C3 + C2

5 0 00 1 −20 −2 5

−−−−−−→R3 + 2R2

5 0 00 1 −20 0 1

−−−−−−→C3 + 2C2

5 0 00 1 00 0 1

−−−−→1√

5R1

√5 0 00 1 00 0 1

−−−−→1√5C1

1 0 00 1 00 0 1

.

The rank of the quadratic form is r = 3 and the number of positive indices of inertia isp = 3, which is the index. Therefore, the signature of the quadratic form is 2p − r = 3.Here, n = r = p = 3, so, the quadratic form is positive definite.

Ex 7.6.2 Let Q = x2 + 6xy − 7y2. Find the orthogonal substitution that diagonalizes Q.

Solution: The symmetric matrix A that represents Q is A =(

1 33 −7

). The characteristic

polynomial of A is

|A− λI| = λ2 − (1− 7)λ+ (7− 6) = (λ+ 8)(λ− 2).

The eigenvalues of A are −8, 2. Thus using x1 and x2 as new variables, a diagonal form ofQ is

Q(x1, x2) = 2x21 − 8x2

2.The corresponding orthogonal substitution is obtained by finding an orthogonal set of eigenvectors of A. The eigen vector corresponding to λ1 = −8 and λ2 = 2 are X1 = (−1, 3)T

and X2 = (3, 1)T respectively. Since A is symmetric, the eigen vectors X1 and X2 areorthogonal. Now, we normalize X1 and X2 to obtain respectively the unit vectors

X1 =(

3√10,

1√10

)T

and X2 =(− 1√

10,

3√10

)T

.

Finally, let P be the matrix whose columns are the unit vectors X1 and X2 respectively,and then (x, y)T = P (x1, x2)T is the required orthogonal change of coordinates, i.e.,

P =

(3√10− 1√

101√10

3√10

)and x =

1√10

(3x1 − x2), y =1√10

(x1 + 3x2).


One can also express x1 and x2 interms of x and y by using P−1 = PT as

x1 =1√10

(3x+ y); x2 =1√10

(−x+ 3y).

Classification of conics

(i) The general equation of a quadratic conic in two variables x and y can be written in theform

ax2 + 2hxy + by2 + gx+ fy + c = 0

(x y)(a hh b

)(xy

)+ (g f)

(xy

)+ c = 0

or, αTAα+KTα+ c = 0, (7.24)

where A =(a hh b

)is a real symmetric matrix and hence it is orthogonally diagonalizable,

KT = (g f) and αT = (x, y). Let λ1, λ2 be the eigen values of the real symmetric matrix A,the corresponding eigen vectors be α1, α2 respectively, so that for P = [α1, α2], P−1 = PT

i.e., P is a orthogonal matrix and

AP = PD, where, D = diag(λ1, λ2) =(λ1 00 λ2

).

If we apply the rotation α = Pα′, then equation (7.24) reduces to,

α′TPTAPα′ +KTPα′ + c = 0or, λ1x

′2 + λ2y′2 + g′x′ + f ′y′ + c = 0 (7.25)

where the rotation α = Pα′ transform the principal axes into coordinate axes. Let us nowapply the translation x′ = x′′ + δ, y′ = y′′ + µ.

If λ1 6= 0, coefficient of x′′ may be made to be zero for a suitable choice of δ and if λ2 6= 0,coefficient of y′′ may be made to be zero for a suitable choice of µ. Therefore, the conic(7.25) can be transformed to one of the three general forms

1. Let In(A) = (2, 0, 0), then the standard form becomes

x2

a2+y2

b2= 1; ellipse

= 0; a single point.

2. Let In(A) = (1, 1, 0), i.e., rank of A = 1. In this case, one of λ1 and λ2 is zero, so, thestandard form becomes

x2

a2− y2

b2= 1; hyperbola

= 0; pair of intersecting straight lines


x2 − 4y = 0; parabola= 1; pair of straight lines= 0; a single straight line

Canonical Form 431

(ii) The general equation of a quadratic conic in the variables x, y, z can be written inthe form

ax2 + by2 + cz2 + 2hxy + 2gyz + 2fzx+ ux+ vy + wz + d = 0

(x y z)

a h gh b fg f c

xyz

+ (u v w)

xyz

+ d = 0

or, αTAα+KTα+ d = 0, (7.26)

where A =

a h gh b fg f c

is a real symmetric matrix and hence it is orthogonally diagonalizable,

KT = (u v w) and αT = (x, y, z). Rotating the coordinate axes to coincide with orthogonaleigen axes or principal axes and translating the origin suitably the quadratic form (7.26) canbe reduced to one of the following six general forms, assuming that λ1 > 0 and in the finalexpression constant on the right hand side, if any, is positive. Therefore, the conic (7.26)can be transformed to one of the six general forms

1. Let In(A) = (3, 0, 0), i.e., rank of A = 3. In this case, none of λ1, λ2, λ3 is zero. Thenthe standard form becomes

x2

a2+y2

b2+z2

c2= 1; ellipsoid

= 0; a single point.


x2

a2+y2

b2− z2

c2= 1; elliptic hyperboloid of one sheet

= 0; elliptic cone


x2

a2− y2

b2− z2

c2= 1; elliptic hyperboloid of two sheets

= 0; elliptic cone


x2

a2+y2

b2= z; elliptic paraboloid

= 1; elliptic cylinder= 0; a single point


x2

a2− y2

b2= z; hyperbolic paraboloid

= 1; hyperbolic cylinder= 0; pair of intersecting planes



x2

a2= δy + µz; parabolic cylinder

= 1; a pair of planes= 0; a simple plane

Ex 7.6.3 Reduce 2y2 − 2xy − 2yz + 2zx− x− 2y + 3z − 2 = 0 into canonical form.

Solution: The quadratic equation 2y2 − 2xy − 2yz + 2zx− x− 2y + 3z − 2 = 0 in x, y, z,can be written as

(x y z)

0 −1 1−1 2 −11 −1 0

xyz

+ (−1 − 2 3)

xyz

− 2 = 0,

XTAX +BX − 2 = 0.

The characteristic equation of A is

|A− λI| = 0 ⇒

∣∣∣∣∣∣−λ −1 1−1 2− λ −11 −1 −λ

∣∣∣∣∣∣ = 0

λ3 − 2λ2 − 3λ = 0 ⇒ λ = −1, 0, 3.

The eigen vectors corresponding to the eigen values 3,−1, 0 are k1(1,−2, 1), k2(1, 0,−1) andk3(1, 1, 1). The orthogonal eigen vectors are

( 1√6,− 2√

6, 1√

6), ( 1√

2, 0,− 1√

2) and ( 1√

3, 1√

3, 1√

3)

respectively. Let P = 1√6

1√

3√

2−2 0

√2

1 −√

3√

2

, then, PTAP =

3 0 00 −1 00 0 0

and

BP = (√

2 − 2√

2 0).By the orthogonal transformation, X = PX ′, where X ′T = (x′ y′ z′), the equation reducesto

3x′2 − y′2 +√

6x′ − 2√

2y′ − 2 = 0

or, 3(x′ +1√6)2 − (y′ +

√2)2 =

12.

Let us applying the transformationx′′ = x′ + 1√

6, y′′ = y′ +

√2, z′′ = z′,

the equation finally reduces to 3x′′2 − y′′2 = 12 , which is canonical form and it represent a

hyperbolic cylinder.

Ex 7.6.4 Reduce the equation 2x2 + 5y2 + 10z2 + 4xy + 6xz + 12yz into canonical form.

Solution: The given quadratic form in x, y, z, can be written in the form

Q(α, α′) = (x y z)

2 2 32 5 63 6 10

xyz

= XTAX,

Canonical Form 433

where, the associated symmetric matrix is A. Let us apply congruence operations on A toreduce it to the normal form

A−−−−−−−−−−−−−−→R2 −R1, R3 −

32R1

2 2 30 3 30 3 11

2

−−−−−−−−−−−−−−→C2 − C1, C3 −32C1

2 0 00 3 30 3 11

2

−−−−−→R3 −R2

2 0 00 3 30 0 5

2

−−−−−→C3 − C2

2 0 00 3 00 0 5

2

−−−−−−−−−−−−−−−−→1√2R1,

1√2R2,

√25R3

√

2 0 00√

3 0

0 0√

52

−−−−−−−−−−−−−−−→1√2C1,

1√2C2,

√25C3

1 0 00 1 00 0 1

.

The rank of the quadratic form is r = 3 and the number of positive indices of inertia isp = 3, which is the index. Therefore, the signature of the quadratic form is 2p − r = 3.Here, n = r = p = 3 so, the quadratic form is positive definite. The corresponding normalform is x2 + y2 + z2.

Ex 7.6.5 Obtain a non-singular transformation that will reduce the quadratic form x2 +2y2 + 3z2 − 2xy + 4yz to the normal form.

Solution: The given quadratic form in x, y, z, can be written as

Q(α, α′) = (x y z)

1 −1 0−1 2 20 2 3

xyz

= XTAX,


A−−−−−→R2 +R1

1 −1 00 1 20 2 3

−−−−−→C2 + C1

1 0 00 1 20 2 3

−−−−−−→R3 − 2R2

1 0 00 1 20 0 −1

−−−−−−→C3 − 2C2

1 0 00 1 00 0 −1

.

The rank of the quadratic form is r = 3 and the number of positive indices of inertia isp = 2, which is the index. Therefore, the signature of the quadratic form is 2p − r = 1.Here, n = r = 3 and p = 1 < r, so, the quadratic form is indefinite. The correspondingnormal form is x2 + y2 − z2. Let X = PX ′, where, X ′T = (x′ y′ z′) and P is non-singular,transforms the form into the normal form X ′TDX ′, then D(= PTAP ) is a diagonal matrix.By the property of elementary matrices, we get,

E32(−2)E21(1)AE21T E32(−2)T = D

⇒ PT = E32(−2)E21(1) =

1 0 00 1 00 −2 1

1 0 01 1 00 0 1

=

1 0 01 1 0−2 −2 1

.

Thus the transformation X = PX ′ becomes,

x = x′ + y′ − 2z′, y = y′ − 2z′, z = z′.


Ex 7.6.6 Show that the quadratic form x1x2 + x2x3 + x3x1 can be reduced to the canonicalform y2

1 − y22 − y2

3 , by means of the transformationx1 = y1 − y2 − y3, x2 = y1 + y2 − y3, x3 = y3.

Solution: The quadratic form x1x2 + x2x3 + x3x1 in x1, x2, x3, can be written as

Q(α, α′) = (x1 x2 x3)

0 12

12

12 0 1

212

12 0

x1

x2

x3

= XTAX,


A−−−−−→R1 +R2

12

12 1

12 0 1

212

12 0

−−−−−→C1 + C2

1 12 1

12 0 1

21 1

2 0

−−−−−−−−−−−−−−→R2 −

12R1, R3 −R1

1 12 1

0 − 14 0

0 0 −1

−−−−−−−−−−−−−−→C2 −12C1, C3 − C1

1 0 00 − 1

4 00 0 −1

−−→2R2

1 0 00 − 1

2 00 0 −1

−−→2C2

1 0 00 −1 00 0 −1

.

The rank of the quadratic form is r = 3 and the number of positive indices of inertia isp = 1, which is the index. The corresponding normal form is y2

1 − y22 − y2

3 . By the propertyof elementary matrices, we get,

E2(2)E31(−1)E12(−12)TE21(1)A[E21(1)]T [E12(−

12)]T [E31(−1)]T [E2(2)]T = D

⇒ PT = E2(2)E31(−1)E12(−12)TE21(1)

=

1 0 00 2 00 0 1

1 0 00 1 0−1 0 1

1 0 0− 1

2 1 00 0 1

1 1 00 2 00 0 1

=

1 1 0−1 1 0−1 −1 1

.

Let X = PY ′, where, Y T = (y1 y2 y3) and P is non-singular, transforms the quadraticform into the normal form = Y TDY, then D(= PTAP ) is a diagonal matrix. Thus thetransformation X = PY becomes,

x1 = y1 − y2 − y3, x2 = y1 + y2 − y3, x3 = y3.

Ex 7.6.7 Reduce the quadratic form 2x1x3 + x2x3 to diagonal form.

Solution: Since the diagonal terms are absent and the coefficient of x1x3 is non zero wemake the change of variables x1 = y1, x2 = y2 and x3 = y3 + y1. The quadratic form istransform to

2y21 + y1y2 + 2y1y3 + y2y3

= 2(y1 +14y2 +

12y3)2 −

18y22 −

12y23 +

12y2y3.

With z1 = y1 + 14y2 + 1

2y3, z2 = y2 and z3 = y3, the quadratic form becomes,

2z21 −

18z22 −

12z33 +

12z2z3 = 2z2

1 −18(z2 − 2z3)2.

Canonical Form 435

Finally, making the transformation z1 = u1, z2 − 2z3 = u2 and z3 = u3 the quadratic formbecomes 2u2

1 − 18u

22.

It can be checked that the transformation from x’s to u’s is x1 = u1 − 14u2 − u3, x2 =

u2 + 2u3 and x3 = u1 − 14u2. The u’s can be expressed in terms of x’s as u1 = 1

2x1 + 14x2 +

12x3, u2 = 2x1 + x2 − 2x3 and u3 = x3 − x1. Thus the given quadratic form can be writtenas

2(

12x1 +

14x2 +

12x3

)2

− 18

(2x1 + x2 − 2x3)2

or,(

1√2x1 +

12√

2x2 +

1√2x3

)2

−(

1√2x1 +

12√

2x2 −

1√2x3

)2

,

with coefficients 1,−1 and 0.

7.6.1 Jordan Canonical Form

We have shown that every complex matrix is similar to an upper triangular matrix. Alsoit is similar to a diagonal matrix if and only if its minimal polynomial has distinct roots.When the minimal polynomial has repeated roots then the Jordan canonical form theoremimplies that it is similar to D + N , where D is a diagonal matrix with eigen values as thediagonal elements of D and N is a nilpotant matrix with a suitable simplified form namelythe first super diagonal elements of N are either 1 or 0, with at least 1. In other words thematrix is similar to a block diagonal matrix, where each diagonal block is of the form λ′

λ′

λ′

whose diagonal elements are all equal to an eigen value λ and first super diagonal elementare all 1. This block diagonal form is known as Jordan canonical form.

Result 7.6.1 Let T : V → V be a linear operator, whose characteristic and minimalpolynomial are

P (λ) = (λ− λ1)n1 · · · (λ− λr)nr ;m(λ) = (λ− λ1)m1 · · · (λ− λr)mr

where λi are the distinct eigen values with algebraic multiplicity ni,mi respectively withmi ≤ ni, then T has a block diagonal matrix representation J whose diagonal entries are ofthe form Jij , where

Jij =

λ 1 0 · · · 0 00 λ 1 · · · 0 0...

...0 0 0 · · · λ 10 0 0 · · · 0 λ

.

For each λi the corresponding blocks Jij has the following properties

(i) There is at least one Jij of order mi and all other Jij with λi as diagonal element areall of order ≤ mi.

(ii) The sum of orders of Jij in ni.

(iii) The number of Jij having diagonal element λi = the geometric multiplicity of λi.


(iv) The Jij of each possible order is uniquely determined.

A kth order Jordan submatrix referring to the number λ0 is a matrix of order k, 1 ≤ k ≤ n,of the form

λ0 1 0 · · · 0 00 λ0 1 · · · 0 0...

...0 0 0 · · · λ0 10 0 0 · · · 0 λ0

.

In other words, one and same number λ0 form the field F occupies the principal diagonal,with unity along the diagonal immediately above and zero elsewhere. Thus

[λ0],[λ0 10 λ0

],

λ0 1 00 λ0 10 0 λ0

are respectively Jordan submatrices of first, second and third order. A Jordan matrix oforder n is a matrix of order n having the form

J =

J1 0 · · · 00 J2 · · · 0...

...0 0 · · · Jn

.

The elements along the principal diagonal are Jordan submatrices or Jordan blocks of certainorders, not necessarily distinct, referring to certain numbers ( not necessarily distinct either)lying in the field F . Thus, a matrix is a Jordan matrix if and only if it has form

λ1 ε1 0 · · · 0 00 λ2 ε2 · · · 0 0...

...0 0 0 · · · λn−1 εn−1

0 0 0 · · · 0 λn

where λi; i = 1, 2, · · · , n are arbitrary numbers in F and every εj ; j = 1, 2, · · · , n− 1 is equalto unity or zero. Note that if εj = 1, then λj = λj+1. Diagonal matrices are a special caseof Jordan matrices. These are Jordan matrices whose submatrices are of order 1.

Theorem 7.6.1 Let J be a Jordan block of order k. Then J has exactly one eigenvalue,which is equal to the scalar on the main diagonal. The corresponding eigenvectors are thenon zero scalar multiples of the k dimensional unit coordinate vector [1, 0, · · · , 0].

Proof: Suppose that the diagonal entries of J are equal to λ. A column vector X =[x1, x2, · · · , xk]T satisfies the equation JX = λX if and only if its components satisfy thefollowing k scalar equations:

λx1 + x2 = λx1

λx2 + x3 = λx2

...λxk−1 + xk = λxk−1

λxk = λxk.

Canonical Form 437

From the first (k− 1) equations, we obtain x2 = x3 = · · · = xk = 0, so λ is an eigenvalue forJ and all eigenvectors have the same form x1[1, 0, · · · , 0] with x1 6= 0. To show that λ is theonly eigenvalue for J , assume that JX = µX for some scalar µ 6= λ. Then the componentssatisfy the following k scalar equations

λx1 + x2 = µx1

λx2 + x3 = µx2

...λxk−1 + xk = µxk−1

λxk = µxk.

Because λ 6= µ, the last relation gives xk = 0 and from the other equations we get xk−1 =xk−2 = · · · = x2 = x1 = 0. Hence only zero vector satisfies JX = µX, so no scalar differentfrom λ can be an eigen value for J .This theorem describes all the eigenvalues and eigenvectors of a Jordan block.

Ex 7.6.8 Find all possible Jordan canonical forms for those matrices whose characteristicpolynomial 4(t) and the minimal polynomial m(t) are as follows

1. 4(t) = (t− 2)5; m(t) = (t− 2)2.

2. 4(t) = (t− 7)5; m(t) = (t− 7)2.

3. 4(t) = (t− 2)7; m(t) = (t− 2)3.

4. 4(t) = (t− 2)4(t− 5)3; m(t) = (t− 2)2(t− 5)3.

5. 4(t) = (t− 2)4(t− 3)2; m(t) = (t− 2)2(t− 3)2.

Solution: (1) Since 4(t) has degree 5, J must be a 5× 5 matrix, and all diagonal elementsmust be 2, since 2 is the only eigenvalue. Moreover, since the exponent of t − 2 in m(t) is2, J must have one Jordan block of order 2, and the other must be of order 2 or 1. Thusthere are only two possibilities

(i) J = diag

([2 10 2

],

[2 10 2

], [2]).

(ii) J = diag

([2 10 2

], [2], [2], [2]

).

(2) In the similar ways, there are only two possibilities

(i) J = diag

([7 10 7

],

[7 10 7

], [7]).

(ii) J = diag

([7 10 7

], [7], [7], [7]

).

(3) Let Mk denote a Jordan block with t = 2 of order k. Then, in the similar ways, thereare only four possibilities

(i) diag(M3,M3,M1).

(ii) diag(M3,M2,M2).


(iii) diag(M3,M2,M1,M1).

(iv) diag(M3,M1,M1,M1,M1).

(4) The Jordan canonical form is one of the following block diagonal matrices:

(i) J = diag

[2 10 2

],

[2 10 2

],

5 1 00 5 10 0 5

.

(ii) J = diag

[2 10 2

], [2], [2],

5 1 00 5 10 0 5

.

The first matrix occurs if the T has two independent eigenvectors belonging to the eigenvalue2 and the second matrix occurs if the linear operator T has three independent eigenvectorsbelonging to 2.(5) The Jordan canonical form is one of the following block diagonal matrices:

(i) J = diag

([2 10 2

],

[2 10 2

],

[3 10 3

]).

(ii) J = diag

([2 10 2

], [2], [2],

[3 10 3

]).

Result 7.6.2 Let V be an n dimensional linear space with complex scalars and let T : V →V be a linear transformation of V into itself. Then there is a basis for V relative to whichT has a block diagonal matrix representation diag(J1, J2, · · · , Jm), with each Jk being aJordan block.

Ex 7.6.9 Find all possible Jordan canonical forms for a linear operator T : V → V whosecharacteristic polynomial 4(t) = (t− 2)3(t− 5)2. In each case, find the minimal polynomialm(t).

Solution: Since t− 2 has exponent 3 in 4(t), 2 must appear three times on the diagonal.Similarly, 5 must appear twice. Thus there are six possibilities:

(i)diag

2 1 00 2 10 0 2

, [5 10 5

] , (ii)diag

2 1 00 2 10 0 2

, [5], [5]

(iii)diag

([2 10 2

], [2],

[5 10 5

]), (iv)diag

([2 10 2

], [2], [5], [5]

)(v)diag

([2], [2], [2]

[5 10 5

]), (vi)diag ([2], [2], [2], [5], [5]) .

The exponent in the minimal polynomial m(t) is equal to the size of the largest block. Thus

(i)m(t) = (t− 2)3(t− 5)2, (ii)m(t) = (t− 2)3(t− 5), (iii)m(t) = (t− 2)2(t− 5)2

(iv)m(t) = (t− 2)2(t− 5), (v)m(t) = (t− 2)(t− 5)2, (vi)m(t) = (t− 2)(t− 5).

Ex 7.6.10 Verify that the matrix A =

−1 3 00 2 02 1 −1

has eigen values 2,−1,−1. Find a non

singular matrix C with initial entry C11 = 1 that transforms A to the Jordan canonical form

C−1AC =

2 0 00 −1 10 0 −1

.

Functions of Matrix 439

Solution: The characteristic equation of the matrix A is |A− λI| = 0, i.e.,∣∣∣∣∣∣−1− λ 3 0

0 2− λ 02 1 −1− λ

∣∣∣∣∣∣ = 0

⇒ (2− λ)(1 + λ)2 = 0 ⇒ λ = −1,−1, 2.

The corresponding eigenvector corresponding to λ = 2 is obtained by solving equations

−3x+ 3y = 0; 2x+ y − 3z = 0 ⇒ x = y = z.

The eigenvector corresponding to λ = 2 is k(1, 1, 1), where k is nonzero constant. Thecorresponding eigenvector corresponding to λ = −1 is obtained by solving equations

3y = 0; 2x+ y = 0 ⇒ x = y = 0.

The eigenvector corresponding to λ = −1 is (0, 0, a), where a is arbitrary nonzero number.We construct the matrix C whose first two columns are the eigenvectors corresponding toλ = 2 and λ = −1. Since C11 = 1, we must have k = 1. The third column is chosen in such

a way that AC = CB, where B is the Jordan canonical form, say C =

1 0 b1 0 c1 a d

. Therefore,

−1 3 00 2 02 1 −1

1 0 b1 0 c1 a d

=

1 0 b1 0 c1 a d

2 0 00 −1 10 0 −1

⇒ −b+ 3c = −b, 2c = −c, 2b+ c− d = a− d⇒ c = 0, a = 2b.

Hence C =

1 0 b1 0 01 2b d

, where b 6= 0 and d is arbitrary.

7.7 Functions of Matrix

As we define and study various functions of a variable in algebra, it is possible to defineand evaluate functions of a matrix. We shall study the following functions of a matrix inthis chapter: integral powers(positive and negative), fractional powers(roots), exponential,logarithmatic, trigonometric and hyperbolic functions.

There are two methods by which a function of a matrix can be evaluated. The first isa rather straightforward method based on the diagonalization of a matrix and is thereforeapplicable to diagonizable matrices only. The second method is based on the existence of aminimal polynomial and can be used to evaluate functions of any matrix.

Functions of diagonizable matrix

Let A be a diagonizable matrix and let P be a diagonalizing matrix for A, so that

P−1AP = Λ, A = PΛP−1, (7.27)

where Λ is a diagonal matrix containing eigenvalues of A. Now, if f is any function of amatrix, then we have f(A) = Pf(Λ)P−1. (7.28)

Thus, if we can define a function of a diagonal matrix, we can define and evaluate thefunction of any diagonalizable matrix. The discussion of this chapter evidently applies tosquare matrices only.


7.7.1 Powers of a Matrix

We have, in fact, had many occasions so far in this book of using the powers of a matrix.Thus, we define the square of a matrix by A2 = AA, The cube by A3 = AAA, etc. Ingeneral, if k is a positive integer, we define the kth power of A as a matrix obtained bymultiplying A with itself k times, that is,

Ak = AAA . . . A (k times). (7.29)

If A is nonsingular, we have defined its inverse A−1 as a matrix whose product with A givesthe unit matrix. The negative powers of A are then similarly defined. If m is a negativeinteger, let k = −m, so that

Am =(A−1

)k= A−1A−1A−1 . . . A−1 (k times). (7.30)

Finally, in analogy with the functions of a variable, we define

A0 = I (7.31)

Although all the integral powers of A have thus been defined in a straightforward manner,the actual evaluation may be tedious for large values of k. The calculation is considerablysimplified by using the diagonalizability of A. For, taking the kth power of A and using thesecond of equations (7.27), we have

Ak =(PΛP−1

) (PΛP−1

). . .(PΛP−1

)(k times)

= PΛkP−1. (7.32)

Similarly, if m = −k is a negative integer and A is nonsingular, then

Am = PΛmP−1 = P(Λ−1

)kP−1. (7.33)

Ex 7.7.1 Find Ak, where k is any integer, positive or negative, and

A =

[43

√2

3√2

353

](7.34)

Solution: The eigenvalues and the eigenvectors of A are found to be

(i)1, √

2 − 1; (ii)2, 1√

2.

We therefore have,P =

[√2 1

−1√

2

], P−1AP =

[1 00 2

]≡ Λ.

The matrix A is seen to be nonsingular. For any integral k, therefore, we have

Ak = PΛkP−1 =[√

2 1−1

√2

] [1 00 2k

][ √2

3−13

13

√2

3

]

=13

[2k + 2

(2k − 1

)√2(

2k − 1)√

2 2k+1 + 1

].

Note that, in particular, A0 = I. Also,

A50 =13

[250 + 2

(250 − 1

)√2(

250 − 1)√

2 251 + 1

]A−10 =

13

[2−10 + 2

(2−10 − 1

)√2(

2−10 − 1)√

2 2−9 + 1

].

Functions of Matrix 441

7.7.2 Roots of a Matrix

In elementary algebra, we say that y is a k-th root of x if yk = x. Similarly, we shall saythat a matrix B is a k-th root of a matrix A if Bk = A. The object is to find all the matricesB which satisfy this relation for a given matrix A.

To begin with, consider a diagonal matrix Λ whose elements are given by (Λ)ij = diδij . Itis evident that Λk is again a diagonal matrix whose diagonal elements are dk

i , i.e.,(Λk)ij

=dk

i δij . Now let p = 1k and consider a diagonal matrix D whose elements are given by

(D)ij = dpi δij . Clearly, the k-th power od D will equal Λ, that is Dk = Λ. Then consider

the matrix B = PDP−1 = PΛpP−1. Taking the k-th power of B, we find

Bk =(PΛpP−1

) (PΛpP−1

)· · ·(PΛpP−1

)(k times) = PΛP−1 = A. (7.35)

Thus, B = PΛpP−1 is a k-th root of A. The same result holds good for any fractionalpower. Thus, if q is any fraction, we have

Aq = PΛqP−1 (7.36)

Ex 7.7.2 Find A37 where A is the matrix of equation (7.34).

Solution: In this case, the diagonal matrix A is given by

P =[√

2 1−1

√2

], P−1AP =

[1 00 2

]≡ Λ.

Hence we have, Λ37 =

[1 00 2

37

]. This gives

A37 = PΛ

37P−1 =

13

237 + 2

(2

37 − 1

)√2(

237 − 1

)√2 2

107 + 1

(7.37)

It should be realized that the kth root of a number is not unique. In fact, there are exactlykth roots of any number except zero. Similarly, the kth root of a matrix will not be unique.If a matrix A has m nonzero eigenvalues, there will be km matrices whose kth power equalsA.

Ex 7.7.3 Find all the square roots of the matrix

A =[

32

12

12

32

](7.38)

Solution: The eigenvalues and the eigenvectors of the given matrix are found to be

(i) 2, 1 1; (ii) 1, 1 − 1.

we therefore have,P =

[1 11 −1

], P−1 =

12

[1 11 −1

], Λ =

[2 00 1

].

The square roots of Λ are Λ12 =

[±√

2 00 ±1

]. We can choose any of the four matrices Λ

12 to

obtain A12 . Using the relationA

12 = PΛ

12P−1, we find that A has four square roots given by

±B and ±C, where

B =

(√

2+1)2

(√

2−1)2

(√

2−1)2

(√

2+1)2

; C =

(√

2−1)2

(√

2+1)2

(√

2+1)2

(√

2−1)2

.


Evaluation of functions using Cayley-Hamilton theorem

The method discussed so far for evaluating the various functions of a matrix is based onthe principle that if A = PΛP−1, where Λ is diagonal, then f(A) = Pf(Λ)P−1. It has,however, the major drawback that it is applicable to diagonalizable matrices only. There isan alternative method for evaluating the functions of a matrix which is based on the use ofthe Cayley-Hamilton theorem and which is applicable to any matrix.

We know that any polynomial of whatever degree in a matrix is equal to a polynomialof degree ≤ m − 1, where m is the degree of the minimal polynomial. The result, in fact,holds good not only for polynomials but also for any arbitrary function of a matrix providedthe function is sufficiently differentiable. Thus if the degree of the minimal polynomial of amatrix A is m, any function f(A) can be expressed as a linear combination of the m linearlyindependent matrices I,A,A2, . . . , Am−1, i.e.,

f(A) = r(A), (7.39)

where r(A) = αm−1Am−1 + αm−2A

m−2 + · · ·+ α1A+ α0I (7.40)

The scalars αi are determined as follows. If λi is a k-fold degenerate eigenvalue of A, thealgebraic functions f(λ) and r(λ) satisfy the k equations

f(λi) = r(λi),df(λi)dλ

=dr(λi)dλ

,

d2f(λi)dλ2

=d2r(λi)dλ2

,

......

...dk−1f(λi)dλk−1

=dk−1r(λi)dλk−1

. (7.41)

Here the notation d1f(λi)dλ1 denotes the l-th derivative of f(λ) evaluated at λ = λi. We shall

now use this method to solve some of the problems discussed earlier in this chapter and alsoapply it to nondiagonalizable matrices.

Ex 7.7.4 Find Ap, where p is any number and A is the matrix of equation (7.34).

Solution: The matrix A is of order 2 and has distinct eigenvalues λ1 = 1, λ2 = 2.The degree of the minimal polynomial is therefore m = 2. We have f(A) = Ap, so thatf(λ) = λp. Let r(A) = α1A+ α0I, so that r(λ) = α1λ+ α0. Since both the eigenvalues arenon degenerate, we have the two conditions

f (λ1) = r (λ1) , f (λ2) = r (λ2) , (7.42)

which give 1 = α1+α0, 2p = 2α1+α0. The solution is found to be α1 = 2p−1, α0 = 2−2p.Putting these back in r(A), we have,

f(A) = r(A) = α1A+ α0I.

or, Ap = (2p − 1)[

43

23

23

53

]+ (2− 2p)

[1 00 1

]=

13

[2p + 2 (2p − 1)

√2

(2p − 1)√

2 2p+1 + 1

].

If p is non integral, this will not be the only matrix equal to Ap.

Series 443

Ex 7.7.5 Find all the square roots of the matrix of equation(7.38).

Solution: The matrix A is of order 2 with distinct λ1 = 2, λ2 = 1; hence m = 2. We havef(A) = A

12 , f(λ) = λ

12 ; let r(A) = α1A + α0I, so that r(λ) = α1λ + α0. The coefficients

α1 and α0 satisfy the conditions

f(λ1) = r(λ1), f(λ2) = r(λ2),

or, ±√

2 + 2α1 + α0, ±1 = α1 + α0,

where all the four sign combinations are valid. These give the four sets of solutions

α1 = ±(√

2− 1), α0 = ±

(2−

√2)

;

α1 = ±(√

2 + 1), α0 = ∓

(2 +

√2),

where we take either the upper signs or the lower signs in each pair. Using these in theequation A

12 = α1A + α0I, we get the four square roots ±B and ±C, where B and C are

given by

B =

(√

2+1)2

(√

2−1)2

(√

2−1)2

(√

2+1)2

; C =

(√

2−1)2

(√

2+1)2

(√

2+1)2

(√

2−1)2

.7.8 Series

A series such as S =∞∑

k=0

akAk (7.43)

in a matrix A, where ak are scalar coefficients, is said to converge if and only if every elementof the right hand side converges. In that case, the series of equation (7.43) is equal to thematrix f(A), of the same order as A, whose elements are given by

[f(A)]ij =∞∑

k=0

ak

(Ak)ij. (7.44)

We shall state without proof the result that a series f(A) in a matrix A is convergent if andonly if the corresponding algebraic series f(λ) is convergent for every eigenvalue λi of A.Thus, if

f(λ) =∞∑

k=0

akλk (7.45)

exists for |λ| < R, thenf(A) =

∞∑k=0

akAk (7.46)

exists if and only if every eigenvalue λi of A satisfies |λi| < R. R is called the radius ofconvergence of the series.

7.8.1 Exponential of a Matrix

The exponential series is defined by

eλ =∞∑

k=0

λk

k!, (7.47)


and is convergent for every finite value of λ. Similarly, we shall define the exponential of amatrix A by

eA =∞∑

k=0

Ak

k!, (7.48)

which will be a matrix of the same order as A and will exist for every finite square matrixA because all of its elements (and hence eigenvalues) are finite.

To begin with, let us obtain the exponential of a diagonal matrix. Let Λ be a diagonalmatrix with elements (Λ)ij = λiδij . Then

eΛ =∞∑

k=0

Λk

k!, (7.49)

The ij-element of exp(Λ) will be given by

[exp(Λ)]ij =∞∑

k=0

(Λk)ij

k!=

∞∑k=0

λki δijk!

= e(λi)δij . (7.50)

It is therefore evident that if

Λ =

λ1

λ2 0...

0 λn

(7.51)

then

exp(Λ) =

exp (λ1)

exp (λ2) 0...

0 exp (λn)

(7.52)

Now consider the series

exp(A) = I +A+A2

2!+A3

3!+ · · ·+ +

Ak

k!+ · · · (7.53)

Let P be a matrix which brings A to the diagonal form Λ. Multiplying equation (7.53) fromthe left by P−1 and from the right by P and remembering that P−1AkP = Λk, we have

P−1 (exp(A))P = I + Λ +Λ2

2!+

Λ3

3!+ · · ·+ +

Λk

k!+ · · · = exp(Λ). (7.54)

It follows immediately that

exp(A) = P (exp(Λ))P−1. (7.55)

Having defined that exponential function, it is possible to define that matrix exponent ofany number. In elementary algebra, we have ax = exp(x ln a), where ln denotes thenatural logarithm. Similarly, we define

aA = exp(A ln a) = Pexp(Λ ln a)P−1, (7.56)

Series 445

where

exp(A ln a) ≡ aΛ =

aλ1

aλ2 0...

0 aλn

(7.57)

Ex 7.8.1 Find eA and 4A if A is the matrix of equation (7.38).

Solution: The matrices P and Λ are given by

P =[

1 11 −1

], P−1 =

12

[1 11 −1

], Λ =

[2 00 1

].

Thus, we have,eΛ =

[e2 00 e

], 4Λ =

[42 00 4

]=[

16 00 4

]. (7.58)

Therefore,eA = PeΛP−1 =

12

[e2 + e e2 − ee2 − e e2 + e

],

and 4Λ = P4ΛP−1 =[

10 66 10

].

7.8.2 Logarithm of a Matrix

We say that x is the natural logarithm of y if ex = y, and write it as x = ln y. Similarly,given a matrix A, we shall say that a matrix B is the natural logarithm of A if eB = A.Therefore, by definition,

B = ln A⇔ exp(B) = A. (7.59)

We shall first find the logarithm of a diagonal matrix. If Λ ≡ [λiδij ] is a diagonal matrix ofthe same order given by

D ≡ ln Λ =

ln Λ1

ln Λ2 0...

0 ln Λn

(7.60)

To prove this, consider exp(D). Remembering that exp(ln x) = x and using Eq. (7.52), itis easy to see that

exp(D) = exp(ln Λ) = Λ (7.61)

Therefore, by definition, it follows that D is the natural logarithm of Λ.Let A be a diagonalizable matrix of Eqs. (7.27). We have

eB = I +B +B2

2!+ · · ·+ Bk

k!+ · · ·

= I + PDP−1 +

(PDP−1

)22!

+ · · ·+(PDP−1

)kk!

+ · · ·

= P

[I +D +

D2

2!+ · · ·+ Dk

k!+ · · ·

]P−1

= PeDP−1 = PΛP−1 = A. (7.62)


Ex 7.8.2 Find the logarithm of the matrix

A =

39 −50 −2015 −16 −1030 −50 −11

. (7.63)

Solution: For the given matrix A, the matrices P, Λ and P−1 are found to be

P =

3 1 21 1 12 −1 2

, Λ =

9 0 00 9 00 0 −6

, P−1 =13

3 −4 −10 2 −1−3 5 2

.we have,

ln A =

ln 9 0 00 ln 9 00 0 ln 6 + iπ

.Therefore, ln A is given by

ln A = P (ln Λ)P−1 =13

9a− 6b 10(b− a) 4(b− a)3(a− b) 5b− 2a 2(b− a)6(a− b) 10(b− a) 4b− a

.where, a = ln 9, b = ln(−6) = ln 6 + iπ.

7.9 Hyperbolic and Trigonometric Functions

The exponential function also leads to the hyperbolic and the trigonometric functions of amatrix. Thus, for any square matrix A, we define the hyperbolic functions by

sinhA =12(eA − e−A

)=

∞∑k=0

A2k+1

(2k + 1)!, (7.64)

coshA =12(eA + e−A

)=

∞∑k=0

A2k

(2k)!, (7.65)

which exist for any matrix A with finite elements. Similarly, the trigonometric functions aredefined by

sinA =12i(eiA − e−iA

)=

∞∑k=0

(−1)kA2k+1

(2k + 1)!, (7.66)

cosA =12(eiA + e−iA

)=

∞∑k=0

(−1)kA2k

(2k)!, (7.67)

which also exists for any matrix A with finite elements.

Ex 7.9.1 Find sin πA and cos πA, where

A =

− 472 53 30

−12 532 15

2 − 72 −2

. (7.68)

Hyperbolic and Trigonometric Functions 447

Solution: The matrices P,Λ and P−1 associated with the given matrix A are found to be

P =

5 8 10 2 14 3 −1

, Λ =

12 0 00 1 00 0 − 1

2

, P−1 =

5 −11 −6−4 9 58 −17 −10

.We therefore have,

sinπΛ =∞∑

k=0

(−1)k(πΛ)2k+1

(2k + 1)!=

sin π2 0 0

0 sinπ 00 0 − sin π

2

=

1 0 00 0 00 0 −1

.Therefore,

sinπA = P sin(πΛ)P−1 =

17 −38 −20−8 17 1028 −61 −34

.Similarly,

cosπΛ =

0 0 00 −1 00 0 0

.so that

cosπA = P cos(πΛ)P−1 =

32 −72 −408 −18 −1012 −27 −15

.

Exercise 7


1. The characteristic equation of the matrix[

1 2−1 3

]is

(a) λ2 − 2λ+ 2 = 0 (b) λ3 − 4λ+ 6 = 0(c) λ2 − 4λ+ 4 = 0 (d) λ2 − 4λ+ 6 = 0

2. If A =[

1 −12 2

]then the value of the matrix expression A2 − 3A+ 3I is

(a) 0 (b) 2I (c) −I (d) −A

3. The eigenvalues of the matrix

1 2 4 20 3 1 50 0 0 10 0 0 2

are

(a) 1,3,1,2 (b) 1,3,0,2 (c) 1,2,4,2 (d) 1,0,0,0

4. The sum of the eigenvalues of A =

1 2 30 2 30 0 2

is [WBUT 2007]

(a) 5 (b) 2 (c) 1 (d) 6

5. The sum of the eigenvalues of the matrix

1 2 34 5 62 1 1

is

(a) 7 (b) 6 (c) 4 (d) 5


6. The product of the eigenvalues of the matrix

1 2 50 3 00 0 −4

is

(a) 0 (b) −12 (c) 12 (d) 10

7. If λ is an eigenvalue of the matrix A then one eigenvalue of the matrix 5A is

(a) 2λ (b) λ (c) λ−1 (d) 5λ

8. If λ is an eigenvalue of A then one eigenvalue of A3 is

(a) λ (b) λ2 (c) λ3 (d) λk

9. Let A =[

1 −10 2

]. The eigenvalues of A5 are

(a) 1, 2 (b) 1,−1 (c) 1, 32 (d) 1, 10

10. If 2 is an eigenvalue of a non-singular matrix A then 12 is an eigenvalue of

(a) A2 (b) A3 (c) 2A (d) A−1

11. If a matrix A satisfy the relation A2 = A then the eigenvalues of A are

(a) 0, 1 (b) 0,−1 (c) −1, 1 (d) 1, 2

12. The eigenvalues of the matrix

1 2 32 4 53 5 6

are all

(a) zero (b) real (c) imaginary (d) real/ imaginary.

13. The eigenvectors of the matrix[

1 11 3

]are

(a) (−1, 1)T (b) (1, 3)T (c) (−2, 2)T , (1,−1)T (d) (−1, 1)T , (1, 3)T

14. If 1, 5 are the eigenvalues of the matrix A and if P diagonalising it then P−1AP is

(a)[

1 50 0

](b)

[1 00 5

](c)[

1 05 0

](d)

[2 00 10

]15. If 2 is an eigenvalue of the matrix A of order 2× 2 then the rank of the matrix A− 2I

is


16. The algebraic multiplicity of the eigenvalue 1 of the matrix A =

1 2 00 2 00 0 1

is

(a) 0 (b) 1 (c) 2 (d) 3

17. If A is real orthogonal matrix then the magnitude of the eigenvalues is

(a) 1 (b) 2 (c) 3 (d) 1/2

18. The characteristic roots of a real skew symmetric matrix are(a) all reals (b) all zeros (c) all imaginary (d) either all zeros or purelyimaginary.


19. The characteristic roots of the matrix A =

8 −6 2−6 7 −42 −4 3

are

(a) 0,3,7 (b) 0,5,15 (c) 0,3,15 (d) 1,3,7

20. If the characteristic values of a square matrix of third order are 4,2,3, then the valueof its determinant is(a) 6 (b) 9 (c) 24 (d) 54

21. If λ is a non-zero characteristic root of a non-singular matrix A, then a characteristicroot of A−1 is(a) |A|λ (b) |A|

λ (c) 1λ (d) 1

|A|λ

22. The characteristic equation of the matrix A =

1 0 20 2 12 0 3

are

(a) λ3 − 6λ2 + 5λ− 3 = 0 (b) λ3 + 6λ2 − 7λ− 2 = 0 (c) −λ3 + 6λ2 + 7λ+ 3 =0 (d) −λ3 + 6λ2 − 7λ− 2 = 0

23. For the matrix A =(

0 11 0

), A−1 is equal to

(a) I (b) A (c) 2A (d) 12A

24. Suppose the matrix A =

40 −29 −11−18 30 −1226 24 −50

has a certain complex number λ 6= 0 as

an eigen value. Which of the following numbers must also be an eigen value of A?NET(Dec)11

(a) λ+ 20 (b) λ− 20 (c) 20− λ (d) −20− λ.

25. Let A be a 3× 3 matrix with trace(A) = 3 and det(A) = 2. If 1 is an eigen value of A,then the eigen values of A2 − 2I are(a) 1, 2(i− 1), −2(i+ 1), (b) -1, 2(i− 1), 2(i+ 1), (c) 1, 2(i+ 1), −2(i+ 1),

(d) -1, 2(i− 1), −2(i+ 1).

26. Let A,B are n× n positive definite matrices and I be the n× nidentity matrix, thenwhich of the following are positive definite? [NET(June)11](a) A+B (b) ABA (c) A2 + I (d) AB

27. Let N be a 3 × 3 non zero matrix with the property N3 = 0. Which of the followingis/are true? NET(June)11(a) N is not similar to a diagonal matrix (b) N is similar to a diagonal matrix. (c)N has one non zero eigen vector (d) N has three linearly independent eigen vector.

28. Let ω be a complex number such that ω3 = 1, but ω 6= 1. If A =

1 ω ω2

ω ω2 1ω2 ω 1

then

which of the following statements are true? [NET(Dec)11](a) A is invertible (b) rank(A) = 2 (c) 0 is an eigen value of A (d) there existlinearly independent vectors v, w ∈ C3 such that Av = Aw = 0.


29. Let A =

0 0 0 −41 0 0 00 1 0 50 0 1 0

, then a Jordan canonical form of A is given by [NET(Dec)11]

(a)

−1 0 0 00 1 0 00 0 2 00 0 0 −2

(b)

−1 1 0 00 1 0 00 0 2 00 0 0 −2

(c)

1 1 0 00 1 0 00 0 2 00 0 0 −2

(d)

−1 1 0 00 −1 0 00 0 2 00 0 0 −2

30. Which of the following matrices are positive definite? NET(June)12

(a)[

2 11 2

](b)[

1 22 1

](c)

[4 −1−1 4

](d)[

0 44 0

].

31. Let A be a non-zero linear transformation on a real vector space V of dimension n.Let the subspace V0 ⊂ V be the image of V under A. Let k = dimV0 < n and supposethat for some λ ∈ <, A2 = λA. Then NET(June)12(a) λ = 1 (b) detλ = |λ|n (c) λ is the only eigenvalue of A (d) There is a non-trivialsubspace V1 ⊂ V such that Ax = 0 for all x ∈ V1.

32. Let N be the non-zero 3× 3 matrix with the property N2 = 0. Which of the followingis/are true? NET(June)12(a) N is not similar to a diagonal matrix (b) N is similar to a diagonal matrix (c)N has one non-zero eigenvector (d) N has three linearly independent eigenvectors.

33. Let A be a 2 × 2 non-zero matrix with entries in C such that A2 = 0. Which of thefollowing statements must be true? [NET(Dec)11](a) PAP−1 is diagonal for some invertible 2×2 matrix P with the entries in < (b) Ahas two distinct eigen values in C (c) A has only one eigen value in C with multiplicity2 (d) Av = v for some v ∈ C2, v 6= 0.

34. Let λ, µ be distinct eigen values of a 2 × 2 matrix A. Then which of the followingstatements must be true? [NET(Dec)11](a) A2 has distinct eigen values. (b) A3 = λ3−µ3

λ−µ A− λµ(λ+ µ) (c) Trace of An isλn + µn for every positive integer n. (d) An is not a scalar multiple of identity forany positive integer n.

35. Let A =

0 0 0 −41 0 0 50 1 0 50 0 1 0

, then the Jordan canonical form of A is NET(Dec)11

(a)

−1 0 0 00 1 0 00 0 2 00 0 0 −2

(b)

−1 1 0 00 1 0 00 0 2 00 0 0 −2

(c)

1 1 0 00 1 0 00 0 2 00 0 0 −2

(d)

−1 1 0 00 −1 0 00 0 2 00 0 0 −2

36. Let A be a 3 × 3 matrix with real entries such that det(A) = 6 and the trace of A is

0. If det(A+ I) = 0, then the eigen values of A are [NET(Dec)11](a) -1,2,3 (b) -1,2,-3 (c) 1,2,-3 (d) -1,-2,3.

37. Let A be a 4 × 4 matrix with real entries such that -1,1,2,-2 are the eigen values. IfB = A4 − 5A2 + 5I, then which of the following statements are true? [NET(Dec)11](a) det(A+B) = 0 (b) det(B) = 1 (c) trace of A−B is 0 (d) trace of A+B is 4.


38. Let J be the 3× 3 matrix all of whose entries are 1. Then, [NET(Dec)11](a) 0 and 3 are the only eigen values of A (b) J is positive semidefinite, i.e., Jx, x ≥0 for all x ∈ <3 (c) J is diagonalizable (d) J is positive definite, i.e., Jx, x > 0 forall x ∈ <3 with x 6= 0.


1. Show that the eigen values of a Hermitian matrix are all real.

2. Let A = R(θ) =(

cos θ − sin θsin θ cos θ

). Show that A does not have any eigen value unless

A = ±1.

3. Verify Cayley-Hamilton Theorem for the square matrix A =(

2 10 5

).

4. Using Cayley-Hamilton Theorem, find the inverse of the matrix A =(

3 −5−1 2

).

5. What are the possible eigen values of a square matrix A (over the field <) satisfyingA3 = A ?

6. Let V be the vector space of all real differentiable functions over the field of reals andD : V → V be the differential operator. Is every non-zero real an eigen value of D?Support your answer.


1. Prove that the spectrum of

1 0 20 −1 10 −1 0

is 1, ω, ω2.

2. Let A be a n × n normal matrix. Show that Ax = λx if and only if A∗x = λx. If Ahas in addition, n distinct real eigenvalues, show that A is Hermitian. [ Gate’96]

3. Let A be a 6 × 6 diagonal matrix with characteristic polynomial x(x + 1)2(x − 1)3.Find the dimension of γ, where γ = B ∈M6(<) : AB = BA. [ Gate’97].

4. The eigen values of a 3× 3 real matrix P is 1,−2, 3, show that

P−1 = 16 (5I + 2P − P 2). [ Gate’96].

5. Show that the characteristic polynomial of αI + βA in terms of that of A is (λ− α)n

or βnχA

(λ−α

β

)according as β = 0 or not.

6. Let λ1 and λ2 be the two distinct eigen values of a real square matrix A. If u and v areeigen vectors of A corresponding to the eigen values λ1 and λ2 respectively, examinewhether u+ v an eigen vector of A.

7. Find the distinct eigen values of U , where U be a 3×3 complex Hermitian and unitarymatrix. [Gate’96]

8. Let V be the vector space of all real differentiable functions over the field of reals andD : V → V be the differential operator. Is every non zero real an eigen value of D?Support your answer. [ BH’06]


9. Let A be a 3× 3 matrix with eigen values 1,−1, 0.

10. Find the cubic equation which is satisfied by the matrix A =

1 2 10 1 −13 −1 1

.

11. Use Cayley H theorem to find A−1, where,A =(

2 13 5

)[SET 10, BH 00]

12. State Cayley-Hamilton theorem and verify the same for the matrix

(i)(

2 1−1 1

)BH‘98, (iii)

(2 10 3

)BH‘00 (iv)

(3 −5−1 2

)BH‘06.

(v)

1 3 6−3 −5 −63 3 4

V H‘02 (vi)

1 3 −33 −5 36 −6 4

CH‘98 (vii)

0 1 −1−1 0 21 1 0

CH‘03

Also express A−1 as a polynomial in the matrix A, and find A−1 using Caley-Hamiltontheorem.

13. Using Cayley-Hamilton theorem find A50, for A =

1 0 01 0 10 1 0

. BH‘04

14. Find the eigen values and eigenvectors of the matrix

(i)

3 2 21 4 1−2 −4 1

BH‘98 (ii)

1 1 1−1 −1 −10 0 1

.BH‘99 (iii)

8 −6 2−6 7 −42 −4 3

BH‘00

(iv)

1 −3 33 −5 36 −6 4

BH‘03 (v)

2 0 00 3 00 0 5

V H‘05 (vi)

−1 3 03 −2 00 0 1

BH‘05

(vii)

6 −2 2−2 3 −12 −1 3

CH‘05 (viii)

3 0 30 3 03 0 3

CH‘95 (ix)

0 0 10 1 01 0 0

CH‘97

(x)

1 −2 22 −2 43 −3 6

CH‘00 (xi)

2 −2 31 1 11 3 −1

CH‘02

Also find the eigen values of A−1.

15. Find matrices P and Q such that

1 1 11 −1 −13 1 1

is in normal form. CH‘04

16. Let (i)A =

0 2 42 0 64 6 0

, BH‘98 (ii)A =

1 2 12 5 −21 −2 17

CH‘97, V H‘02

(iii)A =

2 1 11 3 11 2 2

CH‘99 (iv)A =

3 2 12 3 10 0 1

CH‘01

find a non-singular matrix P such that PAPT is a diagonal matrix.

17. Find a orthogonal matrix P such that P−1AP is a diagonal matrix, where,

A = (i)(

2 −2−2 5

). (ii)

2 −2 0−2 1 −20 −2 0

,[BH 02] (iii)

5 −6 −6−1 4 23 −6 −4

. [JU(M.Sc.)‘06]


18. Obtain a non-singular transformation that will reduce the quadratic form x21 + 2x2

2 +x2

3 − 2x1x2 − 2x2x3 to the normal form. BU(M.Sc)‘99

19. Verify that the quadratic form 5x21 +26x2

2 +10x23 +4x2x3 +14x3x1 +6x1x2 is positive

semi-definite. BU(M.Sc)‘03

20. Diagonalise the matrix (i)(

2 01 3

)BH‘99 (ii)

(3 −1−1 3

)BH‘00

(iii)

2 −2 0−2 1 −20 −2 0

BH‘04, ‘05 (iv)(

4 23 −1

).

21. Show that the matrix(

2 10 2

)is not Diagonalisable. CH‘98

22. Verify that the matrix A =

−11 −7 −516 11 612 6 7

has eigen values 1, 3, 3. Find a non singular

matrix C with initial entry C11 = 1 that transforms A to the Jordan canonical form

C−1AC =

1 0 00 3 10 0 3

.

23. Find an orthogonal transformation x = Py to reduce the quadratic form q = xTAx =x2

1 + 4x1x2 + x22 − x2

3 on <3 to a diagonal form yTDy, where the diagonal elements ofD are the eigenvalues of A. Hence find the signature and determine the nature of thedefiniteness of q. BU(M.Sc.‘02

24. Let T : V → V be a linear operator on a finite dimensional vector space V over thefield K and let p(t) be the minimal polynomial of T . If T is diagonizable, show thatp(t) = (t− λ1)(t− λ2) · · · (t− λr) for some distinct scalars λ1, λ2, · · · , λr. Gate′02.

25. Let Jn be the n×n matrix each of whose entries equals to 1. Find the nullity and thecharacteristic polynomial of Jn. Gate′03.

26. Determine all possible canonical forms J for the matrix of order 5 whose minimalpolynomial is m(t) = (t− 2)2.

27. If A is a complex 5 × 5 matrix with characteristic polynomial f(t) = (t − 2)2(t + 7)2

and minimal polynomial m(t) = (t − 2)2(t + 7). Write the Jordan Canonical form ofA. BU(M.Sc)‘03

28. Find the matrix whose the minimal polynomial is (x− 1)(x− 2).

29. Find the quadratic form to which x21+2x2

2−x23+2x1x2+x2x3 transforms by the change

of variables y1 = x1 − x3, y2 = x2 − x3, y3 = x3 by actual substitution. Verify thatthe matrix of the resulting quadratic form is congruent to the matrix of the originalquadratic form.

30. Show that the quadratic form x1x2 +x1x3 +x3x1 can be reduced to the canonical formy21−y2

2−y23 by means of the transformation x1 = y1−y2−y3, x2 = y1+y2−y3, x3 = y3.

BU(M.Sc.‘98

31. Reduce the quadratic form 2x1x3 + x2x3 to diagonal form.


32. Find the Jordan normal form of the matrix A over the field of real numbers, where

A =

4 −1 14 0 22 −1 3

. BU(M.Sc)‘99

Answer


1. d 2. c 3. b 4. a 5. a 6. b 7. d 8. c 9. c 10. d 11. a12. b 13. a 14. b 15. c 16. c 17. a

Chapter 8

Boolean AlgebraIn this chapter we shall adopt the definition of an modern abstract mathematical structureknown as Boolean algebra introduced by famous mathematician George Boole. Hence wegive the definition of Boolean algebra given by Huntington[1904]. This algebra became anessential tool for the analysis and design of electronic computers, dial telephones, switchingsystems and many kinds of electronic control devices.

8.1 Operation

To describe the modern concept like set algebra, we shall first we shall define what is calledthe operation.

8.1.1 Unary Operation

A unary operation on a set of elements is a rule which assigns to every element in the set,another element from the set. For example, if S is a set of positive real numbers, the function’square root’ is a unary operation which assigns to each a ∈ S, an element

√a forms.

8.1.2 Binary Operation

A binary operation, on a set of elements, is a rule which assigns to a unique element fromthe set to each pair of elements from the set. For example, ordinary addition, subtraction,multiplication etc over the set of real numbers are the examples of binary operation.

8.2 Boolean Algebra

Boolean algebra can be defined either as an algebraic system or a special lattice.

8.2.1 Boolean Algebra as a Lattice

A Boolean algebra B is a complemented distributive lattice. Equivalently a lattice B is aBoolean algebra if

(i) B is bounded with bounds 0 and 1.

(ii) every element a ∈ B has a complement a′ satisfying a ∨ a′ = 1 and a ∧ a′ = 0.

(iii) a ∧ (b ∨ c) = (a ∧ b) ∨ (a ∧ c) and a ∨ (b ∧ c) = (a ∨ b) ∧ (a ∨ c) for all a, b, c ∈ B.

455

456 Boolean Algebra

8.2.2 Boolean Algebra as an Algebraic System

A non-empty set B of elements a, b, c, . . . on which two binary operators ‘+’ (called addi-tion) and ‘·’ (called multiplication) and one unary operator ‘′’ (called complementation) aredefined, is said to be a boolean algebra B,+, ·,′ , if the following postulates are satisfied :P1 : Closure Property:

(i) Closure with respect to ‘+’ i.e., for all a, b ∈ B we have a+ b ∈ B.

(ii) Closure with respect to ‘·’ i.e., for all a, b ∈ B we have a · b ∈ B.

P2 : Commutative law:

(i) The operator ‘+’ is commutative i.e., a+ b = b+ a; ∀a, b ∈ B.

(ii) The operator ‘·’ is commutative i.e., a · b = b · a; ∀a, b ∈ B.

P3 Existence of identity:

(i) ∀a ∈ B, ∃ an identity element 0 ∈ B such that, a+ 0 = 0 + a = a.

(ii) ∀a ∈ B, ∃ an identity element 1 ∈ B such that, a · 1 = 1 · a = a.

P4 Distributive law:Each of operations + and · is distributive over the other i.e.

a · (b+ c) = a · b+ c · c; (·) over (+), ∀a, b, c ∈ B.a+ (b · c) = (a+ b) · (a+ c); (+) over(·), ∀a, b, c ∈ B.

P5 Existence of complement:For every element a ∈ B, ∃ an element a′ ∈ B(called complement of a) such that

a+ a′ = a′ + a = 1; the identity element for ·a · a′ = a′ · a = 0; the identity element for + .

These axioms are given by Huntington’s[1904]. A Boolean Algebra is generally denoted bya 6-tuple B,+, ·,′ , 0, 1 or simply by B,+, ·,′ .

(i) Notice that, 0′ = 1 and 1′ = 0, for, by P3 and P4, we have 1+0 = 1 and 1.0 = 0. Thusthe identity elements are complementary to each other.

(ii) A trivial boolean algebra is given by 0,+, .,′ . Here 0 + 0 = 0, 0.0 = 0, 0 = 0 and1 = 0. All the axioms P1, P2, P3, P4 hold trivially with both sides 0.

Ex 8.2.1 Let B = 1, 2, 3, 5, 6, 10, 15, 30 the set of all positive divisors of 30. For a, b ∈ B,let the binary and unitary operations on B be defined as

(i) a+ b = the LCM of a and b.

(ii) a.b =, the GCD of a and b.

(iii) a′ = 30a .

Prove that B,+, ·,′ is a boolean algebra.

Boolean Algebra 457

Solution: We have the following composition tables

+ 1 2 3 5 6 10 15 30 ′

1 1 2 3 5 6 10 15 30 302 2 2 6 10 6 10 30 30 153 3 6 3 15 6 30 15 30 105 5 10 15 5 30 10 15 30 66 6 6 6 30 6 30 30 30 510 10 10 30 10 30 10 30 30 315 15 30 15 15 30 30 15 30 230 30 30 30 30 30 30 30 30 1

. 1 2 3 5 6 10 15 301 1 1 1 1 1 1 1 12 1 2 1 1 2 2 1 23 1 1 3 1 3 1 3 35 1 1 1 5 1 5 5 56 1 2 3 1 6 2 3 610 1 2 1 5 2 10 5 1015 1 1 3 5 3 5 15 1530 1 2 3 5 6 10 15 30

P1 : From the composition tables, we see that, S is closed under +, .,′ .P2 : Both the operations are commutative, since from the tables, we have,

a+ b = b+ a = b and a.b = b.a = a.

as LCM of a and a and b is the LCM of b and a; also, GCD of a and a and b is the GCD ofb and a.P3 : Each operation is distributive over the other, since, with the help of the properties ofLCM and HCF, we get,

a.(b+ c) = a.b+ a.c and a+ b.c = (a+ b).(a+ c); ∀a, b, c ∈ B

P4 : 1 is the identity element for (+) since,

a+ 1 = 1 + a = a; ∀a ∈ B.

Similarly, 30 is the identity element for (.) since,

a.30 = 30.a = a; ∀a ∈ B.

P5 : The complement of an element 3 is

3 + 3′ = 3 +303

= 3 + 10 = 30, the identity element for .

3.3′ = 3.303

= 3.10 = 1, the identity element for + .

Thus for any element a ∈ B, ∃a′ ∈ B such that a + a′ = 30, identity element for (.) anda.a′ = 1, identity element for (+).

Hence B,+, ·,′ is a boolean algebra. in which 1 is the zero element and 30 is the unitelement.

Ex 8.2.2 Let B be the set of all positive divisors of 48. For a, b ∈ B, let the binary andunitary operations on B be defined as

(i) a+ b = the LCM of a and b.

(ii) a.b =, the GCD of a and b.

(iii) a′ = 48a .

Prove that B,+, ·,′ is not a boolean algebra.

458 Boolean Algebra

Solution: Here B = 1, 2, 3, 4, 6, 8, 12, 16, 24, 48, the composition table is given by

+ 1 2 3 4 6 8 12 16 24 48 ′

1 1 2 3 4 6 8 12 16 24 48 482 2 2 6 4 6 8 12 16 24 48 243 3 6 3 12 6 24 12 48 24 48 164 4 4 12 4 12 8 12 16 24 48 126 6 6 6 12 6 24 12 48 24 48 88 8 8 24 8 24 24 16 16 24 48 612 12 12 12 12 12 24 12 48 24 48 416 16 16 48 16 48 16 48 16 48 48 324 24 24 24 24 24 24 24 48 24 48 248 48 48 48 48 48 48 48 48 48 48 1

. 1 2 3 4 6 8 12 16 24 481 1 1 1 1 1 1 1 1 1 12 1 2 1 2 2 2 2 2 2 23 1 1 3 1 3 1 3 1 3 34 1 2 1 4 2 4 4 4 4 46 1 2 2 2 6 2 6 2 6 68 1 2 1 4 2 8 4 8 8 812 1 2 3 4 6 4 12 4 12 1216 1 2 1 4 2 8 4 16 8 1624 1 2 3 4 6 8 12 8 24 2448 1 2 3 4 6 8 12 16 24 48

We know, 1 is the zero element and 48 is the unit element as by the previous example. Also,8′ = 6 and

8 + 8′ = the LCM of 8 and 6= 24 6= the unit element .

8.8′ = the GCD of 8 and 6= 2 6= the zero element .

Hence B,+, ·,′ is not a Boolean Algebra. We see that B contains the elements like 16,which is divisible by an square integer greater than 1.

Ex 8.2.3 Let S be a given non-empty set, then P (S) be the power set of S. The binary andunitary operations on S be defined as

(i) A+B = A ∪B, the union of subset A,B ∈ P (S).

(ii) A ·B = A ∩B, the intersection of subsets A,B ∈ P (S).

(iii) A′ = the complement of the subset A in S.

Prove that P (S),+, ·,′ is a boolean algebra

Solution: Let A,B and C be any three subsets of S, then the Huntington’s postulates forboolean algebra are satisfied from the following properties of sets:

(i) A ∪B = B ∪A,A ∩B = B ∩A.

(ii) A ∪ (B ∩ C) = (A ∪B) ∩ (A ∩ C), A ∩ (B ∪ C) = (A ∩B) ∪ (A ∩ C).

(iii) A ∪ φ = A,A ∩ S = A.

(iv) A ∪A′ = S,A ∩A′ = φ.

Thus P (S),+, ·,′ is a Boolean Algebra i.e. P (S) is a boolean algebra under set theoreticaloperations of union, intersection and complementation. The null set φ ∈ P (S) is the zeroelement and S is the unit element in this boolean algebra P (S),+, ·,′ .

Ex 8.2.4 Consider, the set B = a, b and the binary operations (+) and (.) defined on theelements of B as + a b

a a bb b b

. a ba a ab a b

.

Prove that B,+, ·,′ is a boolean algebra.

Boolean Algebra 459

Solution: We are to show that the postulates for boolean algebra are satisfied .P1 : From the composition tables, we see that, both the operations obey the closure

axioms.P2 : Both the operations are commutative, since from the tables, we have,

a+ b = b+ a = b and a.b = b.a = a.

P3 : Each operation is distributive over the other, since

a.(a+ b) = a.b = a and a.a+ a.b = a+ a = a.

Again, b.(a+ b) = b.b = b and b.a+ b.b = a+ b = b.

Similarly, a+ (a.b) = a+ a = a and (a+ a).(a+ b) = a.b = a.

P4 : a is the identity element for (+) since,

a+ a = a and b+ a = a+ b = b.

Similarly, b is the identity element for (.) since,

a.b = a and b.b = b.

P5 : The complement of a is b, as a+ b = b, the identity element for (.) and the complementof b is a, since b.a = a, the identity element for (+).

Ex 8.2.5 Prove that there does not exist a Boolean algebra containing only three elements.

Solution: Every Boolean algebra B,+, ·,′ contains two distinct elements, i.e., B = 0, 1,zero element 0 and unit element 1, satisfying

a+ 0 = a and a · 1 = a; ∀a ∈ B

which is a two point algebra. Let a boolean algebra B contain an element a other than 0and 1, i.e., B = 0, 1, a, where a 6= 0 and a 6= 1. Then the complement of a i.e. a′ ∈ S,satisfying

a+ a′ = 1 and a · a′ = 0.

We are to show that a′ 6= a, a′ 6= 0, a′ 6= 1. First let a′ = a then,

a · a′ = a · a⇒ 0 = a

as a · a′ = 0 and a · a = a. We arrive at a contraction, so a′ 6= a. Let a′ = 0, then

a+ a′ = a+ 0 ⇒ 1 = a

as a+ a′ = 1 and a+ 0 = a. We arrive at a contradiction and consequently a′ 6= 0. Lastly,let a′ = 1, then

a · a′ = a · 1 ⇒ 0 = a

as a · a′ = 0 and a · 1 = a. In this case also we arrive at a contradiction and therefore a′ = 1.Therefore, a′ is distinct from a, 0 and 1. This shows that a Boolean algebra B can notcontain only three elements a, 0 and 1.

Deduction 8.2.1 Difference between Boolean algebra and algebra of real num-bers: Comparing Boolean algebra with arithmetic and ordinary algebra (the field of realnumbers) we note the differences:

460 Boolean Algebra

(i) Commutative and associative laws are true in both the algebras. But Huntington’spostulates do not include the associative law.

(ii) The distributive law of ‘+’ over ‘·’ that a + b · c = (a + b) · (a + c) is not valid forordinary algebra.

(iii) Boolean Algebra does not have additive and multiplicative inverses. Therefore nocancellations are allowed (i.e. there are no substraction and division operations).

(iv) The operation complementation (′) is not available in ordinary algebra.

(v) The idempotent law a + a = a and a · a = a holds in Boolean Algebra but does nothold in algebra of real numbers.

(vi) Boolean Algebra is linear in character but the algebra of real numbers is not. Thus inthe former, a+ a = a and a · a = a, but in the latter a+ a = 2a and a · a = a2.

(vii) Boolean Algebra is more symmetric in its properties and hence the principle of dualityholds in it. But no such symmetry is true in algebra of real nos.

8.2.3 Boolean Algebra Rules

Below are some important rules of boolean algebra which are used to simplify the booleanexpression.

(i) 0 + x = x (ix) x+ y = y + x

(ii) 1 + x = 1 (x) x · y = y · x

(iii) x+ x = x (xi) x+ (y + z) = (x+ y) + z

(iv) 0 · x = 0 (xii) x · (y · z) = (x · y) · z

(v) 1 · x = x (xiii) x · (y + z) = x · y + x · z

(vi) x · x = x (xiv) x+ xz = x

(vii) x · x = 0 (xv) x(x+ y) = x

(viii) x+ x = 1 (xvi) (x+ y)(x+ z) = x+ yz

(ix) x = x (xvii) x+ xy = x+ y (xviii) xy + yz + yz = xy + z

Ex 8.2.6 In a Boolean algebra B, prove the following:

(i) x+ x′y = x+ y and x · (x′ + y) = x · y

(ii) xy + x′z + yz = xy + x′z and (x+ y) · (x′ + z) · (y + z) = (x+ y) · (x′ + z)

(iii) (x+y)(y+z)(z+x)=xy+yz+zx.

Solution: (i) Using the Boolean algebra rules, we get,

x+ x′y = (x+ x′) · (x+ y) = 1 · (x+ y) = x+ y.

Therefore, x+ x′y = x+ y. The dual of this x · (x′ + y) = x · y.(ii) Using the Boolean algebra rules, we get,

Boolean Algebra 461

xy + x′z + yz = xy + x′z + yz · 1 = xy + x′z + yz(x+ x′)= xy + x′z + yzx+ yzx′ = xy + xyz + x′z + x′yz

= xy(1 + z) + x′z(1 + y) = xy · 1 + x′z · 1 = xy + x′z.

The dual of the above is(x+ y) · (x′ + z) · (y + z) = (x+ y) · (x′ + z).

(iii) Using the Boolean algebra rules, we get,

LHS = (x+ y)(y + z)(z + x) = (y + x)(y + z)(z + x)= (y + xz)(z + x) = (y + xz)z + (y + xz)x= (yz + xzz) + (yx+ xzx) = yz + xzz + yx+ xz

= yz + xz + yx+ xz = xy + yz + zx+ zx = xy + yz + zx.

Ex 8.2.7 Show that, in a Boolean algebra, ab′ = 0 ⇒ a+ b = b and ab = a, where a, b ∈ B.[BH:87]

Solution: Using the Boolean algebra rules, we get,

LHS = a+ b = (a+ b).1 = (a+ b).(b+ b′)= b+ a.b′ = b+ 0 = b.

and a.b = a.b+ 0 = a.b+ a.b′

= a.(b+ b′) = a.1 = a.

Ex 8.2.8 Show that, in a Boolean algebra,

(x+ y).(x′ + z).(y + z) = xz + x′.y + yz,

where x, y, z ∈ B. BH : 86, 94

Solution: Using the Boolean algebra rules, we get,

LHS = (x+ y).(x′ + z).(y + z) = (x.x′ + x.z.+ y.x′ + y.z).(y + z)= (x.z + x′.y + y.z).(y + z); as x.x′ = 0= x.z.y + x′.y + x.z + x′.y.z + y.z; as x.x = x and x+ x = x

= (x+ x′)y.z + x′.y + x.z + y.z

= 1.y.z + x′.y + x.z + y.z = y.z + x′.y + x.z, as x+ x = x.

Definition 8.2.1 (i) By a proposition in a boolean algebra we mean either a statement oran algebraic identity in the boolean algebra. For example, the statement “ In a booleanalgebra 0 is unique” is a proposition.

(ii) A Boolean Algebra is said to be degenerate if it contains only one element and in thiscase 0 = 1.

8.2.4 Duality

By the dual of a proposition A in a Boolean algebra we mean the proposition obtained fromA by replacing + with . and . with +, 1 with 0 and 0 with 1. For example, the dual of theproposition x + y = y + x is the proposition x.y = y.x and vice versa. We are to find thefollowing two properties of Boolean Algebra B,+, ·,′ directly from Huntington postulates.For each a ∈ B,

(i) a+ 1 = 1 (ii) a · 0 = 0,

462 Boolean Algebra

where 1 and 0 represents the identity elements with respect to · and +. The second relationcan be obtained from the first by changing + to · and 1 to 0 is called dual of the first property.The same is observed in each pair of Huntington’s postulates, where each postulate of a paircan be obtained from the other by interchanging + and · and consequently 0 and 1.

Duality theorem : Starting with a Boolean relation, we can derive another Booleanrelation is

(i) Changing each + to an · sign,

(ii) Changing each · to an + sign,

(iii) and consequently 0 and 1.

If a proposition A is derivable from the axioms of a Boolean algebra, then the dual of A isalso derivable from those axioms.

Duality property or dual expression: An algebraic expression or property P ′, calledthe counterpart of an algebric expression or property P is called dual of P .

Boolean relations Dualsa+ b = b+ a a.b = b.a

(a+ b) + c = a+ (b+ c) (a.b).c = a.(b.c)a.(b+ c) = a.b+ a.c a+ b.c = (a+ b).(a+ c)

a+ 0 = a a.1 = aa+ 1 = 1 a.0 = 0a+ a = a a.a = aa+ a = 1 a.a = 0a = a a = a

a+ b = a.b a.b = a+ ba+ a.b = a a.(a+ b) = a

a+ a.b = a+ b a.(a+ b) = a.b

This dual works for every statement and every theorem in a Boolean Algebra. The principleof duality theorem states that, every true theorem about Boolean Algebra whose statementinvolves only three operations +, ·,′ remains true if + and · and the identity element 0 and1 are interchanged throughout.Properties of boolean algebra : We derive the following properties of the boolean algebraB,+, ·,′ directly from Huntington’s postulates:

Property 8.2.1 In a Boolean algebra B the two identity elements 0 for + and 1 for · areseparately unique.

Proof: If possible, let there be two identity elements 0 and 01 for the binary operation +.Hence a+ 0 = a and a+ 01 = a; ∀a ∈ B. Again

0 + 01 = 0 Since 01is the identity element,and 0 = 0 + 01 = 01 + 0 by the commutative propertyand 0 = 0 + 01 = 01 Since 0 is the identity element.

Hence the identity element 0 for + is unique. Again if possible, let there be two identities 1and 11 for the operation ·. Hence a · 1 = a and a · 11 = a ; ∀a ∈ B. Again

1 · 11 = 1 Since 11is the identity element.and 1 = 1 · 11 = 11 · 1 by the commutative propertyand 1 = 1 · 11 = 11 Since 1is the identity.

Hence the identity element 1 for · is unique.

Boolean Algebra 463

Property 8.2.2 In a boolean algebra B the complement of each element is unique.

Proof: Let a be an arbitrary element in B. Then ∃ a′ ∈ B such that

a+ a′ = 1 identity for · and a · a′ = 0 identity for + .

Let us suppose that a′′ in B is also a complement of a. Then a + a′′ = 1 and a · a′′ = 0.Now,

a′ = a′ · 1 = a′ · (a+ a′′) = (a′ · a) + (a′ · a′′)= 0 + (a′ · a′′) = (a · a′′) + (a′ · a′′)= (a′′ · a) + (a′′ · a′); by commutative law= a′′ · (a+ a′) = a′′ · 1 = a′′.

Hence the complement of a is unique. Therefore, in a boolean algebra for each a ∈ B, ∃ anunique complement in B.

Property 8.2.3 For every a, b ∈ B; (a+ b)′ = a′ · b′ and (a · b)′ = a′ + b′.

Proof: Using the definition, we have,

(a+ b) + a′.b′ = (a+ b) + a′.(a+ b) + b′, distributive law= a′ + (a+ b).(a+ b) + b′, commutative law= (a′ + a) + b.a+ (b+ b′), associative law= (1 + b).(a+ 1), as a+ a′ = 1= 1.1 = 1, as a+ 1 = 1 and a.a = 1.

Therefore, (a+ b) + a′.b′ = 1. Again,

(a+ b).(a′.b′) = a.(a′.b′) + b(a′.b′), distributive law= a(a′.b′) + (a′.b′).b, commutative law= (a.a′).b′ + a′.(b′.b) associative law= 0.b′ + a′.0 as a.a′ = 0= 0 + 0 = 0, as a.0 = 0 and a+ a = a.

Therefore, a′.b′ satisfies all the necessary properties for becoming the complement of (a+ b).Since complement is unique, we have (a+ b)′ = a′ · b′;∀a, b ∈ B. Similarly,

(a.b) + (a′ + b′) = 1 and (a.b).(a′.b′) = 0,

from which, we have, (a.b)′ = a′ + b′. These are well known De-morgan’s laws.

Property 8.2.4 For any a ∈ B; a+ a = a and a · a = a.


LHS = a+ a = (a+ a).1; existence of identity= (a+ a).(a+ a′); as a+ a′ = 1= a+ a.a′; distributive law= a, as a.a′ = 0.

464 Boolean Algebra

Therefore, for any boolean algebra B, we have a+ a = a. Now,

LHS = a.a = a.a+ 0; existence of identity= a.a+ a.a′; as a.a′ = 0= a.(a+ a′); distributive law= a.1 = a, as a+ a′ = 1, identity element for.

Therefore, for any boolean algebra B, we have a.a = a, these laws are known as Idempotentlaws.

Property 8.2.5 For all a, b ∈ B; a+ a · b = a, a · (a+ b) = a.

Proof: Using the definition of Boolean algebra, we have,

LHS = a+ a.b = a.1 + a.b, 1 is the identity element for.= a.(1 + b), distributive law= a.(b+ 1), commutative law= a.1 = a, as a+ 1 = 1.

Therefore, for any Boolean algebra B, we have a+ a.b = a. Also,

LHS = a.(a+ b) = (a+ 0).(a+ b), 0 is the identity element for += a+ 0.b, distributive law= a+ b.0, commutative law= a+ 0 = a, as a.0 = 0.

Therefore, for any Boolean algebra B, we have a.(a+ b) = a. These laws are known as lawsof absorption.

Property 8.2.6 For each a, b, c ∈ B;(a+ b) + c = a+ (b+ c), (a · b) · c = a · (b · c)

Proof: Let x = a+ (b+ c) and y = (a+ b) + c. Then,

a.x = a.[a+ (b+ c)] = a.a+ a.(b+ c), distributive law= a+ a.(b+ c) = a., idempotent and absorption laws

and, a.y = a.[(a+ b) + c] = a.(a+ b) + a.c, distributive law= a+ a.c = a, idempotent and absorption laws.

Therefore, a.x = a.y. Also,

a′.x = a′.[a+ (b+ c)] = a′.a+ a′.(b+ c), distributive law= 0 + a′.(b+ c) = a′.(b+ c), since, a.a′ = 0, a+ 0 = 0

and, a′.y = a′.[(a+ b) + c] = a′.(a+ b) + a′.c, distributive law= a′.a+ a′.b+ a′.c, distributive law= 0 + a′.b+ a′.c = since a.a′ = 0= a′.g + a′.c = a′.(b+ c), as a+ 0 = a, distributive law.

Therefore, a′.x = a′.y. From these two results we get, x = y, i.e., (a + b) + c = a + (b +c)∀a, b, c ∈ B. Similarly, taking z = a.(b.c) and t = (a.b).c, we can prove that (a.b).c =a.(b.c)∀a, b, c ∈ B. These laws are known as associative law.

Boolean Algebra 465

Property 8.2.7 For every a ∈ B, (a′)′ = a (involution).

Proof: For each a ∈ B, there exists a unique element a′ ∈ B such that a.a′ = 0 anda+ a′ = 1. Hence,

a′ + a = 1 and a′.a = 0; by commutative law.

These imply that a is the complement of a′, i.e., (a′)′ = a.

Property 8.2.8 For every 0′ = 1 and 1′ = 0.

Proof: For every a ∈ B, we have a.1 = a and 0 + a = a. Replacing a by 0 and 1,respectively, we get,

0.1 = 0 and 0 + 1 = 1.

This shows that 1 is the complement of 0 in B. Hence, 0′ = 1. By dual, 1′ = 0.

Property 8.2.9 For all a, x, y ∈ B; a+ x = a+ y and a′ + x = a′ + y ⇒ x = y; a · x = a · yand a′ · x = a′ · y ⇒ x = y.


(a+ x).(a′ + x) = (a+ y).(a′ + y)⇒ (x+ a).(x+ a′) = (y + a).(y + a′), commutative law⇒ x+ a.a′ = y + a.a′, distributive law⇒ x+ 0 = y + 0, as a.a′ = 0⇒ x = y, since a+ 0 = 0.

Thus, in a boolean algebra, for all a, x, y ∈ B; a + x = a + y and a′ + x = a′ + y ⇒ x = y;a · x = a · y. Similarly,

a.x+ a′.x = a.y + a′.y

⇒ (a+ a′).x = (a+ a′).y, distributive law⇒ 1.x = 1.y, as a+ a′ = 1⇒ x.1 = y.1, commutative law⇒ x = y, since a.1 = 1.

Thus, in a boolean algebra, for all a, x, y ∈ B; a′ · x = a′ · y ⇒ x = y.

Property 8.2.10 For each a ∈ B; a+ 1 = 1, a · 0 = 0 (Universal bounds).

Proof: Using the axioms and certain properties, we have,a+ 1 = (a+ 1).1 = 1.(a+ 1)

= (a+ a′).(a+ 1)= a+ (a′.1) = a+ a′ = 1.

Using dual, we have a.0 = 0.

Ex 8.2.9 Show that, in a boolean algebra, for x, y ∈ B, [BH:90]

(i)(x′ + xy′)′ = xy′ and (ii)[(x′ + y)′.(x+ y′)′]′ = x′ + y.

466 Boolean Algebra

Solution: (i) Using the boolean algebra rules, we get,

LHS = (x′ + xy′)′ = (x′)′.(x.y)′; by Demorgan’s law= (x.(x.y)′ = x.(x′ + y′)= x.x′ + x.y′ = 0 + x.y′ = x.y′.

(ii) Using the boolean algebra rules, we get,

LHS = [(x′ + y)′.(x+ y′)′]′ = [(x′ + y)′]′ + (x+ y′)′

= x′ + y + x′.(y′)′ = x′ + y + x′.y

= x′.(1 + y) + y = x′ + y.

Definition 8.2.2 A non-empty subset S of a boolean algebra B is said to be a subalgebraof B if S is also a boolean algebra under the same binary operations of B. Consider the nonempty subsets S1 = 1, 30, S2 = 1, 2, 15, 30, S3 = 1, 5, 6, 30 and S4 = 1, 3, 10, 30. Wesee that each of these sets, which are subsets of S = 1, 2, 3, 5, 6, 10, 15, 30, is closed under+, . and ′ and hence is a subalgebra of S,+, .,′ .

8.2.5 Partial Order Relation

Let B be a Boolean algebra and x, y ∈ B. It is defined that x is related with y if and onlyif x.y = x and this relation is denoted by ‘≤’, i.e., x ≤ y. According top this definition andusing the properties of Boolean algebra, we have

(i) x ≤ y, x ≤ z ⇒ x ≤ y.z.

(ii) x ≤ y ⇒ x ≤ y + z.

(iii) x ≤ y ⇒ y′ ≤ x′ and y′ ≤ x′ ⇒ x ≤ y.

Theorem 8.2.1 In a Boolean algebra, x, y ∈ B, x ≤ y if and only if x+ y = y.

Proof: Let x ≤ y then, x+ y = x.y = y.

Conversely, let x+ y = y, then, x.y = x.(x+ y) = x.

Theorem 8.2.2 The relation ≤ in a Boolean algebra is a partial order relation.

Proof: The relation ≤ in a Boolean algebra is defined by ‘x is related with y if and onlyif x.y = x’ Here we are to show that the relation is reflexive, transitive and antisymmetric.

(i) We know, x.x = x,∀x ∈ B, so x ≤ x and ≤ is reflexive.(ii) Let x.y = x and y.z = y for x, y, z ∈ B. Then,

x.z = (x.y).z = x.(y.z) = x.y = x

⇒ x ≤ y and y ≤ z ⇒ x ≤ z.

Therefore, ≤ is transitive.(iii) Let x.y = x and y.x = y for x, y ∈ B, then

x = x.y = y.x = y

⇒ x ≤ y and y ≤ x⇒ x = y.

Therefore, ≤ is antisymmetric. Consequently, ≤ is a partial order relation.

Boolean Function 467

8.3 Boolean Function

Consider, the set B = 1, 0 and the binary operations (+) and (.) defined on the elementsof B as + 1 0

1 1 10 1 0

. 1 01 1 00 0 0

x x′

1 00 1

P1 : the closure axioms is obvious from the tables, since the result of each operation is either1 or 0, 1, 0 ∈ B.P2 : Both the operations are commutative, follows from the symmetry of the binary operatortables.P3 : From the tables we see that

0 + 0 = 0 and 0 + 1 = 1 + 0 = 11.1 = 1 and 1.0 = 0.1 = 0,

which establishes the two identity elements 0 for + and 1 for . as defined in postulate P3.P4 : The distributive law a.(b+ c) = a.b+ a.c can be shown to hold true from the operatortables by forming a truth table of all possible values of a, b and c.

a b c a.(b+ c) a.b+ a.c1 1 1 1 11 1 0 1 11 0 1 1 11 0 0 0 00 1 1 0 00 1 0 0 00 0 1 0 00 0 0 0 0

The distributive law of + over . can be shown to hold true by means of truth table similarto the one above.P4 : From the complement table, it is easily shown that,

0 + 0′ = 0 + 1 = 1 and 1 + 1′ = 1 + 0 = 1.0.0′ = 0.1 = 0 and 1.1′ = 1.0 = 0

i.e., a+ a′ = 1 and a.a′ = 0.Therefore, the set B = 0, 1 together with boolean sum +, boolean product . and boolean

complement ′, is called a two element Boolean algebra.

8.3.1 Constant

A symbol representing a specified element of a boolean algebra will be called a constant. 0and 1 are example of constants.

8.3.2 Literal

A literal is a primed or unprimed (complement) variable. Thus two literals x and x′ corre-sponding to the variable x. The expression x+ x′y has three literals x, x′ and y.

A single literal or a product of two or more literals is known as product term. Theexpression x+ x′y has two product terms. A single literal or of sum of two or more literalsis known as sum term. For example, f = y′.(x+ z).(y′ + z′) contains three sum terms.

468 Boolean Algebra

8.3.3 Variable

Any literal symbol like x, y, z, x1, x2, · · · representing an arbitrary element of a Booleanalgebra will be called a variable. A variable represents an arbitrary or unspecified elementof B. A boolean variable assumes only two values 0 and 1, i.e. it takes values from Z2 whereZ2 is a boolean algebra 0, 1. Two boolean variables are said to be independent if theyassume values independent of each other. Note that x and x′ are not independent variables.Let x, y, z, · · · be boolean variables, then

(i) x+ y = y + x and x.y = y.x, commutative laws.

(ii) (x+ y) + z = x+ (y + z), and (x.y).z = x.(y.z), associative laws.

(iii) x.(y + z) = (x.y) + (x.z) and x+ (y.z) = (x+ y).(x+ z), distributive laws.

(iv) x+ x = x and x.1 = x, idempotent laws.

(v) x+ 0 = x and x.1 = x, identity laws.

(vi) x+ x′ = 1 and x.x′ = 0, inverse laws.

(vii) x+ 1 = 1 and x.0 = 0, dominance laws.

(viii) x+ x.zy = x and x.(x+ y) = x, absorption laws.

(ix) (x+ y)′ = x′.y′ and (x.y)′ = x′ + y′, De-Morgan’s laws.

(x) (x′)′ = x, double complement law.

8.3.4 Monomial

In a boolean algebra a single element with or without prime or more elements connected byoperation (·) is said to be a monomial. x, y′, xyz′, . . . etc. are examples of Monomial.

8.3.5 Polynomial

In a boolean algebra and expression of some indicated monomials connected by operation(+) is called a polynomial. x+ xy′ + x′yz is an example of a polynomial.

Each monomial in a polynomial ia called a term of the polynomial.

8.3.6 Factor

If an expression consist of some elements and polynomials connected by operation · theneach of the elements and polynomials is called a factor of this expression. A factor maybe linear or may not be linear. The factors of the expression x(x + y′)(x′ + y + z) arex, (x+ y′), (x′ + y + z).

8.3.7 Boolean Function

An expression which represents the combination of a finite number of constants and variablesby the operations +, · or ′ is said to be a Boolean algebra. In the expression (a+b′)x+a′y+0; 0, a and b are constants, x and y are variables. It is a boolean function, if a, b, 0, x, y areelements of a boolean algebra. x+x′, xy′+a, xyz′+x′yz+y′z+1 are functions of one, two andthree variables respectively. If f(x, y) = x + y′ then f(0, 0) = 1, f(0, 1) = 0, f(1, 0) = 1,and f(1, 1) = 1. Let f, g, h, · · · be Boolean expressions, then

Truth Table 469

(i) f + g = g + f and f.g = g.f , commutative laws.

(ii) (f + g) + h = f + (g + h), and (f.g).h = f.(g.h), associative laws.

(iii) f.(g + h) = (f.g) + (f.h) and f + (g.h) = (f + g).(f + h), distributive laws.

(iv) f + f = f and f.1 = f , idempotent laws.

(v) f + 0 = f and f.1 = f , identity laws.

(vi) f + f ′ = 1 and f.f ′ = 0, inverse laws.

(vii) f + 1 = 1 and f.0 = 0, dominance laws.

(viii) f + f.hg = f and f.(f + g) = f, absorption laws.

(ix) (f + g)′ = f ′.g′ and (f.g)′ = f ′ + g′, De-Morgan’s laws.

(x) (f ′)′ = f, double complement law.

8.4 Truth Table

A Boolean function f , which a combination of a finite number of Boolean variables connectedby the operations + (OR) and/or . (AND) will assume a value (either 1 or 0) when thevariable involved in it are assigned with their truth values. This value of the function f iscalled its truth value corresponding to that particular set of values of the variable.Mathematicians have found a convenient way of expressing the truth values of a function ffor all possible combinations of the truth values of the independent variables which appearin the expression of f in the form of a table. Such a table is called the Truth table for thefunction f .

Definition 8.4.1 A Boolean expression f in the variables x1, x2, · · · , xn is called a maxtermif

f = x1 + x2 + · · ·+ xn,

where each xi denotes either xi or x′i. For example f = x + y + z′, x′ + y′ + z, x′ + y′ + z′

are examples of the maxterms in the variables x, y, z.

Definition 8.4.2 A Boolean expression f in the variables x1, x2, · · · , xn is called a mintermif

f = x1.x2. · · · .xn,

where each xi denotes either xi or x′i. For example f = xyz′, x′y′z, x′y′z′ are examples ofthe minterms in the variables x, y, z.

Ex 8.4.1 Obtain the truth table for the Boolean function f(x1, x2, x3) = x1 + (x2.x′3).

Solution: A set of 2n possible combinations of 1 and 0. Here f is included three independentvariables x1, x2 and x3, each of these variables can have the value 1 or 0, so that total number

470 Boolean Algebra

of possible combinations of truth values of them is 23 = 8.

x1 x2 x3 x′3 x2.x′3 f

1 1 1 0 0 11 1 0 1 1 11 0 1 0 0 11 0 0 1 0 10 1 1 0 0 00 1 0 1 1 10 0 1 0 0 00 0 0 1 0 0

8.5 Disjunctive Normal Form

A Boolean function is said to be disjunctive normal form (DNF) in n variables x1, x2, . . . , xn

for n > 0, if each term of the function is a monomial of the type f1(x1)f2(x2) · · · fn(xn)where fi(xi) is xi or xi

′ for each i = 1, 2, . . . , n and no two terms are identical. xy′ +xy, xyz + x′yz + x′y′z are the Boolean function in the DNF.

8.5.1 Complete DNF

The disjunctive normal form of a Boolean function is n variables is said to be the Completedisjunctive normal form, if it contains 2n terms. For example, xy + x′y + xy′ + x′y′ is thecomplete disjunctive normal form of two variables x and y, and xyz + x′yz + xy′z + xyz′ +x′y′z+x′yz′+xy′z′+x′y′z′ is the complete disjunctive normal form of three variables x, y, z.Note: Each term of a complete disjunctive normal form of n variables x1, x2, . . . , xn containsxi in either xi or xi

′ for all i. Thus the complete disjunctive normal form consists of 2n

terms.Note: Every complete disjunctive normal form is identically 1 and conversely the unitfunction is in complete disjunctive normal form. For example,

xy + x′y + xy′ + x′y′ = (x+ x′)y + (x+ x′)y′

= y + y′ = 1

Note: Incomplete disjunctive normal form is not unique except its reduced form is inminimum number of variables.

f = xy = xy(z + z′); as z + z′ = 1= xyz + xyz′

f = xyz + xyz′ + x′y′z + xy′z

= xy(z + z′) + (x′ + x)y′z = xy + y′z

Note: The complement of an Incomplete disjunctive normal form are those terms tomake it a complete one. For example, the complement of xyz + xy′z + xy′z′ + x′y′z′ isx′yz + xyz′ + x′y′z + x′yz′.Note: Two Boolean function are equal iff their respective disjunctive normal form have thesame terms.Note: Since 0 is the sum of a numbers of elements, zero function (i.e. 0) can not beexpressed in disjunctive normal form.

Ex 8.5.1 Express the Boolean function f = x+ (x′.y′ + x′.z′)′ in disjunctive normal form.

Disjunctive Normal Form 471

Solution: Using the properties, we get,

f = x+ (x′.y′ + x′.z′)′ = x+ (x′.y′)′.(x′.z)′; De Morgan’s law= x+ (x+ y).(x+ z′); De Morgan’s law= x+ x+ y.z′ Distributive law= x+ y.z′ = x.(y + y′).(z + z′) + y.z′.(x+ x′)= x.(y.z + y.z′ + y′.z + y′.z′) + x.y.z′ + x′.y.z′

= x.y.z + x.y.z′ + x.y′.z + x.y′.z′ + x.y.z′ + x′.y.z′

= x.y.z + x.y.z′ + x.y′.z + x.y′.z′ + x′.y.z′

which is in the full disjunctive normal form.

Ex 8.5.2 Express the Boolean function f = (x+y+z).(xy+xz);x, y, z ∈ B in full disjunctivenormal form.


f = (x+ y + z).(xy + xz) = xxy + xxz + xyy + xyz + xyz + xzz

= xy + xz + xy + xyz + xyz + xz; as x.x = x

= xy + xz + xyz = xy(z + z′) + xz(y + y′) + xyz; as y + y′ = 1 = z + z′

= xyz + xyz′ + xyz + xy′z + xyz = xyz + xyz′ + xy′z,

which is in the disjunctive normal form.

Ex 8.5.3 Express the Boolean function f = (x + y)(x + y′).(x′ + z);x, y, z ∈ B in fulldisjunctive normal form. CH‘99


f = (x+ y)(x+ y′).(x′ + z) = (x+ yy′).(x′ + z)= x.(x′ + z); as x.x′ = 0= xx′ + xz = xz; as x.x′ = 0= xz(y + y′) = xyz + xy′z; asy + y′ = 1,

which is in the disjunctive normal form.

Ex 8.5.4 Express the Boolean function f = (x + y + z)(xy + x′.z)′ in disjunctive normalform in the variables x, y, z.

Solution: Let us construct the truth table of the expression f = (x+ y+ z)(xy+ x′.z)′ forall possible assignments of values 1 or 0 to x, y and z. There are 23 = 8 possible assignmentsof values 1 or 0 to x, y, z in f . Hence,

x y z f1 1 1 01 1 0 01 0 1 11 0 0 10 1 1 00 1 0 10 0 1 00 0 0 0

472 Boolean Algebra

Now, we consider only those rows in which the values of f is 1. Here these rows are 3, 4and 6. For each of these rows we construct minterms as xy′z, xy′z′, x′yz′. Hence, f =(x + y + z)(xy + x′.z)′ = xy′z + xy′z′ + x′yz′ in disjunctive normal form in the variablesx, y, z.

8.6 Conjunctive Normal Form

A Boolean function is said to be conjunctive normal form (CNF) in n variables x1, x2, . . . , xn

for n > 0, if the function is a product of linear factors of the type f1(x1)+f2(x2)+· · ·+fn(xn)where fi(xi) is xi or xi

′ for each i = 1, 2, . . . , n and no two terms are identical. For example,(x+ y)(x′ + y′) and (x+ y + z)(x+ y′ + z)(x′ + y + z) are Boolean function in conjunctivenormal form.

8.6.1 Complete CNF

The conjunctive normal form of a Boolean function is n variables is said to be the CompleteConjunctive Normal Form, if it contains 2n factors. For example, (x+y)(x′+y)(x+y′)(x′+y′)is the complete conjunctive normal form of two variables x and y.Note: Each factor of a Complete Conjunctive Normal Form of n variables x1, x2, . . . , xn

contains xi in either xi or xi′ form for all i. Thus the Complete Conjunctive Normal Form

consists of 2n terms.Note: Every complete conjunctive normal form is identically 0 and conversely zero functioncan be expressed in complete conjunctive normal form. For example,

(x+ y)(x′ + y)(x+ y′)(x′ + y′) = (y + xx′)(y′ + xx′)= (y + 0)(y′ + 0) = yy′ = 0

For three variables the complete conjunctive normal form,

b(x+ y + z)(x′ + y + z)(x+ y′ + z)(x+ y + z′)(x′ + y′ + z)(x′ + y + z′)(x+ y′ + z′)(x′ + y′ + z′)

= (y + z + xx′)(y′ + z + xx′)(y + z′ + xx′)(y′ + z′ + xx′)= (y + z)(y′ + z)(y + z′)(y′ + z′)= (z + yy′)(z′ + yy′)= zz′ = 0

Note: Two Boolean function, each expresses in conjunctive normal form, are equal, iffthey contain identical factors.Note: Incomplete conjunctive normal form is not unique except its reduced form is inminimum no. of variables. For example, f = y = (y + x)(y + x′) = (y + x + z)(y + x +z′)(y + x′ + z)(y + x′ + z′).Note: The complement of an incomplete conjunctive normal form are those factors to makeit complete one. For example, the complement of (x+ y)(x′ + y′) is (x′ + y)(x+ y′).Note: The unit function can not be expressed in conjunctive normal form.

Ex 8.6.1 Express the boolean function f = (x+y+z).(x.y+x.z) in full conjunctive normalform.


f = (x+ y + z).(x.y + x.z) = (x+ y + z).x.(y + z); distributive law

Conjunctive Normal Form 473

= (x+ y + z).(x+ y.y′ + z.z′).(y + z + x.x′); since a.a′ = 0= (x+ y + z).(x+ y.y′ + z).(x+ y.y′ + z′).(y + z + x.x′); distributive law= (x+ y + z).(x+ z + y.y′).(x+ z′ + y.y′).(y + z + x.x′); commutative law= (x+ y + z).(x+ z + y).(x+ z + y′).(x+ z′ + y).(x+ z′ + y′).(y + z + x).(y + z + x′)= (x+ y + z).(x+ y + z).(x+ y′ + z).(x+ y + z′).(x+ y′ + z′).(x+ y + z).(x′ + y + z)= (x+ y + z).(x+ y′ + z).(x+ y + z′).(x+ y′ + z′)

which is in the full conjunctive normal form.

Ex 8.6.2 Express the boolean function f = xyz+(x+y)(y+z);x, y, z ∈ B to its conjunctivenormal form.


f = xyz + (x+ y)(y + z) = (xyz + x+ y)(xyz + y + z)= (x+ y + z)(x+ y + yx)(y + z + x)(y + z + yz)= (x+ y + z)(x+ y)(x+ y + z)(y + z); as x = x+ xy

= (x+ y + z)(x+ y + zz′)(y + z + xx′); as zz′ = xx′ = 0= (x+ y + z)(x+ y + z′)(x′ + y + z),

which is in the conjunctive normal form.

Ex 8.6.3 Express the Boolean function f = (x + y + z)(xy + x′.z)′ in conjunctive normalform in the variables x, y, z.

Solution: Let us construct the truth table of the expression f = (x+ y+ z)(xy+ x′.z)′ forall possible assignments of values 1 or 0 to x, y and z. There are 23 = 8 possible assignmentsof values 1 or 0 to x, y, z in f . Hence,

x y z f1 1 1 01 1 0 01 0 1 11 0 0 10 1 1 00 1 0 10 0 1 00 0 0 0

Now, we consider only those rows in which the values of f is 0. Here these rows are 1, 2, 5 and7. For each of these rows we construct maxterms as x′+y′+z′, x′+y′+z, x+y′+z′, x+y+z′

and x+ y+ z. Hence, f = (x+ y+ z)(xy+x′.z)′ = (x′+ y′+ z′)(x′+ y′+ z)(x+ y′+ z′)(x+y + z′)(x+ y + z) in conjunctive normal form in the variables x, y, z.

Deduction 8.6.1 Conversion of normal form to another normal form : It is doneby double complementation. Let f = xyz + x′y′z + xyz′ + xy′z, then the complement of fis f ′ given by

f ′ = x′y′z′ + xy′z′ + x′yz′ + x′yz.

Now, the complement of f ′ is (f ′)′ which is given by,

(f ′)′ = (x′y′z′ + xy′z′ + x′yz′ + x′yz)′

= (x+ y + z)(x′ + y + z)(x+ y′ + z)(x+ y′ + z′).

It is the conjunctive normal form of f .

474 Boolean Algebra

Ex 8.6.4 Change the Boolean function f = xy + x′y + x′y′ from the DNF to its CNF.

Solution: Using the properties of Boolean algebra, we get,

f = xy + x′y + x′y′ = [(xy + x′y + x′y′)′]′

= [(xy)′.(x′y)′.(x′y′)′]′ = [(x′ + y′)(x+ y′)(x+ y)]′; by De Morgan’s law= x′ + y; by the method of complete CNF.

Ex 8.6.5 Change the Boolean function f = (x+ y + z)(x+ y + z′)(x+ y′ + z)(x′ + y + z′)from the CNF to its DNF.

Solution: Using the properties of Boolean algebra, we get,

f = (x+ y + z)(x+ y + z′)(x+ y′ + z)(x′ + y + z′)= [(x+ y + z)(x+ y + z′)(x+ y′ + z)(x′ + y + z′)′]′

= [x′y′z′ + x′y′z + x′yz′ + xy′z′]′; by De Morgan’s law= xyz + xy′z′ + x′yz + xyz′; by the method of complete DNF.

Ex 8.6.6 Let f(x, y, z) = x+(y.z′) be a Boolean function. Express f(x, y, z) in CNF. Whatis its DNF?

Solution: Here f(x, y, z) contains three independent variables x, y and z and hence thereare 23 = 8 possible combinations of 1 and 0 as truth values of x, y and z. According to thedefinition, the truth table is

x y z f1 1 1 01 1 0 01 0 1 11 0 0 10 1 1 10 1 0 10 0 1 00 0 0 1

Since we have 0 in the last columns of the 1st, 2nd and 7th rows, we construct the correspond-ing maxterms, which are respectively, (x+y+z), (x+y+z′) and (x′+y′+z). Consequently,the required Boolean expression in CNF is given by

fc = (x+ y + z).(x+ y + z′).(x′ + y′ + z).

Since we have 1 in the last columns of the 3rd, 4th, 5th, 6th and 8th rows, we construct thecorresponding maxterms, which are respectively, x.y′.z, x.y′.z′, x′.y.z, x′.y.z′ and x′.y′.z′.Consequently, the required Boolean expression in DNF is given by

fd = x.y′.z + x.y′.z′ + x′.y.z + x′.y.z′ + x′.y′.z′.

Ex 8.6.7 Let f(x, y, z) be a Boolean function such that f(x, y, z) = 0 if and only if at leasttwo variables take the value 1. Express f(x, y, z) in CNF. What is its DNF?

Switching Circuit 475

Solution: ere f(x, y, z) contains three independent variables x, y and z and hence thereare 23 = 8 possible combinations of 1 and 0 as truth values of x, y and z. According to thedefinition, the truth table is

x y z f1 1 1 11 1 0 11 0 1 11 0 0 00 1 1 10 1 0 00 0 1 00 0 0 0

Since we have 0 in the last columns of the 4th, 6th, 7th and 8th rows, we construct thecorresponding maxterms, which are respectively, (x+ y′ + z′), (x′ + y+ z′), (x′ + y′ + z) and(x′ + y′ + z′). Consequently, the required Boolean expression in CNF is given by

fc = (x+ y′ + z′).(x′ + y + z′).(x′ + y′ + z).(x′ + y′ + z′).

Since we have 1 in the last columns of the 1st, 2nd, 3rd and 5th rows, we construct the cor-responding maxterms, which are respectively, x.y.z, x.y.z′, x.y′.z and x′.y.z. Consequently,the required Boolean expression in DNF is given by

fd = x.y.z + x.y.z′ + x.y′.z + x′.y.z.

8.7 Switching Circuit

In this section, we present an application of Boolean algebra in the design of Electricalswitching circuits. Here, the two element algebra plays an important role.

Definition 8.7.1 An electrical switch is a mechanical device which is attached to a pointin a wire having only two possible states – ON or OFF, i.e., closed or open. The switchallows current to flow through the point when it is in the ON state and no current can flowthrough the point when it is the OFF state.

Here we consider the switches which are bi-stable, either ON or OFF, i.e., closed or open. Wesay that, the Boolean expression represents the circuit and the circuit realizes the Booleanexpression.

Ex 8.7.1 A committee of three persons A,B,C decide proposals by majority of votes. Bhas a voting weight 1 and each of A and C has voting weight 2. Each can press a button tocast his vote. Design a simple circuit so that light will glow when a majority of votes is castin favor of the proposal. CH‘03, ‘07

Solution: Let x, y, z be the switches passed by A,B,C respectively. Let v be the numberof votes cast. By the given condition

x = 0, v = 0; x = 1, v = 2y = 0, v = 0; y = 1, v = 1z = 0, v = 0; z = 1, v = 2.

476 Boolean Algebra

If f be the Boolean function of x, y, z. The light will glow when f(x, y, z) = 1. Now,f(x, y, z) = 1, when v ≥ 3 and f(x, y, z) = 0 for v < 3. The table for the function f is givenbelow:

x y z v f1 1 1 5 11 1 0 3 11 0 1 4 11 0 0 2 00 1 1 3 10 1 0 1 00 0 1 2 00 0 0 0 0

Using the properties of Boolean algebra, we can simplify f as follows

f(x, y, z) = x.y.z + x.y.z′ + x.y′.z + x′.y.z = x.y.(z + z′) + x.y′.z + x′.y.z

= x.y + x.y′.z + x′.y.z; as z + z′ = 1= x.(y + y′.z) + x′.y.z; as a.1 = 1 and distributive law= x.(y + y′).(y + z) + x′.y.z; distributive law= x.1.(y + z) + x′.y.z; as a+ a′ = 1= x.(y + z) + x′.y.z; as a.1 = 1= x.y + x.z + x′.y.z = x.y + (x+ x′.y).z; distributive law= x.y + (x+ x′).(x+ y).z; distributive law= x.y + 1.(x+ y).z = x.y + (x+ y).z = x.y + y.z + z.x.

The simplified form is given in the following circuit

z x

y z

x y

Figure 8.1:

Ex 8.7.2 A committee of three approves proposal by majority vote. Each member can votefor the proposal by passing a button at the side of their chairs. These three buttons areconnected to light bulb. For a proposal whenever the majority of votes takes place, a lightbulb is turned on. Design a circuit as possible so that the current passes and the light bulbis turned on only when the proposal is approved.

Solution: Let x, y, z denote the three switches. First we construct a Boolean expression fin the independent variables x, y and z for the required circuit. Let f(x, y, z) be a Booleanfunction such that f(x, y, z) = 1 if and only if at least two variables take the value 1, i.e.,whenever the majority of votes takes place, a light bulb is turned in 1 state. The truth table


is given below:

x y z f1 1 1 11 1 0 11 0 1 11 0 0 00 1 1 10 1 0 00 0 1 00 0 0 0


f(x, y, z) = x.y.z + x.y.z′ + x.y′.z + x′.y.z = x.y + yz + zx.

Ex 8.7.3 A committee consists of the President, Vice-President and Secretary. A proposalis approved if and only if it receives a majority vote or the vote of the president plus oneother member. Each member approves the proposal by passing a button attached to theirchair. Design a switching circuit controlled by the bottoms which allows current to pass ifand only if a proposal is approved.

Solution: Let x, y, z denote the three switches controlled by respectively President, Vice-President and Secretary. First we construct a Boolean expression f in the independentvariables x, y and z for the required circuit. Let f(x, y, z) be a Boolean function. The truthtable is given below:

x y z v f1 1 1 3 11 1 0 2 11 0 1 2 11 0 0 1 00 1 1 2 10 1 0 1 00 0 1 1 00 0 0 0 0


f(x, y, z) = x.y.z + x.y.z′ + x.y′.z + x′.y.z = x.y + yz + zx.

Ex 8.7.4 A light bulb in a room is controlled independently by three wall switches at threeentrances of a room in such a way that the state of the light bulb will change by flicking anyone of the switches (irrespective of its previous state). Design a simple circuit connectingthese three wall switches and the light bulb. CH‘05

Solution: Let x, y, z denote the three wall switches. First we construct a Boolean expressionf in the independent variables x, y and z for the required circuit. Let f(x, y, z) be a Boolean

478 Boolean Algebra

function. The truth table is given below:

x y z f1 1 1 11 1 0 01 0 1 01 0 0 10 1 1 00 1 0 10 0 1 10 0 0 0


f(x, y, z) = x′.y′.z′ + x′.y.z′ + x.y′.z′ + x.y.z = x.(yz + y′z′) + x′(yz′ + y′z).

The corresponding simplified form is given in the following circuit

x y

x’ y

Figure 8.2:

Ex 8.7.5 A light bulb in a room is controlled independently by three wall switches at threeentrances of a room in such a way that the state of the light bulb will change by flicking anyone of the switches (irrespective of its previous state). Design a simple circuit connectingthese three wall switches and the light bulb. CH‘98, ‘06

Solution: Let x, y denote the two wall switches and f be the Boolean function. Firstwe construct a Boolean expression f in the independent variables x and y for the requiredcircuit. The light will glow when f(x, y) = 1. By the given condition, the light is in “off”state when both the switches are in “off” state. So f(x, y) = 0 when x = 0, y = 0. Thetruth table for the function to the problem is

x y f1 1 01 0 10 1 10 0 0

Using the properties of Boolean algebra, we can simplify f as f(x, y, z) = x′y + xy′. Thecorresponding simplified form is given in the following circuit

x y

x’ y

Figure 8.3:

Ex 8.7.6 Simplify the followingSolution: The circuit is represented by the expression

f(x, y, z) = z.(x+ y′) + z′.x+ (z + y′).z′

= x.z + y′.z + z′.x+ z.z′ + y′.z′

= xz + y′z + xz′ + y′z′ = x(z + z′) + y′(z + z′) = x+ y′.

The corresponding simplified form is given in the following circuit


z

y’z’

z’ x

zx

y’

Figure 8.4:x

y’

Figure 8.5:Exercise 8


1. Principle of duality is defined as

(a) ≤ is replaced by ≥(b) LUB becomes GLB

(c) All properties not altered when ≤ is replaced by ≥(d) All properties not altered when ≤ is replaced by ≥ other than 0 and 1 element.

2. What values of A,B,C and D satisfy the following simultaneous Boolean equations?

A+AB = 0, AB = AC, AB +AC + CD = CD.

(a) A = 1, B = 0, C = 0, D = 1 (b) A = 1, B = 1, C = 0, D = 0 (c) A = 1, B =0, C = 1, D = 1 (d) A = 1, B = 0, C = 0, D = 0.

3. The absorption law is defined as(a) a ∗ (a ∗ b) = b (b) |C| ≤ 2 (c) a ∗ (a ∗ b) = b⊕ b (d) C does not exist.

4. The Boolean expression A+BC equals(a) (A+B)(A+ C) (b) (A+B)(A+ C) (c) (A+B)(A+ C) None of the above.

5. Simplifying Boolean expression ABCD +ABCD we get,(a) ABC (b) ABC (c) A+BCD (d) AB + CD

6. The minimization of Boolean expression AC +AB +ABC +BC, we get,(a) A ·B + C (b) AB + C (c) AB +BC (d) None of the above.

7. How many truth tables can be made from one function table(a) 1 (b) 2 (c) 3 (d) 8.

8. The term sum of product in Boolean algebra means

(a) AND function of several OR functions

(b) OR function of several AND functions

(c) AND function of several AND functions

(d) OR function of several OR functions.

480 Boolean Algebra


1. Show that in any Boolean algebra (x′)′ = x, where x′ is the complement of x.

2. Write the dual of each Boolean expression:

(a) a(a′ + b) = ab.

(b) (a+ 1)(a+ 0) = a.

(c) (a+ b)(b+ c) = ac+ b.


1. Show that the power set P (X) of a non empty set X is a poset with respect to setinclusion relation ⊆. Show further that 〈P (X),⊆〉 is a linearly ordered set if and onlyif X is singleton set.

2. Establish that the set of all real numbers does not form a boolean algebra with respectto usual addition and multiplication. [ CH: 10]

3. Show that the set S = a, b, c, d with operations + and . defined below

+ a b c da a b c db b b d dc c d c dd d d d d

. a b c da a a a ab a b a bc b a c cd a b c d

forms a Boolean algebra.

4. Let U be a given universal set. A nonempty class S of subsets of U is said to be afield of sets if S is closed under set theoretical operations of union, intersection andcomplementation, i.e., such that

(a) A ∈ S,B ∈ S ⇒ A ∪B = S.

(b) A ∈ S,B ∈ S ⇒ A ∩B ∈ S.(c) A ∈ S ⇒ A′( complement of A in U) ∈ S.

The universal set U is called the space U .

5. Let U be the family of all finite subsets of < and their respective complements in <.Show that U forms a Boolean algebra under usual set theoretical operations of union,intersection and complementation.

6. Prove that the Boolean algebra (B,+, .,′ ) becomes a poset with respect to the relation≤, defined by a ≤ b if and only if a+ b = b for a, b ∈ B.

7. Prove that a Boolean algebra of three elements 0, 1, a cannot exist. [CH‘06]

8. In a Boolean algebra B, for any a, b and c, prove the following

(a) (a+ b′)(a′ + b′)(a+ b)(a′ + b) = 0.

(b) (a+ b)(a′ + b′) = ba′ + ab′.


(c) (a+ b)(b+ c)(c+ a) = ab+ bc+ ca.

(d) ab+ a′b+ ab′ + a′b′ = 1. [BH‘83, 96, 99]

(e) a+ b = a+ c and a.b = a.c⇒ b = c. [CH‘10]

(f) a+ ab = a; ∀a, b ∈ B. [CH‘08]

(g) b+ a = c+ a and b+ a′ = c+ a′ ⇒ b = c. [CH‘07]

9. Prove that the following properties are equivalent in a Boolean algebra B:(i)a+ b = a, (ii)a+ b′ = 1, and (iii)ab = b. ∀a, b ∈ B.

10. Express the following boolean functions in both DNF and CNF.

(a) f(x, y, z) = (x+ y + z)(xy + x′z)′, [CH‘06]

(b) f(x, y, z) = xy′ + yz′ + zx′,

(c) f(x, y, z) = (x+ y)(x+ y′)(x′ + z),

(d) f(x, y, z) = (x′ + y′ + z)(x+ y′ + z′)(x′ + y + z′)

11. Express the following CNF into an expression of DNF.

(a) (x+ y′ + z)(x+ y + z′)(x+ y′ + z′)(x′ + y + z)(x′ + y + z′)(x′ + y′ + z), [CH‘08]

(b) (x+ y)(y + z)(x′ + y′ + z′), [CH‘10]

12. What is truth table. Construct the truth table for the functionf = xy′z + x′z′ + y.

13. Using the truth table, find full conjunctive normal form of the following Booleanexpression x′y′z + xy′z′ + xy′z + xyz′ + xyz.

14. A Boolean function f(x, y, z) is such that f(1, 1, 0) = f(0, 1, 1) = f(1, 0, 1) = f(1, 1, 1) =1 and f(x, y, z) = 0 for all other cases. Find the function f(x, y, z) in minimized form.

15. f is a Boolean function of three variables as given in the following truth table:

x y z f1 f21 1 1 0 11 1 0 1 11 0 1 0 11 0 0 0 00 1 1 1 10 1 0 1 00 0 1 0 00 0 0 0 0

Find the simplified expression of f and then obtain a switching circuit to get theoutput function.

16. Find the boolean function which represents the circuit

and simplify the functions if possible. [CH‘09]

17. State De-Morgan’s laws and verify them using truth tables.

18. Express the following in CNF in the smallest possible number of variables:

(a) x′y + xyz′ + xy′z + x′y′z′t+ t′.

482 Boolean Algebra

y’ z

x’ z

x y z

Figure 8.6:

(b) (x+ y)(x+ y′)(x′ + z).

(c) xy′ + xz + xy.

(d) (x′ + y′)z + (x+ z)(x′ + z′).

19. Construct the truth table and draw switching circuit diagram of the following Booleanfunctions:

(a) f = xy + yz + zx

(b) f = (xy + xz + x′z′)z′(x+ y + z)

(c) f = xyz′ + xy′z + x′y′z′.

(d) f = x+ y[z + x′(y′ + z′)].

20. A committee consists of the President, Vice-President, Secretary and Treasurer. Aproposal is approved if and only if it receives a majority vote or the vote of thepresident plus one other member. Each member approves the proposal by passing abutton attached to their chair. Design a switching circuit controlled by the bottomswhich allows current to pass if and only if a proposal is approved. [CH‘04]

21. Draw the switching circuit representing the switches x, y, x′ and y′ such that the lightbulb in the circuit glows only if x be ’ON’ and y be ’OFF’ or y be ’ON’ and x be’OFF’. [BH‘90]

22. Construct the switching circuit representing ab+ ab′ + a′b′ and show that the circuitis equivalent to the switching circuit a+ b′. [BH‘87]

23. Let f(x, y, z) be a boolean function which assumes the value 1 if only one of thevariables takes the value 1. Construct a truth table of f and hence write f in CNF.Draw a switching circuit corresponding to the DNF. [CH‘10]

Bibliography

[1] A. R. Rao and P. Bhimasankaram, Linear algebra, TMH.

[2] D. Dubois, and H. Prade, Fuzzy Sets and Fuzzy Systems, Theory and applications,Academic Press, New York, 1980.

[3] I. Niven and H. Zuckerman, An intoduction to theory of numbers, John Wesley andSons.

[4] K. Atanassov, Intuitionistic Fuzzy Sets: Theory and Applications, Physica-Verlag,1999.

[5] K. Hoffman and R. Kuynze, Linear algebra, PHI.

[6] L.A.Zadeh, Fuzzy sets, Information and Control, 8 (1965) 338-352.

[7] M. K. Sen, Ghosh and Mukhopadhyay, Topics in Abstract Algebra, Universty Press.

[8] P.K.Nayak, Numerical Analysis (Theory and Applications), Asian Books Pvt. Limited,2007.

[9] P.K.Nayak, Mechanics: Newtonian, Classical and Relativistic(Theory and Applica-tions), Asian Books Pvt. Limited, 2009.

[10] S.K. Mapa, Higher Algebra, Sarat Books Pvt. Limited.

[11] S. Lipschutz and M.L.Lipson, Discrete Mathematics, Schaum’s Series.

483

Linear Algebra by Nayak

Documents

algebra sets

present book

union of sets

disjoint sets

intersection of sets

types of sets

description of sets

text book