CHAPTER 2 LITERATURE REVIEW
CHAPTER 2
LITERATURE REVIEW
11
2. LITERATURE REVIEW
_________________________________________
2.1 INTRODUCTION
Traditionally, secrecy is required mainly in diplomatic and military
communications. Nowadays it plays [AND and SCH 2005] an
important role in our everyday lives, for example, while managing our
financial affairs. Cryptography plays a vital role in maintaining the
privacy of electronic information against threats. In doing so,
combination of symmetric cryptographic system for encryption,
decryption of the data and public key systems for managing the keys
are used [HIG 1997,SCH 1994]. Assessing the strength of the
cryptographic systems is an essential step in employing cryptography
for information security [LAM et al. 2004]. Cryptanalysis plays a key
role in this context. The main objective of cryptanalysis is not only to
recover the plain text, but also to estimate the strength of the
algorithm which is useful in designing a good cryptographic algorithm.
There is an explosive growth in unclassified research in different
aspects of cryptology and cryptanalysis. Many crypto systems that are
thought to be secure are broken and a large variety of mathematical
tools that are useful in cryptanalysis are developed. Different
approaches are available [ADA 2006, BEL 2003, CAR and MAG 2007,
12
NAW 2004] in the literature to perform cryptanalysis on either block
ciphers or stream ciphers. Classical ciphers are divided into two broad
categories: substitution ciphers and transposition ciphers.
Cryptanalysis of classical ciphers is a popular crypto logical
application for meta-heuristic search.
2.2 REVIEW ON CRYPT ANALYSIS TECHNIQUES
Substitution ciphers represent the basic building blocks of
complex and secure ciphers that are used [MAR et al. 2005, MAS
1988, MAU and WOL 2000, XUE et al. 2009] today. Understanding
the vulnerability of simple ciphers is important while building more
complex ciphers. Many cryptanalysis techniques are available [CHR
2006, DAV 2004, FAI and YOY 2006, GOL 2006, KNU 2002, KNU and
MIT 2005, LAS et al. 2005, PING et al. 2009, RAP and SID 2006, VAS
2004, VAS and GAR 2007] in the literature to break substitution
ciphers, each of them having advantages and disadvantages over one
another.
While attacking the cipher models, one can consider key recovery
attack in which the goal is to derive the secret key or decryption
attack in which the goal is to decrypt the cipher text or key recovery
from decryption attack. Different techniques were explored in the
literature to find the key of the cipher and there by decrypting the
entire cipher text. Several possible methods are available [DUN and
13
KEL 2007, LUN et al. 2008, MAS et al. 2006, OLS 2007, SKR 2007,
YAN 2008] in the literature to break a substitution cipher which
include exhaustive search, simulated annealing, frequency analysis,
genetic algorithm, particle swarm optimization, tabu search and
relaxation algorithm etc.
The exhaustive search method is the simplest out of all algorithms
used to break substitution ciphers. This technique is possible when
the cryptographic systems have finite key space and allowing for all
possible key combinations to be checked until the correct one is
found. This method is an acceptable technique for breaking a mono
alphabetic shift cipher. The first attempt using the exhaustive search
is not the best choice, since it is time consuming, but it decrypts the
text with 100% accuracy.
Michael Lucks described [LUC 1988] a method for automatic
decryption of simple substitution ciphers. This method uses
exhaustive key search and is controlled by few constraints imposed on
word patterns. This method is not used for any statistical analysis or
language heuristics. The method is language independent. This can
be applied to any language for which sufficient online dictionary is
available.
Brute force method is a way to break simple substitution ciphers,
but the number of possible keys that need to be checked is large.
14
Practically, it is impossible to do an exhaustive search with in a
reasonable amount of time. To overcome this, new algorithms are
developed for faster breaking of the cipher.
Ryabko et al. suggested [RYA et al. 2005] an attack on block
ciphers called gradient statistical attack. The possibility of applying it
to ciphers for which no attacks other than the exhaustive key search
are available is presented. The described method belongs to a chosen
plaintext attack. Analysis of statistical properties of block ciphers is
used in the process of cryptanalysis. Applicability of this method to
the RC5 cryptanalysis is experimented.
Raphael presented [RAP and SID 2006] a framework, which is
designed to describe the block cipher cryptanalysis techniques
compactly regardless of their individual differences. This framework
describes possible attacks on block ciphers and also specifies the
technical details of every attack and their respective strengths. This
framework is to describe various attacks on popular and recent block
ciphers.
Baham studied [BIH 1998] the effect of multiple modes using
chosen cipher text attack, when the underlying cryptosystems are
DES and Feal-8. It is shown that in many cases, these modes are
weaker than the corresponding multiple ECB mode. In most cases,
these modes are not much secured than one single encryption using
15
the same cryptosystem. It is suggested to use single mode and use
multiple encryptions as the underlying cryptosystems of single mode.
Automated attack algorithms are developed for which human
intervention is not necessary. These methods are finished either after
a predetermined number of iterations or after messages are
successfully decrypted. One such automated attack algorithms is the
genetic algorithm which is widely used for cracking substitution
ciphers.
Joe Gester proposed [GOE 2003] and implemented the simplest
approach of searching for a key based on the more likely used
keyword generated from the key space. The proposed Genetic
algorithm involves an iterative process of finding the fitness of the
individuals in the population. Then selectively genetic operators are
applied to the members of population to create a new generation and
the process is repeated. Each generation is created by selecting
members of the previous generation and weighted according to their
fitness.
The proposed method uses a simple genetic algorithm approach to
search the key space of cryptograms. If this method is not satisfactory,
then attempt is made to search a smaller problem space by restricting
the key searched to those which are generated by a keyword. In this
16
approach the populations reach local maxima rapidly or never seem to
converge to anything resembling English.
Like other heuristic algorithms, the genetic algorithms do not
produce the exact result. They give solutions which are nearest to the
correct one. The experiments performed using the genetic algorithm
method suggested that a fitness of about 0.9 is enough to determine
the vowel substitutions and consonant substitutions after which the
visual examination by a human can be used to decrypt the entire text.
David Ornchak proposed [ORN 2008] algorithm for decryption of
homophonic substitution ciphers. Mono-alphabetic homophonic
ciphers allow each cipher text symbols to map to only one letter of
plaintext. But homophonic substitution cipher unlike mono
alphabetic, maps each plaintext letter to one or more cipher text
symbols. Moreover homophonic ciphers conceal language statistics in
the enciphered messages and makes [GAN 1993] statistical-based
attacks more difficult one. So, an approach that uses a dictionary-
based attack using genetic algorithm is presented.
Alabassal et al. proposed [ ALA and WAH 2004] a method to
discover the key of a Feistal Cipher using Genetic Algorithm approach.
The possibility of using Genetic Algorithm in key search is attractive
due to its ability in reducing the complexity of the search problem.
The complexity of the attack is reduced by 50%.
17
Yean Li et al. performed [ LI et al. 2005] a study on the effect of an
optimization heuristic cryptanalytic attack on block ciphers. The
possible key-solution generated by the heuristic function is used to
decrypt the known-cipher text. The fitness value for the solution is
obtained by decrypting the known-cipher text and then calculating the
percentage of character-location matches in the original text and the
retrieved text. The search for the correct key combination continues
until a solution match or closest match is found within the
constraints of the test environment.
Tabu search is another optimization technique used for breaking
substitution ciphers. It is of iterative search type and is characterized
by the use of a flexible memory. It eliminates local minima and
searches beyond the local minimum. The experimental results suggest
that the genetic algorithm recovers slightly more characters than the
other algorithms. Tabu Search Algorithm and Genetic Algorithm
frame works are applied on three types of ciphers viz. AES, Hill and
Columnar Transposition. Genetic Algorithm produced results
efficiently in terms of the performance against Tabu Search. However,
according to the available literature, the Genetic Algorithm did not
perform well on the Hill Cipher and AES.
Simulated annealing is another technique that is similar to the
genetic algorithm which is used to break substitution ciphers. The
18
main difference is that the genetic algorithm has a pool of possible
keys at each moment, while the simulated annealing keeps one value
at a time. When combined with a few other simplifications, simulated
annealing makes the approach much simpler than the genetic
algorithm. Simulated annealing algorithm is much simpler to
implement than genetic algorithms and the Tabu search.
The particle swarm optimization method is another method based
on machine learning processes that is used for breaking substitution
ciphers. The algorithm starts by selecting a random population of
potential solutions, each of which is called a particle. Particle swarm
optimization is a good method for breaking simple substitution
ciphers as long as bigrams are used to calculate the fitness of
particles.
Laskari et al. [LAS et al. 2007] applied the particle swarm
optimization(PSO) method to address the problem introduced by the
cryptanalysis of block-cipher crypto systems. PSO originates from the
field of evolutionary computation. PSO is a population-based
algorithm which exploits a population of individuals, to search
promising regions of the function space. The particle swarm
optimization method is applied to the problem of locating the key of a
simplified version of DES. This method is proven to be efficient and
effective where deterministic optimization methods fail. The work
mainly involves in investigating the problem of identifying the missing
19
bits of the key used to a simplified Feistel cipher, the DES reduced to
four rounds.
Relaxation algorithm is another technique used [PEL and ROS
1979] to break substitution ciphers. This is a graph-based technique
that relies on iterative and parallel updating of values associated with
each node. The nodes of the graph are elements of the cipher
alphabet. Each node has a random variable associated with it which
represents the probabilities of the possible characters that the node
represents. The probabilities of a node are updated based on the
appearance of its two neighbors in the cipher text and the trigram
analysis of the original language.
Brute force attacks are successful in solving simple ciphers. But
cryptanalysis of complex ciphers require specialized techniques and
powerful computing systems. Nalini et al. performed [ NAL and RAO
2007] a systematic study of efficient heuristics in successful attack
on some block ciphers. For a systematic study of the attack of ciphers
using heuristics, it is desirable to have simple ciphers which are
tractable and at the same time incorporate representative features of
practical ciphers into them.
Algebraic cryptanalysis is a method, in which cryptanalysis begins
by constructing [BAR and BIH 2008, BAR 2009] a system of
polynomial equations in terms of plaintext bits, cipher text bits and
20
key bits. This technique uses [SIM 2009] modern equation solvers to
attack cryptographic algorithms. The power of the equation solver,
speed and amount of memory that is being used determines whether
the system is solvable or not.
One aspect of differential cryptanalysis that appears [SEL 2008] to
be overlooked is the use of several differences to attack a cipher
simultaneously is best described. The analysis of a cryptosystem not
only measures the strength under the best differential attack, but
need to take into account the best several attacks. These
simultaneous attacks reduce the search space by a reasonable factor.
These simultaneous attacks do result in a significant improvement.
The relative costs of encryption, XOR, and memory I/O are to be taken
into consideration [SIE et al. 2004]. Trial and error is required for an
accurate answer.
Schneier explored [SCH 2000] different cryptanalytic techniques
and the ways to break new algorithms. Breaking a cipher is not
necessarily mean finding a practical way to recover the plaintext from
the cipher text. Breaking a cipher also means, finding a weakness in
the cipher with a complexity less than the bruit force attack.
Several attempts of cryptanalysis of RC5 cipher are found [BIR and
KUS 1998, KIM et al. 2009] in the literature. Kaliski and Yin evaluated
21
[KES et al. 1996] the strength of the RC5 algorithm with respect to
linear and differential attacks. They found a linear attack on RC5 with
6 rounds that uses 257 known plaintexts and whose plaintext
requirement is impractical after 6 rounds. The best previously known
attack requires 254 chosen plain texts to derive the full set of 25 sub
keys for 12 round RC5 with 32 bit words. In this paper Alex Biryukov
et al. proposed a method that drastically improves these results due to
a novel partial differential approach. The proposed attack requires 244
chosen plaintexts only.
Blowfish is a Feistel cipher in which the round function F is a part
of the private key. A differential cryptanalysis on Blowfish is possible
either against a reduced number of rounds or with the piece of
information which describes the function F. Vaudenay showed [ VAU
2006] that the disclosure of F allows performing a differential
cryptanalysis which can recover all the rest of the key with 248 chosen
plaintexts against a number of rounds reduced to eight.
New techniques were developed for cryptanalysis based on
impossible differentials and these techniques are used for attack.
Biham et al. described [BIH 2002] the application of these techniques
to the block cipher algorithms IDEA and Khufu. The new attacks cover
more rounds than the currently well-known attacks. This
demonstrates the power of the new cryptanalytic techniques.
22
2.3 CRYPTANALYSIS USING LANGUAGE MODELS
The role of Cryptanalysis is also to study a cryptographic system
with an emphasis on exploring the weaknesses of the system. The
complex properties of natural languages play an important role in
cryptanalysis. Different approaches of cryptanalysis in the literature
use [GON 1973, STA et al. 2003, Ran and KNI 2009-2] language
characteristics to understand the strength of cipher system. One such
approach deals with frequency statistics. This is based on the
assumption that each letter in the plain text is to be substituted by
another letter of the original ciphered text.
Frequency analysis is the process of determining at what frequency
each symbol of the encrypted text occurs in the cipher text. This
information is used along with the knowledge of frequency of symbols
within the language used in the cipher to determine which cipher text
symbol maps to the corresponding plain text symbol. The frequency
analysis algorithm is the fast approach to decipher the encrypted text.
But, it requires the knowledge of the language statistics of the original
text. The disadvantage is that it relies on constant human interaction
to determine the next move in the process.
23
Rinza et al. presented [ RIN et al. 2008] a method for de-ciphering
texts in Spanish using the probability of usage of letters in the
language. This method is basically to perform Crypto-analysis of a
mono alphabetic cryptosystem. The method uses probability and
usage of letters in Spanish language to break the encrypted text files.
This method assigns weights to different alphabetical letters of
Spanish language. For this purpose analysis of the frequency of
different symbols in the Spanish plain text is done. The same analysis
is carried out on cipher text also. Every encrypted character is
mapped to a single character in the original message and vice versa.
In this way the original text is retrieved from the cipher text. Few
characters vary because of the fact that there are letters and symbols
that are having same frequency. This method of deciphering
cryptograms in Spanish to obtain the original text gave positive results
but the deciphering is not 100% successful.
A simple substitution cipher uses substitution on the set of letters
in the plain text alphabet and different letters in the cipher text
correspond to different letters in the plaintext. To encode a text by
using character wise substitution, an “infinite key” is used and each
letter in the plaintext is replaced by a letter of the cipher text by
means of a one-to-one self-mapping of the set of letters. Then, the
knowledge of the key is necessary to reconstruct the plaintext. The
work of Mineev et al. is concerned [MIN and CHU 2008] with a similar
smoothing effect on the simple substitution cipher resulting from
24
contracting the alphabet by quadratic residues and quadratic non
residues in finite fields. As a sample, the Russian alphabets are
considered in the proposed work.
A single-letter frequency analysis is helpful for obtaining an initial
key to perform powerful bi-gram analysis. Apart from single character,
relation between cipher text and plain text in terms of bi-grams and
trigrams play a vital role. Samuel W. Hasinoff presented [HAS 2003] a
system for the automatic solution of short substitution ciphers. The
proposed system operates by using n-gram model of English and
stochastic local search over all possible keys of the key space. This
method resulted in median of 94% cipher letters correctly decoded.
Sujith Ravi et al. studied [RAV and KNI 2009-1] about attacking
Japanese syllable substitution cipher. Different Natural language
processing techniques are used to attack a Japanese substitution
cipher. They made several novel improvements over previous
probabilistic methods, and report improved results.
In general the receiver uses the key to convert cipher text to plain
text. But a third party intercepts the message and guesses the original
plaintext by analyzing the repetition patterns of the cipher text. From
a natural language perspective, this cryptanalysis task can be viewed
as a kind of unsupervised tagging problem. Language Modeling (LM)
techniques are used to rank proposed decipherment. This work
25
mainly attacks on difficult cipher systems that have more characters
than English and on cipher lengths that are not solved by low-order
language models and relate the language-model perplexity to
decipherment accuracy.
Jackobsen proposed [JAC 1995] a method for cryptanalysis of
substitution ciphers. In this method the initial guess of the key is
refined through a number of iterations. In each step the recovered
plain text using the current key is evaluated to know how close it is to
the correct key. To solve the cipher using this method, bi gram
distribution of letters in cipher text and plain text are sufficient. This
method is suitable for both mono and poly alphabetic substitution
ciphers.
G W Hart proposed [HAR 1994] a method for solving cryptograms
which works well even in difficult cases where only a small sample of
text is available and the probability distribution of letters are far from
what is expected. This method performs well on longer and easier
cryptograms. An exponential time is required in the worst case, but in
practice it is quite fast. This method fails when none of the plain text
words exist in the dictionary.
A deciphering model is developed [LEE et al. 2006] by Lee et al. to
automate the cryptanalysis of mono alphabetic substitution ciphers. It
uses enhanced frequency analysis technique. The method is a three
26
level hierarchical approach. To perform deciphering of mono
alphabetic substitution cipher, monogram frequencies, keyword rules
and dictionary are to be used one by one. This approach is tested on
two short cryptograms and is observed that both cryptograms
achieved successful deciphering results in good computational time. It
is observed that the enhanced frequency analysis approach performs
faster decryption than the Hart‟s approach.
Knight et al. discussed [KNI and YAM 1999, KNI et al. 2006] a
number of natural language decipherment problems that use
unsupervised learning. These include letter substitution ciphers,
phonetic decipherment, character code conversion and word-based
ciphers with importance to machine translation. An efficient algorithm
that accomplishes a naive application of the Expectation Maximization
(EM) algorithm to break a substitution cipher is implemented.
Ravi and Knight introduced [RAV and KNI 2008] another method
that uses low-order letter n-gram model to solve substitution ciphers.
This method is based on integer programming which performs an
optimal search over the key space. This method guarantees that no
key is overlooked. This can be executed with standard integer
programming solvers. The proposed method studies the variation of
decipherment accuracy as a function of n-gram order and cipher
length.
27
Ravi and Knight created [RAN and KNI 2009-2] fifty ciphers each of
lengths 2, 4, 8, . . . , 256 . These ciphers are solved with 1-gram, 2-
gram, and 3-gram language models. The average percentage of cipher
text decoded incorrectly is recorded. It is observed that solution
obtained by integer programming is exact in achieving better
accuracy. For short cipher lengths, much higher improvement is
observed when integer programming method is used. The unigram
model works badly in this scenario, which is inline with Bauer‟s
observation for short texts. The work mainly focuses on letter
substitution ciphers which also include spaces. A comparison is made
on decipherment of ciphers with spaces and without spaces using
different n-gram English model.
2.4 INFORMATION THEORITIC APPROACH
Entropy is a statistical parameter that measures how much
information is produced on an average for each letter of a text in the
language. Redundancy measures the amount of constraint imposed
on a text in the language because of its statistical nature. Shannon
proposed a new method to estimate entropy and redundancy of a
language. This method uses the knowledge of the language statistics
possessed by the speakers of the language. It also depends on
predicting the next letter when the preceding text is known. Some
properties of an ideal predictor are developed.
28
An approach for finding n-gram entropy is developed [SHA 1951].
For this purpose a study is done on 26 letter English where spaces,
punctuation are ignored. The n-gram entropies are calculated from
letter, bigram and trigram frequencies. The estimated entropy values
for n =1,2,3 are 4.14,3.56 and 3.3 respectively. Based on the
frequencies of symbols in the reduced text, it is possible to set bounds
to the n-gram entropy of the original language.
The approach proposed [SHA 1948, SHA 1949] by Shannon deals
with the basic mathematical structure of secrecy systems. Shannon‟s
work defines theoretical secrecy which is defined as the immunity of a
system against cryptanalysis when the cryptanalyst is having
unlimited time and computing power for the analysis of cryptograms.
This is related to communication systems in which noise is present
and entropy, equivocation are applied to cryptography.
The work is also concerned with practical secrecy which is defined
as the level of security that is necessary to make the system secure
against an enemy with a limited amount of time and limited
computational resources for attacking the intercepted cryptogram.
This leads to methods for constructing systems which require a large
amount of work to solve. An analysis of the basic weakness of secrecy
systems is made. H Yamamoto presented [YAM 1991] a survey on
different information theoretic approaches in cryptology. The survey
addresses Shannon‟s cipher system, Simmons authentication
29
approach, wire tape channel, Secret sharing communication system
approaches.
Diffie and Hellman introduced [HEL 1977, DIF and HEL 1979]
another approach to achieve practical security based on
computational complexity. Trap door functions and one way functions
are used. Authenticity is introduced to prevent active attacks. Several
authentication mechanisms are developed in the literature. Simmons
proposed an authentication mechanism applicable to any type of
system. The conclusion drawn from this work is that information
theoretic approach is as important as computational complexity
approach.
Shannon in his information theoretic approach of cryptography
assumes that the computational abilities are unlimited. The work
proposed by Hellman is an extension to Shannon‟s theory. The
concept of matching a cipher to a language and the trade-off between
global and local uncertainty is developed. Hellman defined a model in
which the messages are divided into two subsets. One set is all
meaningful messages, each with the same apriori probability and
other with meaningless messages which are assigned with apriori
probabilities of zero.
Borissov and Lee computed [BOR and LEE 2007] bounds on the
theoretical measure for the strength of a system under known plain
30
text attack. Dunham proposed [DUN 1980] the key equivocation,
which is the conditional entropy of the key given cipher text and
corresponding plain text is proposed as a measure of strength of the
system. For simple substitution cipher lower and upper bounds were
found. This work resulted in concluding that the key recovery of
known-plaintext attack for substitution ciphers is more difficult when
it has many fixed points.
A study is carried out [MAU 1993, MAU 1999, RIV 1991, VER
1998, ZHA 2005] on estimating the key equivocation of secrecy
systems. In general when the length of the block is large, it is difficult
to find key equivocation. A simplified method for computing the key
equivocation for two classes of secrecy systems with Additive Like
Instantaneous Block (ALIB) ciphers is developed [ZHA 2005] by Zhang.
The criterion here is the key equivocation rather than error
probability. For simple substitution ciphers bounds are derived for the
message equivocation in terms of the key equivocation. The message
equivocation approaches faster to the key equivocation. It is also
observed that the exponential behavior of the message equivocation is
not determined by redundancy in the message source.
Maurer presents [MAU 1993] a review on the relation between
information theory and cryptography. Shannon‟s approach fixes some
lower bound on the size of the security to achieve a particular level of
security. Recent models contradict the Shannon‟s approach, where in
31
with a short key also it is possible to provide perfect secrecy. Models
like wire type and broad cost channels, privacy amplification are
considered for illustration.
A parametric evaluation and analysis of the behaviour of the
algorithms with respect to cryptographic strength is presented [PRI
and TOM 1987] by Prieto. In general Unicity distance is considered as
a parameter for evaluating the strength of a cipher. Prieto proposed
two more factors to evaluate the quality of the algorithm. One such
factor is invulnerability factor and the other is called quality factor.
According to Shannon‟s information theoretic approach, unicity
distance is a minimum length of cipher text required to determine the
key. But when cipher text length is less than unicity distance, the
predicted key has a non zero error probability for which an upper
bound is proposed [JAB 1996] by Jabri. It is observed that this
probability is inversely proportional to logarithm of key size and
directly proportional to redundancy of source.
Bauer computed [BAU 2007] unicity distance for different
substitution and transposition techniques using different n-gram
language models. The estimated values for n=1,2,3 are 167, 74 and 59
respectively. It is less uncertain with increasing length of the
cryptotext and after some length near the unicity distance the solution
becomes unproblematic, provided sufficient effort can be made.
32
Sujith Ravi and K.Knight carried out [RAN and KNI 2009-2] empirical
testing of Shannon‟s information theory for decipherment uncertainty
which includes the unicity distance. For n-gram language models with
n=1,2,3 the estimated values of unicity distance are 173,74 and 50
respectively which are similar with the results of Bauer. For real
ciphers, these unicity points do not match with the predicted
numbers. This difference is due to certain assumptions made by
Shannon in the computation of unicity distance for random ciphers.
The results confirm that the unicity distance is a function of the
language statistics used to attack the cipher. The unicity distance
becomes lower when more language statistics are incorporated into
the analysis.
Cryptanalysis is useful in finding the strength of a cryptosystem.
In a practical model, one can test the block cipher with different
known attacks and assess certain security level to it. To quantify the
security of a block cipher precisely and proving that it satisfies specific
security requirements is a difficult task. A parametric evaluation in
the form of Unicity distance is to be effectively incorporated in the
analysis, to provide information about number of spurious solutions
and a single solution to the given cipher. This helps in identifying the
strength of a crypto system.
33
2.5 PROBLEM STATEMENT
Shannon‟s model of perfect secrecy is the target for any researcher
in the present day context. The real world computational restrictions
impose restrictions on the upper bound of the above model. However,
selection of an algorithm based on the requirement of application is
the focused activity in the field of security systems. Shannon proposed
unicity distance, an ideal measure in this context. Message text and
the associated language are driving the design aspects of crypto
systems in the wake of large spread use among various linguistic
populations. In this context, language complexity versus strength of
the algorithm need to be evaluated for effective selection criteria,
before the design of a crypto system. The present work is aimed at
addressing these issues.
Various language models are proposed in a holistic manner and
most of them are closely associated with Roman script based
languages. Indic scripts posses different characteristic nature with a
primitive unit called syllable. Machine representation of syllable is
found with variable block size in these scripts. In this context it is
highly difficult to adopt block ciphers while addressing Shannon‟s
model. An algorithm is proposed for encipherment and decipherment
of Indian language based message text. The statistical behaviour of
language units and their significance in the wake of language
redundancy is important apriori knowledge while addressing the
34
decipherment problems. A complete study of above parameters on
various languages with a specific reference on Indian scripts is
addressed in the present work.
2.6 METHODOLOGY
Four languages viz. English, Telugu, Kannada, Hindi are
considered while addressing the issue of unicity distance versus
language redundancy. A large corpus size of 10,00,000 characters,
32,00,000, 17,00,000 and 9,00,000 code points is created for the
purpose of evaluation. Adequacy of the model is evaluated using
decipherment approach, Cipher text only attack is the main algorithm
adopted for the present evaluation. The conditional and unconditional
probability distributions of language units unigrams, bigrams and
trigrams is computed for building apriori knowledge. Test samples of
varying size from 6000 to 1,10,000 are used for evaluation purpose in
the decipherment approach. Retrieval efficiency is considered as a
measure equivalent to unicity distance while concluding the strength
of the algorithm based on language complexity.
2.7 ORGANISATION OF THE THESIS
In Chapter 1 issues related to cryptanalysis including the recent
trends are introduced. The limitations of key size and the necessity of
algorithm to be made public are also discussed. Information theoretic
approach for evaluation of strength of the crypto system is introduced
in this chapter.
35
Chapter 2, the current chapter elaborated the review on different
decipherment issues and their relative merits and demerits. A detailed
description on cryptanalysis using language models is presented. A
review on information theoretic approach proposed by Shannon and
reintroduction of this model in the recent period is explained in this
chapter.
Chapter 3 introduces information theoretic approach and its
applicability in decipherment process. Shannon‟s concept of ideal
secrecy system is discussed. The role of entropy and redundancy and
their impact on estimating the strength of the algorithm are explored.
The parametric evaluation using unicity distance on four different
languages viz. English, Telugu, Kannada, Hindi are presented for
varying key sizes.
Chapter 4 describes the adequacy of the proposed model. The
cryptographic model for encryption and decryption of Indic scripts is
proposed along with decipherment model. Evaluation is carried out
using unconditional and conditional probability distribution
approaches for English, Telugu, Kannada, Hindi. The text retrieval
efficiency is compared for unigram, bigram, trigram unconditional
probability distributions. Significance of conditional probability
distribution and its impact on text retrieval is emphasised. The
supportive evaluation is presented in this chapter.
36
Chapter 5 provides a detailed summary of the work with salient
features. This chapter explores open problems for future
enhancements.