Top Banner
Models and approaches for Differential Power Analysis Andrej Šimko [email protected] 20.05.2014 1 Introduction With the increasing computational power and new ways of attacking the cryptosystems, there is a need for using larger key sizes everywhere. In 2011 NIST recommended that by the end of 2014, the transition to keys with at least 112 bits of security should be accomplished to accommodate the new security needs for the Federal government [9]. As for the comparison to other algorithms like RSA, you can see Fig. 1. Comparison table of key sizes. According to this table, RSA 1024 is no longer found secure enough for the purposes of today. However as we will show you, longer keys don’t necessarily mean better security, and can sometimes be obtained when the implementation of cryp- tographic algorithms has some faults and weaknesses that are not connected with mathematical model behind the cryptography of those algorithms. The most prominent results in getting the pri- vate keys nowadays lies in exploiting the physical information leakage and implementation details rather than trying to brute-force keys or employ advance cryptanalysis. There are numerous different ways of using the side channel attacks to the advantage in attacking the physical aspects of cryp- tosystems – by taking into an account implementation-specific characteristics to recover the secret parameters, such as output timing, power consumption, electromagnetic radiation, thermal radia- tion, or acoustic emanations. This is possible because of physically observable phenomenons that are cause by execution of computing tasks in microelectronic devices. For example microprocessor con- sumes power and time in computing its tasks, dissipates heat, radiates electromagnetic field or even makes noise. Fig. 1. Comparison table of key sizes [3].
13

Models and approaches for Differential Power Analysis

Jul 16, 2015

Download

Technology

Andrej Šimko
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Models and approaches for Differential Power Analysis

Models and approaches for Differential Power Analysis

Andrej Šimko

[email protected]

20.05.2014

1 Introduction With the increasing computational power and new ways of attacking the cryptosystems, there is a need for using larger key sizes everywhere. In 2011 NIST recommended that by the end of 2014, the transition to keys with at least 112 bits of security should be accomplished to accommodate the new security needs for the Federal government [9]. As for the comparison to other algorithms like RSA, you can see Fig. 1. Comparison table of key sizes. According to this table, RSA 1024 is no longer found secure enough for the purposes of today. However as we will show you, longer keys don’t necessarily mean better security, and can sometimes be obtained when the implementation of cryp-tographic algorithms has some faults and weaknesses that are not connected with mathematical model behind the cryptography of those algorithms. The most prominent results in getting the pri-vate keys nowadays lies in exploiting the physical information leakage and implementation details rather than trying to brute-force keys or employ advance cryptanalysis. There are numerous different ways of using the side channel attacks to the advantage in attacking the physical aspects of cryp-tosystems – by taking into an account implementation-specific characteristics to recover the secret parameters, such as output timing, power consumption, electromagnetic radiation, thermal radia-tion, or acoustic emanations. This is possible because of physically observable phenomenons that are cause by execution of computing tasks in microelectronic devices. For example microprocessor con-sumes power and time in computing its tasks, dissipates heat, radiates electromagnetic field or even makes noise.

Fig. 1. Comparison table of key sizes [3].

Page 2: Models and approaches for Differential Power Analysis

One of the most widely used side-channel attacks is based on power analysis. In this paper we will introduce various attack approaches based on exploiting the power analysis of the device, described models which are used to achieve the goal, and briefly mention some countermeasures that can be used to protect cryptographic devices against these kinds of attacks.

2 Side channel attacks and breaking cryptosystems Theoretically, breaking any good designed cryptosystems like RSA with the key length of 1024 bits, or AES with 128 bits is still not feasible with the computational power available nowadays. Side channel attacks are however not targeting the design flaws of these algorithms, nor are they trying to brute-force the entire key space or exploit any theoretical weaknesses (compared to cryptanalysis). They are trying to attack the physical layer and implementation of the particular cryptosystem.

The longest key that has been successfully broken to this day without the help of any side channel attacks remains to be the RSA 768-bit. This was accomplished at the beginning of 2010 by using the Number Field Sieve (NFS) and took 2 and a half years on many hundreds of machines (authors of this paper claimed that the same result could be achieved on a single core 2.2GHz AMD Opteron proces-sor with 2GB RAM 15 000 years [1]).

Other accomplishment was factoring 1061-bit Mersenne number (21061 − 1) by the Special Number Field Sieve (SNFS) in 2012. Although authors say that factoring this number was not as difficult to compute as was the factoring of RSA 768-bit, it still maintains the record of the largest factored number so far. Computational power used to factor this number was equal 3 CPU centuries.

With the exploitation of hardware and implementation vulnerabilities, even larger numbers can be factorized in orders of magnitudes shorter time consumed. In 2010, researchers from the University of Michigan developed a way of successfully breaking down RSA 1024-bit, which took approximately 100 hours. To prepare the attack they injected transient faults in the target machine by regulating the voltage supply of the system [4]. The attacked RSA algorithm implementation was the one used in OpenSSL 0.9.8i, which was a widely used package.

Success in the side-channel attacks hinders in the breaking down cryptosystem into the smaller parts and solving them one by one. For example all block ciphers have multiple internal states that are kept secret and final state which is then outputted. We introduce the Intermediate state as the in-ternal state of a block cipher. For example, DES has 16 rounds and thus states [1…15] are intermedi-ate states. Side channel attacks are trying to find additional information which can be exploited about these intermediate states, or about the operations used in the transition from one round to another. Attacker can guess parts of the key (and check if they are correctly guessed), or employ some statistical properties that would make checkable values slightly more non-random.

3 Introduction to the power analysis This non-invasive, easily-automated side channel attack measures the power consumption of a device that is doing some cryptographic operations and can be mounted on a black-box device (it doesn’t necessarily require deeper knowledge of attacked device). Since in this scenario the attacker is usually passive, it is also hard or impossible to detect the leakage of secret information. With the power analysis, the term trace needs to be introduced. Trace is a set of power consumption meas-

Page 3: Models and approaches for Differential Power Analysis

urements taken across a cryptographic operation. For example, 1 millisecond operation could be sampled at 10 MHz yielding a trace containing 10000 points. The better the sampling rate of the de-vice, the better the resolution and accuracy of traces, which results in the lower percentage of errors. There are many papers and studies nowadays that focus in exploiting the power analysis and pushing the boundaries for more successful attack scenarios further and further. On the other hand all papers propose also countermeasures that can be implemented to counter these attacks, so it’s up to the right people and companies to stay updated on this progress to make their devices protected from the unwanted leakages. In 2005, it was estimated that roughly 200 papers were published on the power analysis attacks itself [17].

4 Leakage models 4.1 Hamming weight model The most simple power model is based on the assumption that the amount of power consumed dur-ing performing the operation is proportional to the number of ‘1’ bits. In other words, the consumed current is related to the energy required to flip the bits from one state to the next. This model was first used in Kocher’s original DPA paper. Hamming weight is simply the number of bits set to one. This model often helps if the previous value of register is unknown (but is still a constant) and is used for software crypto implementations (data buses have to be charged to ‘1’) [6]. We introduce 𝑑𝑗 as a bit value (can only possess values 0 or 1).

𝐻𝑊 = � 𝑑𝑗𝑚−1

𝑗=0

4.2 Hamming distance model In Hamming distance model, the previous value needs to be known. The key assumption used in this model lies in the proposition that every transition from 1 to 0 and from 0 to 1 requires the same amount of energy. Let 𝑥 and 𝑥′ be two consecutive values of running algorithm (where 𝑥′ is an un-known constant machine word at the reference state). The Hamming distance model only represents consumption of data-dependent part, rather than the entire power consumption of a chip [12]. This model can be best used in registers and hardware crypto units [6].

𝐻𝐷(𝑥, 𝑥′) = 𝐻𝑊(𝑥 ⊕ 𝑥′)

4.3 Switching distance model We can also take into consideration, that there is indeed a difference in power consumption be-tween transition from 0 to 1 and the other way around. More accurate model is then introduced based on this assumption – the switching distance model.

Other advantage lies in defeating some countermeasures that have been deployed in order to pre-vent power analysis attack. One of such countermeasures is pre-charging buses with a random value, because Hamming distance model can’t be used to predict the leakage if either one of the two values is unknown.

We introduce normalized difference of the transition leakages as 𝛿, where 𝑃0→1 is the probability of 0 → 1 transformation, and 𝑃1→0 is the probability of 1 → 0 transformation.

Page 4: Models and approaches for Differential Power Analysis

𝛿 =𝑃0→1 − 𝑃1→0

𝑃0→1

This leads to the improvement of power consumption model shown in Table 1. Improved power con-sumption model [13]:

Table 1. Improved power consumption model [13].

4.4 Signed distance model This idealized emanation model takes into the consideration that charging and discharging the capac-itance involves a leakage of +1 or -1. The signed distance is special case of switching distance model with 𝛿 = 2. As a consequence, the power consumption is spread over a larger set of discrete values (compared to the Hamming weight and Hamming distance) which should introduce great improve-ment to the previous approaches.

The main drawback of this approach is the more detailed knowledge of the chip under the attack, which can require semi-invasive approach to make it successful. This fact holds because the sign in-formation is only accessible if the probe can be localized accurately [13].

𝑆𝐷(𝑥, 𝑥′) = �𝑥′(𝑖) − 𝑥(𝑖)𝑛−1

𝑖=0

5 Simple Power Analysis (SPA) SPA is the most primitive technique based on power analysis, which directly interprets the power consumption and tries to deduce the operation that is being performed. It requires only a small amount of traces. You can see an example of SPA measurements in Fig. 2. SPA analysis of a single run of AES encryption on a smart card.

Transition Power

0 → 0 0

0 → 1 1

1 → 0 1 − δ

1 → 1 0

Fig. 2. SPA analysis of a single run of AES encryption on a smart card [5].

Page 5: Models and approaches for Differential Power Analysis

As we can see in Fig. 2, there is a pattern repeating itself 10 times, which corresponds to 10 rounds of AES-128. Since the knowledge of AES-128 performing 10 rounds is no secret, one can object that this isn’t much of an attack. SPA can however shed some light on the inner workings of an unknown or proprietary algorithm; can help in determining the parts of traces which could be relevant to an at-tacker; or provide additional useful information about the sequence of operations which depends on the data flow. SPA is thus also helpful practically everywhere where the conditional branches which are depending on a secret parameter occur, since it can reveal the se-quence of the instructions executed. We can take a look at Fig. 3. power traces of 2 rounds of DES. At the up-per part we have an execution plan with jump, whereas in the button trace there is no jump performed. The point of the divergence is clearly visible at the clock cycle 6.

5.1 Examples of SPA usage:

• DES key schedule: breaks 54-bit key into two 28-bit halves. Each half is then rotated left inde-pendently on the other one. A conditional branch is present that checks if bit ‘1’ was shifted at the end so that it could be wrapped around. Hence bits ‘0’ and ‘1’ have different execution paths and SPA features.

• DES permutations: since DES implementations perform many bit permutations, conditional branching in software or microcode can cause significant differences in the power consumptions for ‘0’ and ‘1’ bits.

• Comparison: conditional branch is typically used when a comparison is not successful (compared values are different).

• Multipliers: modular multiplication circuits tend to leak lots of information about processed data. Although the leakage function depends on the multiplier design, it is often strongly correlated to operand values and Hamming weights.

• Exponentiators: are performed either by the multiplication or the square algorithm. If these two implementations of algorithm differ (which would be a reasonable choice considering some specif-ic optimizations for the square algorithm would result in a faster code execution), the power con-sumption would also be different and thus it would leak some information about the secret expo-nent. An example of usage of the SPA on RSA squaring can be seen in Fig. 4. SPA of RSA. Obviously, using square and multiply algorithms results in different traces.

Fig. 3. power traces of 2 rounds of DES [11].

Page 6: Models and approaches for Differential Power Analysis

Fig. 4. SPA of RSA [10].

6 Differential Power Analysis (DPA) DPA was first introduced by Paul Kocher, Joshua Jaffe and Benjamin Jun in 1998 [11]. It is used for finding the data dependencies in the power consumption traces, and is used for recovering secret key by analyzing its influence on the known data. It however requires better equipment (higher sam-ple rate of traces) and employment of statistical analysis. Where SPA needed only one sample, DPA needs large number of samples in which the same secret key was used to operate on different data. It usually has two phases – data collection and data analysis, which makes extensive use of statistical analysis for noise cancelation (to improve Signal-to-Noise Ratio), which along with the error correc-tion results in gaining additional information. DPA can be automated by using little or no information about the target implementation because it locates correlated regions in a device’s power consump-tion. Although the knowledge of plaintext is not required, the DPA can use known plaintext or known ciphertext to find the secret key. Leakage of symmetric algorithms is much lower than the leakage of asymmetric ones, because of the higher computational complexity of multiplication operations used in the asymmetric crypto.

The noise is seen as a hindrance while performing the DPA. It can be introduced by several sources, for example electro-magnetic radiation or thermal noise, but also by wrong alignment of the meas-urements (temporal misalignment) or quantization errors due to mismatching of the clock of the sample and the clock of the device. The proper alignment of the power traces is thus crucial and eve-ry instruction needs to be set at the proper sample offset. This can be achieved either with pattern-matching methods (since majority of data is instruction-oriented rather than data-oriented) or with least squares method (sum of squares is different between the two traces).

6.1 Explanation of work of DPA

• Attacker observes multiple encryption operations (with different sets of data) and captures their power traces

• Attacker records ciphertexts (no knowledge of plaintext is needed) • Power traces are partitioned into subsets according to a property of the state being processed • Statistical methods are used to find differences in the subsets (when they are observed, a data

leak has been detected) to determine whether a key block guess is correct • Attacking is done one intermediate state after another until the output value state

Page 7: Models and approaches for Differential Power Analysis

Let selection function 𝐷(𝐶, 𝑏,𝐾𝑠) compute the value of target bit 𝑏, given the ciphertext 𝐶 and key guess 𝐾𝑆. Let 𝑚 be number of collected power traces (observed encryption operations) of 𝑘 samples each (𝑇1:𝑚[1: 𝑘]) and corresponding ciphertext values 𝐶1:𝑚 (which are also recorded by the attack-er). Data will be sorted into two groups – either 𝐷(𝐶, 𝑏,𝐾𝑠) = 0 or 𝐷(𝐶, 𝑏,𝐾𝑠) = 1. If the key guess 𝐾𝑆 is correct, the average power trace for (𝐶, 𝑏,𝐾𝑠) = 1 will be slightly higher at the point of correla-tion and the average trace for 𝐷(𝐶, 𝑏,𝐾𝑠) = 0 will be slightly lower. If however the key guess 𝐾𝑆 is incorrect, the selection function 𝐷(𝐶, 𝑏,𝐾𝑠) will be equal to the correct value for bit 𝑏 with the prob-ability of 50% for each ciphertext, yielding the average traces that are approximately equal. Let ∆𝐷[𝑗] be the differential trace, which is computed between the two average traces. For an incorrect key guess 𝐾𝑆 the ∆𝐷[𝑗] should approach zero, and for a correct key guess 𝐾𝑆 the ∆𝐷[𝑗] should approach the target bit’s power contribution at the correlated sample(s). The correct value of 𝐾𝑆 can thus be identified from the spikes in its differential trace ∆𝐷[𝑗]. These assumptions and equations are pre-sented in the Kocher’s original paper – see Fig. 5. differential power trace formula:

Fig. 5. differential power trace formula [11].

Correctly guessed key value 𝐾𝑆 can be seen in Fig. 6. DPA traces. It shows 4 trac-es, where the highest one is the average power consumption during DES opera-tions. Below, there are 3 different traces, from which the highest one was comput-ed with the correct key guess 𝐾𝑆 and low-er two were computed using incorrect key guesses 𝐾𝑆. For these computations 1000 different samples with known plaintexts were used.

Fig. 6. DPA traces (one correct and two incorrect) [11].

Page 8: Models and approaches for Differential Power Analysis

6.2 DPA on public key cryptography DPA can also be used for attacking the public key cryptography (which, as said earlier, leaks more information than the symmetric one). There are two main statistical methods used to do that: either attacking the secret parameter with the distance of mean method or with the correlation method. Since detailed explanation of these statistical models are out of the scope of this paper, see [14] for further information. There are three following attack scenarios (SEMD, MESD, ZEMD) that have been first used in [18]. The overall summary of these 3 approaches can be found in Table 2.

6.2.1 Single-Exponent, Multiple-Data Attack (SEMD) SEMD assumes attacker’s ability to exponentiate many random messages with at least one known (public) exponent and a secret exponent. This situation can happen for example in a smart card sys-tem that supports the ISO7816 standard “external authenticate” command [18], which forces the smartcard to use its public key. The main idea behind the SEMD is the observation that by comparing the two obtained power signals (one from using public exponent, one from using secret exponent), the attacker can see where they differ and thus learn the secret exponent. DPA technique is used, because the present noise makes the direct comparison of power signals is unreliable. Average sig-nals are calculated and subtracted as in the mean method. This will make random data disappear, and only those signals dependant on the parameter will average out to two different values depend-ing on the operation performed. As it can be seen in Fig. 7. SEMD attack results, the comparison in using secret and known exponents is leaking information, because the energy in the DPA signal is greater when these two values differ from each other. For this particular case, 20 000 exponentiations were used.

6.2.2 Multiple-Exponent, Single-Data Attack (MESD) MESD assumes the adversary with access to a device that can use exponent’s by his (attacker’s) choosing. This approach improves Signal-to-Noise Ratio (SNR) and relies on the assumption that at-tacker can exponentiate a constant value (that might not be known to him) repeatedly with the pa-rameters chosen by him. The number of exponentiation operations needed per 1 exponent bit is roughly 200 (see Table 2.). Attacker is systematically guessing one bit at the time, and when he dis-covers it, he then moves to the rest of them.

Fig. 7. SEMD attack results [18].

Page 9: Models and approaches for Differential Power Analysis

6.2.3 Zero-Exponent, Multiple-Data Attack (ZEMD) ZEMD attack assumes the attacker knows the modulus and the exponentiation algorithms being used in the hardware, but does not require attacker to know any exponents. The knowledge of the algo-rithm and hardware is the key in predicting the intermediate values of square-and-multiply algorithm using an offline simulation. Since the number of algorithms that compute modular exponentiations is small, it is likely that the adversary will learn which one is used.

Table 2. Summary of Power Analysis Attacks on Exponentiations [18]

6.3 High-Order DPA While the normal DPA is used to analyze information about the single event between samples, HP-DPA is used to correlate information between multiple cryptographic sub-operations (not just one operation). HP-DPA thus combines two (second-order DPA) or more (nth-order DPA) samples within a single power trace. Selection function can assign different weights to different traces, and can also be used to divide traces into two or more categories [8]. Let 𝑃[𝑖] be the power consumption at the time 𝑖 which can be split into three parts. Let 𝐻𝑊[𝑖] be the value of Hamming weight at the time 𝑖, let 𝜀 represent the incremental amount of power for each extra ‘1’, let 𝐿 represent the additive constant portion of the total power, and let 𝑛 be the noise (although this part can be ignored if sufficient sta-tistical averaging is used). The linear relationship for total power consumption can be written as 𝑃[𝑖] = 𝜀 ∙ 𝐻𝑊[𝑖] + 𝐿 + 𝑛.

6.4 Automated template DPA In the contrast with normal DPA and other approaches in which the noise is viewed as a hindrance that has to be taken care of by reduction or elimination, this approach focuses on modeling the noise itself, which is then used to extract desired information. Since it uses all possible information availa-ble in each sample for classification, it makes it the strongest form of side-channel attack possible (in an information theoretic sense given the few samples that are available) [7]. The template attack has one key requirement: the attacker has to have access to an identical device which he can program (this process is called the profiling). This is because the adversary needs to capture the precise, de-tailed signal of the noise, called the template. While previous approaches were averaging noise for elimination, with the right characterization, one can get more valuable information with sampling the noise even from a single, much weaker sample. Automated template attack can also be used to stream ciphers with ephemeral keys, like RC4 (in this case only 1 sample is required to break it) [7]. Interesting attack that also analysis the noise itself (which however doesn’t count on power con-sumption) is breaking RSA keys based on the collected EM signal emanations from SSL accelerator inside a closed server, which was done from 15 feet away [7].

This approach defeats some counter-measures that depend on the assumption that an adversary can’t obtain more than one sample, or can obtain only a limited number of samples.

Page 10: Models and approaches for Differential Power Analysis

7 Correlation Power Analysis (CPA) Basic idea of this approach lies in realization, that on the correct time, the power consumption of all traces is correlated with the correct key. During different times and other keys there should be only a low correlation in the traces. We are therefore trying to discover the correct key value 𝑐𝑘 and time when it is used 𝑐𝑡. For this purposes the model of the power consumption has to be created for the use in the analysis phase of the attack. This model needs to approximate the power consumption of the target cryptographic device during an encryption operation. The resulting power predicted by the model will then be correlated to the actual measured power consumption using a key guess. As in normal DPA, the highest peak of the correlation plot gives away the correct key.

Suppose we have 1000 traces, each consisting of 1 million points. Each trace uses a different plaintext, so we have 1000 plaintexts. Since the key is unknown, we have 256 guesses for the first byte (256 different values on 1 byte). We have hypothetical matrix of 1000x256 power values and matrix of 1000000x256 correlation values with peak of the correct key at the correct time (𝑐𝑘 , 𝑐𝑡) [10].

We can compute the power consumption by using Hamming distance model and introducing 𝑏 as the power consumption we are not interested in (offsets, time dependant components and noise) and 𝑎 as a scalar gain between the Hamming distance and 𝑊 which is the power consumed.

𝑊 = 𝑎𝐻(𝐷⊕ 𝑅) + 𝑏

8 Mutual Information Analysis (MIA) One of the latest attacks (developed after the SPA, DPA, high-ordered DPA, template DPA, and CPA) was developed by applying information-theoretic distinguisher to devise an attack without any de-vice characterization. It uses only generic assumptions and is thus more effective. Opposed to all previous attacks which tried to reduce the number of measurements required for successful attack by employing more sophisticated power consumption models, the MIA takes a different approach. The number of measurements taken is hence increased. In contrast to other approaches, the MIA doesn’t require a training device (like template attacks), nor does it have any restrictive assumptions about the real leakage functions (e.g. Hamming distance). The MIA estimates the full probability den-sity for each bit of hypothetical key from observations of the target device’s leakage.

Let 𝑋 and 𝑌 be random variables. The Mutual Information [19] is given by:

𝐼(𝑋;𝑌) = 𝐻(𝑋) − 𝐻(𝑋|𝑌) = 𝐻(𝑋) + 𝐻(𝑌) − 𝐻(𝑋,𝑌) = 𝐼(𝑌;𝑋).

The Mutual Information satisfies 0 ≤ 𝐼(𝑋;𝑌) ≤ 𝐻(𝑋). If 𝑋 and 𝑌 are independent, the lower bound is reached. When 𝑋 is fully determined by 𝑌, the upper bound is achieved. The larger the Mutual Information, the closer the relation between 𝑋 and 𝑌 can be observed.

The MIA can also be useful in attacking device which has good countermeasures against DPA in place – for example Dual Rail Precharge (DRP) logic [19], which was recommended to mediate threads of DPA.

Page 11: Models and approaches for Differential Power Analysis

9 Countermeasures The main problem of the power analysis attack methods is that they are most of the times passive, which makes the detection of the attack nearly impossible. The best countermeasure for all side channel attacks would be to render the getting of the measurements for the power analysis impossi-ble. This implies some kind of aggressive shielding, which would however add significant increase in cost and size (it would however make the attack invasive and detectable). Different countermeasures can be used to cope with either only specific kind of power analysis attack, or more general coun-termeasures can help in coping with more attack scenarios.

To this date, there are many different literatures, recommendations, studies and papers on how to cope with power analysis side channel attacks. For example, in 2012 Air Force Institute of Technology released 100-pages long thesis specialized in protecting the AES from the DPA [15]. Another interest-ing 161-pages long book on countermeasures of side channel attacks has been released by the Universita Di Roma [16]. Last paper mentioned here about the countermeasures is giving a nice and comprehensive list of many side channel attacks developed within first 10 years of their first publica-tion [17]. We will only briefly introduce some countermeasures, because going into details is out of the scope of this paper.

9.1 Main categories of countermeasures

9.1.1 Randomization Introducing randomized data that may leak through side channels is the first approach. We can add random timing shifts and wait states, inserting dummy instructions, randomize execution of opera-tions, start executing operations at different offsets, randomize parameters of elliptic curves…

9.1.2 Blinding Usage of blinding in cryptography is between the two parts – the client and the provider. The client has an input x, and provider has the function y=f(x). Client however doesn’t want the provider to learn x, nor y, and therefore blinds message x with encoding function E(x). The provider therefore computes f(E(x) ) and client then applies decoding function to obtain y=D(f(E(x) ) ). This counter-measure is typically used in hardware modules [17].

9.1.3 Masking Masking is a randomization of intermediate values of the device which are then processed by the algorithm operations. This technique is typically used in software [17]. After masking message or key, and running through all computations, there is a need of unmasking the values at the end called mask correction. In the case of AES, we have a random mask 𝑚, masking function 𝑓 and byte value 𝑥. We can compute mask with 𝑓(𝑥,𝑚) = 𝑥 ∗ 𝑚, where ∗ can be either bit-wise XORing (additive mask-ing) or multiplication over a finite field (multiplicative masking). Masking is then applied on all the intermediates bytes during the AES computations.

Page 12: Models and approaches for Differential Power Analysis

9.2 Countermeasures against the SPA Randomizing exponentiation algorithm would probably be the best course of action. This can be done in a way that the computations would start from the random starting point in the exponent and continue onwards to the most significant bit. Then algorithm would return to the starting point and finished the exponentation computing towards the least significant bit.

9.3 Countermeasures against original DPA The DPA attacks need more samples to eliminate noise and get desired features. There are number of things that can be done for securing algorithms against the DPA:

• Forcing time misalignment by introducing random timing shifts so that computed mean values don’t correspond to the consumptions of the same instructions

• Reduction of signal sizes such as: ─ using constant execution path ─ choosing critical assembler instructions with those which “consumption signature” is difficult to

analyze ─ balancing Hamming Weights and state transitions

• Introducing noise into measurements • Designing cryptosystems with realistic assumptions about the underlying hardware (for example

using nonlinear key update procedures) • Message blinding would prevent MESD and ZESD, but not SEMD • SEMD can be prevented by exponent blinding • Dual Rail Precharge (DRP): the main idea behind DRP is using pair of bits instead of just one, for

example 0 = (0,1) and 1 = (1,0).

9.4 Countermeasures against the template attacks Since the template attacks have a requirement of attacker possessing an identical device and collect-ing a large numbers of templates in an adaptive matter, it this task can be made harder. Randomiza-tion in computations which is out of the control of the attacker on his own experimental device is the key in defeating the template side channel attacks. Such randomization techniques can be ad-dress/data scrambling, or blinding/masking of data and key bits. Minimizing the contamination caused by the use of sensitive information in clear text is also the recommended way of building the secure implementation.

10 Conclusion We have shown that the power analysis techniques are still being researched, because they pose a big thread to security of cryptographic devices. By wrong implementation of the cryptographic al-gorithms or developing algorithms without the real knowledge of the underlined hardware, many power-analysis based attacks are possible, that can break even cryptosystems that are not feasible to break from mathematical point of view. New models and attacking scenarios are emerging constant-ly, which makes more systems vulnerable to side-channel attacks. We briefly introduced insights into the power analysis attacks – SPA, different kinds of DPA, CPA, and MIA. All of these approaches use different attacking models; require different number of samples; or deeper analysis. Even counter-measures employed today are not always effective against all kinds of power analysis attacks.

Page 13: Models and approaches for Differential Power Analysis

11 Resources

[1] KLEINJUN, Thorsten. Factorization of a 768-bit RSA modulus. [online]. 1.4. 18.02.2010 [cit. 2014-05-11]. Available from: http://eprint.iacr.org/2010/006.pdf

[2] CHILDERS, Greg. Factorization of a 1061-bit number by the Special Number Field Sieve [online]. 04.08.2012 [cit. 2014-05-11]. Available from: https://eprint.iacr.org/2012/444.pdf

[3] NIST Special Publication 800-57. Recommendation for Key Management – Part 1: General. 3. revision. July 2012. Available from: http://csrc.nist.gov/publications/nistpubs/800-57/sp800-57_part1_rev3_general.pdf

[4] PELLEGRINI, Andrea. UNIVERSITY OF MICHIGAN. Fault Based Attack of RSA Authentication [online]. 2010 [cit. 2014-05-11]. Available from: http://web.eecs.umich.edu/~valeria/research/publications/DATE10RSA.pdf

[5] STANDAERT, Francois-Xavier. Introduction to Side-Channel Attacks [online]. 2010 [cit. 2014-05-11]. Available from: http://www.springer.com/cda/content/document/cda_downloaddocument/9780387718279-c1.pdf?SGWID=0-0-45-855982-p173805617

[6] Finding the key in the haystack: A practical guide to Differential Power Analysis. ZN000H. [online]. 30.12.2009 [cit. 2014-05-11]. Available from: http://events.ccc.de/congress/2009/Fahrplan/attachments/1502_dpa_slides_26c3.pdf

[7] CHARI, Suresh. Template Attacks [online]. 2003 [cit. 2014-05-11]. Available from: http://saluc.engr.uconn.edu/refs/sidechannel/chari02template.pdf

[8] MESSERGES, Thomas S. Using Second-Order Power Analysis to Attack DPA Resistant Software [online]. 2000 [cit. 2014-05-12]. Available from: http://download.springer.com/static/pdf/989

[9] BARKER, Elaine. NIST. NIST Special Publication 800-131A: Transitions: Recommendation for Transitioning the Use of Cryptographic Algorithms and Key Lengths [online]. 2011 [cit. 2014-05-14]. Available from: http://csrc.nist.gov/publications/nistpubs/800-131A/sp800-131A.pdf

[10] OREN, Yossi. Information Security – Theory vs. Reality: Lecture 3: Power Analysis [online]. 2011 [cit. 2014-05-14]. Available from: http://www.cs.tau.ac.il/~tromer/courses/infosec11/lecture3.pptx

[11] KOCHER, Paul, Joshua JAFFE a Benjamin JUN. Differential Power Analysis [online]. 1998 [cit. 2014-05-14]. Available from: http://www.cryptography.com/public/pdf/DPA.pdf

[12] BRIER, Eric. GEMPLUS CARD INTERNATIONAL. Correlation Power Analysis with a Leakage Model [online]. 2004 [cit. 2014-05-11]. Available from: https://www.iacr.org/archive/ches2004/31560016/31560016.pdf

[13] PEETERS, Eric, Francois-Xavier STANDAERT a Jean-Jacques QUISQUATER. Power and Electromagnetic Analysis: Im-proved Model, Consequences and Comparisons [online]. 2007 [cit. 2014-05-15]. Available from: http://svn-crypto.dice.ucl.ac.be/crypto/services/download/publications.pdf.ac0e15acc12f1794.7064663235322e706466.pdf

[14] AIGNER, Manfred a Elisabeth OSWALD. Power Analysis Tutorial [online]. 2001 [cit. 2014-05-19]. Available from: https://www.iaik.tugraz.at/content/research/implementation_attacks/introduction_to_impa/dpa_tutorial.pdf

[15] FRITZKE, Austin W. AIR FORCE INSTITUTE OF TECHNOLOGY. Obfuscating Against Side-Channel Power Analysis: Us-ing Hiding Techniques for AES [online]. 2012 [cit. 2014-05-19]. Available from: http://www.dtic.mil/dtic/tr/fulltext/u2/a557233.pdf

[16] GIANCANE, Luca. Side-Channel Attacks and Countermeasures: Design of Secure IC's Devices for Cryptographic Ap-plications. 2011. ISBN 978-3847371526. Available from: http://padis.uniroma1.it/bitstream/10805/975/2/Giancane_PhD.pdf

[17] ZHOU, YongBin a DengGuo FENG. STATE KEY LABORATORY OF INFORMATION SECURITY, Institute of Software, Chinese Academy of Sciences. Side-Channel Attacks: Ten Years After Its Publication and the Impacts on Cryptographic Module Security Testing [online]. 2005 [cit. 2014-05-19]. Available from: http://csrc.nist.gov/groups/STM/cmvp/documents/fips140-3/physec/papers/physecpaper19.pdf

[18] MESSERGES, Thomas S., Ezzy A. DABBISH a Robert H. SLOAN. Power Analysis Attacks of Modular Exponentiation in Smartcards [online]. 1999 [cit. 2014-05-19]. Available from: http://saluc.engr.uconn.edu/refs/sidechannel/messerges99power.pdf

[19] GIERLICHS, Benedikt, Lejla BATINA a Pim TUYLS. Mutual Information Analysis: A Generic Side-Channel Distinguish-er [online]. 2008 [cit. 2014-05-19]. Available from: http://link.springer.com/chapter/10.1007%2F978-3-540-85053-3_27