Welcome message from author

This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript

https://doi.org/10.1007/s10836-020-05904-2

Formal Verification of ECCs for Memories Using ACL2

MahumNaseer1 ·Waqar Ahmad1 ·Osman Hasan1

Received: 12 April 2020 / Accepted: 2 September 2020© Springer Science+Business Media, LLC, part of Springer Nature 2020

AbstractDue to the ever-increasing toll of soft errors in memories, Error Correction Codes (ECCs) like Hamming and Reed-SolomonCodes have been used to protect data in memories, in applications ranging from space to terresterial work stations. Inpast seven decades, most of the research has focused on providing better ECC strategies for data integrity in memories,but the same pace research efforts have not been made to develop better verification methodologies for the newer ECCs.As the memory sizes keep increasing, exhaustive simulation-based testing of ECCs is no longer practical. Hence, formalverification, particularly theorem proving, provides an efficient, yet scarcely explored, alternative for ECC verification. Wepropose a framework, with extensible libraries, for the formal verification of ECCs using the ACL2 theorem prover. Theframework is easy to use and particularly targets the needs of formally verified ECCs in memories. We also demonstrate theusefulness of the proposed framework by verifying two of the most commonly used ECCs, i.e., Hamming and Convolutionalcodes. To illustrate that the ECCs verified using our formal framework are practically reliable, we utilized a formal record-based memory model to formally verify that the inherent properties of the ECCs like hamming distance, codeword decoding,and error detection/correction remain consistent even when the ECC is implemented on the memory.

Keywords Error Correction Codes (ECCs) · Memory soft errors · Hamming codes · Convolutional codes · Formalverification · Theorem proving · ACL2

1 Introduction

Soft errors are type of errors that do not cause permanentdamage to the semi-conductor devices [56], yet leading totemporary faults in them. In particular, radiation inducedsoft errors have been a major concern in semi-conductordevices since 1970s [12, 60]. In a long chain of events,both the high speed protons in cosmic rays and the alphaparticles emitted during the decay of radioactive impurities

Responsible Editor: V. D. Agrawal

� Mahum Naseermnaseer.msee16seecs@seecs.edu.pk

Waqar Ahmadwaqar.ahmad@seecs.edu.pk

Osman Hasanosman.hasan@seecs.nust.edu.pk

1 School of Electrical Engineering and Computer Science(SEECS), National University of Sciences and Technology(NUST), Islamabad, Pakistan

in IC packaging material, induce the silicon based semi-conductor memories to change their logic states, henceresulting in soft errors [10, 47].

Recent advancements in technology, including circuitminiaturization, voltage reduction, and increased circuitclock frequencies, have augmented the problem of softerrors in memories [10, 48]. The most obvious drawbacks ofmemory errors include the loss of correct data and the addi-tion of faulty data into the memory. However, dependingon the application/system using the memory, the severityof these memory errors could vary. This is summarized inFig. 1. In a LEON3 processor, a memory error may simplycause a result error, i.e., an erroneous output from an algo-rithm running on the system, or a system timeout, i.e., thetermination of an application without any result [39]. Sim-ilarly, in a Xilinx FPGA, such errors may cause the systemto halt [33].

Error Correction Codes (ECCs) [44], are used to caterfor memory errors by adding extra bits, often called parityor check bits, to the data bits in the memory. The parity bitsare calculated using the available data bits, and in case of anerror, the lost data is retrieved using these parity bits. Hence,ECCs are considered to be the most effective solution formemory errors [10], and since the introduction of Hamming

/ Published online: 26 September 2020

Journal of Electronic Testing (2020) 36:643–663

Fig. 1 Impact of a) TechnologyMiniaturization, b) VoltageScaling, and c) Increased ClockFrequency on memories, and itsconsequences

codes [27] in 1950, ECCs have remained an active domainof research.

1.1 Motivation

Simulation based testing is the most commonly usedtechnique for ensuring the correctness of ECCs in memories[7, 20, 28, 51, 54]. Initially, errors are injected at theinput of the memory model, in a process known as faultinjection. The performance of the ECC, i.e., how well anECC corrects the errors, is then evaluated at the output of themodel. This approach of testing is quite effective for smallermemories, where exhaustive simulation can be somewhatachievable. However, as the memory size grows, it becomesincreasingly difficult to employ exhaustive simulation [20].So, a common practice is to pick random combinations ofinput errors, and observe the response of an ECC in presenceof those error combinations [7, 20, 28]. This undermines thereliability of simulation results in determining the resilienceof ECCs against the errors.

Formal methods [29] have been extensively used toprovide an efficient alternative to the simulation basedtesting. The main idea here is to first construct amathematical model of the given system using a state-machine or an appropriate logic, and then use logicalreasoning and deduction methods to formally verify thatthis model exhibits the desired characteristics of the system.The desired behavior of the system is also specifiedmathematically using appropriate logic. This overcomesthe need of applying combinations of errors at input, andshifts the focus of the verification task on formal reasoninginstead.

Generally, there are two major categories of formalmethods to ensure system resilience: model checking andtheorem proving. While the former was explored in earlierresearches for ensuring circuit reliability, its use was limitedto smaller systems [43] due the large number of statesformed in larger systems and state-space-explosion [18].The latter resolves the limitation posed by large/infinitestate space in model checking using induction [26].However, this is achieved at the cost of increased complexityof implementation [19].

In this paper, we developed a framework for formalverification of ECCs used in memories using a semi-automated theorem prover A Computational Logic forApplicative Common Lisp (ACL2) [37]. ACL2 is a powerfulLisp-based tool in a sense that it not only provides automaticproof execution, but also enables its users to direct the proofprocedures in a meaningful way using the hints facility. Inaddition, it is an efficient tool that augments the speed ofproof procedures by the use of previously proved lemmas toverify new theorems; this means that a theorem which mayrequire huge amount of time for verification if proved usingonly the basic logic axioms, can be proved in a significantlyshorter duration using auxiliary lemmas.

1.2 Challenges

The major challenges while using ACL2 for a frameworkfor verification of ECCs used in memories are:

• Formal Modelling:The ECCs are generally represented either as a

system of encoder and decoder equations/logic, or as

644 J Electron Test (2020) 36:643–663

a hardware circuit implementation [32, 51, 54]. Hence,the first challenge is to create a system model thatfulfills all the specifications of the ECC equations/logic/hardware implementation.

• Formal Reasoning:In formal methods, having a logical explanation for

the behavior of ECC is not sufficient to ensure thecorrectness of the model. So, the second challenge isto use mathematical reasoning to verify the model andits associated properties. There are two major classes ofproperties that we dealt with: (i) properties that verifyworking of the ECC in the absence of any errors, and(ii) the properties that verify ECC performance in thepresence of the error.

• Translating the formal ECC model and its properties inLisp (ACL2) language:

Hand-written mathematical models and proofs areunsound due to the possibility of human errors involvedin the modeling/proof procedure. Hence, we need totranslate our model and the associated mathematicalproof procedure to ACL2. This requires some degreeof expertise in understanding the Lisp language andhandling the tool.

• Model Reusability:To minimize the formalization efforts during the

verification of new ECCs, such as Turbo codes, theexisting formalization framework needs to be composi-tional in nature in order to provide reusable definitionsand theorems. This ensures wider applicability of theframework for the verification of ECCs with differentencoder/decoder designs, varied codeword lengths, anddiverse error correction capabilities.

1.3 Our Contributions

Earlier works [1, 26], in the domain of theorem proving,aimed at building coding theory libraries, using SSReflectand Lean theorem provers. The primary objective of theseworks was not to analyze memory reliability in the presenceof soft errors, but rather ensure error-free communication.Hence, their implementation strategies focused on noisychannels, rather than specific memory models.

In this paper, we propose a formal framework consistingof generic ECC properties using ACL2. This formal ECCframework allows us to verify the correctness of the ECCsgenerally used in memories. Our novel contributions in thispaper are as follows:

• Providing extensible libraries, with functions andproperties, like the error injection function, which arecommon to multiple ECCs.

• Formalization and verification of Hamming codes,which is among the most widely-used ECCs in

memories and forms the basis of several multiple-error-correcting codes like two-dimensional codes.

• Formalization of convolutional codes, which providethe foundation for the formal verification of the moresophisticated ECCs, such as turbo codes. To the bestof our knowledge, this is the first endeavor towards theformal verification of convolutional codes.

• Utilizing ACL2 theorem prover for developing librariesfor formal verification of ECCs used in memories.ACL2 is a semi-automatic tool, which providesnecessary automation to facilitate the verificationprocess. This signifies the ease of use of our libraries.

• Implemention of the formally verified ECCs on aformal record-based memory model to demonstratethat the verified properties of the ECCs pertaining toencoding/decoding, error detection and error correction,are generic enough to easily comply with any givenmemory model.

The rest of the paper is organized as follows: Section 2provides an overview of the current state–of–the–art inthe domain of formal verification for memory/system errorresilience. Section 3 describes the preliminaries for theACL2 theorem prover. Section 4 presents our proposedmethodology. Sections 5 and 6 define the formalization ofour two libraries i.e., the Standard Library for ECCs andthe ECCs Library. Section 7 demonstrates a case studyusing a record-based memory model. Finally, Section 9concludes our work with guildelines to extend our approachfor verification of advanced ECCs using our framework.

2 RelatedWork

In this section, we give a brief overview of the availableliterature about the formal analysis for system/memoryreliability, using different formal methods.

Binary Decision Diagram (BDD) [16] is simply adirected acyclic graph that provides an alternate, and oftenmore compact, representation of the Boolean relations.Some earliers works [15, 41] utilized BDD-based verifica-tion to ensure resilience of circuit models in the presenceof errors. Equivalence checking was employed, using BDD,to compare the performance of two identical models, i.e.,the reference/golden model and the model under fault injec-tion [41]. However, the error protection approaches understudy included simple error detection codes and Triple Mod-ular Redundancy (TMR) circuits. For the error detectioncodes, the approaches also remained inconclusive in ensur-ing the functional correctness of the given circuit in absenceof an error. Moreover, the size of BDD grows as the num-ber of input variables/bits increase. In such cases, obtaininga compact BDD representation, for efficient equivalence

645J Electron Test (2020) 36:643–663

checking, depends on the optimum variable ordering, whichhas been found to be an NP-hard problem [13]. Hence,in practice, the BDD-based verification may not be anappropriate approach for large memories.

Model checking [18] utilizes the system’s state spaceto explore specifications satisfied by the system asinput transitions through the system. Model checking-based approaches have been proposed in numerous works[8, 43] to ensure resilience of circuits’ components inthe presence of soft errors. The main idea is to ensurethat the components return to their correct state after theoccurrence of an error. In a similar approach, the knowledgeof the vulnerable circuit components was shown to beuseful in reducing the power required for error protectionof the circuit [55], by securing only the componentsmost vulnerable to such errors. Other model checkingbased efforts are mainly targeted towards verifying theperformance of simple error correction approaches likeTMR [31] and elementary parity encoding [5, 43] inmemory elements. However, all these works lacked theinclusion of sophisticated error correction codes, suchas Hamming codes, which are commonly used in state-of-the-art systems for error resilience. A dedicated toolBLUEVERI [46] was proposed to exhaustively explorethe state-space of IBM’s ECC circuits to identify theirdesign bugs. However, the BLUEVERI requires an in-depthknowledge of the ECC’s hardware implementation, hencelimiting its portability to other memories. Also, due to theexhaustive search of state-space, the computational (timingand memory) requirement of the tool increases as the sizeof ECC circuits increase; this makes the tool infeasiblefor large memories. Despite being a highly automatedverification technique, the state-space explosion, i.e., anexponential growth in the size of formal model with theincrease in number of state variables, is a common problemin model checking-based approaches, which is likely toovershadow their performance if used for the verification ofthe sophisticated ECCs in larger systems/memories [18].

To overcome the issue of scalability in model checking,Boolean Satisfiability (SAT) solvers are commonly used [35,59]. SAT solvers use the propositional expressions for thesystem and its properties to deduce whether the propertieshold true for the system. In case the property does not holdtrue for the system, the prover provides a counterexample.In the bound-based SAT approaches [23, 24], the circuitcomponents were categorized as either robust, non-robust(i.e., giving incorrect output in presence of error), or non-classified (i.e., not causing an erroneous output but causingSilent Data Corruption). The idea proposed was to use theupper and a lower bounds for the time/number of cyclesto identify the robust and non-robust circuit components inthe circuits, instead of doing a thorough search to find a

satisfying solution. However, the approaches still verifiedrobustness for small circuits only.

Theorem proving [19] uses a deductive reasoning approachto verify the specifications of a system. Although thisapproach requires an in-depth understanding of the systemunder study and a high level of expertise in formal reasoning,it can be effectively applied to larger systems as well.

The earliest work [9] related to ECC verification intheorem proving demonstrated a case study of the project onproperties of computer algebra. The formal proofs for someproperties of Hamming and Bose-Chaudhuri-Hocquenghem(BCH) codes were presented using the Isabelle theoremprover [34]. Earlier attempts of ACL2 theorem prover usagewas for the register dependability analysis in the presence ofan error [49]. TMR was employed on a single register andthus the register was triplicated. The correction of a faultinjected on one of the registers, via majority voting, wasthen formally verified. This was the first study focusing onthe use of theorem proving for hardware verification of amemory component in the presence of error.

The formal verification of Hamming codes and decodingalgorithm of Low Density Parity Check (LDPC) codes,using the SSReflect extension of Coq proof assistant [1,2, 4], was the first elaborate work entirely focusing onthe formalization of any ECCs using theorem proving.A systematic approach was taken to initially formalizethe Hamming encoder and decoder. The approach wasgradually extended to the formalization and verification ofdecoding algorithm for LDPC codes. The work was laterexpanded to include the formalization and verification ofthe Reed-Solomon codes [3] and BCH codes [4]. Like [1],these were elaborate studies to verify the theoretical modelof important classes of ECCs. All the indicated works wereaimed at ensuring error-free communication.

Among the most recent work published in the fieldof formal verification of ECCs include verification efforts[26, 40] using the coding theory library, Cotoleta, basedon the Lean theorem prover [42]. The usefulness ofthe established library was demonstrated by the formalverification of Hamming (7, 4) codes. However, the workwas not extended to indicate the practicality of the libraryfor formal verification of other commonly used ECCs.

To the best of our knowledge, despite soft errors beinga growing concern in memories, no extensive research hasfocused on using theorem proving for the formal verification ofECCs, which is the scope of the current paper.

3 ACL2 Preliminaries

ACL2 [37] is a first-order-logic theorem prover featuringseveral powerful verification algorithms for proving the-

646 J Electron Test (2020) 36:643–663

orems automatically or by using user-guided intelligenthints. It is also equipped with a mechanical theorem proverallowing users to construct proofs interactively by using aproof builder. ACL2’s logic is directly executable, whichimplies that a system model can be tested by using con-crete executions besides symbolic verification. Since ACL2is developed in Common Lisp, it also offers high executionefficiency provided by the underlying Lisp compilers.

A user interacts with ACL2 using REPL (read-eval-printloop) and a new function can be defined by using thekeyword defun.

(def un Function − Name

(input1, input2, ...)

∗ f unction body ∗ )

Similarly, a proof attempt can be invoked by using thedefthm event, which takes a proof goal and then attemptsto verify it automatically by utilizing several clause-processors [38], which are ACL2’s automatic verificationalgorithms.

(def thm T heorem − Name

∗proofgoal ∗∗ : hints / : instructions (optional) ∗ )

A keyword :hints provides user-guidance to a defthmevent to direct the proof attempt. As mentioned earlier,ACL2 also offers a proof builder facility, which allowsuser to control the prover like an interactive theoremprover. Similar to :hints, the proof builder commands canalso be supplied to the defthm event using the keyword:instructions. The keyword :instructions initi-ate the series of commands that the prover must carry out inthe specified sequence in order to verify the proof goal.

An important concept in ACL2 is the encapsulationprinciple [38], which allows the introduction of constraintfunctions, and then specifies constraints over these func-tions by using the keyword encapsulation. A closelyrelated concept to encapsulation is the derived rule of infer-ence called functional instantiation. This rule states that atheorem may be derived by replacing the function sym-bols of any theorem by new function symbols, provided thatthe new symbols satisfy all the constraints on the replacedsymbols [14].

The ACL2 functions car and cdr are used to returnthe first and second members of the pair. Similarly, theconstants t and nil represent true and false, respectively.Booleanp, true-listp, and alistp are ACL2recognizers for boolean contants/variables, true-lists (i.e.,an array whose last element is a nil), and association lists(i.e., list of lists), respectively. Other commonly used ACL2keywords include equal that checks whether the givenconstants/variables/lists are equal, cons that constructs

a pair from the supplied arguments, and append thatconcatenates the available lists.

4 ProposedMethodology

Despite having a unique logic for encoding/decoding, allECCs have two requirements in common:

1. In the absence of any error, decoding the codeword mustprovide the data in the correct form:

∀data. ∃code. ( (Encode (data) = code)

=⇒ (Decode (code) = data))

2. Depending on their error-correcting capability, allECCs must be able to correct error(s):

∀data. ∃code. ∃bad code.

((Encode (data) = code) ∧(Fault inject (code) = bad code)

=⇒ (Decode (bad code) = data))

Similarly, while the encoder and decoder functions forevery ECC are unique, there are a few essential functionsrequired for verification of all ECCs. For instance, to verifythe reliability of ECC in presence of error, the fault/errorinjection function is needed. We propose a framework,developed using the ACL2 theorem prover, with extensiblelibraries of functions, which are essential for the verificationof ECCs. The proposed methodology is shown in Fig. 2.

The first step in our methodology is to formallyrepresent the system (in our case, ECCs) specifications.The whole operation of an ECC is captured in its encodingand decoding algorithms, which are hence inputs to theproposed framework. Due to the diversity of memory sizeson which each ECC is applicable, and the varying degreeof error protection required by memories used in differentapplications, code specific information, like data word sizeand code rate for ECC, are also taken as input to theframework.

As indicated earlier, there are certain functions, like faultinjection, which exhibit the same behavior for many ECCs.We developed a library of formalized standard functions,as described in Section 5, with the most commonlyused functions in ECCs. The next step in our proposedprocedure is to use the ECC specifications and code specificinformation indicated above, along with the functions fromour library of formalized standard functions, to formulateformal ECC encoder and decoder models/functions. Theencoder and decoder functions together provide the formalmodel of the ECC.

To ensure the functional and behavioral correctness ofthe ECC model, ECC properties/theorems are formally

647J Electron Test (2020) 36:643–663

Fig. 2 Proposed Methodology

expressed using the ECC model and the library offormalized standard functions. In the final step of theproposed methodology, the ACL2 theorem prover identifiesif the indicated properties hold true for a given ECC model.If a given property fails, it gives valuable feedback, whichhelps to correct the formalization of the ECC. In case, ACL2verifies all the properties, the ECC is deemed verified. Theverified ECC is finally stored into the ECCs library, whichwill be discussed in detail in Section 6.

5 Library of Formalized Standard Functions

In this section, we formally define some standard functionsthat are used in ECC formalization and verify their keyproperties using ACL2.

5.1 Fault Injection

The functions bit flip tlist and bit flip pairare formally defined to inject errors at arbitrary locationin the codeword. bit flip tlist is applicable tocodewords from block codes, described in Section 6, whilebit flip pair is defined to work with convolutional

codes. Since both functions are behaviorally similar, we willonly discuss bit flip tlist here for brevity.

Definition 1

(def un bit f lip tlist (n lst)

(if (zp n)

(cons (not (car lst)) (cdr lst))

(cons (car lst)

(bit f lip tlist (− n 1)(cdr lst)))))

where lst represents the codeword and n is the locationwhere error can be inserted. zp is an ACL2 recognizerfor zero. The function recursively calls itself, with thetop-most element removed and n decremented by 1 ineach iteration, i.e., (bit flip tlist (- n 1) (cdrlst)), until (zp n) is true, i.e. n becomes zero. Oncethis condition is true, the top-most element of lst , (carlst), is flipped using the not keyword, and concatenatedto the remaining codeword, (cdr lst).

The behavior of fault injection model is formallydescribed in following properties:

648 J Electron Test (2020) 36:643–663

Theorem 1.1

(def thm len − bit − f lip − t list

(implies (and (>= n 0)

(< n (len lst))

(true − listp lst))

(equal (len lst)

(len (bit f lip tlist n lst)))))

The above theorem ensures that fault injection does notadd or remove bits from a codeword. The theorem isstructured into an implication denoted by the keywordimplies. The first argument of implication gives thenecessary premises/conditions, which must be satisfiedfor the property to hold true, while the second argumentindicates the conclusion or the property itself. The abovetheorem states that if the error location n is a non-negativenumber, (>= n 0), less than the length of lst , (< n(len lst)), and lst is a true-list, then the length ofcodeword before and after fault injection, i.e., (len lst)and (len (bit flip tlist n lst)), is equal.

Theorem 1.2

(def thm list − change − af ter − f lip − t list

(implies (and (>= n 0)

(< n (len lst))

(true − listp lst))

(not (equal lst (bit f lip tlist n lst)))))

Fault injection always changes a codeword bit. This isdescribed in the above theorem by indicating that theoriginal codeword lst, and the codeword after faultinjection, (bit flip tlist n lst), are different.

Theorem 1.3

(def thm single − bit − f lips − t list

(implies (and (>= n 0)

(< n (len lst))

(true − listp lst)

(not (endp lst)))

(equal (count element mismatch tlist

lst (bit f lip tlist n lst))

1)))

The above property states that the codeword before andafter fault injection differs exactly at one codeword loca-tion. The function count element mismatch tlistis another function from our library of formalized standardfunctions, which gives the number of places the codewordslst and (bit flip tlist n lst) differ in.

5.2 Comparator

Once faults are injected into the codeword, the error-correcting capability of the ECC is typically determinedon the basis of the number of faults that an ECC candetect/correct. The BDD –based simulation approach in [41]proposed the use of a golden device, i.e., an uncorrupteddevice, to compare and evaluate the performance of thedevice where fault is injected in it.

We use a similar strategy in our work. We formallydefine two functions, i.e., count element mismatchtlist and count element mismatch pair, thatcan count the number of mismatches in the original code-word and the codeword after fault injection. Again, we areonly discussing count element mismatch tlist,which works for block codes. The approach used forcount element mismatch pair is almost similar andis applicable to the convolutional codes.

Definition 2

(def un count element mismatch tlist (A B)

(if (endp A)

0

(if (equal (car A) (car B))

(count element mismatch tlist

(cdr A) (cdr B))

(+ 1

(count element mismatch tlist

(cdr A) (cdr B))))))

where the ACL2 function endp returns true if the given listis empty. The above function recursively counts the numberof mismatch elements in the corresponding locations of thecodewords A and B. For instance, if A = (101) and B =(100) then the function count element mismatchtlist returns 1.

We proved some essential properties about the abovecomparator function in ACL2 as follows:

649J Electron Test (2020) 36:643–663

Theorem 2.1

(def thm count − mistmatch − in − equal − lists − t list

(implies (true − listp lst)

(equal (count element mismatch tlist

lst lst)

0)))

The above property states that the identical codewords differat zero locations. In other words, when comparing identicalcodewords, the comparator function should return 0.

Theorem 2.2

(def thm count − mismatch − append − t list

(implies (and (true − listp A)

(true − listp B)

(true − listp C))

(equal

(count element mismatch tlist

(append A B) (append A C))

(count element mismatch tlist

B C))))

Adding identical bits in the corresponding locations of thecodewords do not effect the number of mismatches. Theabove theorem states this property formally by indicatingthat the number of element mismatches in two codewords Band C is the same as that in the codewords (append A B)and (append A C), where A is another codewordappended to the top of codewords B and C.

5.3 Binary–to–Decimal Convertor

During verification, it is often simpler to compare the datawords as decimal numbers, rather than comparing them asbinary sequences. Hence, we define bin to dec, a 4-bitsbinary to decimal convertor. This is particulary helpful inECCs where the error location is available in binary, and theerror correction mechanism makes use of the error locationsindicated in decimal number system.

Definition 3

(def un bin to dec (a b c d)

(+ (if d 1 0)

(if c 2 0)

(if b 4 0)

(if a 8 0)))

We verified some properties related to bin to dec inACL2 as follows:

Theorem 3.2

(def thm even − binary − number

(implies (equal d nil)

(evenp (bin to dec a b c d))))

Having the last bit of the binary number as nil implies aneven number. evenp is the ACL2’s built–in recognizer foreven integers.

Theorem 3.3

(def thm odd − binary − number

(implies (equal d t)

(oddp (bin to dec a b c d))))

In case the last bit of the binary number is t, this implies thatthe number is odd. oddp is the ACL2’s built–in recognizerfor odd integers.

6 Formal ECCs Library

ECCs can be divided into two broad categories: the blockcodes and the convolutional codes [45]. The block codes arebranch of codes where, depending on the type of memory,the data blocks of constant lengths are used to generate afixed number of parity bits. The convolutional codes, onother hand, have a fixed code rate m/n where n output(codeword) bits are generated for m input (data) bits. Thissection describes formalization of two ECCs, one fromeach ECC category: the Hamming (7,4) codes and the1/2 rate Convolutional codes. Both are extensively usedin memories, and are common to several advanced ErrorCorrection Codes. A brief overview of the encoding anddecoding algorithms of both ECCs are first highlighted.This is followed by the formalization of these ECCs. Finallythe details of the formal verification of their associatedproperties are provided. The ACL2 script for the two ECCs,and their associated properties, comprises of approximately1000 lines of code, and can be downloaded from [57].

6.1 Hamming Codes

Hamming codes [27] are among the first ECCs introduced.They are systematic Single Error Correction-Double ErrorDetection (SEC-DED) codes, using (k + 1) parity bits forprotecting d data bits, to generate an n bits codeword. Theinitial k-parity bits are allocated 2nth

bit positions (i.e., bitlocations 1, 2, 4, 8,...) of the codeword. The last parity bittakes up the last bit of the codeword. The remaining bits ofthe codeword represent the data.

The calculation of the parity and the syndrome bitsinvolves modulo-2 additions. For the initial k bits, the

650 J Electron Test (2020) 36:643–663

Table 1 Parity and Syndrome Bits involved in the calculation of parity/syndrome bits in Hamming Code

Parity/Syndrome position Bits on which Modulo-2

addition is applied

For Parity bits For Syndrome bits

1 3,5,7,9,11,13,... 1,3,5,7,9,11,13,... Pick 1 bit, skip 1 bit (starting from 1)

2 3,6,7,10,11,... 2,3,6,7,10,11,... Pick 2 bits, skip 2 bits (starting from 2)

4 5,6,7,12,13,... 4,5,6,7,12,13,... Pick 4 bits, skip 4 bits (starting from 4)

sequence of this calculation involves the use of parity bitsshown in Table 1, whereas, for the last parity/syndrome bit,all the preceding bits are used for calculation. The initial k

parity bits are used for single error correction; the last paritybit provides double error detection capability.

Although the error-correction capability of Hammingcodes is very limited, they form the basis of severalrecent multiple error detection/correction codes [21, 52,53]. They are among the most widely used error resilientcodes in memories due to their smaller area overhead,latency, power requirement, and simpler combinatorialcircuit architectures. They are often employed in space andmilitary applications [30].

We have formalized the Hamming (7,4) code. It protectsa 4-bit data word, using 3 parity bits for single-errorcorrection, and an additional parity bit for the double-error detection. The gate level structure of the Hamming(7,4) code encoder is shown in Fig. 3. The parity bits aredetermined as:

p1 = d3 ⊕ d5 ⊕ d7

p2 = d3 ⊕ d6 ⊕ d7

p4 = d5 ⊕ d6 ⊕ d7

p8 = p1 ⊕ p2 ⊕ d3 ⊕ p4 ⊕ d5 ⊕ d6 ⊕ d7 (1)

The formal model of (7,4) hamming encoder, shown inFig. 3, is described in ACL2 as follows:

Fig. 3 Encoder for Hamming (7,4) Code

Definition 4

(def un hamming7 − 4 − encode (x3 x5 x6 x7)

(cons (xor (xor x3 x5) x7)

(cons (xor (xor x3 x6) x7)

(cons x3

(cons (xor (xor x5 x6) x7)

(cons x5

(cons x6

(cons x7

(cons (xor ....

(xor (xor x3 x5) x7)

(xor (xor x3 x6) x7))

x3)

(xor (xor x5 x6) x7))

x5)

x6)

x7)))))

The decoder has a similar ACL2 structure as that of anencoder. At the decoder, the syndrome bits corresponding tothe parity bits are calculated as:

s1 = p1 ⊕ d3 ⊕ d5 ⊕ d7

s2 = p2 ⊕ d3 ⊕ d6 ⊕ d7

s4 = p4 ⊕ d5 ⊕ d6 ⊕ d7

s8 = p1 ⊕ p2 ⊕ d3 ⊕ p4 ⊕ d5 ⊕ d6 ⊕ d7 ⊕ p8 (2)

In case of no error, all the syndrome bits are zero. A non-zero syndrome bits indicate the presence of error(s). In caseof a single error, syndrome bit s8 is non-zero; the remainingsyndromes (s1 s2 s4) give the binary location of the error.For instance, the syndromes (1 0 1) represent error at thefifth codeword location. The correction of error involvesflipping the erroneous bit. In case of a double error, thesyndrome bit s8 is zero; however, the remaining syndromebits do not necessarily need to be zero. This indicates thepresence of a double-error.

Some important formally verified behavioral propertiesof Hamming codes, as verified in ACL2 are as follows:

651J Electron Test (2020) 36:643–663

Theorem 4.1

(def thm hamm distance4

(implies (not (equal

(bin to dec a b c d)

(bin to dec e f g h)))

(>= (count element mismatch tlist

(hamming7 − 4 − encode

a b c d)

(hamming7 − 4 − encode

e f g h))

4)))

The hamming distance for an SEC-DED Hamming (7, 4)code is greater than or equal to 4, as stated in the abovetheorem. In other words, if data words (a b c d) and(e f g h) are different, at a single binary location,the codewords (hamming7-4-encode a b c d) and(hamming7-4-encode e f g h)must differ at leastat 4 binary codeword locations.

Theorem 4.2

(def thm hamm − NO − ERROR2

(implies (equal

(hamming7 − 4 − encode

x3 x5 x6 x7)

(list x1 x2 x3 x4 x5 x6 x7 x8))

(equal

(hamming7 − 4 − decode

x1 x2 x3 x4 x5 x6 x7 x8)

(list x3 x5 x6 x7))))

In case of no error, the codeword must be correctlydecoded into data word, as described by the above theorem.So, assuming (list x1 x2 x3 x4 x5 x6 x7 x8)to be a codeword generated by encoding (x3 x5 x6 x7),decoding (x1 x2 x3 x4 x5 x6 x7 x8) must return theoriginal data bits, i.e., (x3 x5 x6 x7).

Now, we formally verify the single error correctionproperty of Hamming codes as follows:

Theorem 4.3

(def thm hamm − SEC1

(implies

(equal 1

(count element mismatch tlist

(hamming7 − 4 − encode

x3 x5 x6 x7)

(list x1 x2 x3 x4 x5 x6 x7 x8)))

(equal (hamming7 − 4 − decode

x1 x2 x3 x4 x5 x6 x7 x8))))

(list x3 x5 x6 x7)

Theorem 4.4

(def thm hamm − SEC2

(implies (and (booleanp a)

(booleanp b)

(booleanp c)

(booleanp d)

(equal

(hamming7 − 4 − encode a b c d)

(list e f g h i j k l)))

(and (equal

(hamming7 − 4 − decode

e f (not g) h i j k l)

(list a b c d))

(equal

(hamming7 − 4 − decode

e f g h (not i) j k l)

(list a b c d))

(equal

(hamming7 − 4 − decode

e f g h i (not j) k l)

(list a b c d))

(equal

(hamming7 − 4 − decode

e f g h i j (not k) l)

(list a b c d)))))

We split the single error correcting property of the Hammingcodes into two theorems: Theorem 4.3 states that if an error

652 J Electron Test (2020) 36:643–663

occurs in any of the parity bits of the codeword, it should beremoved during decoding, whereas Theorem 4.4 describesthat if an error occurs in any of the data bits of the codeword,it would be corrected by the decoder.

Theorem 4.5

(def thm hamm − DED

(implies (and (booleanp a)

(booleanp b) ...

(booleanp l)

(equal 2

(count element mismatch tlist

(hamming7 − 4 − encode

a b c d)

(list e f g h i j k l))))

(equal

(len (hamming7 − 4 − decode

e f g h i j k l))

1)))

Hamming codes are double-error detecting as well. A single-bit value is sufficient to indicate the presence of a double-bits error. The double-error detection property of the Ham-ming codes is hence verified by the above theorem, whichstates that if a codeword (e f g h i j k l) has twoerrors, the presence of the errors is indicated by the decoderwith a single-bit output, i.e., (equal (len (ham-ming7-4-decode e f g h i j k l)) 1).

6.2 Convolutional Codes

Initially acknowledged for error correction in communica-tion networks [58], the convolutional codes are now alsobeing investigated for ensuring the memory chip reliability[25, 51]. In convolutional codes, each input (data) bit is usedto generate multiple output (codeword) bits during encod-ing, which in turn are used to retrieve the data bits again viadecoding. As expected of most error-correcting strategiesused in communication networks, the coding and decodingprocedures of convolutional codes generally happen sequen-tially over multiple clock cycles. However, combinatorialcounterpart of the convolutional encoder and decoder arealso available for use in memories [25]. Here, instead ofconsidering input as a stream of bits, it is instead consid-ered as blocks of fixed length (i.e., the word length). Theblocks act like sliding over-lapping sliding window to per-form the encoding and decoding operations. Output bitsmay be interleaved [50] to minimize data corruption in caseof soft errors.

The convolutional codewords, or more specifically thecode since the length of code is arbitrary, comprise onlyof the output bits. The number of output bits determinedusing each input bit defines the Code Rate of theconvolutional codes. In this work, we have considered 1/2rate convolutional codes, which produces two output bits foreach input bit.

The encoding process involves determining the outputbits using modulo-2 addition on current and previousinput bits, as dictated by the generator polynomial. Theconvolutional code we have chosen in this work usesthe generator polynomials G1 = (1 1 1) and G2 =(1 0 1). Hence, the convolutional encoder, shown in Fig. 4,translates to a finite-state-machine (FSM), shown in Fig. 5,and generates the following output bits equations:

xn = dn−2 ⊕ dn−1 ⊕ dn

yn = dn−2 ⊕ dn (3)

Definition 5a

(def un encode xor (old data)

(if (endp old)

nil

(let ∗ ((xn (xor (xor (f irst old)

(second old))

(f irst data)))

(yn (xor (f irst old)

(f irst data))))

(cons (cons xn yn)

(encode xor (cdr old) (cdr data))))))

The (encode xor) function implements the convolu-tional encoder equations described above, using xor recur-sively until the entire data list old has been encoded, i.e.,

Fig. 4 1/2 Rate Convolutional Encoder, with Generator PolynomialsG1 = (1 1 1) and G2 = (1 0 1)

653J Electron Test (2020) 36:643–663

until (endp old) becomes true. The codeword bits xn

and yn, generated at each recursion are simultaneously con-catenated to form a pair using (cons xn yn). Hence,the final codeword is an association list, where each ele-ment of the association list (pair list) is a pair of xn and yn

corresponding to the data bit dn.

Definition 5b

(def un encoder conv xor (lst)

(encode xor (append ′(nil nil) lst) lst))

The definition (encoder conv xor) appends two nilsat the start of the data sequence lst, and sends it to theencoder definition (encode xor).

For decoder to start decoding at a known state, theencoding is generally initiated at State 00, i.e., with bothdn−1 and dn−2 as nil, as implemented by (encod-er conv xor). Likewise, the encoding terminates at theState 00 as well, which is practically implemented in theencoder by appending two nil bits at the end of data.In ACL2, we implemented this condition by ending ourrecursion process when the sequence old , which has twomore elements than the data list, ends. This process ofstarting and ending the encoding process at the same state iscalled tail-biting [25].

We followed the combinatorial decoder design asdescribed in [25]. For a 1/2 rate convolutional code,decoding the current data bit involves the use of 2predictions of the data bit:

dna = yn−1 ⊕ yn ⊕ xn−1

dnb = xn+1 ⊕ yn+1 (4)

In case of no error, the predictions dna and dnb mustmatch, i.e., dna = dnb. A mismatch indicates the presenceof an error. The erroneous bit is correctly restored by the useof future predictions dn+2a and dn+2b:

dn+2a = yn+1 ⊕ yn+2 ⊕ xn+1

dn+2b = xn+3 ⊕ yn+3 (5)

Considering a single error condition, if dna = dnb theneither yn is corrupted or either of xn+1 or yn+1 is corrupted.If dn+2a = dn+2b, we can conclude that yn was thecorrupted. However, if dn+2a = dn+2b, the erroneous bit isidentified in the next decoder cycle.

Definition 5c

(def un decode xor (code)

(if (endp (cdddr code))

nil

(cons (or (and

(xor

(xor (cdr (f irst code))

(cdr (second code)))

(car (f irst code)))

(xor

(xor

(xor (cdr (third code))

(cdr (f ourth code)))

(car (third code)))

(xor (car (f if th code))

(cdr (f if th code)))))

(and (xor (car (third code))

(cdr (third code)))

(xor

(xor

(xor (cdr (third code))

(cdr (f ourth

code)))

(car (third code)))

(not (xor

(car (f if th code))

(cdr (f if th code)))))))

(decode xor (cdr code)))))

Definition 5d

(def un decoder conv xor (code)

(decode xor (append ′((nil nil)) code)))

Our decoder definitions in ACL2 are similar to theencoder definitions. The function (decoder conv xor)initializes the decoding operation at State 00, while(decode xor) continues decoding as a sequence ofboolean operations.

654 J Electron Test (2020) 36:643–663

Fig. 5 FSM corresponding to 1/2 Rate Convolutional Code, withGenerator Polynomials G1 = (1 1 1) and G2 = (1 0 1)

The important behavioral properties of Convolutionalcodes that we verified are:

Theorem 5.1

(def thm len − encoder − xor

(implies (true − listp lst)

(equal (len (encoder conv xor lst))

(+ 2 (len lst)))))

The encoding process in 1/2 rate convolutional codesadds two extra (nil) bits to the end of data bits, i.e., theterminating nils. Since this addition of the terminating nilswas not explicit in our definition, the above property verifiesthat two extra bits have been added to our data. We alreadyknow that any number of extra bits at the end of datasequence lst will be nil because we considered lst to bea true-list, i.e., (true-listp lst). The above theoremverifies that the length of codeword (encoder conv xorlst) increases by 2, when compared to the length of theoriginal list of data bits lst, due to the 2 pairs of output bits(xn+1 yn+1) and (xn+2 yn+2).

Theorem 5.2

(def thm min − hamm − dist − conv

(implies (and (true − listp A)

(true − listp B)

(alistp (encoder conv FSM A))

(alistp (encoder conv FSM

(cons t B)))

(alistp (encoder conv FSM

(cons nil B))))

(equal (count element mismatch

(append

(encoder conv FSM A)

(encoder conv FSM

(cons t B)))

(append

(encoder conv FSM A)

(encoder conv FSM

(cons nil B))))

5))

As indicated in [17], the minimum hamming distance betweencodewords in a 1/2 rate convolutional code is 5. For anydata bits (append A ’(x) B), where x is a boolean variable,the coderword generated by (append (encoder convFSM A)(cons t B)) and (append(encoder conv FSM A) (cons nil B)) will differ at 5 binarylocations – this corresponds to a hamming distance of 5.

Theorem 5.3

(def thm decode − decoder

(implies (alistp code)

(equal (decode xor code)

(cdr (decoder conv xor code)))))

Like encoding, the decoding is generally initiated at State00. However, if decoding is not initiated at State 00,the first data bit will be lost during decoding. This lossof data bit is indicated in the above theorem by (cdr(decoder conv xor code)), which simply repre-sents that without the initial nil bits, the decoder willreturn all data bits except the first, i.e., (car (dec-oder conv xor code)).

655J Electron Test (2020) 36:643–663

Theorem 5.4

(def thm no error conv1

(implies (and (true − listp data)

(boolean − listp data)

(> (len data) 3))

(equal (decode xor

(encode xor data (cddr data)))

(cdddr data)))

: instructions (: induct (: change − goal nil t)

: prove : demote (: dv 1 2 2 1 1)

(: rewrite encode − cdr)

: up (: rewrite decode − cdr2)

: top (: dv 2 2 1)

(: rewrite

decode − f irst − element − code2)

(: dv 1)

(: rewrite

lemma − decode − encode − once)

: top

: prove : prove))

In case of no-error, the data bits are correctly decoded from thecodeword. This is represented formally by(equal (decodexor (encode xor data (cddr data)))(cdddrdata)), where (cddr data) is the input data bits,excluding the initial nil bits while (encode xor data(cddr data)) is the codeword input to the decoderfunction decode xor. We prove that the decoder output is(cdddr data) instead of simply data to eliminate theimpact of dn−2 and dn−1 added to the input before encod-ing, and xn−1 and yn−1 supplied to the codeword beforedecoding.

The above proof goal makes use of ACL2’s proofbuilder facility along with several previously proved lem-mas, i.e., small, intermediate properties, to complete theproof procedure. As mentioned in Section 3, the proofbuilder commands are supplied to the defthm eventusing :instructions. :induct applies a suitableinduction scheme to the proof goal, :dv dives to aclause/term inside the proof goal, :rewrite replacesthe expression with an equivalent form using the indi-cated (proven) lemma, and :prove calls the ACL2automatic prover. The lemmas used in the above theoreminitially rewrite (encode xor (cdr data) (cdddrdata)), which is generated after the induction step, intoa simpler form (cdr (decode xor (encode xordata (cddr data)))). Next, the lemma decode-first-element-code2 expands the simplified expres-sion to an equivalent cons-based expression (cons

BOOL (cdr (decode xor (encode xor data(cddr data))))), where BOOL represents the booleanrelation of codeword bits given as follows:

((y0 ⊕ y1 ⊕ x0) . ((y2 ⊕ y3 ⊕ x2) ⊕ (x4 ⊕ y4))) +((x2 ⊕ y2) . ((y2 ⊕ y3 ⊕ x2) ⊕ (x4 ⊕ y4)) (6)

Finally, BOOL is rewritten intoits correct data-based form (cadddrdata) using lemma-decode-encode-once.

Theorem 5.5

(def thm decode − bitf lip − encode − f lg

(implies (and (true − listp data)

(boolean − listp data)

(booleanp f lg)

(> (len data) 3)

(zp n)

(f lg))

(equal (decode xor

(bit f lip pair n f lg

(encode xor data (cddr data))))

(cons CODE ERR X0

(cdr (decode xor

(bit f lip pair n f lg

(encode xor data (cddr data)))))))

: instructions ((: use lemma − decode − bitf lip −encode − f lg)

: demote (: dv 1 2)

(: rewrite equal − cons − car − cdr)

: top

: prove : prove))

As a single codeword of a 1/2 rate convolutional codeis composed of two bits xn and yn, an error can occurin either of these two bit locations. The above theoremrepresents an equivalent form of change in the codeworddue to an error injected via (bit flip pair n flg(encode xor data (cddr data))) at xn of the 0th

location of codeword. The error in bit x0 is ensured bythe premises (zp n) and flg. Error injection rewritesthe data bit decoded from the erroneous codeword bit asthe boolean expression CODE ERR X0, and concatenatesit with the remaining data bits using the ACL2 keywordcons. The expression CODE ERR X0 is mathematicallyrepresented as:

((y0 ⊕ y1 ⊕ x0) . ((y2 ⊕ y3 ⊕ x2) ⊕ (x4 ⊕ y4))) +((x2 ⊕ y2) . ((y2 ⊕ y3 ⊕ x2) ⊕ (x4 ⊕ y4)) (7)

656 J Electron Test (2020) 36:643–663

Here again, the proof goal makes use of the proof-buildercommands and the lemma lemma-decode-bitflip-encode-flg, which expands car of (decode xor(bit flip pair n flg (encode xor data(cddr data)))) into the expression CODE ERR X0,and equal-cons- car-cdr, which coalesces the carand cdr of the stated cons pair. Similar to the previoustheorem, the premise (> (len data) 3) ensures thatthe number of data bits is greater than 3, discounting theadditional bits dn−2, dn−1 and (xn−1 . yn−1) added for theencoding and decoding procedures.

Theorem 5.6

(def thm SEC − f lg

(implies (and (true − listp data)

(boolean − listp data))

(equal CODE ERR X0

(f ourth data))))

The above theorem formally verifies the error-correctionproperty when an error is injected at the codeword locationx0. The expression CODE ERR X0models the error injectedat the location x0 of the codeword. The correct data bit, i.e.,(fourth data) is successfully retrieved in the presenceof a single error at x0. A similar procedure (i.e., Theorem 5.5and 5.6) can be easily applied to demonstrate single-errorcorrection while the error is injected at the bit y0.

As described, in the decoder Equations 4 and 5, thedecoding of a single data bit requires five codeword bits.The procedure used for the verification of Theorem 5.6 canbe further extended to correct an error occurrence in anyof these five codeword bits in the decoding process. Hence,this demonstrates the ACL2 formalization of the single errorcorrection property of the 1/2 rate convolutional codes.

7 Case Study: Record-BasedMemory

In order to demonstrate the effectiveness of our proposedformalization of ECCs, described in Section 4, we presenta formal error analysis on a record-based byte addressablememory, presented in [36]. We first describe the formalmemory model and then utilize our ECCs formalizationalong the memory model, to detect and correct errors thatmay occur during read and write processes in the memory.The complete formalization framework [57], including itsimplementation on the memory model, consists of morethan 1600 lines of Common Lisp code, including bothdefun and defthm events, which take approximately1300 seconds of verification time on a MacBook (1.6 GHzIntel Dual-Core i5 CPU).

7.1 Record-BasedMemoryModel

Records [22] are data storing structures that can be accessedby the user for reading as well as writing data in memories.A record can hence be considered as a simple abstractionfor the memory. Such memory models have two basicoperations: load, which reads data from a specific memorylocation, and store, which writes new data into the memory.Both of these operations are formalized in ACL2 [36] asfollows:

7.1.1 Load: Retrieving data fromMemory

The load byte function accepts the memory model and amemory location, and returns the value at the given locationin the specified memory. The memory is modeled as anassociation list with byte-sized values at each list location.

Definition 6a

(def un load byte (n mem)

(if (zp n)

(car mem)

(load byte (− n 1) (cdr mem))))

where n represents the memory location and mem specifiesthe memory model. The load byte recursively calls itselfuntil the memory location n reduces to zero, i.e., (zp n)becomes true. The required value from the memory is thenreturned at this point, using (car mem).

7.1.2 Store: Entering data into the Memory

The store byte function accepts the memory model, thevalue that needs to be stored in the memory, and the memorylocation where this value must be entered. The updatedmemory is then returned at the output.

Definition 6b

(def un store byte (n mem byte)

(if (zp n)

(cons byte (cdr mem))

(cons (car mem)

(store byte

(− n 1) (cdr mem) byte))))

where n represents the memory location, mem specifiesthe memory, and byte is the value that needs to bestored in the memory at the designated location. Just likeload byte, the recursion is performed on the variable n.Once (zp n) becomes true, the ACL2 built-in functioncons concatenates byte at the designated memory location.

657J Electron Test (2020) 36:643–663

7.2 Byte-Addressable Memory

The load and store functions defined above can workwith data of any size. A byte-addressable memory is thetype of memory where the smallest element that is storedor retrieved from the memory is 8-bits long. To ensurethat our memory model is byte-addressable, a recognizerbyte-alistp is defined, which ensures that each valuein our memory is byte-sized only.

Definition 6c

(def un byte − alistp (mem)

(cond ((atom mem) (equal mem nil))

(t (and (consp (car mem))

(equal (len (car mem)) 8)

(byte − alistp (cdr mem))))))

This recursive function returns t if the memory modelmem is entirely empty (i.e., (equal mem nil)) or eachelement of mem has a length of 8 (i.e., each memorylocation holds one byte of value as indicated by (equal(len (car mem)) 8)).

7.3 Memory Properties

To ensure that the functions load byte, store byte,and byte-alistp fulfill the criteria of a memory model,the following properties are verified:

Theorem 6.1

(def thm load − store

(implies (and (< n (len mem))

(byte − alistp mem)

(equal (len byte) 8))

(equal (load byte n

(store byte n mem byte))

byte)))

The above theorem formally verifies that data remainsunchanged after storing and then retrieving from thememory. The above theorem states that given a valid mem-ory location (< n (len mem)) in a byte-addressa-ble memory (byte-alistp mem), a byte of data

remains unchanged after consecutive store and loadoperations, (load byte n (store byte n membyte)).

Theorem 6.2

(def thm overwrite

(implies (and (< n (len mem))

(byte − alistp mem)

(equal (len byte1) 8)

(equal (len byte2) 8))

(equal (store byte

n

(store byte n mem byte1)

byte2)

(store byte n mem byte2))))

Overwriting replaces new data in place of the old one.Hence, storing byte2 at the location where byte1 wasalready stored, i.e., (store byte n (store byte nmem byte1) byte2), is the same as storing byte2 at thememory location directly, i.e., (store byte n membyte2). For overwriting in a byte-addressable memory,both byte1 and byte2 must be of equal lengths, i.e., 8-bits long, as indicated by (equal (len byte1) 8) and(equal (len byte2) 8).

Theorem 6.3

(def thm store − changes − mem

(implies (and (< n (len mem))

(byte − alistp mem)

(equal (len byte) 8)

(not (equal byte

(load byte n mem))))

(not (equal mem

(store byte n mem byte)))))

The given property indicates that if the memory locationn did not already hold the value byte, storing byte atthe memory location n, as shown by (store byte nmem byte), will change the contents of the memory.

658 J Electron Test (2020) 36:643–663

Theorem 6.4

(def thm copy − paste − in − mem

(implies (and (< n1 (len mem))

(< n2 (len mem))

(not (equal n1 n2))

(byte − alistp mem)

(equal (len (load byte n1 mem))

8))

(equal (load byte n2

(store byte n2 mem

(load byte n1 mem)))

(load byte n1 mem))))

Copying data from the memory location n1 and pasting it atthe memory location n2 produces a replica of information atthe two memory locations. This is represented by the abovetheorem. Both n1 and n2 must be valid memory locationsi.e., (< n1 (len mem)) and (< n2 (len mem)),for the property to hold true. Since store bytewrites onebyte of data at a time, an additional constraint that ensuresonly a single byte of data is copied from location n1 isincluded in the premises of the theorem as (equal (len(load byte n1 mem)) 8).

7.4 Implementation of Hamming Codeson theMemoryModel

The Hamming codes (7, 4), discussed in the previoussection, generate 8-bits codeword for a 4-bits data. Hence,they are well-suited for a byte-addressable memory that canstores and retrieves byte-sized codewords. We implementedthe Hamming encoder and decoder definitions along withthe memory load and store definitions to demonstrate thatthe operation of Hamming codes is unchanged even whenimplemented on a memory model. This is indicated by thefollowing properties:

Theorem 7.1

(def thm mem − noerror − hamm

(implies (and (< n (len mem))

(byte − alistp mem)

(equal (len (list e f g h i j k l))

8)

(booleanp a)

(booleanp b)

(booleanp c)

(booleanp d)

(equal

(hamming7 − 4 − encode

a b c d)

(list e f g h i j k l)))

(equal

(hamming7 − 4 − decode ...

(load byte n

(store byte n mem

(list e f g h i j k l))))

(list a b c d))) ... )

The above theorem indicates that given a byte-addressable memory mem, a valid memory location n,and a boolean codeword (list e f g h i j k l)generated by data bits (a b c d) using hamming encoder(hamming-7-4-encode a b c d), the codewordstored in the memory is retrieved in its correct form in caseof no error. The retrieved codeword decodes to the correctdata bits list a b c d. The proof procedure makesuse of Theorem 6.1 (omitted in the text above for the sakeof simplicity) to rewrite all instances of (load byte n(store byte n mem (list e f g h ij k l))) by (list e f g h i j k l).

Theorem 7.2

(def thm mem − single − error − hamm

(implies (and (< n (len mem))

(byte − alistpmem)

(booleanp a)

(booleanp b)

(booleanp c)

(booleanp d)

(equal

(hamming7 − 4 − encode

a b c d)

(list e f g h i j k l)))

(...(implies

(equal

(len (list e f g h i j k (not l)))

8)

(equal

(hamming7 − 4 − decode...

(load byte n

(store byte n mem (list

e f g h i j k (not l)))))

(list a b c d))))...)

In case of a single-bit error, the retrieved codeword frommemory contains error, i.e., if (list e f g h i j kl) is the correct codeword, an error may cause (list ef g h i j k (not l)) to be stored in the memoryinstead. However, the hamming decoder can extract thecorrect data bits from the codeword containing single-biterror.

659J Electron Test (2020) 36:643–663

7.5 Implementation of Convolutional Codeson theMemoryModel

The 1/2 rate convolutional codes, discussed in Section 6.2,can work with chunks of data of arbitrary number of bits.To use our model of convolutional codes with the byte-addressable memory model, we assume the data to be 4-bitslong, hence generating 8-bits codeword.

Theorem 7.3

(def thm mem − noerror − conv

(implies (and (< n (len mem))

(byte − alistp mem)

(equal (len (cdddr data)) 8)

(true − listp data)

(boolean − listp data)

(> (len data) 3))

(equal

(decode xor

(load byte n

(store byte n mem

(encode xor data (cddr data)))))

(cdddr data)))

In absence of any error, a codeword formed by theconvolutional encoder encode xor is correctly retrievedfrom the memory, using Theorem 6.1. Also, the retrievedcodeword is decoded into the correct data bits sequence(cdddr data), as dictated by Theorem 5.4.

Our case study verifies that the operation of any ECCis memory/ technology independent – the memory modeldoes not effect the inherent behavior of the ECC. Theerror detection and correction properties of ECC remainconsistent when the ECC is used on a byte-addressablememory model. Moreover, unlike the traditional simulation-based ECC analysis, our proposed approach can be easilyextended for the verification of ECCs formalized for muchlarger memories.

8 Comparison to ExistingWorks

As briefly highlighted in Section 3, an extension of SSRe-flect library based on Coq theorem prover was proposedto provide formalization of ECCs, for ensuring error-freecommunication [1–4]. While the approach provided sub-stantial efforts to ensure the “theoretical” reliability of ECCsdeployed in noisy communication channels, the frameworkis difficult to visualize in the hardware implementation ofECCs. Hence, it remains inadequate to provide “practical”reliability guarantees for ECCs.

Similarly, Cotoleta library formalized in the Lean the-orem prover for coding theory provides the formalizationof Hamming code [26]. However, the scope of this workis focused on providing a purely mathematical implemen-tation, as opposed to the hardware implementation of theblock codes.

In contrast, our framework provides a more hardware-realizable approach for ECC formalization. Our librariesformalize ECC encoders and decoders using booleanoperators, which can be visualized as the gate-levelhardware semantics. Hence, it provides a meaningfulverification approach for ECCs in practical applications.Unlike the exiting works [2, 4] where errors were consideredas probabilistic noise model, our framework formalizeserrors in terms of a fault injection model, where an errorcan occur at “any” bit location with a 100% probability.This captures the behavior of soft errors in memoriesmore closely, and aligns with our motivation of verifyingECCs deployed in memories. Moreover, the previous worksfocused on the formalization of solely the block codes [1–4,26]. Our framework, on other hand, provides an ECC librarycatering both block and convolutional codes.

9 Conclusion and Future Directions

Simulation has been the most widely opted testing met-hodology for ensuring reliability of any given ECCused with memory. Similarly, model checking has alsobeen experimented for analyzing the reliability of thedata in memories. However, both of them shows lim-itations when analyzing large memory models. On theother hand, our proposed theorem proving-based verifi-cation approach provides formally verified ECC proper-ties on information bits/codewords of fixed and arbitrarylengths. We believe that our formal framework providesessential stepping stone to ECC verification used for mem-ories.

In this research, we utilize ACL2 to formally analyzethe encoding, decoding, error detection and error correctionproperties of the Hamming and Convolutional codes.We also used record-based, byte-addressable memoryabstraction to establish that our ECCs models ensure datacorrectness even when implemented on a memory.

Hamming or convolution codes also form the basis ofseveral newer ECCs. For instance, the two-dimensionalmatrix codes, in simple terms, are only the implementationof Hamming codes to the rows of the data-bits matrix whileits columns use simple parities [6]. Similarly, the widelyacknowledged Turbo codes are an extension of concatenatedConvolution codes [11]. This provides an edge for theimprovement and enhancement of our framework. Ourexisting libraries can be used as the basis for the verification

660 J Electron Test (2020) 36:643–663

of more advanced ECCs. This will ultimately lead to a moreextensive framework for verification of ECCs not only formemories, but also the communication systems that widelyemploy ECCs too. Furthermore, the framework can also actas cog in the wheel for an all rounded memory verificationfor real-world applications.

References

1. Affeldt R, Garrigue J (2015) Formalization of Error-Correctingcodes: from hamming to modern coding theory. In: UrbanC, Zhang X (eds) Proceedings of International conference oninteractive theorem proving. Springer, LNCS, vol 9236, pp 17–33

2. Affeldt R, Garrigue J (2015) Formalization of Error-CorrectingCodes using SSReflect. MI Lect Note Ser 61:76–78

3. Affeldt R, Garrigue J, Saikawa T (2016) Formalization ofReed-Solomon Codes and progress report on Formalization ofLDPC Codes. In: Proceedings of International Symposium onInformation Theory and Its Applications. IEEE, pp 532–536

4. Affeldt R, Garrigue J, Saikawa T (2020) A library forformalization of linear Error-Correcting codes. Journal ofAutomated Reasoning, pp 1–42

5. Arbel E, Koyfman S, Kudva P, Moran S (2014) Automateddetection and verification of parity-protected memory elements.In: Proceedings of International Conference on Computer-AidedDesign. IEEE, pp 1–8

6. Argyrides C, Pradhan DK, Kocak T (2011) Matrix codes forreliable and cost efficient memory chips. Trans VLSI Syst19(3):420–428

7. Argyrides CA, Reviriego P, Pradhan DK, Maestro JA (2010)Matrix-Based Codes for adjacent error correction. Trans NuclearSci 57(4):2106–2111

8. Baarir S, Braunstein C, Encrenaz E, Ilie JM, Mounier I,Poitrenaud D, Younes S (2011) Feasibility analysis for robustnessquantification by symbolic model checking. Formal Methods SystDes 39(2):165–184

9. Ballarin C, Paulson LC (1999) A Pragmatic Approach toExtending Provers by Computer Algebra—with Applications toCoding Theory. Fund Inf 39(1,2):1–20

10. Baumann R (2005) Soft errors in advanced com-puter systems. Des Test Comput 22(3):258–266.https://doi.org/10.1109/MDT.2005.69

11. Berrou C, Glavieux A, Thitimajshima P (1993) Near Shannonlimit error-correcting coding and decoding: Turbo-codes 1 In:Proceedings of International conference on communications, volvol 2. IEEE, pp 1064–1070

12. Binder D, Smith EC, Holman AB (1975) Satellite anomaliesfrom galactic cosmic rays. Trans Nuclear Sci 22(6):2675–2680.https://doi.org/10.1109/TNS.1975.4328188

13. Bollig B, Wegener I (1996) Improving the variable ordering ofOBDDs is NP-complete. Trans Comput 45(9):993–1002

14. Boyer RS, Goldschlag DM, Kaufmann M, Moore JS (1991)Functional instantiation in First-Order logic. Academic Press,pp 7–26

15. Burlyaev D, Fradet P, Girault A (2014) Verification-guidedvoter minimization in triple-modular redundant circuits. In:Proceedings of Design, Automation & Test in Europe Conference& Exhibition. IEEE, pp 1–6

16. Cabodi G, MurcianoM (2006) BDD-based Hardware Verification.In: Proceedings of International conference on formal methods forthe design of computer, Communication, and Software Systems.Springer, pp 78–107

17. Can B, Yomo H, De Carvalho E (2006) Hybrid ForwardingScheme for Cooperative Relaying in OFDM based Networks. In:Proceedings of International conference on communications, vol10, IEEE, pp 4520–4525

18. Clarke EM (1997) Model checking. In: Proceedings of Interna-tional Conference on Foundations of Software Technology andTheoretical Computer Science. Springer, pp 54–56

19. Clarke EM, Wing JM (1996) Formal methods: State of the art andfuture directions. Comput Surv 28(4):626–643

20. Cota E, Lima F, Rezgui S, Carro L, Velazco R, LubaszewskiM, Reis R (2001) Synthesis of an 8051-Like Micro-Controllertolerant to transient faults. J Electron Test 17(2):149–161

21. Das A, Touba NA (2018) Low Complexity Burst Error CorrectingCodes to Correct MBUs in SRAMs. In: Proceedings of GreatLakes Symposium on VLSI. ACM, pp 219–224

22. Davis J (2006) Memories: Array-like records for ACL2. In:Proceedings International workshop on the ACL2 theorem proverand its applications. ACM, pp 57–60

23. Fey G, Sulflow A, Frehse S, Drechsler R (2011) Effec-tive robustness analysis using bounded model checking tech-niques. Trans Comput-Aided Des Integr Circ Syst 30(8):1239–1252

24. Frehse S, Fey G, Suflow A, Drechsler R (2009) Robustness Checkfor Multiple Faults using Formal Techniques. In: Proceedingsof Euromicro conference on digital system design, architectures,Methods and Tools. IEEE, pp 85–90

25. Frigerio L, Radaelli MA, Salice F (2008) Convolutional Codingfor SEU mitigation In: Proceedings of European Test Symposium.IEEE, pp 191–196

26. Hagiwara M, Nakano K, Kong J (2016) Formalization of CodingTheory using Lean In: Proceedings of International Symposiumon Information Theory and Its Applications. IEEE, pp 522–526

27. Hamming RW (1950) Error detecting and error correcting codes.Bell Syst Techn J 29(2):147–160

28. Han H, Touba NA, Yang JS (2017) Exploiting unused sparecolumns and replaced columns to enhance memory ECC. TransComput-Aided Des Integr Circ Syst 36(9):1580–1591

29. Hasan O, Tahar S (2015) Formal verification methods In:Encyclopedia of information science and technology, 3 edn. IGIGlobal, pp 7162–7170

30. Hentschke R, Marques F, Lima F, Carro L, Susin A, Reis R(2002) Analyzing area and performance penalty of protectingdifferent digital modules with hamming code and triple modularredundancy. In: Proceedings of Integrated Circuits and SystemsDesign. IEEE, pp 95–100

31. Holler A, Kajtazovic N, Preschern C, Kreiner C (2014) Formalfault tolerance analysis of algorithms for redundant systems inearly design stages. In: Proceedings of Software Engineering forResilient Systems. Springer, pp 71–85

32. Hsiao MY (1970) A class of optimal minimum Odd-Weight-Column SEC-DED codes. J Res Dev 14(4):395–401

33. Hussein J, Swift G (2015) Mitigating Single-Event UpsetsXilinx White Paper (WP395)(v1. 1). Available at: https://www.xilinx.com/support/documentation/white papers/wp395-Mitigating-SEUs.pdf

34. Isabelle Theorem Prover (2020) Available at: https://isabelle.in.tum.de/

35. Jiang JHR, Lee CC, Mishchenko A, Huang CY (2010) To SAT ornot to SAT: Scalable exploration of functional dependency. TransComput 59(4):457–467

36. Kaufmann M, Sumners R (2002) Efficient Rewriting of Opera-tions on Finite Structures in ACL2

37. Kaufmann M, Moore JS, Manolios P (2000) Computer-AidedReasoning: An Approach. Kluwer Academic Publishers

38. Kaufmann M, Moore JS, Ray S, Reeber E (2009) Integratingexternal deduction tools with ACL2. J Appl Log 7(1):3–25

661J Electron Test (2020) 36:643–663

39. Kchaou A, Youssef WEH, Tourki R, Bouesse F, Ramos P,Velazco R (2016) A deep analysis of SEU consequencesin the internal memory of LEON3 processor. In: Proceed-ings of Latin-American Test Symposium. IEEE, pp 178–178.https://doi.org/10.1109/LATW.2016.7483358

40. Kong J, Webb DJ, Hagiwara M (2018) Formalization ofinsertion/deletion codes and the Levenshtein metric in lean. In:Proceedings of Information Theory and Its Applications. IEEE,pp 11–15

41. Krautz U, Pflanz M, Jacobi C, Tast HW, Weber K, VierhausHT (2006) Evaluating Coverage of Error Detection Logic forSoft Errors using Formal Methods. In: Proceedings of Designautomation & test in europe conference, vol 1. IEEE, pp 1–6.https://doi.org/10.1109/DATE.2006.244062

42. Lean Theorem Prover (2020) Available at: https://leanprover.github.io/

43. Leveugle R (2005) A new approach for early dependabilityevaluation based on formal property checking and controlledmutations. In: Proceedings of International On-Line TestingSymposium. IEEE, pp 260–265

44. Lin S, Costello DJ (1983) Coding for reliable digital transmissionand storage. Prentice-Hall, chap 1, pp 1–14

45. Lin S, Costello DJ (1983b) Error Control Codin: Fundamentalsand Applications. Pearson-Prentice Hall

46. Lvov A, Lastras-Montano LA, Paruthi V, Shadowen R, El-zeinA (2012) Formal verification of error correcting circuits usingcomputational algebraic geometry. In: Proceedings of FormalMethods in Computer-Aided Design. IEEE, pp 141–148

47. May TC, Woods MH (1979) Alpha-Particle-Induced Softerrors in dynamic memories. Trans Electron Dev 26(1):2–9.https://doi.org/10.1109/T-ED.1979.19370

48. Nicolaidis M (1999) Time redundancy based soft-error toleranceto rescue nanometer technologies. In: Proceedings VLSI TestSymposium. IEEE, pp 86–94

49. Pierre L, Clavel R, Leveugle R (2009) ACL2 for the Verification ofFault-tolerance Properties: First Results. In: Proceedings of Inter-national Workshop on the ACL2 Theorem Prover and Its Applica-tions. ACM, pp 90–99, https://doi.org/10.1145/1637837.1637852

50. Radke WH (2011) Fault-tolerant non-volatile integrated circuitmemory. US Patent 8,046,542

51. Rastogi A, Agarawal M, Gupta B (2009) SEU Mitigation-using1/3 Rate Convolution Coding. In: Proceedings of InternationalConference on Computer Science and Information Technology.IEEE, pp 180–183

52. Sanchez-Macian A, Reviriego P, Maestro JA (2012) Enhanceddetection of double and triple adjacent errors in hamming codesthrough selective bit placement. Trans Device Mater Reliab12(2):357–362

53. Sanchez-Macian A, Reviriego P, Maestro JA (2014) HammingSEC-DAED and extended hamming SEC-DED-TAED codesthrough selective shortening and bit placement. Trans DeviceMater Reliab 14(1):574–576

54. Sanchez-Macian A, Reviriego P, Maestro JA (2016) CombinedSEU and SEFI Protection for Memories using Orthogonal LatinSquare Codes. Trans Circ Syst I: Reg Papers 63(11):1933–1943

55. Seshia SA, Li W, Mitra S (2007) Verification-guided Soft ErrorResilience. In: Proceedings of Design, Automation & Test inEurope Conference & Exhibition. IEEE, pp 1–6

56. Slayman CW (2005) Cache and memory error detection,correction, and reduction techniques for terrestrial serversand workstations. Trans Device Mater Reliab 5(3):397–404.https://doi.org/10.1109/TDMR.2005.856487

57. Verifi-ECC (2020) Available at: https://github.com/Mahum123/Verifi-ECC.git

58. Viterbi A (1971) Convolutional Codes and their Performance inCommunication Systems. Trans Commun Technol 19(5):751–772

59. Zhang P, Muccini H, Li B (2010) A classification and comparisonof model checking software architecture techniques. J Syst Softw83(5):723–744

60. Ziegler JF, Curtis HW, Muhlfeld HP, Montrose CJ, Chin B,Nicewicz M, Russell CA, Wang WY, Freeman LB, HosierP, LaFave LE, Walsh JL, Orro JM, Unger GJ, Ross JM,O’Gorman TJ, Messina B, Sullivan TD, Sykes AJ, Yourke H,Enger TA, Tolat V, Scott TS, Taber AH, Sussman RJ, KleinWA, Wahaus CW (1996) IBM Experiments in soft fails incomputer electronics (1978—1994). IBM J Res Dev 40(1):3–18.https://doi.org/10.1147/rd.401.0003

Publisher’s Note Springer Nature remains neutral with regard tojurisdictional claims in published maps and institutional affiliations.

Mahum Naseer received her B.E. in Electronics Engineering degreefrom NED University of Engineering and Technology, and M.S. inElectrical Engineering degree from National University of Sciencesand Technology (NUST), Pakistan, in 2016 and 2018 respectively.Her current research interests include reliability analysis of systems,error control coding, resilient systems, and formal methods for systemverification.

Waqar Ahmad received his Ph.D. and M. Phil degrees from NationalUniversity of Sciences and Technology (NUST), Islamabad, Pakistanand Quaid-i-Azam University, Islamabad, Pakistan in 2012 and 2017,respectively. He worked as a postdoctoral fellow at the HardwareVerification Group (HVG) of Concordia University, Montreal, Canadafor two years from 2018 to 2019. Partly, he is volunteering as aresearch associate with HVG at Concordia University and SystemAnalysis and Verification (SAVe) Lab at NUST. His area of interestincludes formal reasoning and dependability analysis of safety-criticalsystems. He published more than 20 research papers. He won theyoung researcher award from Heidelberg Leureate Forum (HLF-18),Germany, the best researcher awards from SAVE lab, in 2015 and2016, and the best paper award in WCE-11, London, UK. He is also amember of IEEE young professionals.

662 J Electron Test (2020) 36:643–663

Osman Hasan received his BEng (Hons) degree from the Universityof Engineering and Technology, Peshawar Pakistan in 1997, and theMEng and PhD degrees from Concordia University, Montreal, Quebec,Canada in 2001 and 2008, respectively. Before his PhD, he workedas an ASIC Design Engineer from 2001 to 2004 at LSI Logic. Heworked as a postdoctoral fellow at the Hardware Verification Group(HVG) of Concordia University for one year until August 2009.Currently, he is an Associate Professor and the Head of Departmentof Electrical Engineering at the School of Electrical Engineering andComputer Science of National University of Science and Technology(NUST), Islamabad, Pakistan. He is the founder and director ofSystem Analysis and Verification (SAVe) Lab at NUST, which mainlyfocuses on the design and formal verification of energy, embeddedand e-health related systems. He has received several awards anddistinctions, including the Pakistan’s Higher Education Commission’sBest University Teacher (2010) and Best Young Researcher Award(2011) and the President’s gold medal for the best teacher of theUniversity from NUST in 2015. Dr. Hasan is a senior member of IEEE,member of the ACM, Association for Automated Reasoning (AAR)and the Pakistan Engineering Council.

663J Electron Test (2020) 36:643–663

Related Documents