Top Banner
Synthesis of Speed Independent Circuits Based on Decomposition Tomohiro Yoneda National Institute of Informatics [email protected] Hiroomi Onda Tokyo Institute of Technology [email protected] Chris Myers University of Utah [email protected] Abstract This paper presents a decomposition method for speed- independent circuit design that is capable of significantly reducing the cost of synthesis. In particular, this method synthesizes each output individually. It begins by contract- ing the STG to include only transitions on the output of interest and its trigger signals. Next, the reachable state space for this contracted STG is analyzed to determine a minimal number of additional signals which must be rein- troduced into the STG to obtain CSC. The circuit for this output is then synthesized from this STG. Results show that the quality of the circuit implementation is nearly as good as the one found from the full reachable state space, but it can be applied to find circuits for which full state space methods cannot be successfully applied. The proposed method has been implemented as a part of our tool nutas (Nii-Utah Timed Asynchronous circuit Syn- thesis system), and its very first version is available at http://research.nii.ac.jp/ yoneda. Key Words: Decomposition, synthesis, STGs, abstrac- tion, speed-independent circuits. 1. Introduction Logic synthesis [1, 2, 3] from low level specification lan- guages is one of the major approaches to the automated syn- thesis of asynchronous circuits. This approach can poten- tially synthesize more optimized circuits with higher per- formance than other methods such as syntax directed trans- lation method [4, 5, 6, 7, 8, 9]. It, however, usually requires an enumeration of the state space of the given specifica- tion, and it often suffers from the state explosion problem. Thus, large specifications expressed in hardware description languages have usually been synthesized by syntax directed translation methods or similar techniques that do not require state space enumeration, sometimes with local optimiza- tion techniques such as [10]. This paper tackles the chal- lenge of using logic synthesis also for large specifications This research is supported by JSPS Joint Research Projects. This research is supported by NSF Japan Program award INT-0087281 and SRC grant 2002-TJ-1024. derived from hardware description languages, as it has the potential of in the future providing further global optimiza- tion through timed circuit synthesis [11]. In this approach, a specification written in some high-level language is first translated to a signal transition graph (STG), and, then logic synthesis is applied to this STG. This method requires a compiler to generate STGs with the complete state coding (CSC) property, and an efficient logic synthesis method. A preliminary tool for the former is described in [12], and im- proved version is described in [13]. Guaranteeing CSC by such a correct-by-construction method, which may not give optimal solutions in the number of inserted state variables, is practical for large STGs, because automatic CSC solvers sometimes do not handle such STGs well. Note that only a small number of inserted state variables are actually used to implement each output, and so, the delays of the circuits are not significantly affected even in a non-optimal solu- tion. This paper is for handling the latter issue, and aims at reducing the average cost for logic synthesis from STGs by decomposing a specification and running the logic synthe- sis procedure for each small sub-specification. The idea for decomposition based synthesis is first pro- posed by Chu [14]. In his work, one primary output is picked up, and the given STG is modified by replacing each transition for the signal that does not affect the output by a dummy transition. Then, the modified STG is reduced by eliminating selected dummy transitions while preserving the behavior. A correct circuit can be synthesized from this reduced STG with usually much smaller cost. This work, however, had two open problems. First, the reduction of STGs, called contraction, was not formalized. For a sim- ple STG such as a marked graph, its contraction is easy. But, in the general case, the formalized algorithm was un- known at that time. Second, it was not straightforward to de- cide if a signal actually affects the output signal or not, and no algorithm to make this decision is given in his thesis. As for the first problem, Vogler and Wollowski recently formal- ized the contraction algorithm using a bisimulation relation in [15], and Zheng and Myers developed a timed contrac- tion algorithm in [16]. On the other hand, Puri and Gu tried to solve the second problem in [17]. Their algorithm greed- ily removes an irrelevant signal (with respect to the output signal) such that the number of CSC conflicts does not in- crease by hiding the signal. This algorithm is, however, not
11

Synthesis of Speed Independent Circuits Based on Decomposition

Mar 11, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Synthesis of Speed Independent Circuits Based on Decomposition

Synthesisof SpeedIndependentCir cuits Basedon Decomposition

TomohiroYoneda�

NationalInstituteof [email protected]

Hiroomi OndaTokyo Instituteof Technology

[email protected]

ChrisMyers�

Universityof [email protected]

Abstract

Thispaperpresentsa decompositionmethodfor speed-independentcircuit designthat is capableof significantlyreducingthe cost of synthesis.In particular, this methodsynthesizeseach outputindividually. It beginsby contract-ing the STG to include only transitionson the output ofinterest and its trigger signals.Next, the reachable statespacefor this contractedSTGis analyzedto determineaminimal numberof additional signalswhich mustbe rein-troducedinto the STGto obtain CSC.Thecircuit for thisoutput is then synthesizedfrom this STG. Resultsshowthat the quality of the circuit implementationis nearlyas good as the one found from the full reachable statespace, but it can be applied to find circuits for which fullstate spacemethodscannot be successfullyapplied. Theproposedmethodhas beenimplementedas a part of ourtool nutas (Nii-Utah Timed Asynchronous circuit Syn-thesis system), and its very first version is available athttp://research.nii.ac.jp/ � yoneda.

Key Words: Decomposition,synthesis,STGs,abstrac-tion, speed-independentcircuits.

1. Intr oduction

Logic synthesis[1, 2,3] from low level specificationlan-guagesis oneof themajorapproachesto theautomatedsyn-thesisof asynchronouscircuits. This approachcanpoten-tially synthesizemore optimizedcircuits with higherper-formancethanothermethodssuchassyntaxdirectedtrans-lationmethod[4, 5, 6, 7, 8, 9]. It, however, usuallyrequiresan enumerationof the statespaceof the given specifica-tion, andit oftensuffers from thestateexplosionproblem.Thus,largespecificationsexpressedin hardwaredescriptionlanguageshaveusuallybeensynthesizedby syntaxdirectedtranslationmethodsorsimilartechniquesthatdonotrequirestatespaceenumeration,sometimeswith local optimiza-tion techniquessuchas [10]. This papertacklesthe chal-lengeof using logic synthesisalsofor large specifications

� This researchis supportedby JSPSJointResearchProjects.�Thisresearchissupportedby NSFJapanProgramawardINT-0087281andSRCgrant2002-TJ-1024.

derived from hardwaredescriptionlanguages,asit hasthepotentialof in thefutureproviding furtherglobaloptimiza-tion throughtimedcircuit synthesis[11]. In this approach,a specificationwritten in somehigh-level languageis firsttranslatedto asignaltransitiongraph(STG),and,thenlogicsynthesisis appliedto this STG. This methodrequiresacompilerto generateSTGswith the completestatecoding(CSC)property, andanefficient logic synthesismethod.Apreliminarytool for theformeris describedin [12], andim-provedversionis describedin [13]. GuaranteeingCSCbysuchacorrect-by-constructionmethod,whichmaynotgiveoptimalsolutionsin thenumberof insertedstatevariables,is practicalfor largeSTGs,becauseautomaticCSCsolverssometimesdo not handlesuchSTGswell. Note that onlya smallnumberof insertedstatevariablesareactuallyusedto implementeachoutput,andso,thedelaysof thecircuitsare not significantly affectedeven in a non-optimalsolu-tion. This paperis for handlingthelatter issue,andaimsatreducingtheaveragecostfor logic synthesisfrom STGsbydecomposinga specificationandrunningthe logic synthe-sisprocedurefor eachsmallsub-specification.

The ideafor decompositionbasedsynthesisis first pro-posedby Chu [14]. In his work, one primary output ispickedup,andthegivenSTGis modifiedby replacingeachtransitionfor the signal that doesnot affect the outputbya dummy transition.Then, the modified STG is reducedby eliminatingselecteddummytransitionswhile preservingthebehavior. A correctcircuit canbesynthesizedfrom thisreducedSTG with usually muchsmallercost.This work,however, had two openproblems.First, the reductionofSTGs,called contraction, was not formalized.For a sim-ple STG suchas a marked graph,its contractionis easy.But, in the generalcase,the formalizedalgorithmwasun-knownatthattime.Second,it wasnotstraightforwardto de-cideif a signalactuallyaffectstheoutputsignalor not,andnoalgorithmto make thisdecisionis givenin his thesis.Asfor thefirst problem,VoglerandWollowski recentlyformal-izedthecontractionalgorithmusinga bisimulationrelationin [15], andZhengandMyers developeda timed contrac-tion algorithmin [16]. On theotherhand,Puri andGu triedto solve thesecondproblemin [17]. Theiralgorithmgreed-ily removesan irrelevantsignal(with respectto theoutputsignal)suchthat the numberof CSCconflictsdoesnot in-creaseby hiding thesignal.This algorithmis, however, not

Page 2: Synthesis of Speed Independent Circuits Based on Decomposition

sohelpful for our purpose,becauseit needsthestategraphof the original STG, which cannotbe constructeddue tostateexplosionfor very largeSTGs.Beister, Eckstein,andWollowski proposedasimilardecompositionbasedmethodfor extended-burst-modemachines[18].

The maincontribution of this work is to proposea newalgorithmto find a sufficient setof input signalsfor a givenoutputfor thedecompositionbasedsynthesisapproach.Thealgorithmstartswith a small setof signalswhich arecer-tainly neededfor theoutputsignal,andusesonly thestategraphsof thecontractedSTGsfor determiningotherneces-saryinput signals.Sincethe stategraphsof the contractedSTGsare usually very small, it doesnot suffer from thestateexplosion problem.Furthermore,its decisionproce-durecomputescandidatesof thenecessarysignalsin manycasesmoredirectly than the greedyalgorithmin [17], al-thoughsomecasesneedheuristics.

Theproposedalgorithm,however, hasthefollowing re-strictionsontheclassof STGsto behandled.First,thegivenSTG must be 1-safeand output semi-modular, where in-tuitively, the numberof tokensin eachplacemustnot ex-ceedonein a1-safeSTG,andoutputtransitionsarenotdis-abledby or donot disableany othertransitionsin anoutputsemi-modularSTG.Note that thesearerequiredin almostall logic synthesisalgorithms.Second,thegivenSTGmusthaveCSC.Thisis notsorestrictivefor ourpurpose,becauseourcompilerfrom ahigh-level specificationlanguageguar-anteesit asmentionedbefore.Finally, thegivenSTGmustsatisfy the following two properties:(1) the guidedsimu-lation with respectto eachoutput signal terminates,and(2) for every two reachablemarkingsof the STG, eitheroneis reachablefrom the other. Theserequirementscomefrom the analysismethodof the CSCviolation traces,andareexplainedin Section4. We believe that many specifi-cationsto which logic synthesisis appliedsatisfythesere-quirements.At least,every benchmarkcircuit specificationshown in Section5 satisfiesthem.

Therestof this paperis organizedasfollows. Section2shows thebasictheoryof our decompositionbasedsynthe-sis,whereseveralnotationsarebasedon[15] and[19]. Sec-tion 3 describestheoverview of theproposedmethod,andSection4 explains in detail how the input setsare deter-mined,which is themainissueof thispaper. Severalexper-imentalresultsareshown in Section5, andSection6 givesourconclusion.

2. Basictheory

An STG ������� �� �� ������� In Out� is a labelednet,where� is asetof places, � is asetof transitions( ��������), ����� �"!��#�%$&�'�(!)�*� is theflowrelation, � � is theini-

tial marking, �,+-��.�� In $ Out�/!1032465178$)0:9;7 is thelabel-ing function,In andOutaretheinputandoutputsignalsets.Let <>= ?@� �*� denoteIn $ Out. A transition A with � �'AB�C�D9is calleddummy. For EGFH<I= ?J��*� , E -transitiondenotesa

transition A with �B�AB�K�DE12 or E*5 . For any transition A ,L AM�N0�OPFP�RQJ�SOTBAB��FP�U7 and A L �N0�OPFP�VQW�A>OJ��FP�U7denotethesourceplacesandthedestinationplacesof A , re-spectively. TransitionsA andA�X suchthat L A>� L A�XZY� �

aresaidto bein conflict. NotethatwhenSTGs � , �&[ , etc.arecon-sidered,theircorrespondingcomponents� , � , etc., �\[ , �][ ,etc.areimplicitly considered.

A marking � of � is any subsetof � . A transitionA is en-abled in a marking � if L AU�N� (all its sourceplaceshavetokensin � ); otherwise,it is disabled. If a transitionA is en-abledin � , it canfire, andanew marking �;X^�_���5 L AB�\$`A Lis obtained,denotedby �ba.c�;X . For asequencede�fA [gA�h\iji6iof transitions,�bk.l� X is definedsimilarly ( � is equalto � Xfor an empty d ). d is calleda trace, if thereexists �;X suchthat � � k.G�;X . Let m>npo3qgr�� �*� denotethesetof all tracesof � .A tracemaycontainmultiple occurrencesof thesametran-sition. In this paper, it is assumedthat thoseoccurrencesof the sametransitionaredistinguishedby someappropri-ateway, suchas,by attachingfiring counts.

Eachmarking hasa statevector, which representsthevaluesof signalsin In $ Out. Differentmarkingsmayhavethe samestatevector. In this paper, a stateimplies a statevector or a set of markingswith the samestatevector. Itis sometimesconvenientto annotateto a statetheinforma-tion whethertheoutputsareexcitedto riseor fall. For thispurpose,s or � is usedin additionto 0 or 1 in statevec-tors. s representsthe binary valueof 0, but it implies thattheoutputis excitedto rise. � indicatesthesignalis 1,but itis excited to fall. Whenthesetwo notationswith andwith-out ��tus shouldbe distinguished,we call the former dec-orated states, and the latter nondecorated states. For ex-ample,supposethat two markings� and �;X have decoratedstate(1010)and(101R).They have thecommonnondeco-ratedstate(1010),but the behavior of the output is differ-ent in thosemarkings.This situationis calleda CSCviola-tion, andthesetwo markingsareaCSCviolationpair. If anSTGhasaCSCviolationpair, wesaythattheSTGdoesnothave CSC.Otherwise,it hasCSC.If anSTGdoesnot haveCSC,a circuit cannotbesynthesizedfrom theSTG.

Thepropertycalledoutputsemi-modularityis alsonec-essaryto synthesizea circuit from an STG.Formally, thispropertyis violated,if andonly if therearetwo chains d�[and duh of dummytransitionssuchthat their first transitionsarein conflict,andthateitherof thenon-dummytransitionsthatfollow d�[ or duh is relatedto anoutputsignal.Themostsimplecaseis that d [ and d h areempty, andso,an outputtransitiondisablesor is disabledby aninput or outputtran-sition.

For v�F Out, ES�v^21� denotesa setof reachablenon-decoratedstateswherev canrise,QS�'v�21� is asetof reach-ablenondecoratedstateswherev is stablehigh.ES�'vT5�� andQS�'vT5�� aredefinedsimilarly. Theotherstatesareunreach-able,andthis setis denotedby UR. Fromthe definitionofCSC,if andonly if anSTGhasCSC,its ES�'v�21� , QS�'v�21� ,ES�v]5�� , andQS�v]5�� setsaredisjoint for eachvwF Out.

Page 3: Synthesis of Speed Independent Circuits Based on Decomposition

In this paper, the implementationtechnologiesconsid-eredareatomicgatesandgeneralized-C(gC) elements. Acircuit for anatomicgateimplementationfor eachvPF Outis definedby acover xe�'v^� , whichis asetof stateswherethelogic functionof thecircuit produces1.Thecoveris correctwith respectto � , if it satisfies

xe�vJ�Z5 UR � ES�'v^21�]$ QS�'v^21�gyThe gC implementationneeds two covers xe�v^21� andxe�'v]5�� , andthey arecorrectwith respectto � , if they sat-isfy

ES�v^21���fxe�'v^21�z5 UR � ES�'v�21�;$ QS�'v�21�>ES�v]5����fxe�'v]5��z5 UR � ES�'vT5��;$ QS�'vT5��>y

If the coversarecorrectwith respectto � , thenthe corre-spondingcircuit is alsocorrectwith respectto � .

For a nondecoratedstate { and a set | of signals,the| -closure of { , denotedby }/~�� {:� , is a set of all non-decoratedstates,including { , such that their state vec-tors are the sameif the signals in | are projectedout.The core of a | -closureis the commonstatevector ob-tained by projecting out the signals in | . For example,for {���� ���g�g�������B�-�j�W�%� and |���0:�J��%7 , } ~ ��{:�f�0%�-���/�� �/�%�/��6�j���8�-j�u�j�/�u7 andits coreis ��6���1����W�%� . Themappingsfrom | -closure }/~�� {:� to its core {jX and its in-verseare definedby �un��g� ~ �'}8~&� {:�B� and �un��>�B� [~ ��{%X�� . Notethatbotharethe one-to-onemappings.The | -closureandthesemappingsareextendedto setsasfollows: }/~�� ���e��4�I��� }/~�� {:� , �un��>� ~ �p}/~�� ���B���D0-�-n��g� ~ �p}/~�� {:� �KQZ{(F���7 ,and �un��g� � [~ � �zX��z� �U���'����� �un��>��� [~ � {%X�� .

For anSTG � and v F Out, a set | of signalsis an ir-relevantinputsetfor v , if

1. |�� In $ Out 5 0jv;7 ,

2. }/~#� ES�v^21� �\5 UR � ES�v^21� , and

3. } ~ � ES�v]5�� �\5 UR � ES�v]5�� .Fromthis definition,thefollowing lemmaholds.

Lemma 1 For an STG � with CSC andan irrelevant in-put set | for its output v , }/~�� ES�'v�21�B� , }/~�� QS�'v�21�B� ,}8~1� ES�v]5��B� , and }/~�� QS�v]5�� � aredisjoint.

Theproof is shown in theappendix.Fromthis lemmaand}8~1���z��¡¢� for any set � , QS�v^21� andQS�'vT5�� satisfy

1. }/~#� QS�'v^21� �z5 UR � QS�'v�21� , and

2. } ~ � QS�'v]5�� �z5 UR � QS�'vT5�� ,when � hasCSCand | is anirrelevantinputset.

For anSTG � anda set | of signals,let � ~ denoteanSTGobtainedfrom � by makingtransitionsrelatedto sig-nalsin | dummy. Then,thefollowing lemmaholds.

Lemma 2 Supposethat an STG � hasCSCandis outputsemi-modular. For any vwF Out andany irrelevantinput set| for v , aspeed-independentcircuit for v synthesizedfrom��~ is correctwith respectto � .

Intuitively, thiscanbeexplainedasfollows.Fromtheabovepropertiesfor }/~�� ES�'v�21�B� , }8~�� ES�v]5��B� , }/~�� QS�'v�21�B� ,and }/~1� QS�v]5��B� , even if the valuesof signalsin | arechangedin a state,theresultingstatefalls in thesamestateset(i.e., £4�¤�'v�21� , £4�¤�'vT5�� , ¥*���'v^21� , or ¥*�¤�v]5�� ) astheoriginal state,if it is reachable.Hence,the behavior of anSTG is not affectedby projectingout the signalsin | . Amoreformalproof is shown in theappendix.

On theotherhand,for a non-irrelevant input set | , � ~no longerhasCSCasshown below.

Lemma 3 Supposethat an STG � hasCSC,andfor v¦FOut, aset | of signalswith |V� In $ Out 5K0jv]7 is notanir-relevantinput setfor v . Then, � ~ doesnot have CSCwithrespectto v .(Proof) Since| is notanirrelevantinputset,thereexist ei-ther states{§F ES�v^21� and {jXUF_}/~���{%��5 UR suchthat{%X�YF ES�'v�21� , or states{�F ES�'v]5�� and {%X\F`}/~�� {:�z5 URsuch that {%X YF ES�'v]5�� . In the former case, {%X must bein QS�'v^21��$ ES�'v]5���$ QS�'vT5�� . But, the value of v in{ is 0, andso is the valueof v in { X from vVYF�| . Thus,{%XzF QS�v]5�� holds. { and {jX aremappedin thesamestatein ��~ . Hence, {wF ES�'v�21� and {%X�F QS�v]5�� imply that� ~ hasa CSCviolation with respectto v . Thesimilar dis-cussionholdsfor thelattercase. (Q.E.D.)

For STGs �&[ and ��h , a simulationfrom �&[ to ��h is arelation � betweenmarkingsof � [ and � h satisfyingL �'� � [ B� � h ��F�� , andL for all �'�T[uB�;hj��Fc� and all �T[¨k. �;X[ with d©�A [IA�hzi6i6i A�ª��'«�¬­��� , thereexists somefiring sequence

d�X��¨A�X[ A�Xh i6i6i A�X®4�¯ ¬°��� and �;Xh such that �;h k�

.�;Xh , �[3�'d8�����ph��'d�X�� and �'�;X[ B�;Xh �±F¨� , where for² � ² [ ² h\i6iji ²�³ � ´µ¬¶��� , � � ² � is obtained from�B� ² [g���B� ² h%�^i6iji � � ²�³ � by deleting 9;yIf · is asimulationfrom �&[ to ��h , and · � [ is asimulationfrom ��h to �&[ , then · is a bisimulationbetween�&[ and��h . Let �&[�¸¹��h and �&[eº»��h denotethat thereexist asimulationfrom �&[ to ��h anda bisimulationbetween�&[and ��h , respectively.

For anSTG � , v`F Out, and ¼±½�<>= ?@��*� suchthat v`F¼ , o�¾:<j� �e�¼�Bv^� isany STGwith theinputsignalset ¼�510%v;7andtheoutputsignalset 0jv]7 suchthat o�¾:<j� �e�¼�Bv^��º�� ~with |R��<I= ?J��*�Z5(¼ . o�¾:<6� �e�¼�BvJ� canbeobtainedby thenetcontractionalgorithm,andit canusuallybeconstructedsuchthatits statespaceis muchsmallerthanthatof � . Thedetailscanbefound,for instance,in [15].

Theorem1 Supposethat an STG � hasCSCandis out-put semi-modular. For vPF Out andsome¼»½¢<I= ?^��*� withv¿F(¼ , if o�¾:<6� �e�¼�BvJ� hasCSC,thena speed-independentcircuit for v synthesizedfrom o�¾:<j� �e�¼�BvJ� is correctwithrespectto � .(Proof) If | ��<>= ?@��*�U5À¼ is not an irrelevant inputset for v , then from Lemma 3, ��~ doesnot have CSCwith respectto v . This implies that o�¾%<6� �e�¼�Bv^� doesnot

Page 4: Synthesis of Speed Independent Circuits Based on Decomposition

decompositionbasedsynthesis( Á ) Âforall ÃeÄ Out ÂÁ�Å>Æ Ç = obtain synthesizable abs( Á , à )if ( Á Å>ÆÇ == ‘‘impossible’’) then abortÈZÉ

= logic synthesis( Á¤Å>ÆÇ )ÊÊFigure 1. Top-le vel algorithm for synthesis.

have CSCeither, becauseo�¾:<j��eI¼z vJ�Kºb��~ . Hence, |is an irrelevant input set for v . From Lemma2, a correctspeed-independentcircuit for v is synthesizedfrom ��~ .Again,from thebisimilarity between� ~ and o�¾:<6� �e�¼�BvJ� ,o�¾%<6� �e�¼�Bv^� producesthesamecircuit astheoneobtainedfrom � ~ . (Q.E.D.)

From this theorem,if an input set ¼ suchthat o�¾:<j� �e¼z vJ� hasCSCis determined,a correctspeed-independentcircuit for v canbe synthesizedefficiently. The main con-tribution of this work is to develop its decisionprocedurewithoutconstructingthestategraphof � .

3. Decompositionbasedsynthesisoverview

Thetop level algorithmfor theproposeddecompositionbasedsynthesisis shown in Figure1. It tries to computeasynthesizableabstraction�*ËgÌ � for eachoutputsignal v of� . This is actually o�¾%<j��e�¼�Bv^� thathasCSCfor some¼ . Ifit is impossible,thenit is provenin thetheoremshown laterthat � doesnothaveCSC,andsothealgorithmaborts.Oth-erwise,anordinaryspeed-independentlogic synthesistoolsuchasPetrify or ATACSis appliedto � Ë6Ì � to synthesizeacircuit for v .

Thealgorithmfor obtainingsynthesizableabstractionisshown in Figure2. It first constructstheinitial input setforv by taking thesignalsthatmake v enabled,calledtriggersignalsfor v 1. This is becausetriggersignalsbelongto noirrelevantinput setsasshown in theproofof Lemma2.

For this initial input set ¼ , the algorithm next com-putes o�¾%<%��eI¼z vJ� 2, and checkif it hasCSC. If it does,the algorithm returnsit. Otherwise,someset of tracesofo�¾%<6� �e�¼�Bv^� that causeCSC violations, x��zx�¼ , is ex-tracted 3 by generatingand checking the stategraph ofo�¾%<6� �e�¼�Bv^� . Note that this stategraph is usually muchsmallerthanthatof theoriginal � . Thealgorithmthenan-alyzeseachÍ F�x��zx�¼ andtries to find candidateinputs

1 This is actuallyimplementedin a conservative way suchthat it takesthesignalsrelatedto thefirst non-dummytransitionsreachedfrom Î-Ïor Î�Ð transitionsby theupwardnettraversalof Ñ .

2 A simplifiedversionof thealgorithmshown in [15] is usedto computeÒgÓ�ÔBÕ ÑMÖØ×@Ö Î-Ù .3 In our currentimplementation,oneshortesttraceis selectedfor each

CSCviolation pair, becauseusingall CSCviolation tracesis very ex-pensive.

obtain synthesizableabs( Á , Ã ) ÂÚ= initial input set( Á , Ã )

loop ÂÁ ÅIÆ Ç = obtain abs( Á , Ú, Ã )

if ( Á�Å>Æ Ç has CSC) then return Á�ÅIÆ ÇȤÛ;È Ú=

obtain CSC violation trace set( Á�Å>Æ Ç )forall Ü�Ä È¤Û;È Ú ÂÝ ÞußWà3á'à3Þ3â�ã =

analyze CSCV trace( Ü , Á , Ú, à , äJå , ßWæ�ç�ç )

if ( Ý�Þuß/à3á'à3ÞuâØã == ‘‘impossible’’) thenreturn ‘‘impossible’’

add constraints matrix( Ý�Þuß/à3áà3Þ3âØã )ÊßWã>è Ú

= solve covering problem()Ú=

Ú(é ßWãgè ÚÊÊ

Figure 2. Algorithm to obtain synthesizab leabstraction.

to be addedin orderto resolve the CSCviolation. The al-gorithmmaydecidethatit is impossibleto resolve theCSCviolation by addingany signalsin � . In this case,the al-gorithm returns“impossible”. Otherwise,�6��«;��ê�����A�ë con-tainsa setof requirementssuchthat eachrequirementis asetof signals,and it is satisfiedif at leastoneof them isaddedto ¼ . In orderto resolve theCSCviolation,every re-quirementmustbesatisfied.Thoserequirementsareaddedin the constraintmatrix to setup a coveringproblem.Thisprocessis repeatedfor every CSCviolation tracein CSCV.Finally, the covering problemis solved for thoserequire-ments,andthe optimal setof signalsareaddedto ¼ . This¼ is usedto computeanew o�¾%<6� �e�¼�Bv^� , andthealgorithmrepeatstheaboveprocess.

4. Analyzing CSCviolation trace

This section shows the detailed algorithm for ana-lyze CSCV trace that analyzesthe CSC violation tracesanddeterminesa setof requirementsfor anappropriatein-put setfor v . As shown in Figure2, analyzeCSCV tracereceives Í asa parameter. This Í is a traceof o�¾%<j��e�¼�Bv^�that causesa CSCviolation. In order to find an appropri-ateinput setfor v , it is necessaryto obtaina concretetraceof � thatcorrespondsto Í . This canbe donewithout gen-eratingthe stategraphof � by a techniquesimilar to theone used for the partial order reduction,which we callguidedsimulation. This sectionstartswith theguidedsim-ulation algorithm, and then shows how to determinethecandidatesfor the input. In the following, an interfacesig-nal meansthesignalsusedin o�¾%<%��eI¼z vJ� , i.e., thesignalsin ¼ , anda noninterfacesignal meansthe remainingsig-

Page 5: Synthesis of Speed Independent Circuits Based on Decomposition

analyze CSCV trace( Ü , Á , Ú, Ã , ä , ì ) Â

if ( Ü is empty) ÂÝ�Þuß/à3áà3Þ3âØã = find inputs( ì , Á , Ú, à )

return Ý�Þuß/à3á'à3Þ3â�ãÊÜ�í = pick the first transition of Üî

= find firing trans( ä , Ü í )if (

îis empty) return ‘‘backtrack’’

forall â Ä î Âä@ï = fire( ä , â )Ü ï = Üif ( â == Ü-í )

remove first transition of Ü ïð%ã6ñIæ8çòâ =analyze CSCV trace( Ü ï , Á , Ú

, à , ä ï , ì�ó â )if ( ðjãjñIæ8çôâ õö ‘‘backtrack’’)

exit forallÊreturn ð%ã6ñIæ8çòâÊ

Figure 3. Algorithm for the guided sim ulation.

nalsof � , i.e., thesignalsin |D�N<>= ?J� �*��5¢¼ . Thecorre-spondingtransitionsarecalledsimilarly.

4.1. Guided simulation

For a givenabstractedtraceÍ , theguidedsimulationob-tainsa trace ÷ of � suchthat a traceobtainedby project-ing out thenoninterfacesignalsfrom ÷ is equalto Í . Moreformally, ø-r-ùô�|úI÷�����Í with |b�N<I= ?^��*��5 ¼ , whereforû ��A [>A�hgA�üziji6i andaset £ of signals,if �B�A [g��Fú£"!�0u246517 ,then ø-r-ùô�£� û �H��ý , otherwise, ø�ruùô� £� û ����A [Iý , withýe��ø�ruùô� £�BA�hjA�ü\iji6i � .

Althoughaninterfacetransitionanda noninterfacetran-sitionthatareconcurrentcanfire in any order, ouralgorithmfiresall possiblenoninterfacetransitionsbeforefiring anin-terfacetransitionbecausethis greatly simplifies the anal-ysis algorithmshown later. We call a tracesatisfyingthispropertya regular trace.This simplificationcan,however,causesituationswherethe guidedsimulationdoesnot ter-minate.Thishappens,for example,whenthereexistsa loopin which noninterface transitionscan fire forever. Practi-cally, suchsituationscanbedetectedby countingthe con-tinuousfirings of noninterfacetransitionsandcheckingif itexceedssomeupperbound.

Thealgorithmfor theguidedsimulationis shown in Fig-ure 3. It forms the body of the recursive procedurean-alyze CSCV trace. If Í is nonempty, the algorithmpicks the first transition of Í , denotedby Í�[ , and com-putes by find firing trans �'�\BÍ�[6� a set of transitionsthat should be fired for Í�[ . Figure 4 shows the algo-rithm for find firing trans. It first computesa set of po-tentially necessarytransitions for Í�[ , which is 0jÍ�[j7:$

find firing trans( ä , â ) Âif ( â Ä ã>ß/ÞuþIç�ã>à8ÿ ä�� )ðjã6ñIæ8çòâ ö  â Ê é ÿ'ã>ß/ÞuþIç�ã>à8ÿ ä���� NonIF�else ð%ã6ñIæ8çòâ ö necessary( ä , â )if ( � æ Ä ðjã6ñIæ8çòâ � NonIF s.t. conflictÿpæ � ö  æ Ê

)return  æ Ê

if ( ð%ã6ñIæ8çòâ�� conflictÿ�â � õö�� )return ( ð%ã6ñIæ8çòâ�� conflictÿ�â � )

return ð%ã6ñIæ8çòâÊnecessary( ä , â ) Âif ( â is already visited) return �if ( â Ä ã>ß/ÞuþIç�ã>à8ÿ ä�� ) return  â Êðjã6ñIæ8çòâ ö the set of all transitionsforall ÿ Ä�� â � ä��wÂ� ö��forall ÿ�â ï Ä�� � NonIF�� ö � é

necessary( ä�� â ï )if ( � is smaller than ð%ã6ñIæ8çòâ )ðjã6ñIæ8çòâ ö �Ê

return ð%ã6ñIæ8çòâÊ

Figure 4. Algorithms for finding transitions tobe fired and for constructing necessar y sets.

�ë%«;�8�>�ë:�J�'�]�Z� NonIF� if Í [ is enabled,whereNonIF is asetof all noninterfacetransitions,andnecessary�'�\ Í [ � oth-erwise.The former is for satisfyingregularity, andthe lat-ter is a minimal set of enablednoninterface transitionssuchthat Í [ can never be enabledif noneof thosetran-sitions is fired. For example,considerthe netsshown inFigure 5, where the transitionsexcept for Í�[ are nonin-terface. In the caseof Figure 5(a), the necessaryset ofÍ�[ is 0jA [:7 or 0%A�h:7 . 0%A [j7 canbe a necessarysetof Í�[ , be-causeÍ�[ cannotbeenabledif A [ is notfired.Similarly, 0%A�h37canbe anothernecessarysetof Í�[ . On the otherhand,inthecaseof Figure5(b),evenif A [ is notfired, Í�[ maybeen-abled by a token producedby firing A h . If neither A [nor A h is fired, Í [ cannot be enabled.Thus, the neces-sarysetof Í [ for thiscaseis 0%A [ BA h 7 . find firing trans thenchoosesasetof transitionsthatshouldactuallybefired.Thefirst two conditionsarefor firing enablednoninterfacetran-sitions earlier than Í�[ in order to satisfy regularity. If� ë:{ ² �'A containsanenablednoninterfacetransitionthatcon-flicts with no other transitions(i.e., conflict� ² � � 0 ² 7 ,whereconflict� ² �`�¨0 ² X Q L:² � L:² XúY� � 7 ), then it canbe fired alone. Otherwise,all conflicting transitionsex-cept for conflict�'Í�[g� are returnedand usedfor backtrack-ing. Finally, if thereexist no transitionsthatareconcurrentwith Í [ , Í [ andits conflicting enablednoninterfacetransi-tionsarereturned.

If find firing trans returns more than one transi-

Page 6: Synthesis of Speed Independent Circuits Based on Decomposition

(a)

� í

��� ���

� í ���

(b)

� í

��� ���

� í ���

Figure 5. Examples for the necessar y setconstruction.

tion, then those transitionsare fired one by one in an-alyze CSCV trace. This is becauseit is impossible tofind the exact transitionthat shouldbe fired in the mark-ing, and so, the algorithm relies on backtracking.If aninappropriate transition is fired, then it becomesim-possibleto fire the next transition of Í in some mark-ing, and an empty set is returnedby find firing trans.In this case, backtracking occurs by returning “back-track”.

Although it is guaranteedby this backtrackingmecha-nism that the concretetracethat correspondsto Í is cer-tainly obtained,� with many conflicts sometimescausesa lot of backtrackings.Our tool supportstwo optionsthatwe considerto bepracticalsolutionsfor this problem.Oneis to keepevery conflicting transitionin o�¾%<%��eI¼z vJ� , andthe other is to keepsomeof thoseconflicting transitions,which areselectedby theusersthroughspecificcommentsin the STG file. Whentranslatingspecificationswritten ina high-level language,our compilerautomaticallysuggeststheselectionof conflictingtransitionsthatshouldbekeptino�¾%<6� �e�¼�Bv^� by usingthelatteroption.

The fired transitionsare appendedto ÷ in eachrecur-sivecall of analyzeCSCV trace. When Í becomesempty,÷ holdsthe correspondingconcretetrace,which is passedto find inputs.

4.2. Determining the input set

EachCSCviolation trace Í of o�¾:<6��eI¼z vJ� constructedby obtain CSC violation trace set is assumedto satisfythat Í����'Í � BÍ�[�� , �;X�

� �.��;X[ ( �;X� is the initial markingofo�¾%<6� �e�¼�Bv^� ), �;X[ �"!. �;Xh , �;X[ and �;Xh are a CSC violationpair, andtherearenomarkingsbetween� X [ and � Xh thathavethesamenondecoratedstateasthatof �;X[ . Thecorrespond-ing concretetrace ÷ generatedby the guidedsimulationisalsoof the form ��÷ � �÷-["� suchthat ÷ � and ÷-[ endby inter-facetransitions.Let �T[ and�;h denotethemarkingsof � ob-tainedby ÷ � and ÷-[ , respectively (seeFigure6). Note thatthis assumptionsimplifiestheinput setdecisionprocedure,

ä å

ä í

ä �

ä �

ì6å

ì íì �

ì �

ì

Figure 6. Labeling of a concrete trace .

but in order to guaranteethat every CSC violation pair isreachedfrom theinitial markingby a simplepath,theSTGneedsto satisfy the propertythat for its every two reach-ablemarkings,eitheroneis reachablefrom theother.

For two interfacetransitions� and � in ÷ , if � fires be-fore � , it is denotedby ��JI�g��FKs$#% &('")*& . For two (interfaceornoninterface)transitionsA [ and A�h in ÷ , if A [ causesA�h , thatis, A h firesby consumingthetokenproducedby thefiring ofA [ , it is denotedby �A [ A h ��F�s #+ Ë�, � ) . If A [ and A h arerelatedby thetransitiveclosureof theunionof s$#%-&.'")*& and s #+ Ë", � ) ,i.e., �A [:BA�h%��F(�s$#%-&.'")*& $es #+ Ë", � ) �(/ , thenwesaythat A [ is anancestorof A�h in ÷ , anddenotedby 0 A [�¸ # A�h"1 . Sincethespecificabstractedtrace Í is focusedon, this ancestorrela-tion representsan actualcausalityrelationwith respecttoÍ . Thereasonwhy theorderingrelationof interfacesignalsis alsoincludedis that if the firing orderof concurrentin-terfacetransitionsin Í changes,thenit is consideredto bea differentCSCviolation tracewith a differentCSCviola-tion statepair, andsucha different traceis handledsepa-rately in the forall loop of obtain synthesizableabs. Thisancestorrelationplaysakey role in theproposedalgorithm.Thus,in thefirst stepof find input , thisancestorrelationissetup, which is actuallydoneby constructinga datastruc-turesimilar to anoccurrencenet[20].

In orderto resolvetheCSCviolationbetween�T[ and �;h ,it is necessaryto find a noninterfacesignal E suchthat ÷-[containsodd numberof E -transitions.If E -transitioncer-tainly fires in ÷ [ in odd times, then the CSC violation isresolved. However, the existenceof concurrenttransitionsmakes this decisiondifficult. Thus,we needto definethefollowing notions(seeFigure7).2 3 o�ùô� û � denotesthelast transitionin

û, and q543o-= 3 �E* û �

is a sequence�A � A [gA�h\i6iji�A�ª � [6� of all E -transitions(i.e.,�B�A76 �z��E#2 or E*5 for eachê ) firing inû

in this order. For atrace

û �8� û [: û h9� suchthatat leastû h endswith an inter-

facetransition,anda noninterfacesignal E , E is confinedby

û h inû, if

Page 7: Synthesis of Speed Independent Circuits Based on Decomposition

: í

: �

;=<?>5@ ÿ : í �

A9B >DCE< ÿpè � : í �

:

;=<?>5@ ÿ : � �

A9B >DCE< ÿpè � : � �

FHGFHG

FHG

Figure 7. Confinement relation.

L when E -transitionfiresinû [ , for thelasttransitionA ) !

of q543o-= 3 �E* û [ � , 0 A ) ! ¸JI 2 3 o�ùS� û [ �K1 , andL when E -transition fires inû h , for the first transi-

tion A �*L and the last transition A ) L of q543o-= 3 �E* û h:� ,0 2 3 o�ùS� û [g�^¸ I A �*L 1 and 0 A ) L ¸ I 2 3 o�ùS� û h%�*1 .If E is confinedby

û h inû, it is guaranteedthat all E -

transitionsthatcanfire between2 3 o�ùS� û [6� and

2 3 o�ùô� û hj� arejust thosein qM4:o-= 3 �'E* û h%� . From the regularity andthe as-sumptionthat

û h endswith aninterfacetransition,any non-interfacetransitionthatis concurrentwith

2 3 o�ùS� û h � firesbe-fore

2 3 o�ùS� û h � . Thus,2 3 o�ùS� û h � is anancestorof thenext E -

transitionthatfiresafterû h . Fromthis reason,it’s not nec-

essaryto considerthefirst E -transitionafterû h . In thecase

of ÷��N��÷ � I÷-["� , ÷ � alsoendswith aninterfacesignal.Thus,thecondition 0 2 3 o�ùS� û [g�^¸ I A �*L 1 is notnecessaryeither. Theothercasesshown later, however, needthis condition.

If E is confinedbyû h in

û, and q543o-= 3 �E* û h%� containsan

oddnumberof transitions,then E is odd-confinedbyû h inû

. even-confinedis definedsimilarly.Considerthe first interfacetransitionin ÷ [ , and divide÷ [ into ÷ h and ÷ ü with it, i.e., ÷ [ �O��÷ h I÷ ü � and ÷ h ends

with this first interfacetransition.Figure6 shows the rela-tion among÷ � iji6i�÷uü . Thefollowing lemmaholds.

Lemma 4 TheCSCviolation with respectto ÷ is resolvedby addinganoninterfacesignalE to ¼ , if E is odd-confinedby ÷-[ in ÷ , and ÷uh containsno E -transitions.(Proof) Since E is odd-confinedby ÷-[ in ÷ , it is guaran-teedby theancestorrelationthatthestatevectorof �T[ , pro-jectedto ¼»$f0%E�7 , is different from that of �;h . Further-more,from boththeassumptionwith respectto Í thatthereareno markingsbetween�;X[ and �;Xh thathave thesamebi-narystatevectoras �;X[ , andtheassumptionthat ÷ h containsno E -transitions,no new CSCviolation pair is introduced.

(Q.E.D.)

In casesthat ÷3h contains E -transitions,every markingobtainedby a E -transitionin ÷3h hasthe samestatevector,

ä í

ä �

ä �

ì �

ì �ì �

èQPæRP

èQPèS�æ��

10R

ÿpÞuþ Ã=�

101

100

010R

ÿpè�Þuþ Ã=�

1101

1100

110R

0010R

ÿpæ�è�Þuþ Ã=�

11101

01100

1110R

ì í

Figure 8. Resolving CSC violation by addingessential signals.

if it is projectedto ¼ , because� ü is obtainedby thefirst in-terfacetransitionin ÷ [ . Thus,thosemarkingsin odd posi-tionscauseaCSCviolationwith �;h , becausetheirvaluesofE arethesameasthatof �;h . Oneexampleis shown in Fig-ure8. Thefirst E12 in ÷uh leadsto a markingwith statevec-tor �����j��s1� , andthisstatehasaCSCviolationwith �;h . ThisCSCviolationcannotberesolvedonly usingthesignal E .

Such a CSC violation, however, can be resolved byaddinganothernoninterfacesignal ² in additionto E suchthat ² is odd-confinedby ÷UT in ÷ andthefirst ² -transitionin÷UT fires in ÷uü , where ÷5T is thesuffix of ÷-[ after thefirst E -transition.Thefinal columnof Figure8 showshow theCSCviolation is resolvedin thiscase.Thisadditionalsignal ² iscalledessentialfor E in ÷ . Hence,in thiscase,asignalE to-getherwith its essentialsignal² resolvestheCSCviolation.The precisedefinition of essentialsignalsis shown asfol-lows.

Supposethat E is odd-confined by ÷ [ in ÷ , andq543o-= 3 �E*�÷-[j���¦A [gA�h\i6iji�A�ª , whereA ³ is thelast E -transitionthat fires in ÷3h . For every odd integer êWVD´ , a noninter-facesignal ² 6��>Y��E#� is essentialfor E in ÷ with respecttoê , if (1) ² 6 is odd-confinedby

û 6 in ÷ , whereû 6 is thesuf-

fix of ÷-[ after A76 , and (2) for the first ² 6 -transition A7,UX inû 6 ,L if either A76ZY,[ doesnot exist or 0 2 3 o�ùS� ÷uhj�C¸ # A76[YT["1 ,

then 0 2 3 o�ùS� ÷3h%�^¸ # A7,DX*1 ,L else, 0 A76ZYT[\¸ # A7,UXK1holds.A setof essentialsignalsthatincludessomeessentialsignal ² 6 for every odd integer ê\VR´ is calleda sufficientessentialsignalsetfor E .

Figure9 shows a sufficient essentialsignalset 0 ² [% ² ü37for E . Each essentialsignal distinguishesthe (common)statevectorof all themarkingsbetweentheoneobtainedby

Page 8: Synthesis of Speed Independent Circuits Based on Decomposition

ä;í

ä �

ä �

ì �

ì �

èWP"ÿ ö â í7�

è]� ÿ ö â7^ �

æ � P¿ÿ ö â*_?` �èWP"ÿ ö â � �

ì íè]� ÿ ö â � �èWP"ÿ ö â � �æ í P¿ÿ ö â*_ ! �

: �

: í

Figure 9. An example where more than oneessential signal is necessar y.

A76 andtheonewhereA76ZYT[ firesfrom thatof �;h . For this rea-son,the first ² -transition A7,DX mustfire certainlyafter A76ZY,[ .Furthermore,

2 3 o�ùô��÷ h � playstherole of A 6[YT[ in thelastsec-tion of ÷ h . From thesediscussions,the following theorem,which is a generalizedversionof Lemma4, holds.

Theorem2 TheCSCviolationwith respectto ÷ is resolvedby addinganoninterfacesignal E thatis odd-confinedby ÷-[in ÷ andits sufficient essentialsignalsetto ¼ .

Note that this is a sufficient condition. Thus, even ifthe signalssatisfying the above condition are not found,CSC violations may be resolved. For example, considerthe STG � shown in Figure10 andits output � . The trig-ger signalsfor � is � and � . o�¾:<j��e60%�J��3 �:7� �6� hasa CSCviolation trace ÍV�µ�j24��/24��624I�:5 with Í � �µ�j24��82and Í�[ú�D�624��35 . The guidedsimulationgenerates÷ � ��j24BvT�:24Bv�a�24��82 and ÷-[w� �624 v]�-5�I�:5 . v]� -transitionsappearodd-timesin ÷-[ . v]� is, however, not confinedby ÷-[in ÷ , becausevT�:2 and �/2 areconcurrent,andso, 0 v]�32ÀY¸ #�82b1 . Thereexists no othernoninterfacesignalthat is con-finedby ÷ [ . Hence,no signalsdonot satisfytheabovecon-dition. However, � itself hasCSC.Thus,althoughvT� andv�a do not satisfyour sufficient condition, v]� togetherwithv�a canresolve theCSCviolation.

On the otherhand,it is possibleto decidethat a givenCSCviolationcanneverberesolvedby addingany signals.Onesufficientconditionis asfollows.

Theorem3 If thereexistsa(noninterfaceor interface)tran-sition A in ÷3h suchthat 0 A;¸ #

2 3 o�ùS� ÷uhj�K1 , andfor any nonin-terfacesignal ² , ² is even-confinedby thesuffix of ÷-[ fromA in ÷ , then � hasno CSC(seeFigure11).(Proof) Thestatevectorof themarkingwhere A fires is the

INPUTS: a,bOUTPUTS: c,x1,x2

c+

a+ x1+

c-

b+ x2+

x1-

a-

b-

x2-

Figure 10. An STG where our sufficient con-dition does not work.

ä;í

ä �

ä �

ì �

ì �suffix of ì:ífrom â

âì:í

Figure 11. Unresolv able CSC violation.

sameas that of �;h . Furthermore,from 0 A�¸ #2 3 o�ù ��÷3hj�*1

and the regularity, they certainly causea CSC violation.Since for any noninterface signal ² , ² is even-confined,any noninterfacesignalcannotdistinguishthosestatevec-tors.Hence,this CSCviolation cannotberesolvedevenbyaddingall noninterfacesignals.This meansthat � hasnoCSC. (Q.E.D.)

If the above condition is satisfied,find input returns“impossible”. Otherwise,it tries to find noninterfacesig-nalsthatcanresolvethegivenCSCviolation.If it succeeds,thereareusuallymany choices.For example,supposethatsignals� , � , � , and � canbe E , and � hasa sufficient essen-tial signalset 0%ë�7 , and � hasa sufficient essentialsignalset0:÷T7 or 0jÍ@7 . Thewholeconditioncanbeexpressedby

�dcK�ec`� �gfKë:� c � �hf � ÷ic Í8� �>y

Page 9: Synthesis of Speed Independent Circuits Based on Decomposition

Table 1. Experimental results (1).Circuit (#I,#O) Petrify Proposedmethod

CPU(s) Mem(MB) area CPU(s)(Petrify+other) Max(MB) area ave.#Lcb (10,10) 9.6 4.6 82 3.5 = (3.3+0.2) 3.4 82 1.2cachem (11,16) 219.5 7.7 122 36.8 = (36.1+0.7) 4.0 123 1.5lf6 (21,41) j ( ¬ 59272.4) ( ¬ 742) – 98.9 = (94.9+4.1) 4.7 200 1.2( j : BDD manageroverflow: ¬ 30000000nodes)

Table 2. Experimental results (2).Circuit (#I,#O) Petrify Proposedmethod

CPU(s) Mem(MB) area CPU(s)(Petrify+other) Max(MB) areaFIR5 2mul csc (7,19) 78.8 7.8 151 32.4 = (31.2+1.2) 4.3 150IIR2 2mul d csc (7,19) 240.2 12.3 150 179.6 = (176.9+2.7) 5.8 151LMS4 pr12 csc (9,18) 354.6 18.6 177 28.3 = (26.6+1.7) 4.5 177

To set up the covering problem,this is transformedintoproduct-of-sumform suchas

���c��kc¤�kc¤���g� ��c��lc��kc�÷mc�Í/�g� ��c��lc�ë=c¤���6���c��kc¤ë=c�÷Sc�Í8�gyEachclauseis arequirement(seeSection3),andfind inputreturnsasetof thoserequirements.

As long as the condition of Theorem3 doesnot hold,even if no noninterfacesignalssatisfythesufficient condi-tion for resolvingtheCSCviolation,it isworthaddingsomesignalsasshown in the exampleof Figure11. find inputusessomeheuristicsto choosethosesignals,for instance,choosinga noninterfacesignalof which transitionsjust ap-pearodd-timesin ÷-[ .5. Experimental results

TheproposedmethodhasbeenimplementedusingtheClanguage.This sectionevaluatesthepotentialperformanceof the proposedmethodandthe quality of the synthesizedcircuits. The experimentshere have beendone on a 2.8GHz Pentium4 workstationwith 4 gigabytesof memory.As for the selectionof conflicting transitionsin order toavoid backtracking,thecompilersuggestionis usedfor thefirst setof examples,andtheoption to keepevery conflict-ing transitionis usedfor thesecondandthird sets.Notethatthe performanceof Petrify shown in theseexperimentsisjust usedfor suggestingthe complexity of eachexample,not for comparingthe superiorityof bothmethods.On theotherhand,theestimatedareasizesshown by Petrify areagoodcriterionto evaluatetheoverheadof ourmethod.

The first set of experimentalresults is shown in Ta-ble 1. In this experiment,theSTGsgeneratedfrom a high-level specificationlanguageby the methodshown in [12]are used.Thoseare for the sub-circuitsof the instructioncachesystemfor TITAC2 asynchronousmicroprocessor[21]. TheseSTGsoriginally containmany dummytransi-tions generatedby the compiler from the high-level speci-ficationlanguage,andthosedummytransitionsdegradethe

Table 3. Experimental results (3).Circuit (#I,#O) Petrify Proposedmethod

CPU(s) area CPU(s) areaalloc-outbound (4,5) 0.05 21 0.12 21atod (3,4) 0.06 23 0.18 23chu150 (3,3) 0.06 21 0.12 21chu172 (3,3) 0.03 7 0.06 7converta (2,3) 0.06 28 0.19 28dff (2,2) 0.05 14 0.14 14master-read (6,8) 2.23 53 1.72 53mp-forward-pkt (12,3) 0.12 29 0.08 29nak-pa (4,6) 0.14 31 0.23 31nowick (3,3) 0.08 25 0.10 25pe-rcv-ifc (4,7) 1.68 64 1.61 65pe-send-ifc (5,5) 1.38 64 1.51 64ram-read-sbuf (5,6) 0.23 32 0.28 32rcv-setup (3,2) 0.03 14 0.12 14sbuf-ram-write (5,7) 0.27 30 0.40 30sbuf-read-ctl (3,5) 0.08 19 0.13 19sbuf-send-pkt2 (4,5) 0.15 25 0.33 25sendr-done (2,2) 0.02 7 0.03 7trimos-send (3,6) 0.38 48 1.38 48vbe10b (4,7) 0.30 47 0.62 47vbe4a (3,3) 0.06 9 0.05 9vbe6a (4,4) 0.22 42 0.66 42vmebus-arb (3,2) 0.23 14 0.46 14wrdata (1,4) 0.03 22 0.19 22wrdatab (4,6) 0.43 53 0.86 53

performanceof synthesisprocessby Petrify or contractionprocessin ourmethod.Thus,thereducedSTGsobtainedbyremoving thosedummytransitionsfrom theoriginal STGsareused.The secondcolumnof the tableshows the num-ber of input signalsandoutputsignalsof the circuits.Thethird main columnshows the CPU times,the memoryus-age,and the estimatedareaof the synthesizedcircuits byPetrify with gC implementationoption.Petrify cannotsyn-thesize“lf6” due to BDD manageroverflow. The fourthmaincolumnshowstheresultsof ourmethod.For fair com-parison,Petrify is usedfor the logic synthesisof eachab-

Page 10: Synthesis of Speed Independent Circuits Based on Decomposition

stractedSTG,e.g.,Petrify is run 41 timesto synthesizethesub-circuitfor eachoutputin thecaseof “lf6”, althoughanylogic synthesistool canbeused.Our CPUtimesshow boththe run times of this final logic synthesisprocessby Pet-rify andtheremainingrun timesfor contraction,statespacegenerationof abstractedSTGs,andanalysisof CSCviola-tion traces.The final sub-columnshows the averageloopcountsin obtain synthesizableabs.

Theseresultsshow thatour methodefficiently handlealarge specificationthat the traditionalmethodcannothan-dle. As for theareaoverhead,it seemssmall,but sincethelargeexamplecannotbesynthesizedby Petrify, morecom-parisonsarenecessary.

For thispurpose,someexamplesfrom [22] andthestan-dard benchmarkexamplesare used.Table 2 and Table 3show theseresults.Fromtheseresults,theareaoverheadisjust0.2%.Thus,eventhoughourmethodusesrestrictedin-formationfor synthesizingsub-circuits,the quality at leastwith respectto theareasizeis not badlyaffected.

6. Conclusion

Thispaperpresentsadecompositionmethodfor efficientsynthesisof very large speed-independentcircuits. Whilethis methoddoeshave someoverheadfor small circuits, itallows for the synthesisof large circuits that could not besynthesizedusing flat synthesismethods.The experimen-tal resultsshow that the areaoverheadappearsto be verysmall.

Althoughthetheoryandalgorithmspresentedin thispa-per are for untimedcircuits, the proposedideacanbe ex-tendedto timed circuit synthesisby minor modificationasfollows.Let � a 6 ® )7' beatimedSTG,wheretransitionshavetheearliestandlatestfiring times,and � ,%ª a 6 ® )7' beits un-timedversion(i.e.,aSTGobtainedfrom � a 6 ® )7' by remov-ing all earliestandlatestfiring times).If �$,%ª a 6 ® )7' satisfiestherequirementsfor our method,it is safeto applyour de-cisionprocedureto �$,:ª a 6 ® )7' in orderto obtaintheneces-sarysignalsof � a 6 ® )7' . Sincethe decisionproceduredoesnotusetiming information,theinputsignalsetsmaynotbeoptimal, i.e., someredundantsignalsmaybe included,butnecessarysignalsarenevermissed.Thus,if theresultantin-put setsare usedby the timed contractionof � a 6 ® )7' fol-lowedby thetimedlogic synthesisprocedure,anoptimizedcircuit usingthetiming informationcanbesynthesizedfor� a 6 ® )7' . The timed circuit synthesisusing this idea fromhigh-level specificationlanguagestogetherwith its com-piler is future work as well as the comparisonwith otherapproacheslikesyntaxdirectedtranslation.

Acknowledgment

We’d like to thankH. Saitofor giving ushis benchmarkcircuits. We alsothankthe refereesfor their helpful com-ments.

References

[1] J. Cortadella,M. Kishinevsky, A. Kondratyev, L. Lavagno,and A. Yakovlev. Petrify: a tool for manipulatingcon-current specificationsand synthesisof asynchronouscon-trollers. IEICE Transactionson Information and Systems,E80-D(3):315–325,March1997.

[2] P. A. Beerel, C. J. Myers, and T. H.-Y. Meng. Cover-ing conditionsand algorithmsfor the synthesisof speed-independentcircuits.IEEETransactionsonComputer-AidedDesign, March1998.

[3] R.M. Fuhrer, S.M. Nowick, M. Theobald,N. K. Jha,B. Lin,andL. Plana.Minimalist: An environmentfor thesynthesis,verificationandtestabilityof burst-modeasynchronousma-chines.TechnicalReportTR CUCS-020-99,ColumbiaUni-versity, NY, July1999.

[4] StevenM. BurnsandAlain J.Martin. Syntax-directedtrans-lation of concurrentprogramsinto self-timedcircuits. InJ. Allen and F. Leighton, editors, AdvancedResearch inVLSI, pages35–50.MIT Press,1988.

[5] Kees van Berkel, Joep Kessels,Marly Roncken, RonaldSaeijs,andFrits Schalij. TheVLSI-programminglanguageTangramandits translationinto handshakecircuits. In Proc.EuropeanConferenceonDesignAutomation(EDAC), pages384–389,1991.

[6] EuiseokKim, Jeong-GunLee,andDong-IkLee. Automaticprocess-orientedcontrolcircuit generationfor asynchronoushigh-level synthesis. In Proc. InternationalSymposiumonAdvancedResearch in AsynchronousCircuits and Systems,pages104–113.IEEE ComputerSocietyPress,April 2000.

[7] JoepKesselsand Ad Peeters. The Tangramframework:Asynchronouscircuits for low power. In Proc. of Asia andSouthPacific Design AutomationConference, pages255–260,February2001.

[8] Doug Edwards and Andrew Bardsley. Balsa: An asyn-chronoushardwaresynthesislanguage.TheComputerJour-nal, 45(1):12–18,2002.

[9] A. Bystrov andA. Yakovlev. Asynchronouscircuit synthe-sisby directmapping:Interfacingto environment. In Proc.International Symposiumon AdvancedResearch in Asyn-chronousCircuitsandSystems, pages127–136,April 2002.

[10] Tiberiu Chelceaand Steven M. Nowick. Resynthesisandpeepholetransformationsfor theoptimizationof large-scaleasynchronoussystems.In Proc.ACM/IEEEDesignAutoma-tion Conference, June2002.

[11] Chris J. Myers andTeresaH.-Y. Meng. Synthesisof timedasynchronouscircuits. IEEE Transactionson VLSISystems,1(2):106–119,June1993.

[12] TomohiroYonedaandChrisMyers. Synthesizingtimedcir-cuits from high level specificationlanguages.NII TechnicalReport, NII-2003-003E,2003.

[13] A. Matsumoto. High level synthesisof asynchronouscir-cuits (in Japanese).MasterThesis,Tokyo Insituteof Tech-nology, 2004.

[14] Tam-AnhChu. Synthesisof Self-TimedVLSI Circuits fromGraph-Theoretic Specifications. PhD thesis,MIT Labora-tory for ComputerScience,June1987.

[15] WalterVoglerandRalf Wollowski. Decompositionin asyn-chronouscircuit design. In J. Cortadella,A. Yakovlev, and

Page 11: Synthesis of Speed Independent Circuits Based on Decomposition

G. Rozenberg, editors,ConcurrencyandHardware Design,volume2549of Lecture Notesin ComputerScience, pages152–190.Springer-Verlag,2002.

[16] H. Zheng,E. Mercer, andC. J.Myers. Modularverificationof timedcircuitsusingautomaticabstraction.IEEETransac-tionsonComputer-AidedDesign, 22(9),September2003.

[17] Ruchir Puri andJunGu. A modularpartitioningapproachfor asynchronouscircuit synthesis.In Proc.ACM/IEEEDe-signAutomationConference, pages63–69,June1994.

[18] J. Beister, G. Eckstein,and R. Wollowski. From STG toextended-burst-modemachines.In Proc. InternationalSym-posiumonAdvancedResearch in AsynchronousCircuitsandSystems, pages145–158,April 1999.

[19] Chris Myers. AsynchronousCircuit Design. JohnWiley &Sons,2001.

[20] KennethMcMillan. Usingunfoldingsto avoid thestateex-plosionproblemin theverificationof asynchronouscircuits.In G. v. Bochmanand D. K. Probst,editors,Proc. Inter-nationalWorkshopon ComputerAidedVerification, volume663of Lecture Notesin ComputerScience, pages164–177.Springer-Verlag,1992.

[21] Akihiro Takamura,MasashiKuwako, MasashiImai, TaroFujii, Motokazu Ozawa, Izumi Fukasaku,Yoichiro Ueno,andTakashiNanya. TITAC-2: An asynchronous32-bit mi-croprocessorbasedon scalable-delay-insensitive model. InProc. InternationalConf. ComputerDesign(ICCD), pages288–294,October1997.

[22] H. Saito. Synthesisof Globally Delay InsensitiveLocallyTimed AsynchronousCircuits from Register TransferLevelDescriptions. PhDthesis,Universityof Tokyo, 2003.

Appendix

Proof of Lemma 1 From v¨YF | , the value of v dis-tinguishes the elementsin }/~�� ES�'v�21�B� from those in}8~1� QS�v^21� � and }/~�� ES�'vT5��B� . Similarly, }/~�� QS�'v]5�� � isdisjoint from }/~�� QS�'v�21�B� and }/~1� ES�'vT5��B� . }/~�� ES�v^21� �and }/~�� QS�'vT5��B� are disjoint for the following rea-son.Supposethat they have a commonelement { . Then,there must exist {3[ F ES�v^21� and {jh�F QS�v]5��such that {lFµ}/~�� {3[j� and {lFµ}/~�� {%h:� . Then, fromthe definition of | -closure, } ~ � { [ � � } ~ � { h � musthold, and so { h F } ~ ��{ [ � is implied. This, how-ever, contradicts} ~ � ES�v^21� ��5 UR � ES�v^21� , becauseES�'v�21� and QS�v]5�� are disjoint. Hence, } ~ � ES�v^21� �and } ~ � QS�v]5�� � are disjoint. The disjointness be-tween }/~�� ES�'vT5��B� and }/~�� QS�v^21� � can be shownsimilarly. (Q.E.D.)

Proof of Lemma 2 For this proof,we focuson thegC im-plementation,andfurthermore,only xe�'v^21� is considered,becausethe proof for the othercover and the atomicgateimplementationcanbedonesimilarly. Let ESX andQSX de-note the excitation and stablestatesetsof ��~ . From theconstructionof ��~ , those are obtainedby ESX �v^21����un��g� ~ �p}/~*� ES�v^21�B� � , QSX �'v�21�À� �-n��g� ~ �p}/~*� QS�'v�21�B�B� ,andsoon.Sincesomeunreachablestatesof � aremapped

into thoseexcitation or stablestatesof ��~ , the unreach-ablestatesetURX of ��~ satisfies�un��g� � [~ � URX ��� UR.

The first thing to be shown is that ��~ hasCSCandisoutput semi-modular. The former is straightforward fromLemma1.For thelatter, supposethat ��~ isnotoutputsemi-modular. This canhappenif an output transition A7n is en-abledby an irrelevant transition A ' in � , and A ' is replacedby a dummytransitionin � ~ , resultingin a new chainofdummytransitionsto A n from a dummytransitionconflict-

ing with sometransition.Let {3[ a�o.c{%h bethestatetransitioncausedby A ' . SinceA ' makes A n enabled,{ [ YF ES�'v�21� and{ h F ES�'v�21� hold (assumingthat A n is arisingtransitionofan output v ). This, however, contradictsthat A ' is a transi-tion relatedto an irrelevant signal,because{ [ FN} ~ ��{ h �but { [ YF ES�'v�21��$ UR. Therefore, � ~ is output semi-modular. Thesimilardiscussionholdsfor thecasethat A ' isfollowedby achainof dummytransitions.Since� is outputsemi-modular, thereareno othercasesthat ��~ is not out-put semi-modular. Hence,��~ canbeconcludedto beout-put semi-modular. Thesignalsrelatedto transitionslike A 'arecalled trigger signalsfor v , andthey do not belongtoany irrelevantinputsetfor v .

The above two factsguaranteethe existenceof a cor-rectcircuit with respectto � ~ . Let x1X�'v�21� denoteoneofits covers.Notethatthis coversatisfies

ESX �'v�21���fx X �'v�21�Z5 URX � ESX �'v^21�]$ QSX �v^21�gy (1)

Thenext thingtobeshown is thatthiscoveralsosatisfiesthecorrectnessconditionof � . Sincex X �v^21� shouldbeconsid-eredin thestatespaceof � , this is shown by

ES�v^21��� �un��>�B� [~ ��x X �v^21� �\5 UR � ES�v^21�;$ QS�v^21�>y(2)

Theabove ���un��>� � [~ � x1X�'v�21�B�]5 UR� canberewrittenasfol-lows.

�un��>� � [~ ��x X �'v^21� �Z5 UR� �un��g�B� [~ �B��x X �v^21�Z5 URX �;$ � x X �'v�21�;� URX �B�Z5 UR� �un��g�B� [~ � x X �'v�21�Z5 URX �]$

�un��>�B� [~ � x X �v^21�;� URX �Z5 UR� �un��g�B� [~ � x X �'v�21�Z5 URX �\5 URy (3)

This (3) is obtainedfrom �-n��g� � [~ � URX �§� UR. Applying�un��>� � [~ to (1) derives

�un��>� � [~ � ESX �'v�21�B���¦�un��g� � [~ � x X �v^21�Z5 URX �� �un��g� � [~ � ESX �'v^21�]$ QSX �v^21�B�

}/~�� ES�'v^21� �¤���-n��g� � [~ � x X �'v�21�Z5 URX �� }8~�� ES�v^21� �]$ }8~�� QS�'v�21�B�}/~�� ES�'v^21� �\5 UR �¦�un��>� � [~ ��x X �v^21�Z5 URX �Z5 UR�H�p}/~�� ES�'v�21�B�\5 UR�;$`�p}/~*� QS�'v�21�B�\5 UR�B�ES�v^21��� �un��>�B� [~ ��x X �v^21�Z5 URX �Z5 UR� ES�'v�21�;$ QS�'v^21� (4)

Thefinal equation(4) holdsbecause| is anirrelevantinputset.From(3) and(4), (2) is obtained. (Q.E.D.)