This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Research ArticleA Novel Rules Based Approach forEstimating Software Birthmark
Shah Nazir1 Sara Shahzad1 Sher Afzal Khan2 Norma Binti Alias3 and Sajid Anwar4
1Department of Computer Science University of Peshawar Peshawar 25000 Pakistan2Department of Computer Science Abdul Wali Khan University Mardan 23200 Pakistan3Ibnu Sina Institute for Fundamental Science Studies Universiti Teknologi Malaysia 81310 Johor Bahru Malaysia4Institute of Management Sciences Peshawar 25000 Pakistan
Correspondence should be addressed to Shah Nazir snshahnzrgmailcom
Received 29 December 2014 Accepted 19 March 2015
Academic Editor Patricia Melin
Copyright copy 2015 Shah Nazir et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited
Software birthmark is a unique quality of software to detect software theft Comparing birthmarks of software can tell us whethera program or software is a copy of another Software theft and piracy are rapidly increasing problems of copying stealing andmisusing the software without proper permission as mentioned in the desired license agreement The estimation of birthmarkcan play a key role in understanding the effectiveness of a birthmark In this paper a new technique is presented to evaluate andestimate software birthmark based on the twomost sought-after properties of birthmarks that is credibility and resilience For thispurpose the concept of soft computing such as probabilistic and fuzzy computing has been taken into account and fuzzy logic isused to estimate properties of birthmarkThe proposed fuzzy rule based technique is validated through a case study and the resultsshow that the technique is successful in assessing the specified properties of the birthmark its resilience and credibility This inturn shows how much effort will be required to detect the originality of the software based on its birthmark
1 Introduction
A software birthmark is an intrinsic property of software thatis used to detect the theft of software systems A softwaresystem can be stolen or pirated which ultimately results infinancial loss to the owner organization Software piracyis a global problem of unauthorized copying installingusing distribution or sale of software other than what isofficially documented as exclusive rights by the authors asdescribed in relevant license agreement With the growthof software development industry manufacturing and useof Internet and software piracy have become a red alertsign for numerous software industries Software companiesencounter tremendous losses due to software piracy Onthe other hand software pirates earn huge sums of moneyfrom the piracy which they are doing General internationalcommunity is not yet aware of the serious crime that is beingdone Software piracy happens in diversewayswhich includeshard-disk loading soft lifting counterfeit goods rental soft-ware and bulletin board piracy [1] Original licensed software
has many advantages These advantages include assuranceof the disk carrying no virus regular software upgradesconsistent technical support complete documentation andquality assurance
Different advanced techniques are used for the detectionand prevention of software theft such as software water-marking and software fingerprints [2ndash9] Software water-marks emphasize ownership of program while fingerprintis used in tracking the intellectual property Besides thesetechniques software birthmark is a property based systemwhich identifies the inherent property of a program tocheck and show the originality of software Most of thestudy on software birthmark focuses on how to describe theappropriate properties to detect software theft
The contribution of this paper is to estimate softwarebirthmark to show its effectiveness The estimation is basedon the well-defined properties of birthmarks The methodprovides an intelligent solution to estimate two commonlyused properties that are credibility and resilience whichin turn will provide estimates of the birthmark For this
Hindawi Publishing Corporatione Scientific World JournalVolume 2015 Article ID 579390 8 pageshttpdxdoiorg1011552015579390
2 The Scientific World Journal
purpose a fuzzy model has been designed which is basedon membership function and fuzzy rules to provide anappropriate estimation for software birthmarks
The structure of the rest of the paper is as followsIn Section 2 research on software birthmark has been dis-cussed Section 3 gives a detailed description of materials andmethod In this section the details of software birthmarkclassifications of birthmarks and a comparison with water-marks are given The proposed methodology for estimatingsoftware birthmarks is presented in Section 4 Results anddiscussion of the method are presented in Section 5 Thepaper is concluded in Section 6
2 Related Work
Till now researchers have considered two important prop-erties of a software birthmark to evaluate their effectivenessthat are credibility and resilience Zeng et al [10] report thatnot many theoretical frameworks are available that properlyanalyze and verify the success of software birthmark Theevaluation of software birthmark is mainly done throughexperiment They presented a semantic based abstract inter-pretation framework This model describes two importantproperties of software birthmark namely credibility andresilienceWith the help of static 119899-gram birthmark and staticAPI birthmark the effectiveness of the framework is verifiedMyles and Collberg [11] presented a technique called ldquoWholeProgramPath Birthmarkingrdquo for detecting software theftTheprogram is based on complete control flow of the softwareprogram The two important properties that are credibilityand tolerance are considered to evaluate the efficiency ofthe technique The technique demonstrates that the wholeprogram path birthmark is more resilient than the existingtechniques Furthermore the technique also showed thateven if an embedded watermark is destroyed by programtransformation the birthmark can still identify the theft Parket al [8] used static API trace birthmark for the detectionof Java theft The method evaluates the birthmark in termsof credibility and resilience The experimental result of themethod shows that static API birthmark can detect modulesof two packages whereas the other birthmark fails
Kakimoto et al [12] analyzed the birthmark similaritiesin ArgoUML and visualized them using multidimensionalscaling Chan et al [13] proposed a dynamic software birth-mark system for systems designed in Java based on theobject reference graph The method was evaluated for hugeprograms and most of them were megabytes in size Theresults showed that the method was useful in detecting thecode theft Wang et al [14] used CHI (1205942 statistics) for thecharacteristics selection in text classification and broughtin an instruction words software birthmark selection Thealgorithmmakes sample program for protected program andtakes out instructionword from sample programaccording toinstruction word library To find out their correlation the 1205942statistics is calculated for each instructionword and programThe experimental results of the algorithm show that selectionalgorithm has much enhanced the robustness and credibilityof the birthmark Choi et al [6] proposed a staticAPI software
birthmark for Windows binary executable They compared49 Windows executables and showed that their birthmarkcan differentiate and detect the copies The birthmark iscomparedwith theWindows dynamic birthmark and showedthat it ismore suitable forGUI application Lim [15] presenteda customized method of 119896-gram birthmark which permitsthe small changes of programs by applying partial matchingof 119896-gram The experimental result shows that customizingthe 119896-gram birthmark improves the properties of birthmarkthat are credibility and resilience The idea of rule basedestimation has been used by Tyagi and Sharma [16] Theymeasured the reliability of component based system Fuzzyrules were designed to measure the reliability based on fourfactors that are application complexity reusability compo-nent dependency and operational profile
3 Materials and Method
The following are the main concepts used to define theproposed birthmark estimation technique
31 Software Piracy The software industry has faced hugefinancial losses due to the piracy of software Software piracyis performed by end-users as well as the dealers Softwarepiracy causes serious problems which hinder the successof the international software industry Piracy of softwareis a global problem of illegal copying installation usedistribution or sale of software in anymanner other than thatexpressed in the appropriate license agreement The piratesgain easy benefits from the sale of pirated software which ulti-mately affects the business of the software industry Figure 1shows how software is pirated from its original businessmarket
The original licensed software offers a number of highvalued benefits to the customers including assurance of soft-ware quality availability of upgrades technical and manualdocumentations and less bandwidth consumption On theother hand pirated software does not provide such kind offacilities If an organization is using pirated software theremight be risk of failure of the system which might put theorganization at risk of huge financial loss
32 Software Birthmark Software birthmark is a uniqueproperty of every type of software which can help in detectingsoftware theft It is the intrinsic characteristics of a programor software that can be used to spot the theft Comparingthe birthmarks of software tells us whether a program orsoftware is a copy of any other software or not The followingdefinitions of birthmark are given by Tamada et al [17]
Definition 1 (birthmark) Suppose 119901 and 119902 are the programsLet 119891 be the function for extracting a set of properties from aprogram 119891(119901) is a birthmark of 119901 if and only if
(1) 119891(119901) is obtained only from 119901 itself(2) 119902 is a copy of 119901 rArr 119891(119901) = 119891(119902)
Definition 2 (dynamic birthmark) Suppose 119901 and 119902 are theprograms Input 119894 is given to these programs Let 119891 be
The Scientific World Journal 3
Functionalsoftware
Original software Attempt at piracy Distribution ofpirated software
Theft
Figure 1 Software piracy
extracting characteristics from a program So 119891(119901 119894) is adynamic birthmark of 119901 if and only if
(1) 119891(119901 119894) is obtained only from 119901 itself by extracting 119901with the given input 119894
(2) 119902 is a copy of 119901 rArr 119891(119901 119894) = 119891(119902 119894)
All program paths cannot be covered by the dynamicbirthmarks dynamic birthmark only detects the theft of theprogram On the other hand static birthmark is extracted bythe static program analysis that is liable to the properties ofoverestimated program
33 Classification of Software Birthmark Software birthmarkis classified into the following three categories [10]
(i) Instruction Based Software Birthmarks Software programconsists of data and instructions Instruction sequences canreflect program behavior to various points so it is realistic todefine birthmarks as instructions sequences
(ii) API Based Software Birthmark Several birthmarks areavailable that are based on observations of the way a programuses the standard API libraries Not only is it unique to aprogram but this feature is also complex for an attacker toforge [18]
(iii) Graph Based Software Birthmark The software programis like graph structure For instance functions are representedas control flow graph dependency among statement(s) inthe function(s) is represented as dependency graph possiblecalls between functions are represented as call graph andinheritance interaction between classes is represented asacyclic graph As a result it makes sense to represent abirthmark using graph representations of programs [18]
34 Birthmark and Watermark Software birthmark is apromising technique used for the detection of software theftBirthmark does not embed additional code or informationin any form in the original program Software birthmarksonly extract the inherent characteristics from the originalprogram to detect the originality of program [11] Softwarebirthmark only establishes an identity to detect if a programis a copy of any other program It does not show whothe original owner of the program is or who is guilty ofsoftware piracy [11] While software watermarking assertsthe ownership of the programs by adding extra information
to the original program before it is publically availablesoftware watermarks identify software from the embeddedinformationcode Both the techniques can be combined toprovide a stronger verification mechanism to detect theftBirthmark can be used where there is a limitation of storagespace as watermarking uses extra storage space Also inmany situations watermarks fail for example if an attackeris able to apply obfuscation that destroys watermarks In suchsituations software birthmarks provide evidence of piracy orsoftware theft [11]
4 Proposed Methodology
The following sections define the proposed methodology toestimate software birthmarks
41 Software Birthmarks Properties In order to estimate thesuccess of software birthmarks researchers typically considertwo properties which are credibility and resilience [19]Credibility requires that the birthmark of the two programsmust be different whereas the resilience states that thebirthmark should be preserved and not destroyed in anycircumstances
According to Tamada et al [17] software birthmark sat-isfies the following two important properties which indicatesthat the two independently implemented programs should bedifferent
Property 1 Let 119875 and 119876 be two independently writtenprograms which achieve the same task then 119891 is credible if119891(119875) = 119891(119876)
Property 2 Let 1198751015840 be the program obtained from 119875 byapplying semantic preserving transformation 119879 119891 is resilientto 119879 if 119891(119875) = 119891(1198751015840)
Property 1 indicates that the birthmarks falsely show that119876 is a copy of 119875 This situation will occur with the separatelyimplemented programs that achieve the same task
Property 2 relates to identifying a copy in the occurrenceof transformation It is wished that a birthmark could be usedto detect a copy if some transformation has been applied tothe program
Figure 2 shows the properties of software birthmarkIn the existing literature on software birthmarks there
is no model which exactly estimates the birthmark of soft-ware based on the properties of credibility and resilience
4 The Scientific World Journal
Credibility
Software birthmark
Resilience
+=
Figure 2 Properties based software birthmarks
The proposed methodology helps to estimate the birthmarksof software based on these properties
42 Fuzzy Logic Fuzzy logic concept was developed byZadeh in 1965 [20] It is a mathematical tool which deals withmanaging uncertain and doubtful information Fuzzy settheory is being used for solving diverse problems in differentfields of daily life Fuzzy tool helps in providing solution forthe problems which are complicated to model Fuzzy set isthe extended form of traditional sets which is described bymembership function and is extremely beneficial for decisionmaking in uncertain and vague situations Here the decisioncan be made in qualitative variables (low strong very strongetc) instead of quantitative variables (ie numbers) andthese qualitative variables allow precise modelingThe inputsand outputs have the degree ofmembership function in rangeof interval [0 1]
In the proposed method the membership functionsnamed mf
1in the range of (0ndash19) mf
2in the range of (20ndash
39) mf3in the range of (40ndash59) mf
4in the range of (60ndash
79) and mf5in the range of (80ndash100) are defined Also to
plot fuzziness triangular membership functions are definedand used to represent weights The triangular membershipfunction has three parameters (119897 119898 119906) which are defined as119897 le 119898 le 119906
Details of fuzzy logic concept are given in Zadeh [20]however the major parts of the fuzzy system are as followsThe first phase is the fuzzification which transforms theclassification table into continuous classifications Then it isprocessed in the fuzzy domain based on the designed rulesLastly the fuzzification process transforms fuzzy numberback into the real number
43 Rules Based Approach to Estimate Software BirthmarkEstimating software birthmark is an essential part of softwaresystem development to get rid of the entire theft of the soft-ware system Most of software theft threats are faced duringthe implementation of the software Developers are still inconfusion about how to handle such situations If birthmarksof the system are estimated then one can easilymake decision
about the alternate designThe proposedmethodology basedon fuzzy concept provides an estimation model to softwarebirthmark Initially inputs (properties of birthmark) areselected on the basis of which the birthmark(s) is to beestimated On the basis of inputs the membership functionsare plotted The membership function identifies the degreeof relationship of the concept (data) to a particular area(data range) Five membership functions were plotted thatare mf
1 mf2 mf3 mf4 and mf
5 The inputs and membership
functions are combined in rule editor which forms fuzzyrules A fuzzy inference system model is obtained based onmembership functions and rules
431 Algorithm for Designing a Rule Based Model Thefollowing are the steps to design the proposed model
(1) Perform domain analysis on software birthmark(2) Identify properties of software birthmark on which
birthmark is to be estimated(3) Establish an input data base for these properties(4) Design the fuzzy inference system based on these
properties (inputs)(5) Define themembership functions for these properties
(for both inputs and output)(6) Design the fuzzy rules based on membership func-
tions(7) Obtain a fuzzy inference system (model to estimate
birthmark)(8) Estimate the inputs accordingly
The graphical representation of the algorithm is given inFigure 3
The proposedwork for estimating software birthmark hasbeen carried out by using MATLAB fuzzy tool box [21]
The different membership combinations are given inTable 1
The fuzzy rules and model in the proposed methodologyare given in Figure 4
The proposed model can further be explicitly explainedin Figure 5
The rules are as follows
If (credibility is mf1(0ndash19)) and (resilience is mf
5(80ndash
100)) then (output is (0ndash19)) (0)If (credibility is mf
1(0ndash19)) and (resilience is mf
4(60ndash
79)) then (output is (20ndash39)) (02)If (credibility is mf
1(0ndash19)) and (resilience is mf
3(40ndash
59)) then (output is (40ndash59)) (04)
The Scientific World Journal 5
Domain analysis forsoftware birthmark
Identify propertiesof softwarebirthmark
Establish a database for birthmark
properties
Design the fuzzy inference systembased on these properties (inputs)
Define themembership
functions for theseproperties (for bothinputs and output)
Design the fuzzy rulesbased on membership
functions
Obtain a fuzzy inferencesystem (model to estimate
birthmark)
Estimate the inputsaccordingly
Figure 3 Graphical representation of the proposed algorithm
Credibility
Resilience
Resultssum
(output is (0ndash19)) (0)
(output is (20ndash39)) (02)
(output is (60ndash79)) (06)
(output is (80ndash100)) (08)
Up to 25 rules
If (credibility is mf1 (0ndash19)) and
If (credibility is mf1 (0ndash19)) and
If (credibility is mf1 (0ndash19)) and
(resilience is mf5 (80ndash100)) then
(resilience is mf4 (60ndash79)) then
(resilience is mf2 (20ndash39)) then
If (credibility is mf5 (80ndash100)) and(resilience is mf1 (0ndash19)) then
Figure 4 Proposed fuzzy rules model
andornot
Logical operations
Figure 5 Detailed fuzzy rules model (inputs membership func-tions rules and output)
If (credibility is mf1(0ndash19)) and (resilience is mf
2(20ndash
39)) then (output is (60ndash79)) (06)If (credibility is mf
1(0ndash19)) and (resilience is mf
1(0ndash
19)) then (output is (80ndash100)) (08)If (credibility is mf
5(80ndash100)) and (resilience is
mf1(0ndash19)) then (output is (80ndash100)) (08)
If (credibility is mf4(60ndash79)) and (resilience is mf
1(0ndash
19)) then (output is (60ndash79)) (06)If (credibility is mf
3(40ndash59)) and (resilience is mf
1(0ndash
19)) then (output is (40ndash59)) (04)If (credibility is mf
2(20ndash39)) and (resilience is mf
1(0ndash
19)) then (output is (20ndash39)) (02)If (credibility is mf
2(20ndash39)) and (resilience is
mf2(20ndash39)) then (output is (80ndash100)) (08)
If (credibility is mf3(40ndash59)) and (resilience is
mf3(40ndash59)) then (output is (80ndash100)) (08)
If (credibility is mf4(60ndash79)) and (resilience is
mf4(60ndash79)) then (output is (80ndash100)) (08)
If (credibility is mf5(80ndash100)) and (resilience is
mf5(80ndash100)) then (output is (80ndash100)) (08)
If (credibility is mf2(20ndash39)) and (resilience is
mf5(80ndash100)) then (output is (20ndash39)) (02)
If (credibility is mf3(40ndash59)) and (resilience is
mf5(80ndash100)) then (output is (40ndash59)) (04)
If (credibility is mf4(60ndash79)) and (resilience is
mf5(80ndash100)) then (output is (60ndash79)) (06)
If (credibility is mf3(40ndash59)) and (resilience is
mf4(60ndash79)) then (output is (60ndash79)) (06)
If (credibility is mf2(20ndash39)) and (resilience is
mf4(60ndash79)) then (output is (40ndash59)) (04)
If (credibility is mf2(20ndash39)) and (resilience is
mf3(40ndash59)) then (output is (40ndash59)) (04)
If (credibility is mf4(60ndash79)) and (resilience is
mf3(40ndash59)) then (output is (60ndash79)) (06)
6 The Scientific World Journal
Estimating(mamdani)
Credibility (5)
Resilience (5)
Output (5)
Figure 6 Proposed fuzzy inference system
09
08
07
06
05
04
031
0806
0402
0 002
0406
081
Out
put
Resilience Credibility
Figure 7 Surface view of inputs and outputs (generated in MAT-LAB)
If (credibility is mf5(80ndash100)) and (resilience is
mf3(40ndash59)) then (output is (80ndash100)) (08)
If (credibility is mf4(60ndash79)) and (resilience is
mf2(20ndash39)) then (output is (60ndash79)) (06)
If (credibility is mf3(40ndash59)) and (resilience is
mf2(20ndash39)) then (output is (40ndash59)) (04)
Based upon the above rules a fuzzy inference system isobtained for estimating software birthmark which is givenin Figure 6
Figure 7 visually shows the surface view of inputs andoutput
44 Inputs Estimation Once the fuzzy rules model isdesigned inputs will be given according to the customerrequirements to the model The model will generate theoutput based on the fuzzy rules Details of the proposedsystem inputs and output are given as shown in Table 2
45 Evaluation of the Model (Case Study) The presentresearch work has been validated by a case study of smallmodule for Android application The Android radiocalcmodule consists of 109 lines of code The methodologyhas been applied on a similar application for Android Thebirthmark of the module has been estimated based on theproperties of resilience and credibility119870-gram based birthmark similarity technique [22] has
been used By performing various experiments we foundout that as the 119870-value increases the birthmark similaritydecreases For very small values of119870 the birthmark similarity
was not satisfactory For 119896 = 5 the experiment revealedgood results in terms of similarity and runtime overheadTheresulting similarity for the above mentioned application with119896 = 5 was 40
We applied SandMark [23] and Codeshield [24] toolsfor the above application for code obfuscation To find thevalue of resilience it gives a similarity of 80 for 119896 = 5Codeshield tool provides the name obfuscation the removalof debugging information and some type of control flowwhile the SandMark tool does not include an automaticobfuscation The similarity was computed through119870-gramsThe similarity of Codeshield was found for 119870-gram whichshows that if 119870 increases there is a decrease in the similarityfor numerous transformations Table 3 shows the inputs andvalues for the proposed model
The defined inputs to the fuzzy model are described asfollows If credibility is equal to 04 (40) and resilience is08 (80) these inputs are given to the fuzzification model
The Scientific World Journal 7
Table 3 Inputs and value for the proposed model
InputsFor 119896 = 5
Value in Value for proposedmodel
Credibility 40 04Resilience 80 08
(fuzzy inference system) Credibility 04 is the degree ofmembership function mf
1(40ndash59) and resilience 08 is the
degree of membership function mf2(20ndash39) It will give the
output 0500 from the degree of membership function basedon the designed model So from the results one can make adecision about the birthmark of the software
5 Results and Discussion
A fuzzy inference system is designed which models thesystemwhich in turn estimates the birthmark of the softwareInputs are assigned to the model to check and estimate thesoftware birthmark in terms of credibility and resilience Thedesigned model evaluates the inputs (which are given to themodel) and gives results On the basis of the given resultsone can check the estimation of software birthmark for theproperties of credibility and resilience To check the validityof the proposed model inputs were given as follows out =evalfis ([04 08] fismat) and the output = 0500 whichshow the estimation of the software birthmark Hence thisresult clearly shows the software birthmarks for their desiredproperties Different fuzzy techniques are used [25] whichuse fuzzy C-mean clustering
6 Conclusion
Software theft is a global problem of copying stealing andmisusing the software without proper license agreementSoftware birthmark is a capable technique to detect thetheft of software systems Software birthmark is an intrinsiccharacteristic of software used to detect the similarity ofsoftware The estimation of software birthmark can play akey role in accepting the effectiveness of a birthmark Inthis research fuzzy logic has been used to estimate softwarebirthmark(s) which is an efficient and powerful tool totackle issues of uncertainty This method is based on fuzzyrules which were designed from the fuzzy membershipfunctions Different techniques are used in practice but allare based on known information In practice situations ofuncertainty also arise The proposed model works well incase of uncertainty and with unknown information Themodel is based on the two properties of software birthmarkcredibility and resilienceThemodel has been validated usingsome Android applications Various experiments have beenperformed using different existing tools of code obfuscationand software birthmark(s) are estimated Results produced bythe proposed process show that the method is efficient andprovides satisfactory results The approach has been testedonly for credibility and resilience as these two properties
are considered as the most important properties of softwarebirthmark(s) Therefore these are selected here for modeltesting In the future the model can be expanded for adifferent set of properties
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] D Curtis ldquoSoftware piracy and copyright protectionrdquo inProceedings of the IdeaMicroelectronics Conference Record(WESCON rsquo94) Anaheim Calif USA 1994
[2] GMyles andCCollberg ldquoSoftwarewatermarking through reg-ister allocation implementation analysis and attacksrdquo in Infor-mation Security andCryptologymdashICISC 2003 vol 2971 pp 274ndash293 Springer Berlin Germany 2003
[3] C Collberg and T R Sahoo ldquoSoftware watermarking inthe frequency domain implementation analysis and attacksrdquoJournal of Computer Security vol 13 no 5 pp 721ndash755 2005
[4] F Liu B Lu and X Luo ldquoA chaos-based robust softwarewatermarkingrdquo in Information Security Practice and Experiencevol 3903 pp 355ndash366 Springer Berlin Germany 2006
[5] Y Zeng F Liu X Luo and C Yang ldquoSoftware watermarkingthrough obfuscated interpretation implementation and analy-sisrdquo Journal of Multimedia vol 6 no 4 pp 329ndash340 2011
[6] S Choi H Park H-I Lim and T Han ldquoA static API birthmarkfor Windows binary executablesrdquo Journal of Systems and Soft-ware vol 82 no 5 pp 862ndash873 2009
[7] P Heewan C Seokwoo L Hyun-Il and H Taisook ldquoDetectingcode theft via a static instruction trace birthmark for Javameth-odsrdquo in Proceedings of the 6th IEEE International Conferenceon Industrial Informatics (INDIN rsquo08) pp 551ndash556 DaejeonRepublic of Korea July 2008
[8] H Park S Choi H-I Lim and T Han ldquoDetecting java theftbased on staticAPI trace birthmarkrdquo inAdvances in Informationand Computer Security vol 5312 of Lecture Notes in ComputerScience pp 121ndash135 Springer Berlin Germany 2008
[9] H Park H-I Lim S Choi and T Han ldquoDetecting commonmodules in java packages based on static object trace birth-markrdquo Computer Journal vol 54 no 1 pp 108ndash124 2011
[10] Y Zeng F Liu X Luo and S Lian ldquoAbstract interpretation-based semantic framework for software birthmarkrdquo Computersamp Security vol 31 no 4 pp 377ndash390 2012
[11] G Myles and C Collberg ldquoDetecting software theft via wholeprogram path birthmarksrdquo in Information Security vol 3225pp 404ndash415 Springer Berlin Germany 2004
[12] T Kakimoto A Monden Y Kamei H Tamada M Tsunodaand K-I Matsumoto ldquoUsing software birthmarks to identifysimilar classes and major functionalitiesrdquo in Proceedings of theInternational Workshop on Mining Software Repositories (MSRrsquo06) pp 171ndash172 ACM Shanghai China May 2006
[13] P P F Chan L C K Hui and S M Yiu ldquoDynamic soft-ware birthmark for java based on heap memory analysisrdquo inCommunications andMultimedia Security vol 7025 pp 94ndash107Springer Berlin Germany 2011
[14] Y Wang F Liu D Gong B Lu and S Ma ldquoCHI basedinstruction-words software birthmark selectionrdquo in Proceedings
8 The Scientific World Journal
of the 4th International Conference on Multimedia and Security(MINES rsquo12) pp 892ndash895 November 2012
[15] H-I Lim ldquoCustomizing k-gram based birthmark throughpartial matching in detecting software theftsrdquo in Proceedings ofthe 37th IEEE Annual Computer Software and Applications Con-ference Workshops (COMPSACW rsquo13) pp 1ndash4 July 2013
[16] K Tyagi and A Sharma ldquoA rule-based approach for estimatingthe reliability of component-based systemsrdquo Advances in Engi-neering Software vol 54 pp 24ndash29 2012
[17] H Tamada M Nakamura A Monden and K-I MatsumotoldquoDesign and evaluation of birthmarks for detecting theft ofjava programsrdquo in Proceedings of the IASTED InternationalConference on Software Engineering (IASTED SE 04) pp 569ndash575 2004
[18] C Collberg and J Nagra Surreptitious Software ObfuscationWatermarking and Tamperproofing for Software ProtectionAddison Wesley Boston Mass USA 1st edition 2009
[19] G M Myles Software theft detection through program iden-tification [PhD thesis] Department of Computer ScienceUniversity of Arizona Tucson Ariz USA 2006
[20] L A Zadeh ldquoFuzzy logicrdquo Computer vol 21 no 4 pp 83ndash931988
[21] MATLAB 7100 The MathWorks Natick Mass USA 2010[22] G Myles and C Collberg ldquoK-gram software birthmarksrdquo in
Proceedings of the 20th Annual ACM Symposium on AppliedComputing pp 314ndash318 ACM Santa FeNMUSAMarch 2005
[23] C Collberg GMyles andAHuntwork ldquoSandmarkmdasha tool forsoftware protection researchrdquo IEEE Security and Privacy vol 1no 4 pp 40ndash49 2003
[24] C P Ltd CodeShield Java Byte Obfuscator 2014 httpwwwxmarkscomssitewwwcodingartcomcodeshieldhtml
[25] L Saeidiasl TAhmadNAlias andMGhanbari ldquoComparisonof EEG source localization using meromorphic approximationto fuzzy C-meanrdquo Malaysian Journal of Fundamental andApplied Science vol 9 pp 215ndash220 2013
purpose a fuzzy model has been designed which is basedon membership function and fuzzy rules to provide anappropriate estimation for software birthmarks
The structure of the rest of the paper is as followsIn Section 2 research on software birthmark has been dis-cussed Section 3 gives a detailed description of materials andmethod In this section the details of software birthmarkclassifications of birthmarks and a comparison with water-marks are given The proposed methodology for estimatingsoftware birthmarks is presented in Section 4 Results anddiscussion of the method are presented in Section 5 Thepaper is concluded in Section 6
2 Related Work
Till now researchers have considered two important prop-erties of a software birthmark to evaluate their effectivenessthat are credibility and resilience Zeng et al [10] report thatnot many theoretical frameworks are available that properlyanalyze and verify the success of software birthmark Theevaluation of software birthmark is mainly done throughexperiment They presented a semantic based abstract inter-pretation framework This model describes two importantproperties of software birthmark namely credibility andresilienceWith the help of static 119899-gram birthmark and staticAPI birthmark the effectiveness of the framework is verifiedMyles and Collberg [11] presented a technique called ldquoWholeProgramPath Birthmarkingrdquo for detecting software theftTheprogram is based on complete control flow of the softwareprogram The two important properties that are credibilityand tolerance are considered to evaluate the efficiency ofthe technique The technique demonstrates that the wholeprogram path birthmark is more resilient than the existingtechniques Furthermore the technique also showed thateven if an embedded watermark is destroyed by programtransformation the birthmark can still identify the theft Parket al [8] used static API trace birthmark for the detectionof Java theft The method evaluates the birthmark in termsof credibility and resilience The experimental result of themethod shows that static API birthmark can detect modulesof two packages whereas the other birthmark fails
Kakimoto et al [12] analyzed the birthmark similaritiesin ArgoUML and visualized them using multidimensionalscaling Chan et al [13] proposed a dynamic software birth-mark system for systems designed in Java based on theobject reference graph The method was evaluated for hugeprograms and most of them were megabytes in size Theresults showed that the method was useful in detecting thecode theft Wang et al [14] used CHI (1205942 statistics) for thecharacteristics selection in text classification and broughtin an instruction words software birthmark selection Thealgorithmmakes sample program for protected program andtakes out instructionword from sample programaccording toinstruction word library To find out their correlation the 1205942statistics is calculated for each instructionword and programThe experimental results of the algorithm show that selectionalgorithm has much enhanced the robustness and credibilityof the birthmark Choi et al [6] proposed a staticAPI software
birthmark for Windows binary executable They compared49 Windows executables and showed that their birthmarkcan differentiate and detect the copies The birthmark iscomparedwith theWindows dynamic birthmark and showedthat it ismore suitable forGUI application Lim [15] presenteda customized method of 119896-gram birthmark which permitsthe small changes of programs by applying partial matchingof 119896-gram The experimental result shows that customizingthe 119896-gram birthmark improves the properties of birthmarkthat are credibility and resilience The idea of rule basedestimation has been used by Tyagi and Sharma [16] Theymeasured the reliability of component based system Fuzzyrules were designed to measure the reliability based on fourfactors that are application complexity reusability compo-nent dependency and operational profile
3 Materials and Method
The following are the main concepts used to define theproposed birthmark estimation technique
31 Software Piracy The software industry has faced hugefinancial losses due to the piracy of software Software piracyis performed by end-users as well as the dealers Softwarepiracy causes serious problems which hinder the successof the international software industry Piracy of softwareis a global problem of illegal copying installation usedistribution or sale of software in anymanner other than thatexpressed in the appropriate license agreement The piratesgain easy benefits from the sale of pirated software which ulti-mately affects the business of the software industry Figure 1shows how software is pirated from its original businessmarket
The original licensed software offers a number of highvalued benefits to the customers including assurance of soft-ware quality availability of upgrades technical and manualdocumentations and less bandwidth consumption On theother hand pirated software does not provide such kind offacilities If an organization is using pirated software theremight be risk of failure of the system which might put theorganization at risk of huge financial loss
32 Software Birthmark Software birthmark is a uniqueproperty of every type of software which can help in detectingsoftware theft It is the intrinsic characteristics of a programor software that can be used to spot the theft Comparingthe birthmarks of software tells us whether a program orsoftware is a copy of any other software or not The followingdefinitions of birthmark are given by Tamada et al [17]
Definition 1 (birthmark) Suppose 119901 and 119902 are the programsLet 119891 be the function for extracting a set of properties from aprogram 119891(119901) is a birthmark of 119901 if and only if
(1) 119891(119901) is obtained only from 119901 itself(2) 119902 is a copy of 119901 rArr 119891(119901) = 119891(119902)
Definition 2 (dynamic birthmark) Suppose 119901 and 119902 are theprograms Input 119894 is given to these programs Let 119891 be
The Scientific World Journal 3
Functionalsoftware
Original software Attempt at piracy Distribution ofpirated software
Theft
Figure 1 Software piracy
extracting characteristics from a program So 119891(119901 119894) is adynamic birthmark of 119901 if and only if
(1) 119891(119901 119894) is obtained only from 119901 itself by extracting 119901with the given input 119894
(2) 119902 is a copy of 119901 rArr 119891(119901 119894) = 119891(119902 119894)
All program paths cannot be covered by the dynamicbirthmarks dynamic birthmark only detects the theft of theprogram On the other hand static birthmark is extracted bythe static program analysis that is liable to the properties ofoverestimated program
33 Classification of Software Birthmark Software birthmarkis classified into the following three categories [10]
(i) Instruction Based Software Birthmarks Software programconsists of data and instructions Instruction sequences canreflect program behavior to various points so it is realistic todefine birthmarks as instructions sequences
(ii) API Based Software Birthmark Several birthmarks areavailable that are based on observations of the way a programuses the standard API libraries Not only is it unique to aprogram but this feature is also complex for an attacker toforge [18]
(iii) Graph Based Software Birthmark The software programis like graph structure For instance functions are representedas control flow graph dependency among statement(s) inthe function(s) is represented as dependency graph possiblecalls between functions are represented as call graph andinheritance interaction between classes is represented asacyclic graph As a result it makes sense to represent abirthmark using graph representations of programs [18]
34 Birthmark and Watermark Software birthmark is apromising technique used for the detection of software theftBirthmark does not embed additional code or informationin any form in the original program Software birthmarksonly extract the inherent characteristics from the originalprogram to detect the originality of program [11] Softwarebirthmark only establishes an identity to detect if a programis a copy of any other program It does not show whothe original owner of the program is or who is guilty ofsoftware piracy [11] While software watermarking assertsthe ownership of the programs by adding extra information
to the original program before it is publically availablesoftware watermarks identify software from the embeddedinformationcode Both the techniques can be combined toprovide a stronger verification mechanism to detect theftBirthmark can be used where there is a limitation of storagespace as watermarking uses extra storage space Also inmany situations watermarks fail for example if an attackeris able to apply obfuscation that destroys watermarks In suchsituations software birthmarks provide evidence of piracy orsoftware theft [11]
4 Proposed Methodology
The following sections define the proposed methodology toestimate software birthmarks
41 Software Birthmarks Properties In order to estimate thesuccess of software birthmarks researchers typically considertwo properties which are credibility and resilience [19]Credibility requires that the birthmark of the two programsmust be different whereas the resilience states that thebirthmark should be preserved and not destroyed in anycircumstances
According to Tamada et al [17] software birthmark sat-isfies the following two important properties which indicatesthat the two independently implemented programs should bedifferent
Property 1 Let 119875 and 119876 be two independently writtenprograms which achieve the same task then 119891 is credible if119891(119875) = 119891(119876)
Property 2 Let 1198751015840 be the program obtained from 119875 byapplying semantic preserving transformation 119879 119891 is resilientto 119879 if 119891(119875) = 119891(1198751015840)
Property 1 indicates that the birthmarks falsely show that119876 is a copy of 119875 This situation will occur with the separatelyimplemented programs that achieve the same task
Property 2 relates to identifying a copy in the occurrenceof transformation It is wished that a birthmark could be usedto detect a copy if some transformation has been applied tothe program
Figure 2 shows the properties of software birthmarkIn the existing literature on software birthmarks there
is no model which exactly estimates the birthmark of soft-ware based on the properties of credibility and resilience
4 The Scientific World Journal
Credibility
Software birthmark
Resilience
+=
Figure 2 Properties based software birthmarks
The proposed methodology helps to estimate the birthmarksof software based on these properties
42 Fuzzy Logic Fuzzy logic concept was developed byZadeh in 1965 [20] It is a mathematical tool which deals withmanaging uncertain and doubtful information Fuzzy settheory is being used for solving diverse problems in differentfields of daily life Fuzzy tool helps in providing solution forthe problems which are complicated to model Fuzzy set isthe extended form of traditional sets which is described bymembership function and is extremely beneficial for decisionmaking in uncertain and vague situations Here the decisioncan be made in qualitative variables (low strong very strongetc) instead of quantitative variables (ie numbers) andthese qualitative variables allow precise modelingThe inputsand outputs have the degree ofmembership function in rangeof interval [0 1]
In the proposed method the membership functionsnamed mf
1in the range of (0ndash19) mf
2in the range of (20ndash
39) mf3in the range of (40ndash59) mf
4in the range of (60ndash
79) and mf5in the range of (80ndash100) are defined Also to
plot fuzziness triangular membership functions are definedand used to represent weights The triangular membershipfunction has three parameters (119897 119898 119906) which are defined as119897 le 119898 le 119906
Details of fuzzy logic concept are given in Zadeh [20]however the major parts of the fuzzy system are as followsThe first phase is the fuzzification which transforms theclassification table into continuous classifications Then it isprocessed in the fuzzy domain based on the designed rulesLastly the fuzzification process transforms fuzzy numberback into the real number
43 Rules Based Approach to Estimate Software BirthmarkEstimating software birthmark is an essential part of softwaresystem development to get rid of the entire theft of the soft-ware system Most of software theft threats are faced duringthe implementation of the software Developers are still inconfusion about how to handle such situations If birthmarksof the system are estimated then one can easilymake decision
about the alternate designThe proposedmethodology basedon fuzzy concept provides an estimation model to softwarebirthmark Initially inputs (properties of birthmark) areselected on the basis of which the birthmark(s) is to beestimated On the basis of inputs the membership functionsare plotted The membership function identifies the degreeof relationship of the concept (data) to a particular area(data range) Five membership functions were plotted thatare mf
1 mf2 mf3 mf4 and mf
5 The inputs and membership
functions are combined in rule editor which forms fuzzyrules A fuzzy inference system model is obtained based onmembership functions and rules
431 Algorithm for Designing a Rule Based Model Thefollowing are the steps to design the proposed model
(1) Perform domain analysis on software birthmark(2) Identify properties of software birthmark on which
birthmark is to be estimated(3) Establish an input data base for these properties(4) Design the fuzzy inference system based on these
properties (inputs)(5) Define themembership functions for these properties
(for both inputs and output)(6) Design the fuzzy rules based on membership func-
tions(7) Obtain a fuzzy inference system (model to estimate
birthmark)(8) Estimate the inputs accordingly
The graphical representation of the algorithm is given inFigure 3
The proposedwork for estimating software birthmark hasbeen carried out by using MATLAB fuzzy tool box [21]
The different membership combinations are given inTable 1
The fuzzy rules and model in the proposed methodologyare given in Figure 4
The proposed model can further be explicitly explainedin Figure 5
The rules are as follows
If (credibility is mf1(0ndash19)) and (resilience is mf
5(80ndash
100)) then (output is (0ndash19)) (0)If (credibility is mf
1(0ndash19)) and (resilience is mf
4(60ndash
79)) then (output is (20ndash39)) (02)If (credibility is mf
1(0ndash19)) and (resilience is mf
3(40ndash
59)) then (output is (40ndash59)) (04)
The Scientific World Journal 5
Domain analysis forsoftware birthmark
Identify propertiesof softwarebirthmark
Establish a database for birthmark
properties
Design the fuzzy inference systembased on these properties (inputs)
Define themembership
functions for theseproperties (for bothinputs and output)
Design the fuzzy rulesbased on membership
functions
Obtain a fuzzy inferencesystem (model to estimate
birthmark)
Estimate the inputsaccordingly
Figure 3 Graphical representation of the proposed algorithm
Credibility
Resilience
Resultssum
(output is (0ndash19)) (0)
(output is (20ndash39)) (02)
(output is (60ndash79)) (06)
(output is (80ndash100)) (08)
Up to 25 rules
If (credibility is mf1 (0ndash19)) and
If (credibility is mf1 (0ndash19)) and
If (credibility is mf1 (0ndash19)) and
(resilience is mf5 (80ndash100)) then
(resilience is mf4 (60ndash79)) then
(resilience is mf2 (20ndash39)) then
If (credibility is mf5 (80ndash100)) and(resilience is mf1 (0ndash19)) then
Figure 4 Proposed fuzzy rules model
andornot
Logical operations
Figure 5 Detailed fuzzy rules model (inputs membership func-tions rules and output)
If (credibility is mf1(0ndash19)) and (resilience is mf
2(20ndash
39)) then (output is (60ndash79)) (06)If (credibility is mf
1(0ndash19)) and (resilience is mf
1(0ndash
19)) then (output is (80ndash100)) (08)If (credibility is mf
5(80ndash100)) and (resilience is
mf1(0ndash19)) then (output is (80ndash100)) (08)
If (credibility is mf4(60ndash79)) and (resilience is mf
1(0ndash
19)) then (output is (60ndash79)) (06)If (credibility is mf
3(40ndash59)) and (resilience is mf
1(0ndash
19)) then (output is (40ndash59)) (04)If (credibility is mf
2(20ndash39)) and (resilience is mf
1(0ndash
19)) then (output is (20ndash39)) (02)If (credibility is mf
2(20ndash39)) and (resilience is
mf2(20ndash39)) then (output is (80ndash100)) (08)
If (credibility is mf3(40ndash59)) and (resilience is
mf3(40ndash59)) then (output is (80ndash100)) (08)
If (credibility is mf4(60ndash79)) and (resilience is
mf4(60ndash79)) then (output is (80ndash100)) (08)
If (credibility is mf5(80ndash100)) and (resilience is
mf5(80ndash100)) then (output is (80ndash100)) (08)
If (credibility is mf2(20ndash39)) and (resilience is
mf5(80ndash100)) then (output is (20ndash39)) (02)
If (credibility is mf3(40ndash59)) and (resilience is
mf5(80ndash100)) then (output is (40ndash59)) (04)
If (credibility is mf4(60ndash79)) and (resilience is
mf5(80ndash100)) then (output is (60ndash79)) (06)
If (credibility is mf3(40ndash59)) and (resilience is
mf4(60ndash79)) then (output is (60ndash79)) (06)
If (credibility is mf2(20ndash39)) and (resilience is
mf4(60ndash79)) then (output is (40ndash59)) (04)
If (credibility is mf2(20ndash39)) and (resilience is
mf3(40ndash59)) then (output is (40ndash59)) (04)
If (credibility is mf4(60ndash79)) and (resilience is
mf3(40ndash59)) then (output is (60ndash79)) (06)
6 The Scientific World Journal
Estimating(mamdani)
Credibility (5)
Resilience (5)
Output (5)
Figure 6 Proposed fuzzy inference system
09
08
07
06
05
04
031
0806
0402
0 002
0406
081
Out
put
Resilience Credibility
Figure 7 Surface view of inputs and outputs (generated in MAT-LAB)
If (credibility is mf5(80ndash100)) and (resilience is
mf3(40ndash59)) then (output is (80ndash100)) (08)
If (credibility is mf4(60ndash79)) and (resilience is
mf2(20ndash39)) then (output is (60ndash79)) (06)
If (credibility is mf3(40ndash59)) and (resilience is
mf2(20ndash39)) then (output is (40ndash59)) (04)
Based upon the above rules a fuzzy inference system isobtained for estimating software birthmark which is givenin Figure 6
Figure 7 visually shows the surface view of inputs andoutput
44 Inputs Estimation Once the fuzzy rules model isdesigned inputs will be given according to the customerrequirements to the model The model will generate theoutput based on the fuzzy rules Details of the proposedsystem inputs and output are given as shown in Table 2
45 Evaluation of the Model (Case Study) The presentresearch work has been validated by a case study of smallmodule for Android application The Android radiocalcmodule consists of 109 lines of code The methodologyhas been applied on a similar application for Android Thebirthmark of the module has been estimated based on theproperties of resilience and credibility119870-gram based birthmark similarity technique [22] has
been used By performing various experiments we foundout that as the 119870-value increases the birthmark similaritydecreases For very small values of119870 the birthmark similarity
was not satisfactory For 119896 = 5 the experiment revealedgood results in terms of similarity and runtime overheadTheresulting similarity for the above mentioned application with119896 = 5 was 40
We applied SandMark [23] and Codeshield [24] toolsfor the above application for code obfuscation To find thevalue of resilience it gives a similarity of 80 for 119896 = 5Codeshield tool provides the name obfuscation the removalof debugging information and some type of control flowwhile the SandMark tool does not include an automaticobfuscation The similarity was computed through119870-gramsThe similarity of Codeshield was found for 119870-gram whichshows that if 119870 increases there is a decrease in the similarityfor numerous transformations Table 3 shows the inputs andvalues for the proposed model
The defined inputs to the fuzzy model are described asfollows If credibility is equal to 04 (40) and resilience is08 (80) these inputs are given to the fuzzification model
The Scientific World Journal 7
Table 3 Inputs and value for the proposed model
InputsFor 119896 = 5
Value in Value for proposedmodel
Credibility 40 04Resilience 80 08
(fuzzy inference system) Credibility 04 is the degree ofmembership function mf
1(40ndash59) and resilience 08 is the
degree of membership function mf2(20ndash39) It will give the
output 0500 from the degree of membership function basedon the designed model So from the results one can make adecision about the birthmark of the software
5 Results and Discussion
A fuzzy inference system is designed which models thesystemwhich in turn estimates the birthmark of the softwareInputs are assigned to the model to check and estimate thesoftware birthmark in terms of credibility and resilience Thedesigned model evaluates the inputs (which are given to themodel) and gives results On the basis of the given resultsone can check the estimation of software birthmark for theproperties of credibility and resilience To check the validityof the proposed model inputs were given as follows out =evalfis ([04 08] fismat) and the output = 0500 whichshow the estimation of the software birthmark Hence thisresult clearly shows the software birthmarks for their desiredproperties Different fuzzy techniques are used [25] whichuse fuzzy C-mean clustering
6 Conclusion
Software theft is a global problem of copying stealing andmisusing the software without proper license agreementSoftware birthmark is a capable technique to detect thetheft of software systems Software birthmark is an intrinsiccharacteristic of software used to detect the similarity ofsoftware The estimation of software birthmark can play akey role in accepting the effectiveness of a birthmark Inthis research fuzzy logic has been used to estimate softwarebirthmark(s) which is an efficient and powerful tool totackle issues of uncertainty This method is based on fuzzyrules which were designed from the fuzzy membershipfunctions Different techniques are used in practice but allare based on known information In practice situations ofuncertainty also arise The proposed model works well incase of uncertainty and with unknown information Themodel is based on the two properties of software birthmarkcredibility and resilienceThemodel has been validated usingsome Android applications Various experiments have beenperformed using different existing tools of code obfuscationand software birthmark(s) are estimated Results produced bythe proposed process show that the method is efficient andprovides satisfactory results The approach has been testedonly for credibility and resilience as these two properties
are considered as the most important properties of softwarebirthmark(s) Therefore these are selected here for modeltesting In the future the model can be expanded for adifferent set of properties
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] D Curtis ldquoSoftware piracy and copyright protectionrdquo inProceedings of the IdeaMicroelectronics Conference Record(WESCON rsquo94) Anaheim Calif USA 1994
[2] GMyles andCCollberg ldquoSoftwarewatermarking through reg-ister allocation implementation analysis and attacksrdquo in Infor-mation Security andCryptologymdashICISC 2003 vol 2971 pp 274ndash293 Springer Berlin Germany 2003
[3] C Collberg and T R Sahoo ldquoSoftware watermarking inthe frequency domain implementation analysis and attacksrdquoJournal of Computer Security vol 13 no 5 pp 721ndash755 2005
[4] F Liu B Lu and X Luo ldquoA chaos-based robust softwarewatermarkingrdquo in Information Security Practice and Experiencevol 3903 pp 355ndash366 Springer Berlin Germany 2006
[5] Y Zeng F Liu X Luo and C Yang ldquoSoftware watermarkingthrough obfuscated interpretation implementation and analy-sisrdquo Journal of Multimedia vol 6 no 4 pp 329ndash340 2011
[6] S Choi H Park H-I Lim and T Han ldquoA static API birthmarkfor Windows binary executablesrdquo Journal of Systems and Soft-ware vol 82 no 5 pp 862ndash873 2009
[7] P Heewan C Seokwoo L Hyun-Il and H Taisook ldquoDetectingcode theft via a static instruction trace birthmark for Javameth-odsrdquo in Proceedings of the 6th IEEE International Conferenceon Industrial Informatics (INDIN rsquo08) pp 551ndash556 DaejeonRepublic of Korea July 2008
[8] H Park S Choi H-I Lim and T Han ldquoDetecting java theftbased on staticAPI trace birthmarkrdquo inAdvances in Informationand Computer Security vol 5312 of Lecture Notes in ComputerScience pp 121ndash135 Springer Berlin Germany 2008
[9] H Park H-I Lim S Choi and T Han ldquoDetecting commonmodules in java packages based on static object trace birth-markrdquo Computer Journal vol 54 no 1 pp 108ndash124 2011
[10] Y Zeng F Liu X Luo and S Lian ldquoAbstract interpretation-based semantic framework for software birthmarkrdquo Computersamp Security vol 31 no 4 pp 377ndash390 2012
[11] G Myles and C Collberg ldquoDetecting software theft via wholeprogram path birthmarksrdquo in Information Security vol 3225pp 404ndash415 Springer Berlin Germany 2004
[12] T Kakimoto A Monden Y Kamei H Tamada M Tsunodaand K-I Matsumoto ldquoUsing software birthmarks to identifysimilar classes and major functionalitiesrdquo in Proceedings of theInternational Workshop on Mining Software Repositories (MSRrsquo06) pp 171ndash172 ACM Shanghai China May 2006
[13] P P F Chan L C K Hui and S M Yiu ldquoDynamic soft-ware birthmark for java based on heap memory analysisrdquo inCommunications andMultimedia Security vol 7025 pp 94ndash107Springer Berlin Germany 2011
[14] Y Wang F Liu D Gong B Lu and S Ma ldquoCHI basedinstruction-words software birthmark selectionrdquo in Proceedings
8 The Scientific World Journal
of the 4th International Conference on Multimedia and Security(MINES rsquo12) pp 892ndash895 November 2012
[15] H-I Lim ldquoCustomizing k-gram based birthmark throughpartial matching in detecting software theftsrdquo in Proceedings ofthe 37th IEEE Annual Computer Software and Applications Con-ference Workshops (COMPSACW rsquo13) pp 1ndash4 July 2013
[16] K Tyagi and A Sharma ldquoA rule-based approach for estimatingthe reliability of component-based systemsrdquo Advances in Engi-neering Software vol 54 pp 24ndash29 2012
[17] H Tamada M Nakamura A Monden and K-I MatsumotoldquoDesign and evaluation of birthmarks for detecting theft ofjava programsrdquo in Proceedings of the IASTED InternationalConference on Software Engineering (IASTED SE 04) pp 569ndash575 2004
[18] C Collberg and J Nagra Surreptitious Software ObfuscationWatermarking and Tamperproofing for Software ProtectionAddison Wesley Boston Mass USA 1st edition 2009
[19] G M Myles Software theft detection through program iden-tification [PhD thesis] Department of Computer ScienceUniversity of Arizona Tucson Ariz USA 2006
[20] L A Zadeh ldquoFuzzy logicrdquo Computer vol 21 no 4 pp 83ndash931988
[21] MATLAB 7100 The MathWorks Natick Mass USA 2010[22] G Myles and C Collberg ldquoK-gram software birthmarksrdquo in
Proceedings of the 20th Annual ACM Symposium on AppliedComputing pp 314ndash318 ACM Santa FeNMUSAMarch 2005
[23] C Collberg GMyles andAHuntwork ldquoSandmarkmdasha tool forsoftware protection researchrdquo IEEE Security and Privacy vol 1no 4 pp 40ndash49 2003
[24] C P Ltd CodeShield Java Byte Obfuscator 2014 httpwwwxmarkscomssitewwwcodingartcomcodeshieldhtml
[25] L Saeidiasl TAhmadNAlias andMGhanbari ldquoComparisonof EEG source localization using meromorphic approximationto fuzzy C-meanrdquo Malaysian Journal of Fundamental andApplied Science vol 9 pp 215ndash220 2013
Original software Attempt at piracy Distribution ofpirated software
Theft
Figure 1 Software piracy
extracting characteristics from a program So 119891(119901 119894) is adynamic birthmark of 119901 if and only if
(1) 119891(119901 119894) is obtained only from 119901 itself by extracting 119901with the given input 119894
(2) 119902 is a copy of 119901 rArr 119891(119901 119894) = 119891(119902 119894)
All program paths cannot be covered by the dynamicbirthmarks dynamic birthmark only detects the theft of theprogram On the other hand static birthmark is extracted bythe static program analysis that is liable to the properties ofoverestimated program
33 Classification of Software Birthmark Software birthmarkis classified into the following three categories [10]
(i) Instruction Based Software Birthmarks Software programconsists of data and instructions Instruction sequences canreflect program behavior to various points so it is realistic todefine birthmarks as instructions sequences
(ii) API Based Software Birthmark Several birthmarks areavailable that are based on observations of the way a programuses the standard API libraries Not only is it unique to aprogram but this feature is also complex for an attacker toforge [18]
(iii) Graph Based Software Birthmark The software programis like graph structure For instance functions are representedas control flow graph dependency among statement(s) inthe function(s) is represented as dependency graph possiblecalls between functions are represented as call graph andinheritance interaction between classes is represented asacyclic graph As a result it makes sense to represent abirthmark using graph representations of programs [18]
34 Birthmark and Watermark Software birthmark is apromising technique used for the detection of software theftBirthmark does not embed additional code or informationin any form in the original program Software birthmarksonly extract the inherent characteristics from the originalprogram to detect the originality of program [11] Softwarebirthmark only establishes an identity to detect if a programis a copy of any other program It does not show whothe original owner of the program is or who is guilty ofsoftware piracy [11] While software watermarking assertsthe ownership of the programs by adding extra information
to the original program before it is publically availablesoftware watermarks identify software from the embeddedinformationcode Both the techniques can be combined toprovide a stronger verification mechanism to detect theftBirthmark can be used where there is a limitation of storagespace as watermarking uses extra storage space Also inmany situations watermarks fail for example if an attackeris able to apply obfuscation that destroys watermarks In suchsituations software birthmarks provide evidence of piracy orsoftware theft [11]
4 Proposed Methodology
The following sections define the proposed methodology toestimate software birthmarks
41 Software Birthmarks Properties In order to estimate thesuccess of software birthmarks researchers typically considertwo properties which are credibility and resilience [19]Credibility requires that the birthmark of the two programsmust be different whereas the resilience states that thebirthmark should be preserved and not destroyed in anycircumstances
According to Tamada et al [17] software birthmark sat-isfies the following two important properties which indicatesthat the two independently implemented programs should bedifferent
Property 1 Let 119875 and 119876 be two independently writtenprograms which achieve the same task then 119891 is credible if119891(119875) = 119891(119876)
Property 2 Let 1198751015840 be the program obtained from 119875 byapplying semantic preserving transformation 119879 119891 is resilientto 119879 if 119891(119875) = 119891(1198751015840)
Property 1 indicates that the birthmarks falsely show that119876 is a copy of 119875 This situation will occur with the separatelyimplemented programs that achieve the same task
Property 2 relates to identifying a copy in the occurrenceof transformation It is wished that a birthmark could be usedto detect a copy if some transformation has been applied tothe program
Figure 2 shows the properties of software birthmarkIn the existing literature on software birthmarks there
is no model which exactly estimates the birthmark of soft-ware based on the properties of credibility and resilience
4 The Scientific World Journal
Credibility
Software birthmark
Resilience
+=
Figure 2 Properties based software birthmarks
The proposed methodology helps to estimate the birthmarksof software based on these properties
42 Fuzzy Logic Fuzzy logic concept was developed byZadeh in 1965 [20] It is a mathematical tool which deals withmanaging uncertain and doubtful information Fuzzy settheory is being used for solving diverse problems in differentfields of daily life Fuzzy tool helps in providing solution forthe problems which are complicated to model Fuzzy set isthe extended form of traditional sets which is described bymembership function and is extremely beneficial for decisionmaking in uncertain and vague situations Here the decisioncan be made in qualitative variables (low strong very strongetc) instead of quantitative variables (ie numbers) andthese qualitative variables allow precise modelingThe inputsand outputs have the degree ofmembership function in rangeof interval [0 1]
In the proposed method the membership functionsnamed mf
1in the range of (0ndash19) mf
2in the range of (20ndash
39) mf3in the range of (40ndash59) mf
4in the range of (60ndash
79) and mf5in the range of (80ndash100) are defined Also to
plot fuzziness triangular membership functions are definedand used to represent weights The triangular membershipfunction has three parameters (119897 119898 119906) which are defined as119897 le 119898 le 119906
Details of fuzzy logic concept are given in Zadeh [20]however the major parts of the fuzzy system are as followsThe first phase is the fuzzification which transforms theclassification table into continuous classifications Then it isprocessed in the fuzzy domain based on the designed rulesLastly the fuzzification process transforms fuzzy numberback into the real number
43 Rules Based Approach to Estimate Software BirthmarkEstimating software birthmark is an essential part of softwaresystem development to get rid of the entire theft of the soft-ware system Most of software theft threats are faced duringthe implementation of the software Developers are still inconfusion about how to handle such situations If birthmarksof the system are estimated then one can easilymake decision
about the alternate designThe proposedmethodology basedon fuzzy concept provides an estimation model to softwarebirthmark Initially inputs (properties of birthmark) areselected on the basis of which the birthmark(s) is to beestimated On the basis of inputs the membership functionsare plotted The membership function identifies the degreeof relationship of the concept (data) to a particular area(data range) Five membership functions were plotted thatare mf
1 mf2 mf3 mf4 and mf
5 The inputs and membership
functions are combined in rule editor which forms fuzzyrules A fuzzy inference system model is obtained based onmembership functions and rules
431 Algorithm for Designing a Rule Based Model Thefollowing are the steps to design the proposed model
(1) Perform domain analysis on software birthmark(2) Identify properties of software birthmark on which
birthmark is to be estimated(3) Establish an input data base for these properties(4) Design the fuzzy inference system based on these
properties (inputs)(5) Define themembership functions for these properties
(for both inputs and output)(6) Design the fuzzy rules based on membership func-
tions(7) Obtain a fuzzy inference system (model to estimate
birthmark)(8) Estimate the inputs accordingly
The graphical representation of the algorithm is given inFigure 3
The proposedwork for estimating software birthmark hasbeen carried out by using MATLAB fuzzy tool box [21]
The different membership combinations are given inTable 1
The fuzzy rules and model in the proposed methodologyare given in Figure 4
The proposed model can further be explicitly explainedin Figure 5
The rules are as follows
If (credibility is mf1(0ndash19)) and (resilience is mf
5(80ndash
100)) then (output is (0ndash19)) (0)If (credibility is mf
1(0ndash19)) and (resilience is mf
4(60ndash
79)) then (output is (20ndash39)) (02)If (credibility is mf
1(0ndash19)) and (resilience is mf
3(40ndash
59)) then (output is (40ndash59)) (04)
The Scientific World Journal 5
Domain analysis forsoftware birthmark
Identify propertiesof softwarebirthmark
Establish a database for birthmark
properties
Design the fuzzy inference systembased on these properties (inputs)
Define themembership
functions for theseproperties (for bothinputs and output)
Design the fuzzy rulesbased on membership
functions
Obtain a fuzzy inferencesystem (model to estimate
birthmark)
Estimate the inputsaccordingly
Figure 3 Graphical representation of the proposed algorithm
Credibility
Resilience
Resultssum
(output is (0ndash19)) (0)
(output is (20ndash39)) (02)
(output is (60ndash79)) (06)
(output is (80ndash100)) (08)
Up to 25 rules
If (credibility is mf1 (0ndash19)) and
If (credibility is mf1 (0ndash19)) and
If (credibility is mf1 (0ndash19)) and
(resilience is mf5 (80ndash100)) then
(resilience is mf4 (60ndash79)) then
(resilience is mf2 (20ndash39)) then
If (credibility is mf5 (80ndash100)) and(resilience is mf1 (0ndash19)) then
Figure 4 Proposed fuzzy rules model
andornot
Logical operations
Figure 5 Detailed fuzzy rules model (inputs membership func-tions rules and output)
If (credibility is mf1(0ndash19)) and (resilience is mf
2(20ndash
39)) then (output is (60ndash79)) (06)If (credibility is mf
1(0ndash19)) and (resilience is mf
1(0ndash
19)) then (output is (80ndash100)) (08)If (credibility is mf
5(80ndash100)) and (resilience is
mf1(0ndash19)) then (output is (80ndash100)) (08)
If (credibility is mf4(60ndash79)) and (resilience is mf
1(0ndash
19)) then (output is (60ndash79)) (06)If (credibility is mf
3(40ndash59)) and (resilience is mf
1(0ndash
19)) then (output is (40ndash59)) (04)If (credibility is mf
2(20ndash39)) and (resilience is mf
1(0ndash
19)) then (output is (20ndash39)) (02)If (credibility is mf
2(20ndash39)) and (resilience is
mf2(20ndash39)) then (output is (80ndash100)) (08)
If (credibility is mf3(40ndash59)) and (resilience is
mf3(40ndash59)) then (output is (80ndash100)) (08)
If (credibility is mf4(60ndash79)) and (resilience is
mf4(60ndash79)) then (output is (80ndash100)) (08)
If (credibility is mf5(80ndash100)) and (resilience is
mf5(80ndash100)) then (output is (80ndash100)) (08)
If (credibility is mf2(20ndash39)) and (resilience is
mf5(80ndash100)) then (output is (20ndash39)) (02)
If (credibility is mf3(40ndash59)) and (resilience is
mf5(80ndash100)) then (output is (40ndash59)) (04)
If (credibility is mf4(60ndash79)) and (resilience is
mf5(80ndash100)) then (output is (60ndash79)) (06)
If (credibility is mf3(40ndash59)) and (resilience is
mf4(60ndash79)) then (output is (60ndash79)) (06)
If (credibility is mf2(20ndash39)) and (resilience is
mf4(60ndash79)) then (output is (40ndash59)) (04)
If (credibility is mf2(20ndash39)) and (resilience is
mf3(40ndash59)) then (output is (40ndash59)) (04)
If (credibility is mf4(60ndash79)) and (resilience is
mf3(40ndash59)) then (output is (60ndash79)) (06)
6 The Scientific World Journal
Estimating(mamdani)
Credibility (5)
Resilience (5)
Output (5)
Figure 6 Proposed fuzzy inference system
09
08
07
06
05
04
031
0806
0402
0 002
0406
081
Out
put
Resilience Credibility
Figure 7 Surface view of inputs and outputs (generated in MAT-LAB)
If (credibility is mf5(80ndash100)) and (resilience is
mf3(40ndash59)) then (output is (80ndash100)) (08)
If (credibility is mf4(60ndash79)) and (resilience is
mf2(20ndash39)) then (output is (60ndash79)) (06)
If (credibility is mf3(40ndash59)) and (resilience is
mf2(20ndash39)) then (output is (40ndash59)) (04)
Based upon the above rules a fuzzy inference system isobtained for estimating software birthmark which is givenin Figure 6
Figure 7 visually shows the surface view of inputs andoutput
44 Inputs Estimation Once the fuzzy rules model isdesigned inputs will be given according to the customerrequirements to the model The model will generate theoutput based on the fuzzy rules Details of the proposedsystem inputs and output are given as shown in Table 2
45 Evaluation of the Model (Case Study) The presentresearch work has been validated by a case study of smallmodule for Android application The Android radiocalcmodule consists of 109 lines of code The methodologyhas been applied on a similar application for Android Thebirthmark of the module has been estimated based on theproperties of resilience and credibility119870-gram based birthmark similarity technique [22] has
been used By performing various experiments we foundout that as the 119870-value increases the birthmark similaritydecreases For very small values of119870 the birthmark similarity
was not satisfactory For 119896 = 5 the experiment revealedgood results in terms of similarity and runtime overheadTheresulting similarity for the above mentioned application with119896 = 5 was 40
We applied SandMark [23] and Codeshield [24] toolsfor the above application for code obfuscation To find thevalue of resilience it gives a similarity of 80 for 119896 = 5Codeshield tool provides the name obfuscation the removalof debugging information and some type of control flowwhile the SandMark tool does not include an automaticobfuscation The similarity was computed through119870-gramsThe similarity of Codeshield was found for 119870-gram whichshows that if 119870 increases there is a decrease in the similarityfor numerous transformations Table 3 shows the inputs andvalues for the proposed model
The defined inputs to the fuzzy model are described asfollows If credibility is equal to 04 (40) and resilience is08 (80) these inputs are given to the fuzzification model
The Scientific World Journal 7
Table 3 Inputs and value for the proposed model
InputsFor 119896 = 5
Value in Value for proposedmodel
Credibility 40 04Resilience 80 08
(fuzzy inference system) Credibility 04 is the degree ofmembership function mf
1(40ndash59) and resilience 08 is the
degree of membership function mf2(20ndash39) It will give the
output 0500 from the degree of membership function basedon the designed model So from the results one can make adecision about the birthmark of the software
5 Results and Discussion
A fuzzy inference system is designed which models thesystemwhich in turn estimates the birthmark of the softwareInputs are assigned to the model to check and estimate thesoftware birthmark in terms of credibility and resilience Thedesigned model evaluates the inputs (which are given to themodel) and gives results On the basis of the given resultsone can check the estimation of software birthmark for theproperties of credibility and resilience To check the validityof the proposed model inputs were given as follows out =evalfis ([04 08] fismat) and the output = 0500 whichshow the estimation of the software birthmark Hence thisresult clearly shows the software birthmarks for their desiredproperties Different fuzzy techniques are used [25] whichuse fuzzy C-mean clustering
6 Conclusion
Software theft is a global problem of copying stealing andmisusing the software without proper license agreementSoftware birthmark is a capable technique to detect thetheft of software systems Software birthmark is an intrinsiccharacteristic of software used to detect the similarity ofsoftware The estimation of software birthmark can play akey role in accepting the effectiveness of a birthmark Inthis research fuzzy logic has been used to estimate softwarebirthmark(s) which is an efficient and powerful tool totackle issues of uncertainty This method is based on fuzzyrules which were designed from the fuzzy membershipfunctions Different techniques are used in practice but allare based on known information In practice situations ofuncertainty also arise The proposed model works well incase of uncertainty and with unknown information Themodel is based on the two properties of software birthmarkcredibility and resilienceThemodel has been validated usingsome Android applications Various experiments have beenperformed using different existing tools of code obfuscationand software birthmark(s) are estimated Results produced bythe proposed process show that the method is efficient andprovides satisfactory results The approach has been testedonly for credibility and resilience as these two properties
are considered as the most important properties of softwarebirthmark(s) Therefore these are selected here for modeltesting In the future the model can be expanded for adifferent set of properties
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] D Curtis ldquoSoftware piracy and copyright protectionrdquo inProceedings of the IdeaMicroelectronics Conference Record(WESCON rsquo94) Anaheim Calif USA 1994
[2] GMyles andCCollberg ldquoSoftwarewatermarking through reg-ister allocation implementation analysis and attacksrdquo in Infor-mation Security andCryptologymdashICISC 2003 vol 2971 pp 274ndash293 Springer Berlin Germany 2003
[3] C Collberg and T R Sahoo ldquoSoftware watermarking inthe frequency domain implementation analysis and attacksrdquoJournal of Computer Security vol 13 no 5 pp 721ndash755 2005
[4] F Liu B Lu and X Luo ldquoA chaos-based robust softwarewatermarkingrdquo in Information Security Practice and Experiencevol 3903 pp 355ndash366 Springer Berlin Germany 2006
[5] Y Zeng F Liu X Luo and C Yang ldquoSoftware watermarkingthrough obfuscated interpretation implementation and analy-sisrdquo Journal of Multimedia vol 6 no 4 pp 329ndash340 2011
[6] S Choi H Park H-I Lim and T Han ldquoA static API birthmarkfor Windows binary executablesrdquo Journal of Systems and Soft-ware vol 82 no 5 pp 862ndash873 2009
[7] P Heewan C Seokwoo L Hyun-Il and H Taisook ldquoDetectingcode theft via a static instruction trace birthmark for Javameth-odsrdquo in Proceedings of the 6th IEEE International Conferenceon Industrial Informatics (INDIN rsquo08) pp 551ndash556 DaejeonRepublic of Korea July 2008
[8] H Park S Choi H-I Lim and T Han ldquoDetecting java theftbased on staticAPI trace birthmarkrdquo inAdvances in Informationand Computer Security vol 5312 of Lecture Notes in ComputerScience pp 121ndash135 Springer Berlin Germany 2008
[9] H Park H-I Lim S Choi and T Han ldquoDetecting commonmodules in java packages based on static object trace birth-markrdquo Computer Journal vol 54 no 1 pp 108ndash124 2011
[10] Y Zeng F Liu X Luo and S Lian ldquoAbstract interpretation-based semantic framework for software birthmarkrdquo Computersamp Security vol 31 no 4 pp 377ndash390 2012
[11] G Myles and C Collberg ldquoDetecting software theft via wholeprogram path birthmarksrdquo in Information Security vol 3225pp 404ndash415 Springer Berlin Germany 2004
[12] T Kakimoto A Monden Y Kamei H Tamada M Tsunodaand K-I Matsumoto ldquoUsing software birthmarks to identifysimilar classes and major functionalitiesrdquo in Proceedings of theInternational Workshop on Mining Software Repositories (MSRrsquo06) pp 171ndash172 ACM Shanghai China May 2006
[13] P P F Chan L C K Hui and S M Yiu ldquoDynamic soft-ware birthmark for java based on heap memory analysisrdquo inCommunications andMultimedia Security vol 7025 pp 94ndash107Springer Berlin Germany 2011
[14] Y Wang F Liu D Gong B Lu and S Ma ldquoCHI basedinstruction-words software birthmark selectionrdquo in Proceedings
8 The Scientific World Journal
of the 4th International Conference on Multimedia and Security(MINES rsquo12) pp 892ndash895 November 2012
[15] H-I Lim ldquoCustomizing k-gram based birthmark throughpartial matching in detecting software theftsrdquo in Proceedings ofthe 37th IEEE Annual Computer Software and Applications Con-ference Workshops (COMPSACW rsquo13) pp 1ndash4 July 2013
[16] K Tyagi and A Sharma ldquoA rule-based approach for estimatingthe reliability of component-based systemsrdquo Advances in Engi-neering Software vol 54 pp 24ndash29 2012
[17] H Tamada M Nakamura A Monden and K-I MatsumotoldquoDesign and evaluation of birthmarks for detecting theft ofjava programsrdquo in Proceedings of the IASTED InternationalConference on Software Engineering (IASTED SE 04) pp 569ndash575 2004
[18] C Collberg and J Nagra Surreptitious Software ObfuscationWatermarking and Tamperproofing for Software ProtectionAddison Wesley Boston Mass USA 1st edition 2009
[19] G M Myles Software theft detection through program iden-tification [PhD thesis] Department of Computer ScienceUniversity of Arizona Tucson Ariz USA 2006
[20] L A Zadeh ldquoFuzzy logicrdquo Computer vol 21 no 4 pp 83ndash931988
[21] MATLAB 7100 The MathWorks Natick Mass USA 2010[22] G Myles and C Collberg ldquoK-gram software birthmarksrdquo in
Proceedings of the 20th Annual ACM Symposium on AppliedComputing pp 314ndash318 ACM Santa FeNMUSAMarch 2005
[23] C Collberg GMyles andAHuntwork ldquoSandmarkmdasha tool forsoftware protection researchrdquo IEEE Security and Privacy vol 1no 4 pp 40ndash49 2003
[24] C P Ltd CodeShield Java Byte Obfuscator 2014 httpwwwxmarkscomssitewwwcodingartcomcodeshieldhtml
[25] L Saeidiasl TAhmadNAlias andMGhanbari ldquoComparisonof EEG source localization using meromorphic approximationto fuzzy C-meanrdquo Malaysian Journal of Fundamental andApplied Science vol 9 pp 215ndash220 2013
The proposed methodology helps to estimate the birthmarksof software based on these properties
42 Fuzzy Logic Fuzzy logic concept was developed byZadeh in 1965 [20] It is a mathematical tool which deals withmanaging uncertain and doubtful information Fuzzy settheory is being used for solving diverse problems in differentfields of daily life Fuzzy tool helps in providing solution forthe problems which are complicated to model Fuzzy set isthe extended form of traditional sets which is described bymembership function and is extremely beneficial for decisionmaking in uncertain and vague situations Here the decisioncan be made in qualitative variables (low strong very strongetc) instead of quantitative variables (ie numbers) andthese qualitative variables allow precise modelingThe inputsand outputs have the degree ofmembership function in rangeof interval [0 1]
In the proposed method the membership functionsnamed mf
1in the range of (0ndash19) mf
2in the range of (20ndash
39) mf3in the range of (40ndash59) mf
4in the range of (60ndash
79) and mf5in the range of (80ndash100) are defined Also to
plot fuzziness triangular membership functions are definedand used to represent weights The triangular membershipfunction has three parameters (119897 119898 119906) which are defined as119897 le 119898 le 119906
Details of fuzzy logic concept are given in Zadeh [20]however the major parts of the fuzzy system are as followsThe first phase is the fuzzification which transforms theclassification table into continuous classifications Then it isprocessed in the fuzzy domain based on the designed rulesLastly the fuzzification process transforms fuzzy numberback into the real number
43 Rules Based Approach to Estimate Software BirthmarkEstimating software birthmark is an essential part of softwaresystem development to get rid of the entire theft of the soft-ware system Most of software theft threats are faced duringthe implementation of the software Developers are still inconfusion about how to handle such situations If birthmarksof the system are estimated then one can easilymake decision
about the alternate designThe proposedmethodology basedon fuzzy concept provides an estimation model to softwarebirthmark Initially inputs (properties of birthmark) areselected on the basis of which the birthmark(s) is to beestimated On the basis of inputs the membership functionsare plotted The membership function identifies the degreeof relationship of the concept (data) to a particular area(data range) Five membership functions were plotted thatare mf
1 mf2 mf3 mf4 and mf
5 The inputs and membership
functions are combined in rule editor which forms fuzzyrules A fuzzy inference system model is obtained based onmembership functions and rules
431 Algorithm for Designing a Rule Based Model Thefollowing are the steps to design the proposed model
(1) Perform domain analysis on software birthmark(2) Identify properties of software birthmark on which
birthmark is to be estimated(3) Establish an input data base for these properties(4) Design the fuzzy inference system based on these
properties (inputs)(5) Define themembership functions for these properties
(for both inputs and output)(6) Design the fuzzy rules based on membership func-
tions(7) Obtain a fuzzy inference system (model to estimate
birthmark)(8) Estimate the inputs accordingly
The graphical representation of the algorithm is given inFigure 3
The proposedwork for estimating software birthmark hasbeen carried out by using MATLAB fuzzy tool box [21]
The different membership combinations are given inTable 1
The fuzzy rules and model in the proposed methodologyare given in Figure 4
The proposed model can further be explicitly explainedin Figure 5
The rules are as follows
If (credibility is mf1(0ndash19)) and (resilience is mf
5(80ndash
100)) then (output is (0ndash19)) (0)If (credibility is mf
1(0ndash19)) and (resilience is mf
4(60ndash
79)) then (output is (20ndash39)) (02)If (credibility is mf
1(0ndash19)) and (resilience is mf
3(40ndash
59)) then (output is (40ndash59)) (04)
The Scientific World Journal 5
Domain analysis forsoftware birthmark
Identify propertiesof softwarebirthmark
Establish a database for birthmark
properties
Design the fuzzy inference systembased on these properties (inputs)
Define themembership
functions for theseproperties (for bothinputs and output)
Design the fuzzy rulesbased on membership
functions
Obtain a fuzzy inferencesystem (model to estimate
birthmark)
Estimate the inputsaccordingly
Figure 3 Graphical representation of the proposed algorithm
Credibility
Resilience
Resultssum
(output is (0ndash19)) (0)
(output is (20ndash39)) (02)
(output is (60ndash79)) (06)
(output is (80ndash100)) (08)
Up to 25 rules
If (credibility is mf1 (0ndash19)) and
If (credibility is mf1 (0ndash19)) and
If (credibility is mf1 (0ndash19)) and
(resilience is mf5 (80ndash100)) then
(resilience is mf4 (60ndash79)) then
(resilience is mf2 (20ndash39)) then
If (credibility is mf5 (80ndash100)) and(resilience is mf1 (0ndash19)) then
Figure 4 Proposed fuzzy rules model
andornot
Logical operations
Figure 5 Detailed fuzzy rules model (inputs membership func-tions rules and output)
If (credibility is mf1(0ndash19)) and (resilience is mf
2(20ndash
39)) then (output is (60ndash79)) (06)If (credibility is mf
1(0ndash19)) and (resilience is mf
1(0ndash
19)) then (output is (80ndash100)) (08)If (credibility is mf
5(80ndash100)) and (resilience is
mf1(0ndash19)) then (output is (80ndash100)) (08)
If (credibility is mf4(60ndash79)) and (resilience is mf
1(0ndash
19)) then (output is (60ndash79)) (06)If (credibility is mf
3(40ndash59)) and (resilience is mf
1(0ndash
19)) then (output is (40ndash59)) (04)If (credibility is mf
2(20ndash39)) and (resilience is mf
1(0ndash
19)) then (output is (20ndash39)) (02)If (credibility is mf
2(20ndash39)) and (resilience is
mf2(20ndash39)) then (output is (80ndash100)) (08)
If (credibility is mf3(40ndash59)) and (resilience is
mf3(40ndash59)) then (output is (80ndash100)) (08)
If (credibility is mf4(60ndash79)) and (resilience is
mf4(60ndash79)) then (output is (80ndash100)) (08)
If (credibility is mf5(80ndash100)) and (resilience is
mf5(80ndash100)) then (output is (80ndash100)) (08)
If (credibility is mf2(20ndash39)) and (resilience is
mf5(80ndash100)) then (output is (20ndash39)) (02)
If (credibility is mf3(40ndash59)) and (resilience is
mf5(80ndash100)) then (output is (40ndash59)) (04)
If (credibility is mf4(60ndash79)) and (resilience is
mf5(80ndash100)) then (output is (60ndash79)) (06)
If (credibility is mf3(40ndash59)) and (resilience is
mf4(60ndash79)) then (output is (60ndash79)) (06)
If (credibility is mf2(20ndash39)) and (resilience is
mf4(60ndash79)) then (output is (40ndash59)) (04)
If (credibility is mf2(20ndash39)) and (resilience is
mf3(40ndash59)) then (output is (40ndash59)) (04)
If (credibility is mf4(60ndash79)) and (resilience is
mf3(40ndash59)) then (output is (60ndash79)) (06)
6 The Scientific World Journal
Estimating(mamdani)
Credibility (5)
Resilience (5)
Output (5)
Figure 6 Proposed fuzzy inference system
09
08
07
06
05
04
031
0806
0402
0 002
0406
081
Out
put
Resilience Credibility
Figure 7 Surface view of inputs and outputs (generated in MAT-LAB)
If (credibility is mf5(80ndash100)) and (resilience is
mf3(40ndash59)) then (output is (80ndash100)) (08)
If (credibility is mf4(60ndash79)) and (resilience is
mf2(20ndash39)) then (output is (60ndash79)) (06)
If (credibility is mf3(40ndash59)) and (resilience is
mf2(20ndash39)) then (output is (40ndash59)) (04)
Based upon the above rules a fuzzy inference system isobtained for estimating software birthmark which is givenin Figure 6
Figure 7 visually shows the surface view of inputs andoutput
44 Inputs Estimation Once the fuzzy rules model isdesigned inputs will be given according to the customerrequirements to the model The model will generate theoutput based on the fuzzy rules Details of the proposedsystem inputs and output are given as shown in Table 2
45 Evaluation of the Model (Case Study) The presentresearch work has been validated by a case study of smallmodule for Android application The Android radiocalcmodule consists of 109 lines of code The methodologyhas been applied on a similar application for Android Thebirthmark of the module has been estimated based on theproperties of resilience and credibility119870-gram based birthmark similarity technique [22] has
been used By performing various experiments we foundout that as the 119870-value increases the birthmark similaritydecreases For very small values of119870 the birthmark similarity
was not satisfactory For 119896 = 5 the experiment revealedgood results in terms of similarity and runtime overheadTheresulting similarity for the above mentioned application with119896 = 5 was 40
We applied SandMark [23] and Codeshield [24] toolsfor the above application for code obfuscation To find thevalue of resilience it gives a similarity of 80 for 119896 = 5Codeshield tool provides the name obfuscation the removalof debugging information and some type of control flowwhile the SandMark tool does not include an automaticobfuscation The similarity was computed through119870-gramsThe similarity of Codeshield was found for 119870-gram whichshows that if 119870 increases there is a decrease in the similarityfor numerous transformations Table 3 shows the inputs andvalues for the proposed model
The defined inputs to the fuzzy model are described asfollows If credibility is equal to 04 (40) and resilience is08 (80) these inputs are given to the fuzzification model
The Scientific World Journal 7
Table 3 Inputs and value for the proposed model
InputsFor 119896 = 5
Value in Value for proposedmodel
Credibility 40 04Resilience 80 08
(fuzzy inference system) Credibility 04 is the degree ofmembership function mf
1(40ndash59) and resilience 08 is the
degree of membership function mf2(20ndash39) It will give the
output 0500 from the degree of membership function basedon the designed model So from the results one can make adecision about the birthmark of the software
5 Results and Discussion
A fuzzy inference system is designed which models thesystemwhich in turn estimates the birthmark of the softwareInputs are assigned to the model to check and estimate thesoftware birthmark in terms of credibility and resilience Thedesigned model evaluates the inputs (which are given to themodel) and gives results On the basis of the given resultsone can check the estimation of software birthmark for theproperties of credibility and resilience To check the validityof the proposed model inputs were given as follows out =evalfis ([04 08] fismat) and the output = 0500 whichshow the estimation of the software birthmark Hence thisresult clearly shows the software birthmarks for their desiredproperties Different fuzzy techniques are used [25] whichuse fuzzy C-mean clustering
6 Conclusion
Software theft is a global problem of copying stealing andmisusing the software without proper license agreementSoftware birthmark is a capable technique to detect thetheft of software systems Software birthmark is an intrinsiccharacteristic of software used to detect the similarity ofsoftware The estimation of software birthmark can play akey role in accepting the effectiveness of a birthmark Inthis research fuzzy logic has been used to estimate softwarebirthmark(s) which is an efficient and powerful tool totackle issues of uncertainty This method is based on fuzzyrules which were designed from the fuzzy membershipfunctions Different techniques are used in practice but allare based on known information In practice situations ofuncertainty also arise The proposed model works well incase of uncertainty and with unknown information Themodel is based on the two properties of software birthmarkcredibility and resilienceThemodel has been validated usingsome Android applications Various experiments have beenperformed using different existing tools of code obfuscationand software birthmark(s) are estimated Results produced bythe proposed process show that the method is efficient andprovides satisfactory results The approach has been testedonly for credibility and resilience as these two properties
are considered as the most important properties of softwarebirthmark(s) Therefore these are selected here for modeltesting In the future the model can be expanded for adifferent set of properties
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] D Curtis ldquoSoftware piracy and copyright protectionrdquo inProceedings of the IdeaMicroelectronics Conference Record(WESCON rsquo94) Anaheim Calif USA 1994
[2] GMyles andCCollberg ldquoSoftwarewatermarking through reg-ister allocation implementation analysis and attacksrdquo in Infor-mation Security andCryptologymdashICISC 2003 vol 2971 pp 274ndash293 Springer Berlin Germany 2003
[3] C Collberg and T R Sahoo ldquoSoftware watermarking inthe frequency domain implementation analysis and attacksrdquoJournal of Computer Security vol 13 no 5 pp 721ndash755 2005
[4] F Liu B Lu and X Luo ldquoA chaos-based robust softwarewatermarkingrdquo in Information Security Practice and Experiencevol 3903 pp 355ndash366 Springer Berlin Germany 2006
[5] Y Zeng F Liu X Luo and C Yang ldquoSoftware watermarkingthrough obfuscated interpretation implementation and analy-sisrdquo Journal of Multimedia vol 6 no 4 pp 329ndash340 2011
[6] S Choi H Park H-I Lim and T Han ldquoA static API birthmarkfor Windows binary executablesrdquo Journal of Systems and Soft-ware vol 82 no 5 pp 862ndash873 2009
[7] P Heewan C Seokwoo L Hyun-Il and H Taisook ldquoDetectingcode theft via a static instruction trace birthmark for Javameth-odsrdquo in Proceedings of the 6th IEEE International Conferenceon Industrial Informatics (INDIN rsquo08) pp 551ndash556 DaejeonRepublic of Korea July 2008
[8] H Park S Choi H-I Lim and T Han ldquoDetecting java theftbased on staticAPI trace birthmarkrdquo inAdvances in Informationand Computer Security vol 5312 of Lecture Notes in ComputerScience pp 121ndash135 Springer Berlin Germany 2008
[9] H Park H-I Lim S Choi and T Han ldquoDetecting commonmodules in java packages based on static object trace birth-markrdquo Computer Journal vol 54 no 1 pp 108ndash124 2011
[10] Y Zeng F Liu X Luo and S Lian ldquoAbstract interpretation-based semantic framework for software birthmarkrdquo Computersamp Security vol 31 no 4 pp 377ndash390 2012
[11] G Myles and C Collberg ldquoDetecting software theft via wholeprogram path birthmarksrdquo in Information Security vol 3225pp 404ndash415 Springer Berlin Germany 2004
[12] T Kakimoto A Monden Y Kamei H Tamada M Tsunodaand K-I Matsumoto ldquoUsing software birthmarks to identifysimilar classes and major functionalitiesrdquo in Proceedings of theInternational Workshop on Mining Software Repositories (MSRrsquo06) pp 171ndash172 ACM Shanghai China May 2006
[13] P P F Chan L C K Hui and S M Yiu ldquoDynamic soft-ware birthmark for java based on heap memory analysisrdquo inCommunications andMultimedia Security vol 7025 pp 94ndash107Springer Berlin Germany 2011
[14] Y Wang F Liu D Gong B Lu and S Ma ldquoCHI basedinstruction-words software birthmark selectionrdquo in Proceedings
8 The Scientific World Journal
of the 4th International Conference on Multimedia and Security(MINES rsquo12) pp 892ndash895 November 2012
[15] H-I Lim ldquoCustomizing k-gram based birthmark throughpartial matching in detecting software theftsrdquo in Proceedings ofthe 37th IEEE Annual Computer Software and Applications Con-ference Workshops (COMPSACW rsquo13) pp 1ndash4 July 2013
[16] K Tyagi and A Sharma ldquoA rule-based approach for estimatingthe reliability of component-based systemsrdquo Advances in Engi-neering Software vol 54 pp 24ndash29 2012
[17] H Tamada M Nakamura A Monden and K-I MatsumotoldquoDesign and evaluation of birthmarks for detecting theft ofjava programsrdquo in Proceedings of the IASTED InternationalConference on Software Engineering (IASTED SE 04) pp 569ndash575 2004
[18] C Collberg and J Nagra Surreptitious Software ObfuscationWatermarking and Tamperproofing for Software ProtectionAddison Wesley Boston Mass USA 1st edition 2009
[19] G M Myles Software theft detection through program iden-tification [PhD thesis] Department of Computer ScienceUniversity of Arizona Tucson Ariz USA 2006
[20] L A Zadeh ldquoFuzzy logicrdquo Computer vol 21 no 4 pp 83ndash931988
[21] MATLAB 7100 The MathWorks Natick Mass USA 2010[22] G Myles and C Collberg ldquoK-gram software birthmarksrdquo in
Proceedings of the 20th Annual ACM Symposium on AppliedComputing pp 314ndash318 ACM Santa FeNMUSAMarch 2005
[23] C Collberg GMyles andAHuntwork ldquoSandmarkmdasha tool forsoftware protection researchrdquo IEEE Security and Privacy vol 1no 4 pp 40ndash49 2003
[24] C P Ltd CodeShield Java Byte Obfuscator 2014 httpwwwxmarkscomssitewwwcodingartcomcodeshieldhtml
[25] L Saeidiasl TAhmadNAlias andMGhanbari ldquoComparisonof EEG source localization using meromorphic approximationto fuzzy C-meanrdquo Malaysian Journal of Fundamental andApplied Science vol 9 pp 215ndash220 2013
Design the fuzzy inference systembased on these properties (inputs)
Define themembership
functions for theseproperties (for bothinputs and output)
Design the fuzzy rulesbased on membership
functions
Obtain a fuzzy inferencesystem (model to estimate
birthmark)
Estimate the inputsaccordingly
Figure 3 Graphical representation of the proposed algorithm
Credibility
Resilience
Resultssum
(output is (0ndash19)) (0)
(output is (20ndash39)) (02)
(output is (60ndash79)) (06)
(output is (80ndash100)) (08)
Up to 25 rules
If (credibility is mf1 (0ndash19)) and
If (credibility is mf1 (0ndash19)) and
If (credibility is mf1 (0ndash19)) and
(resilience is mf5 (80ndash100)) then
(resilience is mf4 (60ndash79)) then
(resilience is mf2 (20ndash39)) then
If (credibility is mf5 (80ndash100)) and(resilience is mf1 (0ndash19)) then
Figure 4 Proposed fuzzy rules model
andornot
Logical operations
Figure 5 Detailed fuzzy rules model (inputs membership func-tions rules and output)
If (credibility is mf1(0ndash19)) and (resilience is mf
2(20ndash
39)) then (output is (60ndash79)) (06)If (credibility is mf
1(0ndash19)) and (resilience is mf
1(0ndash
19)) then (output is (80ndash100)) (08)If (credibility is mf
5(80ndash100)) and (resilience is
mf1(0ndash19)) then (output is (80ndash100)) (08)
If (credibility is mf4(60ndash79)) and (resilience is mf
1(0ndash
19)) then (output is (60ndash79)) (06)If (credibility is mf
3(40ndash59)) and (resilience is mf
1(0ndash
19)) then (output is (40ndash59)) (04)If (credibility is mf
2(20ndash39)) and (resilience is mf
1(0ndash
19)) then (output is (20ndash39)) (02)If (credibility is mf
2(20ndash39)) and (resilience is
mf2(20ndash39)) then (output is (80ndash100)) (08)
If (credibility is mf3(40ndash59)) and (resilience is
mf3(40ndash59)) then (output is (80ndash100)) (08)
If (credibility is mf4(60ndash79)) and (resilience is
mf4(60ndash79)) then (output is (80ndash100)) (08)
If (credibility is mf5(80ndash100)) and (resilience is
mf5(80ndash100)) then (output is (80ndash100)) (08)
If (credibility is mf2(20ndash39)) and (resilience is
mf5(80ndash100)) then (output is (20ndash39)) (02)
If (credibility is mf3(40ndash59)) and (resilience is
mf5(80ndash100)) then (output is (40ndash59)) (04)
If (credibility is mf4(60ndash79)) and (resilience is
mf5(80ndash100)) then (output is (60ndash79)) (06)
If (credibility is mf3(40ndash59)) and (resilience is
mf4(60ndash79)) then (output is (60ndash79)) (06)
If (credibility is mf2(20ndash39)) and (resilience is
mf4(60ndash79)) then (output is (40ndash59)) (04)
If (credibility is mf2(20ndash39)) and (resilience is
mf3(40ndash59)) then (output is (40ndash59)) (04)
If (credibility is mf4(60ndash79)) and (resilience is
mf3(40ndash59)) then (output is (60ndash79)) (06)
6 The Scientific World Journal
Estimating(mamdani)
Credibility (5)
Resilience (5)
Output (5)
Figure 6 Proposed fuzzy inference system
09
08
07
06
05
04
031
0806
0402
0 002
0406
081
Out
put
Resilience Credibility
Figure 7 Surface view of inputs and outputs (generated in MAT-LAB)
If (credibility is mf5(80ndash100)) and (resilience is
mf3(40ndash59)) then (output is (80ndash100)) (08)
If (credibility is mf4(60ndash79)) and (resilience is
mf2(20ndash39)) then (output is (60ndash79)) (06)
If (credibility is mf3(40ndash59)) and (resilience is
mf2(20ndash39)) then (output is (40ndash59)) (04)
Based upon the above rules a fuzzy inference system isobtained for estimating software birthmark which is givenin Figure 6
Figure 7 visually shows the surface view of inputs andoutput
44 Inputs Estimation Once the fuzzy rules model isdesigned inputs will be given according to the customerrequirements to the model The model will generate theoutput based on the fuzzy rules Details of the proposedsystem inputs and output are given as shown in Table 2
45 Evaluation of the Model (Case Study) The presentresearch work has been validated by a case study of smallmodule for Android application The Android radiocalcmodule consists of 109 lines of code The methodologyhas been applied on a similar application for Android Thebirthmark of the module has been estimated based on theproperties of resilience and credibility119870-gram based birthmark similarity technique [22] has
been used By performing various experiments we foundout that as the 119870-value increases the birthmark similaritydecreases For very small values of119870 the birthmark similarity
was not satisfactory For 119896 = 5 the experiment revealedgood results in terms of similarity and runtime overheadTheresulting similarity for the above mentioned application with119896 = 5 was 40
We applied SandMark [23] and Codeshield [24] toolsfor the above application for code obfuscation To find thevalue of resilience it gives a similarity of 80 for 119896 = 5Codeshield tool provides the name obfuscation the removalof debugging information and some type of control flowwhile the SandMark tool does not include an automaticobfuscation The similarity was computed through119870-gramsThe similarity of Codeshield was found for 119870-gram whichshows that if 119870 increases there is a decrease in the similarityfor numerous transformations Table 3 shows the inputs andvalues for the proposed model
The defined inputs to the fuzzy model are described asfollows If credibility is equal to 04 (40) and resilience is08 (80) these inputs are given to the fuzzification model
The Scientific World Journal 7
Table 3 Inputs and value for the proposed model
InputsFor 119896 = 5
Value in Value for proposedmodel
Credibility 40 04Resilience 80 08
(fuzzy inference system) Credibility 04 is the degree ofmembership function mf
1(40ndash59) and resilience 08 is the
degree of membership function mf2(20ndash39) It will give the
output 0500 from the degree of membership function basedon the designed model So from the results one can make adecision about the birthmark of the software
5 Results and Discussion
A fuzzy inference system is designed which models thesystemwhich in turn estimates the birthmark of the softwareInputs are assigned to the model to check and estimate thesoftware birthmark in terms of credibility and resilience Thedesigned model evaluates the inputs (which are given to themodel) and gives results On the basis of the given resultsone can check the estimation of software birthmark for theproperties of credibility and resilience To check the validityof the proposed model inputs were given as follows out =evalfis ([04 08] fismat) and the output = 0500 whichshow the estimation of the software birthmark Hence thisresult clearly shows the software birthmarks for their desiredproperties Different fuzzy techniques are used [25] whichuse fuzzy C-mean clustering
6 Conclusion
Software theft is a global problem of copying stealing andmisusing the software without proper license agreementSoftware birthmark is a capable technique to detect thetheft of software systems Software birthmark is an intrinsiccharacteristic of software used to detect the similarity ofsoftware The estimation of software birthmark can play akey role in accepting the effectiveness of a birthmark Inthis research fuzzy logic has been used to estimate softwarebirthmark(s) which is an efficient and powerful tool totackle issues of uncertainty This method is based on fuzzyrules which were designed from the fuzzy membershipfunctions Different techniques are used in practice but allare based on known information In practice situations ofuncertainty also arise The proposed model works well incase of uncertainty and with unknown information Themodel is based on the two properties of software birthmarkcredibility and resilienceThemodel has been validated usingsome Android applications Various experiments have beenperformed using different existing tools of code obfuscationand software birthmark(s) are estimated Results produced bythe proposed process show that the method is efficient andprovides satisfactory results The approach has been testedonly for credibility and resilience as these two properties
are considered as the most important properties of softwarebirthmark(s) Therefore these are selected here for modeltesting In the future the model can be expanded for adifferent set of properties
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] D Curtis ldquoSoftware piracy and copyright protectionrdquo inProceedings of the IdeaMicroelectronics Conference Record(WESCON rsquo94) Anaheim Calif USA 1994
[2] GMyles andCCollberg ldquoSoftwarewatermarking through reg-ister allocation implementation analysis and attacksrdquo in Infor-mation Security andCryptologymdashICISC 2003 vol 2971 pp 274ndash293 Springer Berlin Germany 2003
[3] C Collberg and T R Sahoo ldquoSoftware watermarking inthe frequency domain implementation analysis and attacksrdquoJournal of Computer Security vol 13 no 5 pp 721ndash755 2005
[4] F Liu B Lu and X Luo ldquoA chaos-based robust softwarewatermarkingrdquo in Information Security Practice and Experiencevol 3903 pp 355ndash366 Springer Berlin Germany 2006
[5] Y Zeng F Liu X Luo and C Yang ldquoSoftware watermarkingthrough obfuscated interpretation implementation and analy-sisrdquo Journal of Multimedia vol 6 no 4 pp 329ndash340 2011
[6] S Choi H Park H-I Lim and T Han ldquoA static API birthmarkfor Windows binary executablesrdquo Journal of Systems and Soft-ware vol 82 no 5 pp 862ndash873 2009
[7] P Heewan C Seokwoo L Hyun-Il and H Taisook ldquoDetectingcode theft via a static instruction trace birthmark for Javameth-odsrdquo in Proceedings of the 6th IEEE International Conferenceon Industrial Informatics (INDIN rsquo08) pp 551ndash556 DaejeonRepublic of Korea July 2008
[8] H Park S Choi H-I Lim and T Han ldquoDetecting java theftbased on staticAPI trace birthmarkrdquo inAdvances in Informationand Computer Security vol 5312 of Lecture Notes in ComputerScience pp 121ndash135 Springer Berlin Germany 2008
[9] H Park H-I Lim S Choi and T Han ldquoDetecting commonmodules in java packages based on static object trace birth-markrdquo Computer Journal vol 54 no 1 pp 108ndash124 2011
[10] Y Zeng F Liu X Luo and S Lian ldquoAbstract interpretation-based semantic framework for software birthmarkrdquo Computersamp Security vol 31 no 4 pp 377ndash390 2012
[11] G Myles and C Collberg ldquoDetecting software theft via wholeprogram path birthmarksrdquo in Information Security vol 3225pp 404ndash415 Springer Berlin Germany 2004
[12] T Kakimoto A Monden Y Kamei H Tamada M Tsunodaand K-I Matsumoto ldquoUsing software birthmarks to identifysimilar classes and major functionalitiesrdquo in Proceedings of theInternational Workshop on Mining Software Repositories (MSRrsquo06) pp 171ndash172 ACM Shanghai China May 2006
[13] P P F Chan L C K Hui and S M Yiu ldquoDynamic soft-ware birthmark for java based on heap memory analysisrdquo inCommunications andMultimedia Security vol 7025 pp 94ndash107Springer Berlin Germany 2011
[14] Y Wang F Liu D Gong B Lu and S Ma ldquoCHI basedinstruction-words software birthmark selectionrdquo in Proceedings
8 The Scientific World Journal
of the 4th International Conference on Multimedia and Security(MINES rsquo12) pp 892ndash895 November 2012
[15] H-I Lim ldquoCustomizing k-gram based birthmark throughpartial matching in detecting software theftsrdquo in Proceedings ofthe 37th IEEE Annual Computer Software and Applications Con-ference Workshops (COMPSACW rsquo13) pp 1ndash4 July 2013
[16] K Tyagi and A Sharma ldquoA rule-based approach for estimatingthe reliability of component-based systemsrdquo Advances in Engi-neering Software vol 54 pp 24ndash29 2012
[17] H Tamada M Nakamura A Monden and K-I MatsumotoldquoDesign and evaluation of birthmarks for detecting theft ofjava programsrdquo in Proceedings of the IASTED InternationalConference on Software Engineering (IASTED SE 04) pp 569ndash575 2004
[18] C Collberg and J Nagra Surreptitious Software ObfuscationWatermarking and Tamperproofing for Software ProtectionAddison Wesley Boston Mass USA 1st edition 2009
[19] G M Myles Software theft detection through program iden-tification [PhD thesis] Department of Computer ScienceUniversity of Arizona Tucson Ariz USA 2006
[20] L A Zadeh ldquoFuzzy logicrdquo Computer vol 21 no 4 pp 83ndash931988
[21] MATLAB 7100 The MathWorks Natick Mass USA 2010[22] G Myles and C Collberg ldquoK-gram software birthmarksrdquo in
Proceedings of the 20th Annual ACM Symposium on AppliedComputing pp 314ndash318 ACM Santa FeNMUSAMarch 2005
[23] C Collberg GMyles andAHuntwork ldquoSandmarkmdasha tool forsoftware protection researchrdquo IEEE Security and Privacy vol 1no 4 pp 40ndash49 2003
[24] C P Ltd CodeShield Java Byte Obfuscator 2014 httpwwwxmarkscomssitewwwcodingartcomcodeshieldhtml
[25] L Saeidiasl TAhmadNAlias andMGhanbari ldquoComparisonof EEG source localization using meromorphic approximationto fuzzy C-meanrdquo Malaysian Journal of Fundamental andApplied Science vol 9 pp 215ndash220 2013
Figure 7 Surface view of inputs and outputs (generated in MAT-LAB)
If (credibility is mf5(80ndash100)) and (resilience is
mf3(40ndash59)) then (output is (80ndash100)) (08)
If (credibility is mf4(60ndash79)) and (resilience is
mf2(20ndash39)) then (output is (60ndash79)) (06)
If (credibility is mf3(40ndash59)) and (resilience is
mf2(20ndash39)) then (output is (40ndash59)) (04)
Based upon the above rules a fuzzy inference system isobtained for estimating software birthmark which is givenin Figure 6
Figure 7 visually shows the surface view of inputs andoutput
44 Inputs Estimation Once the fuzzy rules model isdesigned inputs will be given according to the customerrequirements to the model The model will generate theoutput based on the fuzzy rules Details of the proposedsystem inputs and output are given as shown in Table 2
45 Evaluation of the Model (Case Study) The presentresearch work has been validated by a case study of smallmodule for Android application The Android radiocalcmodule consists of 109 lines of code The methodologyhas been applied on a similar application for Android Thebirthmark of the module has been estimated based on theproperties of resilience and credibility119870-gram based birthmark similarity technique [22] has
been used By performing various experiments we foundout that as the 119870-value increases the birthmark similaritydecreases For very small values of119870 the birthmark similarity
was not satisfactory For 119896 = 5 the experiment revealedgood results in terms of similarity and runtime overheadTheresulting similarity for the above mentioned application with119896 = 5 was 40
We applied SandMark [23] and Codeshield [24] toolsfor the above application for code obfuscation To find thevalue of resilience it gives a similarity of 80 for 119896 = 5Codeshield tool provides the name obfuscation the removalof debugging information and some type of control flowwhile the SandMark tool does not include an automaticobfuscation The similarity was computed through119870-gramsThe similarity of Codeshield was found for 119870-gram whichshows that if 119870 increases there is a decrease in the similarityfor numerous transformations Table 3 shows the inputs andvalues for the proposed model
The defined inputs to the fuzzy model are described asfollows If credibility is equal to 04 (40) and resilience is08 (80) these inputs are given to the fuzzification model
The Scientific World Journal 7
Table 3 Inputs and value for the proposed model
InputsFor 119896 = 5
Value in Value for proposedmodel
Credibility 40 04Resilience 80 08
(fuzzy inference system) Credibility 04 is the degree ofmembership function mf
1(40ndash59) and resilience 08 is the
degree of membership function mf2(20ndash39) It will give the
output 0500 from the degree of membership function basedon the designed model So from the results one can make adecision about the birthmark of the software
5 Results and Discussion
A fuzzy inference system is designed which models thesystemwhich in turn estimates the birthmark of the softwareInputs are assigned to the model to check and estimate thesoftware birthmark in terms of credibility and resilience Thedesigned model evaluates the inputs (which are given to themodel) and gives results On the basis of the given resultsone can check the estimation of software birthmark for theproperties of credibility and resilience To check the validityof the proposed model inputs were given as follows out =evalfis ([04 08] fismat) and the output = 0500 whichshow the estimation of the software birthmark Hence thisresult clearly shows the software birthmarks for their desiredproperties Different fuzzy techniques are used [25] whichuse fuzzy C-mean clustering
6 Conclusion
Software theft is a global problem of copying stealing andmisusing the software without proper license agreementSoftware birthmark is a capable technique to detect thetheft of software systems Software birthmark is an intrinsiccharacteristic of software used to detect the similarity ofsoftware The estimation of software birthmark can play akey role in accepting the effectiveness of a birthmark Inthis research fuzzy logic has been used to estimate softwarebirthmark(s) which is an efficient and powerful tool totackle issues of uncertainty This method is based on fuzzyrules which were designed from the fuzzy membershipfunctions Different techniques are used in practice but allare based on known information In practice situations ofuncertainty also arise The proposed model works well incase of uncertainty and with unknown information Themodel is based on the two properties of software birthmarkcredibility and resilienceThemodel has been validated usingsome Android applications Various experiments have beenperformed using different existing tools of code obfuscationand software birthmark(s) are estimated Results produced bythe proposed process show that the method is efficient andprovides satisfactory results The approach has been testedonly for credibility and resilience as these two properties
are considered as the most important properties of softwarebirthmark(s) Therefore these are selected here for modeltesting In the future the model can be expanded for adifferent set of properties
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] D Curtis ldquoSoftware piracy and copyright protectionrdquo inProceedings of the IdeaMicroelectronics Conference Record(WESCON rsquo94) Anaheim Calif USA 1994
[2] GMyles andCCollberg ldquoSoftwarewatermarking through reg-ister allocation implementation analysis and attacksrdquo in Infor-mation Security andCryptologymdashICISC 2003 vol 2971 pp 274ndash293 Springer Berlin Germany 2003
[3] C Collberg and T R Sahoo ldquoSoftware watermarking inthe frequency domain implementation analysis and attacksrdquoJournal of Computer Security vol 13 no 5 pp 721ndash755 2005
[4] F Liu B Lu and X Luo ldquoA chaos-based robust softwarewatermarkingrdquo in Information Security Practice and Experiencevol 3903 pp 355ndash366 Springer Berlin Germany 2006
[5] Y Zeng F Liu X Luo and C Yang ldquoSoftware watermarkingthrough obfuscated interpretation implementation and analy-sisrdquo Journal of Multimedia vol 6 no 4 pp 329ndash340 2011
[6] S Choi H Park H-I Lim and T Han ldquoA static API birthmarkfor Windows binary executablesrdquo Journal of Systems and Soft-ware vol 82 no 5 pp 862ndash873 2009
[7] P Heewan C Seokwoo L Hyun-Il and H Taisook ldquoDetectingcode theft via a static instruction trace birthmark for Javameth-odsrdquo in Proceedings of the 6th IEEE International Conferenceon Industrial Informatics (INDIN rsquo08) pp 551ndash556 DaejeonRepublic of Korea July 2008
[8] H Park S Choi H-I Lim and T Han ldquoDetecting java theftbased on staticAPI trace birthmarkrdquo inAdvances in Informationand Computer Security vol 5312 of Lecture Notes in ComputerScience pp 121ndash135 Springer Berlin Germany 2008
[9] H Park H-I Lim S Choi and T Han ldquoDetecting commonmodules in java packages based on static object trace birth-markrdquo Computer Journal vol 54 no 1 pp 108ndash124 2011
[10] Y Zeng F Liu X Luo and S Lian ldquoAbstract interpretation-based semantic framework for software birthmarkrdquo Computersamp Security vol 31 no 4 pp 377ndash390 2012
[11] G Myles and C Collberg ldquoDetecting software theft via wholeprogram path birthmarksrdquo in Information Security vol 3225pp 404ndash415 Springer Berlin Germany 2004
[12] T Kakimoto A Monden Y Kamei H Tamada M Tsunodaand K-I Matsumoto ldquoUsing software birthmarks to identifysimilar classes and major functionalitiesrdquo in Proceedings of theInternational Workshop on Mining Software Repositories (MSRrsquo06) pp 171ndash172 ACM Shanghai China May 2006
[13] P P F Chan L C K Hui and S M Yiu ldquoDynamic soft-ware birthmark for java based on heap memory analysisrdquo inCommunications andMultimedia Security vol 7025 pp 94ndash107Springer Berlin Germany 2011
[14] Y Wang F Liu D Gong B Lu and S Ma ldquoCHI basedinstruction-words software birthmark selectionrdquo in Proceedings
8 The Scientific World Journal
of the 4th International Conference on Multimedia and Security(MINES rsquo12) pp 892ndash895 November 2012
[15] H-I Lim ldquoCustomizing k-gram based birthmark throughpartial matching in detecting software theftsrdquo in Proceedings ofthe 37th IEEE Annual Computer Software and Applications Con-ference Workshops (COMPSACW rsquo13) pp 1ndash4 July 2013
[16] K Tyagi and A Sharma ldquoA rule-based approach for estimatingthe reliability of component-based systemsrdquo Advances in Engi-neering Software vol 54 pp 24ndash29 2012
[17] H Tamada M Nakamura A Monden and K-I MatsumotoldquoDesign and evaluation of birthmarks for detecting theft ofjava programsrdquo in Proceedings of the IASTED InternationalConference on Software Engineering (IASTED SE 04) pp 569ndash575 2004
[18] C Collberg and J Nagra Surreptitious Software ObfuscationWatermarking and Tamperproofing for Software ProtectionAddison Wesley Boston Mass USA 1st edition 2009
[19] G M Myles Software theft detection through program iden-tification [PhD thesis] Department of Computer ScienceUniversity of Arizona Tucson Ariz USA 2006
[20] L A Zadeh ldquoFuzzy logicrdquo Computer vol 21 no 4 pp 83ndash931988
[21] MATLAB 7100 The MathWorks Natick Mass USA 2010[22] G Myles and C Collberg ldquoK-gram software birthmarksrdquo in
Proceedings of the 20th Annual ACM Symposium on AppliedComputing pp 314ndash318 ACM Santa FeNMUSAMarch 2005
[23] C Collberg GMyles andAHuntwork ldquoSandmarkmdasha tool forsoftware protection researchrdquo IEEE Security and Privacy vol 1no 4 pp 40ndash49 2003
[24] C P Ltd CodeShield Java Byte Obfuscator 2014 httpwwwxmarkscomssitewwwcodingartcomcodeshieldhtml
[25] L Saeidiasl TAhmadNAlias andMGhanbari ldquoComparisonof EEG source localization using meromorphic approximationto fuzzy C-meanrdquo Malaysian Journal of Fundamental andApplied Science vol 9 pp 215ndash220 2013
(fuzzy inference system) Credibility 04 is the degree ofmembership function mf
1(40ndash59) and resilience 08 is the
degree of membership function mf2(20ndash39) It will give the
output 0500 from the degree of membership function basedon the designed model So from the results one can make adecision about the birthmark of the software
5 Results and Discussion
A fuzzy inference system is designed which models thesystemwhich in turn estimates the birthmark of the softwareInputs are assigned to the model to check and estimate thesoftware birthmark in terms of credibility and resilience Thedesigned model evaluates the inputs (which are given to themodel) and gives results On the basis of the given resultsone can check the estimation of software birthmark for theproperties of credibility and resilience To check the validityof the proposed model inputs were given as follows out =evalfis ([04 08] fismat) and the output = 0500 whichshow the estimation of the software birthmark Hence thisresult clearly shows the software birthmarks for their desiredproperties Different fuzzy techniques are used [25] whichuse fuzzy C-mean clustering
6 Conclusion
Software theft is a global problem of copying stealing andmisusing the software without proper license agreementSoftware birthmark is a capable technique to detect thetheft of software systems Software birthmark is an intrinsiccharacteristic of software used to detect the similarity ofsoftware The estimation of software birthmark can play akey role in accepting the effectiveness of a birthmark Inthis research fuzzy logic has been used to estimate softwarebirthmark(s) which is an efficient and powerful tool totackle issues of uncertainty This method is based on fuzzyrules which were designed from the fuzzy membershipfunctions Different techniques are used in practice but allare based on known information In practice situations ofuncertainty also arise The proposed model works well incase of uncertainty and with unknown information Themodel is based on the two properties of software birthmarkcredibility and resilienceThemodel has been validated usingsome Android applications Various experiments have beenperformed using different existing tools of code obfuscationand software birthmark(s) are estimated Results produced bythe proposed process show that the method is efficient andprovides satisfactory results The approach has been testedonly for credibility and resilience as these two properties
are considered as the most important properties of softwarebirthmark(s) Therefore these are selected here for modeltesting In the future the model can be expanded for adifferent set of properties
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] D Curtis ldquoSoftware piracy and copyright protectionrdquo inProceedings of the IdeaMicroelectronics Conference Record(WESCON rsquo94) Anaheim Calif USA 1994
[2] GMyles andCCollberg ldquoSoftwarewatermarking through reg-ister allocation implementation analysis and attacksrdquo in Infor-mation Security andCryptologymdashICISC 2003 vol 2971 pp 274ndash293 Springer Berlin Germany 2003
[3] C Collberg and T R Sahoo ldquoSoftware watermarking inthe frequency domain implementation analysis and attacksrdquoJournal of Computer Security vol 13 no 5 pp 721ndash755 2005
[4] F Liu B Lu and X Luo ldquoA chaos-based robust softwarewatermarkingrdquo in Information Security Practice and Experiencevol 3903 pp 355ndash366 Springer Berlin Germany 2006
[5] Y Zeng F Liu X Luo and C Yang ldquoSoftware watermarkingthrough obfuscated interpretation implementation and analy-sisrdquo Journal of Multimedia vol 6 no 4 pp 329ndash340 2011
[6] S Choi H Park H-I Lim and T Han ldquoA static API birthmarkfor Windows binary executablesrdquo Journal of Systems and Soft-ware vol 82 no 5 pp 862ndash873 2009
[7] P Heewan C Seokwoo L Hyun-Il and H Taisook ldquoDetectingcode theft via a static instruction trace birthmark for Javameth-odsrdquo in Proceedings of the 6th IEEE International Conferenceon Industrial Informatics (INDIN rsquo08) pp 551ndash556 DaejeonRepublic of Korea July 2008
[8] H Park S Choi H-I Lim and T Han ldquoDetecting java theftbased on staticAPI trace birthmarkrdquo inAdvances in Informationand Computer Security vol 5312 of Lecture Notes in ComputerScience pp 121ndash135 Springer Berlin Germany 2008
[9] H Park H-I Lim S Choi and T Han ldquoDetecting commonmodules in java packages based on static object trace birth-markrdquo Computer Journal vol 54 no 1 pp 108ndash124 2011
[10] Y Zeng F Liu X Luo and S Lian ldquoAbstract interpretation-based semantic framework for software birthmarkrdquo Computersamp Security vol 31 no 4 pp 377ndash390 2012
[11] G Myles and C Collberg ldquoDetecting software theft via wholeprogram path birthmarksrdquo in Information Security vol 3225pp 404ndash415 Springer Berlin Germany 2004
[12] T Kakimoto A Monden Y Kamei H Tamada M Tsunodaand K-I Matsumoto ldquoUsing software birthmarks to identifysimilar classes and major functionalitiesrdquo in Proceedings of theInternational Workshop on Mining Software Repositories (MSRrsquo06) pp 171ndash172 ACM Shanghai China May 2006
[13] P P F Chan L C K Hui and S M Yiu ldquoDynamic soft-ware birthmark for java based on heap memory analysisrdquo inCommunications andMultimedia Security vol 7025 pp 94ndash107Springer Berlin Germany 2011
[14] Y Wang F Liu D Gong B Lu and S Ma ldquoCHI basedinstruction-words software birthmark selectionrdquo in Proceedings
8 The Scientific World Journal
of the 4th International Conference on Multimedia and Security(MINES rsquo12) pp 892ndash895 November 2012
[15] H-I Lim ldquoCustomizing k-gram based birthmark throughpartial matching in detecting software theftsrdquo in Proceedings ofthe 37th IEEE Annual Computer Software and Applications Con-ference Workshops (COMPSACW rsquo13) pp 1ndash4 July 2013
[16] K Tyagi and A Sharma ldquoA rule-based approach for estimatingthe reliability of component-based systemsrdquo Advances in Engi-neering Software vol 54 pp 24ndash29 2012
[17] H Tamada M Nakamura A Monden and K-I MatsumotoldquoDesign and evaluation of birthmarks for detecting theft ofjava programsrdquo in Proceedings of the IASTED InternationalConference on Software Engineering (IASTED SE 04) pp 569ndash575 2004
[18] C Collberg and J Nagra Surreptitious Software ObfuscationWatermarking and Tamperproofing for Software ProtectionAddison Wesley Boston Mass USA 1st edition 2009
[19] G M Myles Software theft detection through program iden-tification [PhD thesis] Department of Computer ScienceUniversity of Arizona Tucson Ariz USA 2006
[20] L A Zadeh ldquoFuzzy logicrdquo Computer vol 21 no 4 pp 83ndash931988
[21] MATLAB 7100 The MathWorks Natick Mass USA 2010[22] G Myles and C Collberg ldquoK-gram software birthmarksrdquo in
Proceedings of the 20th Annual ACM Symposium on AppliedComputing pp 314ndash318 ACM Santa FeNMUSAMarch 2005
[23] C Collberg GMyles andAHuntwork ldquoSandmarkmdasha tool forsoftware protection researchrdquo IEEE Security and Privacy vol 1no 4 pp 40ndash49 2003
[24] C P Ltd CodeShield Java Byte Obfuscator 2014 httpwwwxmarkscomssitewwwcodingartcomcodeshieldhtml
[25] L Saeidiasl TAhmadNAlias andMGhanbari ldquoComparisonof EEG source localization using meromorphic approximationto fuzzy C-meanrdquo Malaysian Journal of Fundamental andApplied Science vol 9 pp 215ndash220 2013
of the 4th International Conference on Multimedia and Security(MINES rsquo12) pp 892ndash895 November 2012
[15] H-I Lim ldquoCustomizing k-gram based birthmark throughpartial matching in detecting software theftsrdquo in Proceedings ofthe 37th IEEE Annual Computer Software and Applications Con-ference Workshops (COMPSACW rsquo13) pp 1ndash4 July 2013
[16] K Tyagi and A Sharma ldquoA rule-based approach for estimatingthe reliability of component-based systemsrdquo Advances in Engi-neering Software vol 54 pp 24ndash29 2012
[17] H Tamada M Nakamura A Monden and K-I MatsumotoldquoDesign and evaluation of birthmarks for detecting theft ofjava programsrdquo in Proceedings of the IASTED InternationalConference on Software Engineering (IASTED SE 04) pp 569ndash575 2004
[18] C Collberg and J Nagra Surreptitious Software ObfuscationWatermarking and Tamperproofing for Software ProtectionAddison Wesley Boston Mass USA 1st edition 2009
[19] G M Myles Software theft detection through program iden-tification [PhD thesis] Department of Computer ScienceUniversity of Arizona Tucson Ariz USA 2006
[20] L A Zadeh ldquoFuzzy logicrdquo Computer vol 21 no 4 pp 83ndash931988
[21] MATLAB 7100 The MathWorks Natick Mass USA 2010[22] G Myles and C Collberg ldquoK-gram software birthmarksrdquo in
Proceedings of the 20th Annual ACM Symposium on AppliedComputing pp 314ndash318 ACM Santa FeNMUSAMarch 2005
[23] C Collberg GMyles andAHuntwork ldquoSandmarkmdasha tool forsoftware protection researchrdquo IEEE Security and Privacy vol 1no 4 pp 40ndash49 2003
[24] C P Ltd CodeShield Java Byte Obfuscator 2014 httpwwwxmarkscomssitewwwcodingartcomcodeshieldhtml
[25] L Saeidiasl TAhmadNAlias andMGhanbari ldquoComparisonof EEG source localization using meromorphic approximationto fuzzy C-meanrdquo Malaysian Journal of Fundamental andApplied Science vol 9 pp 215ndash220 2013