Top Banner
1018 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 13, NO. 4, APRIL 2018 Robust 2D Engineering CAD Graphics Hashing for Joint Topology and Geometry Authentication via Covariance-Based Descriptors Zhiyong Su , Ying Ye, Qi Zhang, Weiqing Li, and Yuewei Dai Abstract—This paper investigates the joint authentication of topology and geometry information of 2D engineering computer- aided design graphics, which focus more on topological modeling than geometric modeling of objects. A robust hashing scheme is proposed for joint topology and geometry authentication. The covariance matrices of descriptors are explored to fuse and encode both topology and geometry features of different types into a compact representation. First, a normalized binary shape texture is rendered for each geometric object through the render- to-texture technique. Then, for each geometric object, geometry features are computed based on statistical features that are extracted from image rings. Additionally, topology features are generated according to the topological relations among joint objects. To generate hash codes of the graphic, all geometric objects are first grouped according to their geometry features. Then, for each group, the covariance matrices of descriptors are applied to fuse both the topology and geometry features of all objects, and the intermediate hash codes of each group are com- puted based on the covariance matrices. The final hash sequence is formed by concatenating the intermediate hash codes that correspond to each group. Secret keys are introduced into both feature extraction and hash construction. The hashes are robust against topology-preserving graphic manipulations and sensitive to malicious attacks. By decomposing the hashes, the locations of tampered objects can be determined. Experimental results are presented to evaluate the performance and show the effectiveness of the method. Index Terms— Covariance descriptor, authentication, topology authentication, geometry authentication, hash. I. I NTRODUCTION E NGINEERING computer-aided design (CAD) graphics are a very important type of industrial graphical docu- mentation and are extensively used in Architecture, Engineer- ing and Construction (AEC), which is one branch of CAD. With intensive global competition and increasing product Manuscript received June 25, 2017; revised October 6, 2017 and November 15, 2017; accepted November 16, 2017. Date of publication November 23, 2017; date of current version January 3, 2018. This work was supported by the National Natural Science Foundation of China under Grant 61300160. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Stefano Tubaro. (Corresponding author: Zhiyong Su.) Z. Su, Y. Ye, Q. Zhang, and Y. Dai are with the School of Automation, Nanjing University of Science and Technology, Nanjing 210094, China (e-mail: [email protected]; [email protected]; [email protected]; [email protected]). W. Li is with the School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIFS.2017.2777341 complexity in the AEC industry, companies are increasingly focusing on collaborative design technologies in which a company concentrates only on its core activity and collaborates with other companies for other activities. These technologies provide a consistent set of solutions to support the collabora- tive creation, management, dissemination, and use of design documentation throughout the entire product and project life- cycle [1]. Therefore, the integrity and security of engineering CAD graphics sharing among all collaborative participants are essential for successful Product Lifecycle Management (PLM) applications. The digital contents of engineering CAD graphics typically consist of geometry, engineering, and topology information. Geometry information refers to the shape, dimensions and position of objects. Geometric shapes of objects can be designed by using basic geometric entities, such as LINE, POLYLINE, ARC, CIRCLE and 3DFACE. Engineering infor- mation refers to design constraints, engineering disciplines, etc. Topology information describes the complex topolog- ical relations among various joint objects. The design of engineering CAD graphics focuses on topological modeling more than geometric modeling of objects. The objective of topological modeling is to determine the most economical spa- tial arrangement of various objects that satisfies construction, operation, maintenance, and safety requirements [2], [3]. This is significantly different from traditional mechanical CAD, as another branch of CAD, which focuses on geometric modeling. Both topology and geometry information should be taken into consideration in content authentication. Content authentication and identification techniques can be classified into two main categories from the technological per- spective: watermarking and hashing. In watermarking-based techniques, watermarks that are associated with authentication information are embedded into specific areas of the content and then extracted to judge whether there have been mali- cious manipulations of the received content. The precision of the host content is inevitably changed slightly by water- marking [4], [5]. This is an important problem in highly detailed digital design graphics in CAD applications. Different from watermarking-based techniques, hashing-based schemes require no embedding process. Hash codes are generated based on well-designed features that are extracted from the host content and are in accordance with certain characteristics. Content authentication is performed by comparing the hash codes of the host content with the hash codes of the received 1556-6013 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
13

1018 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND ... · detailed digital design graphics in CAD applications. Different from watermarking-based techniques, hashing-based schemes

Aug 22, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1018 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND ... · detailed digital design graphics in CAD applications. Different from watermarking-based techniques, hashing-based schemes

1018 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 13, NO. 4, APRIL 2018

Robust 2D Engineering CAD Graphics Hashing forJoint Topology and Geometry Authentication via

Covariance-Based DescriptorsZhiyong Su , Ying Ye, Qi Zhang, Weiqing Li, and Yuewei Dai

Abstract— This paper investigates the joint authentication oftopology and geometry information of 2D engineering computer-aided design graphics, which focus more on topological modelingthan geometric modeling of objects. A robust hashing scheme isproposed for joint topology and geometry authentication. Thecovariance matrices of descriptors are explored to fuse andencode both topology and geometry features of different typesinto a compact representation. First, a normalized binary shapetexture is rendered for each geometric object through the render-to-texture technique. Then, for each geometric object, geometryfeatures are computed based on statistical features that areextracted from image rings. Additionally, topology features aregenerated according to the topological relations among jointobjects. To generate hash codes of the graphic, all geometricobjects are first grouped according to their geometry features.Then, for each group, the covariance matrices of descriptors areapplied to fuse both the topology and geometry features of allobjects, and the intermediate hash codes of each group are com-puted based on the covariance matrices. The final hash sequenceis formed by concatenating the intermediate hash codes thatcorrespond to each group. Secret keys are introduced into bothfeature extraction and hash construction. The hashes are robustagainst topology-preserving graphic manipulations and sensitiveto malicious attacks. By decomposing the hashes, the locationsof tampered objects can be determined. Experimental results arepresented to evaluate the performance and show the effectivenessof the method.

Index Terms— Covariance descriptor, authentication, topologyauthentication, geometry authentication, hash.

I. INTRODUCTION

ENGINEERING computer-aided design (CAD) graphicsare a very important type of industrial graphical docu-

mentation and are extensively used in Architecture, Engineer-ing and Construction (AEC), which is one branch of CAD.With intensive global competition and increasing product

Manuscript received June 25, 2017; revised October 6, 2017 andNovember 15, 2017; accepted November 16, 2017. Date of publicationNovember 23, 2017; date of current version January 3, 2018. This work wassupported by the National Natural Science Foundation of China under Grant61300160. The associate editor coordinating the review of this manuscript andapproving it for publication was Prof. Stefano Tubaro. (Corresponding author:Zhiyong Su.)

Z. Su, Y. Ye, Q. Zhang, and Y. Dai are with the School ofAutomation, Nanjing University of Science and Technology, Nanjing210094, China (e-mail: [email protected]; [email protected];[email protected]; [email protected]).

W. Li is with the School of Computer Science and Engineering, NanjingUniversity of Science and Technology, Nanjing 210094, China (e-mail:[email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TIFS.2017.2777341

complexity in the AEC industry, companies are increasinglyfocusing on collaborative design technologies in which acompany concentrates only on its core activity and collaborateswith other companies for other activities. These technologiesprovide a consistent set of solutions to support the collabora-tive creation, management, dissemination, and use of designdocumentation throughout the entire product and project life-cycle [1]. Therefore, the integrity and security of engineeringCAD graphics sharing among all collaborative participants areessential for successful Product Lifecycle Management (PLM)applications.

The digital contents of engineering CAD graphics typicallyconsist of geometry, engineering, and topology information.Geometry information refers to the shape, dimensions andposition of objects. Geometric shapes of objects can bedesigned by using basic geometric entities, such as LINE,POLYLINE, ARC, CIRCLE and 3DFACE. Engineering infor-mation refers to design constraints, engineering disciplines,etc. Topology information describes the complex topolog-ical relations among various joint objects. The design ofengineering CAD graphics focuses on topological modelingmore than geometric modeling of objects. The objective oftopological modeling is to determine the most economical spa-tial arrangement of various objects that satisfies construction,operation, maintenance, and safety requirements [2], [3]. Thisis significantly different from traditional mechanical CAD,as another branch of CAD, which focuses on geometricmodeling. Both topology and geometry information should betaken into consideration in content authentication.

Content authentication and identification techniques can beclassified into two main categories from the technological per-spective: watermarking and hashing. In watermarking-basedtechniques, watermarks that are associated with authenticationinformation are embedded into specific areas of the contentand then extracted to judge whether there have been mali-cious manipulations of the received content. The precisionof the host content is inevitably changed slightly by water-marking [4], [5]. This is an important problem in highlydetailed digital design graphics in CAD applications. Differentfrom watermarking-based techniques, hashing-based schemesrequire no embedding process. Hash codes are generated basedon well-designed features that are extracted from the hostcontent and are in accordance with certain characteristics.Content authentication is performed by comparing the hashcodes of the host content with the hash codes of the received

1556-6013 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: 1018 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND ... · detailed digital design graphics in CAD applications. Different from watermarking-based techniques, hashing-based schemes

SU et al.: ROBUST 2D ENGINEERING CAD GRAPHICS HASHING FOR JOINT TOPOLOGY AND GEOMETRY AUTHENTICATION 1019

content [6]–[8]. Therefore, hashing-based techniques do notintroduce any distortion into the host content and are generallymore suitable for CAD applications.

To the best of our knowledge, a detailed analysis of theauthentication of both topology and geometry informationfor 2D engineering CAD graphics has not been reported inthe literature. In the case of geometry authentication, manydigital watermarking schemes have been recently proposed formechanical CAD graphics [4], [9]–[12], and multiple hashing-based authentication schemes have been proposed for vectordata models [13], [14]. By comparison, few related workson topology authentication have been reported. The topologyauthentication problem of piping isometric drawings, whichare a type of 2D engineering CAD graphics, was introduced bySu et al. [15], and a watermarking-based scheme was proposedto verify only the topological integrity. The problem of jointtopology and geometry authentication for 2D engineeringCAD graphics has yet to be addressed.

A. Contributions

In this paper, we aim to tackle the problem of joint topologyand geometry information authentication for 2D engineeringCAD graphics. The contributions of this paper can be sum-marized as follows:

(1) A novel framework for jointly authenticating topologyand geometry information of 2D engineering CAD graphicsis proposed in this paper. The framework decomposes theauthentication task into three stages: topology and geometryfeature extraction, topology and geometry feature fusion, andjoint topology and geometry hashing.

(2) The geometry features of each geometric object areextracted from a normalized texture through ring partition-ing [8], [16]. The normalized texture onto which the geo-metric object is projected orthogonally is invariant to objecttranslation and uniform scaling. The proposed descriptor ismade robust to a wide range of non-malicious manipulations,such as global and local rotation, uniform scaling and trans-lation (RST) transformations, by applying a shape texturerendering method for geometric objects.

(3) Covariance matrices are proposed as a new descriptorfor fusion of topology and geometry features. While similardescriptors have been proposed for object tracking and textureanalysis in 2D images, this is the first time that covariance-based analysis is explored for content authentication of CADgraphics in the literature. The advantage of using covariancematrices compared with geometric descriptors is that theyenable the fusion of multiple and heterogeneous featureswithout the need for normalization [17], [18].

(4) A hashing-based scheme is proposed for authenticatingtopology and geometry information of 2D engineering CADgraphics. The proposed method is robust to a wide rangeof non-malicious manipulations, such as global and localRST transformations, while it is also sensitive to topologyand geometry changes that are caused by malicious attacks.Furthermore, it can detect and locate tampered objects.

The rest of this paper is organized as follows. Section IIreviews the related work. Section III introduces the

preliminaries used in this paper. Section IV overviewsthe framework of the proposed scheme. Details of theproposed hashing scheme are described in Section V,Section VI, and Section VII, respectively. Section VIIIpresents the performance analysis and experimental results.This work is concluded in Section IX.

II. RELATED WORK

This section reviews some related works on the geometryand topology authentication for CAD models.

A. Geometry Authentication

Existing works on geometry authentication for CAD modelsin the literature can be divided into two main categories:watermarking-based methods and hashing-based methods.

Watermarking-based methods: Many watermarking-basedmethods for geometry authentication of CAD models havebeen reported in the past years [19], [20]. Fornaro et al. [21]proposed a distributed watermarking scheme for verifyingConstructive Solid Geometry (CSG) models. Watermarks werecomputed from selected attributes of the model and stored incontrol nodes or in the comments of the model. Peng et al. [12]presented two reversible watermarking schemes, which can beapplied for content authentication, for 2D CAD engineeringgraphics based on histogram shifting. Both schemes exploitedthe correlation among adjacent coordinates or relative phases.Watermarks were embedded by shifting and modifying thedifference histogram of coordinates or phase. Xiao et al. [4]introduced a combined reversible watermarking scheme for 2DCAD engineering graphics. Watermarks were embedded intothe distance ratios of vertices through improved quantizationindex modulation and improved difference expansion.

Hashing-based methods: An information-theoretic hashingof a 3D mesh using spectral graph theory and entropic span-ning trees was presented by Tarmissia and Hamza [22]. Thescheme applied eigen-decomposition to the Laplace-Beltramimatrix of each sub-mesh and then generated the hash valuebased on the spectral coefficients and the Tsallis entropyestimate. Lee et al. [14] proposed a vector data hashingmethod for authentication and copyright protection of CADdesign graphics. Feature values were extracted by projectingthe polyline curvatures, which were obtained from groups ofvector data using GMM (Gaussian mixture model) clustering,onto random values. The final hash values were generatedbased on the binarization of feature values.

B. Topology Authentication

The problem of topology authentication for engineeringCAD graphics in the AEC industries is relatively new com-pared with existing image, video, 3D model and vector datahashing and has not been researched as widely compared withgeometry authentication. Su et al. [15] first investigated thetopology integrity authentication problem for piping isometricdrawings, which are a type of 2D engineering CAD graphics.A semi-fragile watermarking scheme was proposed to addressthis interesting issue. The topological relation among joint

Page 3: 1018 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND ... · detailed digital design graphics in CAD applications. Different from watermarking-based techniques, hashing-based schemes

1020 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 13, NO. 4, APRIL 2018

Fig. 1. Part of a typical 2D engineering CAD graphic.

Fig. 2. Some geometric objects used in 2D engineering CAD graphics.

components was encoded into watermarks. Authentication wasperformed by embedding topology-sensitive watermarks intogeometrical invariants of selected objects via quantizationindex modulation.

Although significant progress has been made in geome-try authentication for CAD models, there are still very fewmethods that focus on topology authentication. Furthermore,the problem of joint topology and geometry authentication for2D engineering CAD graphics has not been well investigatedin the literature. Therefore, this paper aims at developinghashing-based methods for jointly authenticating topology andgeometry information for 2D engineering CAD graphics.

III. PRELIMINARIES

A. 2D Engineering CAD Graphics

D engineering CAD graphics consist of multiple geometricobjects. Fig. 1 shows part of a common 2D engineering CADgraphic. Fig. 2 shows some typical geometric objects, whichare composed of various basic geometric entities such asLINE, POLYLINE, CIRCLE, ARC and POLYGON. Theseobjects often have complex external and internal shapes,as illustrated in Fig. 2. In terms of geometry and topologyinformation, without loss of generality, a 2D engineeringCAD graphic G can be defined as an undirected graphG = (O, E), where O = {o1, o2, · · · , om} is the set of nodes,and E = {ei j } is the set of edges. Each node oi correspondsto a geometric object. Each edge ei j = [oi , o j ] indicates thatoi connects with o j .

2D engineering CAD graphics can be easily edited throughthe various geometry and topology operations that are providedby CAD tools. These operations can be classified into non-malicious and malicious operations. Hash codes are expectedto be able to survive non-malicious operations and rejectmalicious tampering to an acceptable extent. Non-maliciousoperations cover global and local RST transformations. GlobalRST transformations are performed on the whole graphic to

have a better view, while local RST transformations are oftenapplied to individual objects to achieve a satisfactory appear-ance and fit. These geometry operations are applied to createcleaner and more legible graphics and facilitate the annotationfor various objects. They affect the position, dimensions andorientation of objects, based on the precondition of keepingtopological relations unchanged. Malicious operations includeinserting objects, deleting objects, and changing topologicalrelations logically. The insertion and deletion of objects,which can be defined as malicious geometry attacks, alwaysinvolve topology modification. All of the above operations areperformed on objects, rather than on their geometric entities.

B. Vector Quantization

The Vector Quantization (VQ) technique is utilized tocluster geometric objects in this paper. It was introduced as animage compression technique and proved to be efficient [23].

VQ can be simply regarded as a mapping function thatmaps the m-dimensional space Rm into a finite subset Y ={Y0, Y1, · · · , Yk−1}, where Y is called a codebook with k code-words and Y j = {Y 0

j , Y 1j , · · · , Y m−1

j } is the j -th codewordin codebook Y . Codebook training is performed in advancethrough the Linde-Buzo-Gray (LBG) algorithm [24] in thispaper. The details of the LBG algorithm are given as follows:

Step 1 : Generate an initial codebook Y 0 of size k. Setthe iteration counter i = 0 and the initial averagedistortion D−1 = ∞. Set the maximum iterationcounter as I and the distortion threshold as ε.

Step 2 : For each training vector x , find its best-matchingcodeword with the least distortion in the currentcodebook Y i by calculating the Euclidean distancebetween each codeword and the input vector x .

Step 3 : Assign the training vectors into k cells and updatethe centroid of each cell to obtain a new codebookY i+1.

Step 4 : Calculate the current average distortion Di for alltraining vectors at the i -th iteration.

Step 5 : If (Di−1 − Di )/Di ≤ ε or i = I , set the ultimatecodebook Y = Y i+1 and the LBG algorithm iscomplete. Otherwise, let i = i + 1 and return toStep 2.

C. Covariance Descriptor

The covariance descriptor, which was first introduced byTuzel et al. [17] for object detection and texture classification,is employed to fuse and represent topology and geometryfeatures of 2D engineering CAD graphics in this paper.

From a statistics point of view, covariance can be understoodas a measure of how several variables change together. Withinthe context of the descriptor definition, the set of randomvariables must correspond to a set of observable featuresthat are correlated to one another [18], [25]. Given an imageI ∈ RW×H , let F(x, y) be the W × H ×d-dimensional featureimage that is extracted from I ,

F(x, y) = φ(I, x, y) (1)

Page 4: 1018 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND ... · detailed digital design graphics in CAD applications. Different from watermarking-based techniques, hashing-based schemes

SU et al.: ROBUST 2D ENGINEERING CAD GRAPHICS HASHING FOR JOINT TOPOLOGY AND GEOMETRY AUTHENTICATION 1021

Fig. 3. Overview of the proposed framework.

where the function φ can be any pixel-wise mapping, suchas intensity, color, gradient, filter response, or higher-orderderivative. For a given rectangular region R ∈ F , let {zi }n

i=1be the d-dimensional feature points inside R. The region Rcan be described using a d ×d-dimensional covariance matrixof their points [17],

CR = 1

n − 1

n∑

i=1

(zi − μ)(zi − μ)T (2)

where μ is the mean of the feature vectors of all points in theregion.

IV. OVERVIEW OF THE FRAMEWORK

The framework of the proposed hashing scheme consists ofthree major parts: topology and geometry feature extraction,topology and geometry feature fusion, and joint topologyand geometry hashing. The flow chart of the authenticationframework is shown in Fig. 3.

In the topology and geometry feature extraction part, foreach geometric object, a binary shape texture is rendered andits geometry feature is computed based on the ring partition.Its topology feature is extracted according to its topologicalrelation. In the topology and geometry feature fusion part,all objects are clustered into k groups with different numbersof objects according to their geometry features. Then, foreach group, a covariance matrix that encodes the topologyand geometry features of objects in the group is computed.In the joint topology and geometry hashing part, a featurevector for each group is constructed according to its covariancematrix. To reduce the hash length and for convenience of

Fig. 4. Illustration of geometry feature extraction.

storage, a Gaussian random matrix is used to compress thefeature vector to obtain an intermediate hash, which is thenpseudo-randomly scrambled based on secret keys. Encryptionand randomization are utilized to reduce hash collisions toimprove the security of the algorithm. The final hash sequenceis generated by concatenating the intermediate hashes thatcorrespond to the groups.

V. TOPOLOGY AND GEOMETRY FEATURE EXTRACTION

A. Geometry Feature Extraction

For each geometric object oi , its geometry feature vgi is

computed in the image space, as illustrated in Fig. 4, becauseof its complex contours and internal structures. A normalizedbinary texture is first generated by projecting oi onto afixed-size texture orthogonally. Then, the rendered texture isdivided into different rings. Finally, its geometry feature vg

i iscomputed based on the statistical features that were extractedfrom each ring.

1) Shape Texture Rendering: A normalized F × F binarytexture T is rendered for each geometric object oi throughthe Render-To-Texture technique [26], as illustrated in Fig.4.First, an empty texture T is created. Then, the smallestenclosing circle with center c(x, y) and radius r of oi iscomputed. These parameters are further utilized to define thesix parameters (le f t , right , top, bottom, near , f ar ) of theprojection matrix, as illustrated in Fig.4. Finally, the object oi

is rendered to the texture T by orthographic projection. It isobvious that the rendered normalized texture is invariant toobject translation and uniform scaling.

2) Ring Partitioning: Ring partitioning [8], [16] isemployed to extract geometry features that are robust to objectrotation. The rendered normalized texture is divided into aset of rings with equal area, as illustrated in Fig.4. It istheoretically proved that the region in the inscribed circle ofan image is preserved under rotation [8], [16]. This providesus with an opportunity to extract image features that are robustto rotation.

Given a normalized F × F texture T , let n be the numberof rings; rm be the m-th radius (m = 0, 1, · · · , n − 1), wherethe radii are arranged in ascending order; and Rm be the setof pixel values of the m-th ring. Clearly, rn−1 = �F/2� forthe texture T . In addition, rm can be determined by iterativelycalculating the following equation:

rm =√

S + πr2m−1

π(3)

Page 5: 1018 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND ... · detailed digital design graphics in CAD applications. Different from watermarking-based techniques, hashing-based schemes

1022 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 13, NO. 4, APRIL 2018

where

r0 =√

S

π(4)

and S is the average area of each ring

S = �S/n� (5)

in which S is the area of the inscribed circle

S = πr2n−1 (6)

Thus, image pixels p(x, y) (0 ≤ x ≤ F − 1, 0 ≤ y ≤F − 1) can be classified into different sets by comparing theirdistances to the image center with these radii

R0 = {p(x, y)|dx,y ≤ r0} (7)

Rm = {p(x, y)|rm−1 ≤ dx,y ≤ rm}(m = 1, 2, · · · , n − 1) (8)

where dx,y is the Euclidean distance from p(x, y) to the imagecenter (xc, yc) which is defined as:

dx,y =√

(x − xc)2 + (y − yc)2 (9)

where xc = yc = F/2+0.5 if F is an even number. Otherwise,xc = yc = (F + 1)/2.

3) Feature Extraction: Four statistics are chosen to effi-ciently capture the visual content of each ring Rm : mean (μm),variance (δm), skewness (sm), and kurtosis (wm), which aredefined as follows:

μm = 1

Nm

Nm−1∑

i=0

Rm(i) (10)

δm = 1

Nm − 1

Nm−1∑

i=0

(Rm(i) − μm)2 (11)

sm =1

Nm

∑Nm−1i=0 (Rm(i) − μm)3

(√1

Nm

∑Nm −1i=0 (Rm(i) − μm)2

)3 (12)

wm =1

Nm

∑Nm−1i=0 (Rm(i) − μm)4

(√1

Nm

∑Nm −1i=0 (Rm(i) − μm)2

)2 (13)

where Nm = card(Rm) is the total number of elements inRm , and Rm(i) is the i -th element of Rm (0 ≤ i ≤ Nm − 1).These statistics of each ring are exploited to form the geometryfeature vector vg

i , which contains (4 × n) elements.

vgi = [μ0, δ0, s0, w0, · · · , μn−1, δn−1, sn−1, wn−1] (14)

B. Topology Feature Extraction

For each object oi , a fixed-dimensional topology featurevector vt

i is formed according to its topological relation

vti = [nmax , vg

0 , vg1 , · · · , vg

nmax−1] (15)

where vgj (0 ≤ j ≤ nmax − 1) is the j -th joint object of oi and

nmax is the maximum number of joint objects. Elements of thegeometry feature vectors are set to zero if the object oi hasless than nmax joint objects. The number nmax and geometryfeature vectors of all joint objects together form a topologyfeature vector vt

i , which contains (1 + nmax ×4 ×n) elements.

VI. TOPOLOGY AND GEOMETRY FEATURE FUSION

A. Objects Clustering

To facilitate tampering localization and ensure that thegenerated hash has a fixed length and the same computationalcomplexity, for a given 2D engineering CAD graphic G, allobjects are clustered into k groups {G j , 0 ≤ j ≤ k − 1}according to their geometry features using the vector quan-tization technique [23]. Thus, objects with similar shape areclustered into the same group.

Many geometric objects of 2D engineering graphics are col-lected and selected to train a codebook Y through LBG [24].The codebook size is predefined as k, and the influence ofthe codebook size is analyzed further in Section VIII-C. WithVQ, for the geometry feature vector vg

i of each object oi ,we find the best-matching codeword Y j and its index j . Then,we assign object oi to the j -th group G j .

B. Covariance Matrix for Fusing of Geometryand Topology Features

For each group G j with n j objects, a covariance matrixis built for fusing of topology and geometry features of allobjects in G j .

First, a feature selection function �(G j ) is defined for agiven group G j :

�(G j ) = {vi , ∀oi s.t. oi ∈ G j , 0 ≤ i ≤ n j − 1]} (16)

where vi is the feature vector that encodes the topology andgeometry properties of each object oi , which is defined as:

vi = [vgi , vt

i ] = [vgi , nmax , vg

0 , vg1 , · · · , vg

nmax−1] (17)

where vgi is the geometry feature vector and vt

i is the topologyfeature vector. The fixed d-dimensional feature vector vi iscomputed for object oi of G j , where d = 1+(nmax +1)×4×n.

Then, a d × d-dimensional Symmetric Positive Defi-nite (SPD) covariance matrix MG j is defined to representgroup G j :

MG j = 1

n j − 1

n j −1∑

i=0

(vi − μ)(vi − μ)T

=⎡

⎣m(1, 1) m(1, 2) . . . m(1, d)

. . . . . . . . . . . .m(d, 1) m(d, 2) . . . m(d, d)

⎦ (18)

where μ is the mean of the set of feature vectors {vi } ofgroup G j . The diagonal elements of the covariance matrixMG j represent the variances of the features, while its non-diagonal elements represent their pairwise correlations. It hasa fixed dimension that is independent of the size of the group.Furthermore, the matrix MG j can be computed for any types offeatures that are encoded in vi without normalization or jointprobability estimation. Therefore, covariance matrices providean elegant mechanism for fusing heterogeneous features ofarbitrary dimension and scale [18].

Page 6: 1018 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND ... · detailed digital design graphics in CAD applications. Different from watermarking-based techniques, hashing-based schemes

SU et al.: ROBUST 2D ENGINEERING CAD GRAPHICS HASHING FOR JOINT TOPOLOGY AND GEOMETRY AUTHENTICATION 1023

VII. JOINT TOPOLOGY AND GEOMETRY HASHING

A. Hash Generation

For each group G j , we zigzag the upper-triangular elementsof MG j , which is a symmetric positive definite (SPD) matrix,to obtain the following vector:

vmj = [m(1, 1), · · · , m(1, d),

m(2, 2), · · · , m(2, d), · · · , m(d, d)] (19)

1) Compression and Projection: A Gaussian randommatrix Mg is generated and employed to reduce the dimen-sionality of the vector vm

j . To obtain a compressed vectorvmc

j , the equation (20) is used to achieve compression andprojection:

vmcj = Mg · (vm

j )T (20)

where Mg ∈ Rs× (d(d+1)/2), s = �d(d +1)/2× p� in which pis the projection rate, which is selected experimentally. Mg

u,v

is a matrix of independent and identically distributed randomvariables from a Gaussian probability density function withmean 0 and variance 1/s [27].

Mgu,v ∼ N (0,

1

s) (21)

Finally, a compressed s-dimensional vector vmcj is generated.

2) Encryption and Randomization: To increase the securityof the proposed hashing algorithm, a deterministic chaotic mapis employed to generate a chaotic sequence, which is extremelysensitive to initial conditions [7]. The function that is used inthis paper is the logistic difference equation:

yn+1 = ayn(1 − yn) (22)

where a is the function seed and yn is the current valueof the mapping, which is between 0 and 1, with an initialvalue y0. The sequence that is obtained by iterating fromthe initial value is chaotic when 3.5699456 < a ≤ 4.A mapping value y I

0 will be generated if the logistic functionis seeded with a function seed a0 and an initial value y0for I iterations. Let y = (y I

0 , y I+10 , · · · , y I+s−1

0 ) be thegenerated chaotic sequence with the same initial conditionsfor different iterations. The compressed vector vmc

j can berandomized by

vmcj = (vmc

j,0, vmcj,1, · · · , vmc

j,s−1)

= (vmcj,0 × y I

0 , vmcj,1 × y I+1

0 , · · · , vmcj,s−1 × y I+s−1

0 ) (23)

Then, an intermediate binary hash h j for each group G j isgenerated through thresholding

h j = [h(0), · · · , h(s − 1)] (24)

where

h(i) ={

1, vmcj,i > Tj

0, vmcj,i ≤ Tj ,

0 ≤ i ≤ s − 1 (25)

Tj = 1

s

s−1∑

i=0

vmcj,i (26)

3) Hash Construction: The intermediate hash h j of eachgroup G j is concatenated to form the final hash sequence,namely h.

h = [h1, · · · , hk] (27)

It is clear that the length of our hash h is (k × s) bits.To guarantee the uniqueness of the final hash and facilitatethe authentication stage, the k groups should be arranged inadvance. This can be achieved through sorting the codewordsin Y according to their vector component values in sequence.By this approach, a sorted codebook Y is achieved. Then, kgroups and their hash codes are arranged consequently.

B. Group-Level Tampering Detection and Localization

The proposed hashing scheme is designed to yield group-level tampering detection and localization ability through com-paring a distance metric to measure the similarity between thehash values of each group. Regarding malicious geometry andtopology modifications, it is sometimes difficult to locate thetampered objects accurately because of the trade-off betweencompactness of hash codes and sensitivity to malicious tam-pering. If the hash h of a trusted engineering CAD graphic Gis available, it is called the reference hash. The hash of areceived engineering CAD graphic G

′to be tested, namely h

′,

is extracted using the above method. An object group canbe considered tampered if it contains maliciously modifiedobjects, and the changes in the objects can be measured viadistances between hash values of the trusted graphic and thetested graphic in the corresponding group. Here, two graphicswith the same contents do not need to have identical geometryinformation, only topology information, since objects may bemodified by topology-preserving operations such as rotation,uniform scaling and translation, as discussed in Section III-A.

The graphic authentication process consists of the followingsteps:

Step 1: Decompose the received reference hash h into kgroups {h j }( j = 0, . . . , k − 1) according to the pre-trainedcodebook Y . Each group has s bits.

Step 2: For the received graphic G′, extract the geometry

feature vgi and then the topology feature vt

i of each object.Step 3: Cluster all of the objects into k groups according to

their geometry features with the given codebook Y .Step 4: Compute the covariance matrix M

G′j

for fusing the

topology and geometry features of objects in each group G′j .

Step 5: Generate the intermediate hash code h′j of each

group G′j , and then form the final hash sequence h

′.

Step 6: To measure the similarity between group G j

and group G′j , the normalized Hamming distance dgroup is

exploited as a metric:

dgroup( j) = 1

s

s−1∑

m=0

|h′j (m) − h j (m)|2 (28)

where h′j (m) and h j (m) are the m-th elements of h

′j and h j ,

0 ≤ j ≤ k − 1, respectively. Thus, the normalized Ham-ming distance Dgraphic for graphic similarity measurement is

Page 7: 1018 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND ... · detailed digital design graphics in CAD applications. Different from watermarking-based techniques, hashing-based schemes

1024 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 13, NO. 4, APRIL 2018

TABLE I

NONMALICIOUS OPERATIONS AND PARAMETER VALUES

TABLE II

MALICIOUS OPERATIONS AND PARAMETER VALUES

defined as:

Dgraphic = max(dgroup(0), dgroup(1), · · · , dgroup(k − 1))

(29)

Step 7: G j and G′j are said to be functionally identi-

cal if dgroup( j) < T , where T is a threshold. Otherwise,the group G

′j is a tampered version of G j or is different

from G j . Furthermore, G and G′

should be consideredfunctionally identical if Dgraphic < T . Otherwise, they aredifferent graphics or one is a tampered version of the other.

VIII. PERFORMANCE ANALYSIS AND

EXPERIMENTAL RESULTS

In this section, various experiments are carried out toevaluate the performance of the proposed hashing schemefor 2D engineering CAD graphics in terms of robustness,sensitivity, discriminative capability and security.

A. Graphic Data Sets

Taking the process plant in the AEC industry for exam-ple, 40 different 2D engineering CAD graphics with variousnumbers of objects (including 10 graphics with approximately50 objects, 10 graphics with approximately 100 objects,10 graphics with approximately 300 objects, and 10 graphicswith approximately 500 objects) are tested. To train thecodebook Y , an object database of approximately 106 differentkinds of objects, which were collected from many graphics ofprocess plants, is also constructed.

Detailed parameter settings of non-malicious and maliciousoperations are presented in Tables I and II, respectively.Each test graphic has 12 non-maliciously attacked versionsand 5 maliciously attacked versions with different tamperingratios. Therefore, 40 × 12 = 480 pairs of identical graphicsare used for robustness validation, 40 × 5 = 200 pairsof similar graphics are used for sensitivity validation, and40 × (40 − 1)/2 = 780 pairs of different graphics are usedfor discrimination testing.

B. Performance Criteria

To discuss the performance in detail, the true positiverate (TPR) PTPR and false positive rate (FPR) PFPR are firstdefined:

PTPR = Nsimilar

Nidentical(30)

PFPR = Ndistinct

Ndifferent(31)

where Nsimilar is the number of pairs of functionally identicalgraphics that are correctly identified as functionally identicalgraphics, Nidentical is the total number of pairs of functionallyidentical graphics, Ndistinct is the number of pairs of distinctgraphics that are mistakenly classified as the same graphics,and Ndifferent is the total number of pairs of different graphics.Then, a new term, namely “Detection rate”Dr , is defined todescribe the probability of correct detection:

Dr = Ncorrect

Ntotal× 100% (32)

where Ncorrect is the number of correct detections and Ntotalis the total number of tested graphics.

C. Parameter Setting

To achieve satisfactory performance, the parameters thatare used in the proposed hashing scheme are estimated viaexperiments. In the experiments, the parameters that are usedare as follows. The rendered binary shape texture size is100 × 100 (F = 100). Small F leads to loss of fine details,while large F results in high computational complexity.We choose F = 100 as an appropriate trade-off. The numbernmax in equation (15) is determined in accordance with thespecific application area. For example, in our case, nmax

is set to 4 since the maximum number of joint objectsgenerally will not exceed 4 in the process industry. Thelogistic function in equation (22) is seeded with the valuesa = 4 and y0 = 0.20160614 for 2000 iterations. The numberof groups k is equal to the size of the codebook Y . Thenormalized Hamming distance in Equation (28) is used tomeasure the hash distances between corresponding groups.Since the proposed method achieves satisfactory performancefor all tested operations when T ≥ 0.2, we set T = 0.2.

1) Group Number & Codebook Size k: To facilitate tam-per detection and localization, objects are clustered into kgroups according to the codebook Y with k codewords in thepreprocessing step for each graphic. The proposed schemeis designed to yield the group-level tamper detection andlocalization capabilities. Large k will result in fewer objectsin each group, thereby giving rise to high tamper detectionand localization capabilities. However, total hash length, whichdepends on k and s, will also increase with the incrementof k. Thus, there is a trade-off between total hash lengthand tamper localization capability. There is also a trade-off among total hash length, sensitivity, and discriminativecapability. Generally, compact hash codes include less graph-ical information, which contributes to stronger robustness.However, the discriminative capability and sensitivity willbe weaker. In contrast, hash values of longer length include

Page 8: 1018 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND ... · detailed digital design graphics in CAD applications. Different from watermarking-based techniques, hashing-based schemes

SU et al.: ROBUST 2D ENGINEERING CAD GRAPHICS HASHING FOR JOINT TOPOLOGY AND GEOMETRY AUTHENTICATION 1025

Fig. 5. Euclidean distances between each pair of feature vectors of 106 dif-ferent kinds of objects.

abundant graphical information. Hence, they contribute toideal tampering localization functionality. Thus, discriminativecapability and sensitivity are stronger and robustness is weaker.In this paper, from the practical application point of view,we set k to 25 as an appropriate trade-off among tamperlocalization capability, discriminative capability, sensitivity,and total hash length.

2) Ring Number n: Geometry features that are resilientto object translation, scaling, and especially rotation areextracted through dividing the rendered normalized binarytexture into n rings. Theoretically, large n will yield betterobject discrimination performance. However, this will lead togreater geometry feature vector dimension and even highercomputational complexity. To select a proper value of n forring partition, experiments are conducted on the constructedobject database with 106 different kinds of objects. Thegeometry feature vector vg

i in equation (14) is first formedfor each object for three different values of n (n = 2, 4, 6).To measure distances between feature vectors, some typicalmetrics, such as Euclidean distance and cosine distance, maybe employed. In this paper, to reflect the absolute differencesamong individual numerical features, the Euclidean distanceis adopted since cosine similarity is generally used as ametric for measuring the distance when the magnitudes of thevectors do not matter. Thus, for each number, the Euclideandistance between each pair of feature vectors is calculated,and 106 × (106 − 1)/2 = 5565 results are finally obtained.The distribution of these Euclidean distances is illustratedin Fig.5, where the x-axis is the number of different-objectpairs and the y-axis is the value of the Euclidean distance.Statistics of Euclidean distances under different ring numbersare also computed and given in Table III. It is observed thatthe proposed scheme achieves better object discriminationpower when n ≥ 4. Therefore, we choose n = 4 as anappropriate trade-off between discrimination performance andcomputational complexity.

3) Projection Rate p: A s × (d(d + 1)/2) Gaussianrandom matrix Mg is employed to reduce the dimensionalityof the vector vm

j , which is derived from the covariance matrixMG j in Eq.(20), where s = �d(d + 1)/2 × p� and d =1 + (nmax + 1) × 4 × n. Therefore, the projection rate pdetermines the dimension of the compressed vector vmc

j and

TABLE III

STATISTICS OF EUCLIDEAN DISTANCES BASEDON 106 DIFFERENT OBJECTS

TABLE IV

DIFFERENT PROJECTION RATES AND AUTHENTICATIONTHRESHOLDS FOR ROC CURVES

the hash length. The receiver operating characteristic (ROC)graph in which the x-axis is PFPR and the y-axis is PTPRis employed to make visual classification comparisons withrespect to robustness and discrimination under different projec-tion rates with k = 25. To comprehensively describe the effectof the projection rate on the hash performances under differentthresholds, for each projection rate p, different thresholds Tare used to find PTPR and PFPR. The ROC curve is finallyformed by a set of points with coordinates (PFPR, PTPR). It isclear that PFPR and PTPR are indicators of robustness anddiscrimination capability, respectively. For two ROC curves,the curve that is close to the top-left corner has better classi-fication performance than the curve that is far away from thetop-left corner.

In the experiment, 40 test engineering CAD graphics,as described in Section VIII-A, are used for testing: 40×12 =480 pairs of identical graphics for robustness validation and40 × (40 − 1)/2 = 780 pairs of different graphics forthe discrimination test. Table IV presents the values of theprojection rate p and threshold T that are used to calculatethe ROC curves. For each pair of graphics, the hash codeh

′j of each group G

′j of test graphic G

′is first extracted.

Then, the group distance dgroup( j) between G′j and G j is

calculated. Finally, the graphic distance Dgraphic between G′

and its trusted graphic G is generated. Fig.6 illustrates theROC curve comparisons among different projection rates. It isobserved that all ROC curves are very close to the top-left corner. This means that the proposed hashing schemehas satisfactory classification performance with respect torobustness and discrimination. Moreover, the ROC curve forp = 0.08 is slightly closer to the top-left corner than thosefor other p values. Therefore, a moderate projection rate, e.g.,p = 0.08, is a good choice for obtaining a desirable trade-offbetween robustness and discrimination.

4) Authentication Threshold T : The threshold T is utilizedto measure the similarity between group and graphic pairs.The smaller the T value is, the better the discriminativecapability. However, robustness performance will be degradedas T decreases. Therefore, the threshold T should be chosenaccording to the specific application, to obtain a satisfactorybalance between discrimination and robustness.

To determine the threshold T for differentiating two groups,40 × 12 = 480 pairs of identical graphics and 40 × 5 = 200

Page 9: 1018 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND ... · detailed digital design graphics in CAD applications. Different from watermarking-based techniques, hashing-based schemes

1026 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 13, NO. 4, APRIL 2018

Fig. 6. ROC curve comparisons among different projection rates.

TABLE V

STATISTICS OF NORMALIZED HAMMING DISTANCEBASED ON GRAPHIC PAIRS

pairs of similar graphics are used. For each pair of graphics,the hash sequence h

′of each test graphic G

′is first extracted.

Then, the graphic distance Dgraphic between G′

and itstrusted graphic G is calculated. Figs. 7(a) and .7(b) showthe normalized Hamming distance distributions for hashes ofidentical graphics and similar graphics, respectively. Table Villustrates the statistics of normalized Hamming distances.It can be observed that the mean distance of identical graphicpairs is only 0.045 and all maximum distances are less than0.2, except for those of some rotated graphics. Likewise,the mean distance of similar graphic pairs is 0.318 and allminimum distances are larger than 0.2, except for those ofsome tampered complex graphics with more than 300 objects.Furthermore, the tampering rates of those graphics rangefrom approximately 1% to 5%. Fig. 8 shows the detectionrates of identical and similar graphics under different valuesof threshold T . When T = 0.2, the proposed scheme canachieve a good balance between discrimination and robustness.In this case, 94.58% of identical graphics (including somerotated versions) and 88.50% of similar graphics with lowtampering rates can be correctly detected. This is why weset the authentication threshold in subsequent experiments toT = 0.2.

D. Robustness Analysis

The proposed hashing scheme is designed to be robustto non-malicious operations, including global and local RSTtransformations. Under the premise of preserving the topo-logical relationships among objects, these manipulations areperformed on graphic objects to obtain a better view or achievea satisfactory appearance or fit, as discussed in Section III-A.

Fig. 7. Distribution of normalized Hamming distances of identical and similargraphic pairs. (a) Identical graphic pairs. (b) Similar graphic pairs.

Fig. 8. Detection rates of identical graphics and similar graphics underdifferent thresholds.

Therefore, these operations only affect the geometric shapesand position of objects.

The test graphics that are used in Section VIII-C.4 areselected for this experiment, and all of the non-maliciousoperations that are listed in Table I are exploited to attackthese graphics. Therefore, each test graphic has 12 functionallyconsistent graphics and the total number of pairs of identicalgraphics is 40 × 12 = 480. Hash values of the original and

Page 10: 1018 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND ... · detailed digital design graphics in CAD applications. Different from watermarking-based techniques, hashing-based schemes

SU et al.: ROBUST 2D ENGINEERING CAD GRAPHICS HASHING FOR JOINT TOPOLOGY AND GEOMETRY AUTHENTICATION 1027

TABLE VI

DETECTION RATES UNDER VARIOUS NONMALICIOUSOPERATIONS LISTED IN TABLE I (%)

TABLE VII

DETECTION RATE UNDER VARIOUS MALICIOUS OPERATIONS

LISTED IN TABLE II (%)

the attacked graphics are calculated. Then, the normalizedHamming distance is employed to evaluate their distances.Fig. 7(a) shows the distributions of the calculated normalizedHamming distances. Table VI lists the detection rates undervarious non-malicious operations. When T = 0.2, 94.58%of identical graphics can be correctly detected. Moreover,the mean distance of identical graphic pairs is only 0.045, andall maximum distances are less than 0.2, except those of somerotated graphics. This means that our hashing scheme canachieve satisfactory robustness performance when T = 0.2.

E. Sensitivity Analysis

The proposed scheme must be highly sensitive to maliciousoperations, including inserting objects, deleting objects, andchanging topological relations logically. In terms of objectaddition, the added objects should be connected with existingobjects. This kind of attack changes the topology of the mod-ified objects. In the case of object removal, the target objectsare first disconnected from their joint objects and then deletedfrom the graphic. Thus, the topological relation of the involvedobjects is modified. Modifying local topological relations ofobjects involves various operations, such as disconnecting twojoint objects logically and connecting two disconnected objectslogically. Thus, all of the above operations inevitably alter thegeometry or topology information of the manipulated objects.Moreover, they lead to the modification of the covariancematrix of the corresponding group and, finally, the generatedhash codes.

To further validate the sensitivity of the proposed scheme,the malicious operations that are listed in Table II are usedto conduct attacks on each original graphic. Thus, each testgraphic has 5 maliciously attacked versions and 40 × 5 =200 pairs of similar graphics are used in total. Finally,200 normalized Hamming distances are calculated, as shownin Fig. 7(b). Table VII lists the detection rates under variousmalicious operations. Almost all distances are greater than 0.2,except for those that correspond to some graphics where thetampering ratios are less than 5%. Moreover, only 11.50%of similar graphics with low tampering rates are detected bymistake. This confirms that the proposed method is sensitiveto malicious operations.

F. Visual Effect of Tampering Localization

For a content authentication scheme, the tamperinglocalization functionality is of crucial importance. This

functionality refers to the capability of identifying tamperedgraphic objects. The proposed hashing scheme is designedto achieve group-level tampering localization capability,which can be improved by increasing the group numberk, as discussed in Section VIII-C.1. A graphic is selectedfor demonstrating the functionality via the visual effect.Due to space limitations, Fig .9(a) shows only part of the testgraphic. The three malicious operations that are discussedabove are used to alter graphic objects. The proposed hashingscheme is applied to the test graphic, and it is observed thatall normalized Hamming distances are greater than 0.2. All ofthe object groups to which the tampered objects belong arecorrectly identified. Then, objects in the identified groupsare marked as suspicious objects. Figs. 9(b), (c) and (d)show the visual detection results of the proposed method fordifferent attack types. The detected results are highlighted byred-colored squares. For example, in Fig. 9(b), a new objectA of the same type as B1 is added and connected with B1and C1. Thus, the geometry information of the graphical andtopological relations of B1 and C1 is modified. The groups towhich the attacked objects belong are correctly detected bythe proposed scheme. In addition, all of the objects (includingA, B1, B2, B3, C1, and C2) in the detected groups are labeledand highlighted by red-colored squares.

G. Discriminative Capability Analysis

Discriminative capability means that two different graphicshave a very low probability of generating similar hashes. If thenormalized Hamming distance between two different graphicsis less than the threshold T , a collision occurs.

To evaluate the discriminative capability of the proposedscheme, 40 × (40 − 1)/2 = 780 pairs of different graphicsare employed. The proposed hashing scheme is used to extracthashes of 40 different graphics. Then, the normalized Ham-ming distance Dgraphic between each pair of different graphicsis calculated, and 40 × (40 − 1)/2 = 780 results are finallyobtained. The statistics of the normalized Hamming distancesof different graphic pairs are listed in Table V. Fig. 10 givesthe normalized Hamming distance distributions for hashes ofdifferent graphics. According to the results, the minimum andmean distances are 0.472 and 0.508, respectively. Clearly, alldistances are much larger than the above mentioned thresholdT = 0.2, which indicates that the proposed hashing schemeachieves good discrimination.

H. Security Analysis

Security depends on the unpredictability of hash codes.This implies that it should be very difficult to decode a hashwithout knowledge of the key. The security of the proposedhashing scheme can be guaranteed by applying key-dependentencryption in the process of feature vector compression andrandomization. The Gaussian random matrix Mg , which isemployed to reduce the dimensionality of the feature vectors,can be kept as a security key. The function seed a and theinitial value y0 in the logistic mapping equation (22), whichis utilized to encrypt the compressed feature vectors, can alsoserve as security keys.

Page 11: 1018 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND ... · detailed digital design graphics in CAD applications. Different from watermarking-based techniques, hashing-based schemes

1028 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 13, NO. 4, APRIL 2018

Fig. 9. Tampered graphics and localized objects. (a) Part of the test graphic. (b) An object A is added and connected with B1 and C1. (c) An object D isdeleted from the graphic. (d) Topology relation among A1, B1, and A2 is modified logically.

Fig. 10. Distribution of normalized Hamming distances of different graphicpairs.

To validate the security performance of the proposed hash-ing scheme, in our experiments, 40 test graphics are adopted.Different keys are exploited to extract hashes, and distancesbetween these key-based hashes are calculated. Only secretkeys are varied and other parameters remain unchanged.

For each graphic, first, a Gaussian random matrix Mg , a func-tion seed a and an initial value y0 are used to extractthe graphic hash. Then, 99 sets of different keys, includingGaussian random matrices, function seeds, and initial values,are employed to generate 99 different graphic hashes. Finally,the normalized Hamming distances between the first hash andthe other 99 hashes are computed. Therefore, 40 × 99 = 3960normalized Hamming distances are obtained for the 40 testgraphics in total. Fig. 11 shows the obtained results for alltest graphics, where the x-axis is the index of the normalizedHamming distance between two hash codes and the y-axisis the normalized Hamming distance. It is observed that theminimum distance is much larger than T = 0.2. These resultsempirically verify that our graphic hashing scheme is key-dependent and meets the security requirements.

I. Fusion Method Analysis

The covariance descriptor is employed to fuse topology andgeometry features of each group G j in this paper. Comparedwith other fusion methods, such as feature concatenation, itsadvantages can be summarized as follows: First, it providesan elegant mechanism for fusing heterogeneous features

Page 12: 1018 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND ... · detailed digital design graphics in CAD applications. Different from watermarking-based techniques, hashing-based schemes

SU et al.: ROBUST 2D ENGINEERING CAD GRAPHICS HASHING FOR JOINT TOPOLOGY AND GEOMETRY AUTHENTICATION 1029

Fig. 11. Distribution of normalized Hamming distances with different keys.

of arbitrary dimension and scale. It captures not only thegeometry but also the topology features of objects in eachgroup, thereby characterizing the graphic. Second, it has afixed dimension that is independent of the size of group G j .Third, it is compact and easy to compute. Owing to thesymmetry, a covariance matrix has a smaller number of distinctelements compared with many other region descriptors.

J. Comparison With Previous Works

To the best of our knowledge, no related work that focuseson authenticating both geometry and topology informationof 2D engineering CAD graphics has been reported in theliterature. As described in Section II, existing works typi-cally concentrate on geometry authentication and protectionfor traditional mechanical CAD graphics, and there are stillvery few methods that focus on topology authentication for2D engineering CAD graphics [15], [28]. The main advantageof the proposed scheme, compared with previous works on2D engineering CAD graphics [15], [28], can be summarizedas follows:

First, the proposed method can authenticate both geometryand topology information for 2D engineering CAD graphics.Second, the proposed hashing-based method does notintroduce any distortion into the original graphics; thus,it is more suitable for CAD applications. The watermarking-based schemes that were presented in [15], [28] weredesigned to authenticate only the topology information for2D engineering CAD graphics, and original graphics wereinevitably changed by watermarking. Therefore, it is believedthat the proposed method is more generic and practical forindustrial applications.

IX. CONCLUSIONS

In this paper, a novel robust hashing scheme is proposedfor jointly authenticating topology and geometry informationof 2D engineering CAD graphics. A new covariance-baseddescriptor is introduced for fusing multiple heterogeneoustopology and geometry features. Hashes that are produced withthe proposed method are robust to non-malicious operationsand are sensitive to changes that are caused by malicious

attacks. The hashing scheme that is described in this paperyields group-level tampering detection and localization ability.The hash can be used to differentiate similar and differentgraphics. It can also identify and locate object groups thatcontain maliciously attacked objects. The proposed schemeachieves a trade-off among robustness, sensitivity, discrimina-tive capability, and tampering localization. The experimentalresults show the effectiveness and availability of the proposedhashing algorithm.

Further study is needed to find geometry features thatbetter represent various kinds of geometric objects, to enhancethe hash’s robustness against the rotation operation. Anotherimportant aim for future research is the achievement of moreprecise tampering localization accuracy while maintaining ashort hash length and good sensitivity to malicious attacks.

ACKNOWLEDGMENT

The authors would like to acknowledge the helpfulcomments and kindly suggestions provided by anonymousreferees.

REFERENCES

[1] W. Shen et al., “Systems integration and collaboration in architecture,engineering, construction, and facilities management: A review,” Adv.Eng. Inform., vol. 24, no. 2, pp. 196–207, 2010.

[2] A. Burdorf, B. Kampczyk, M. Lederhose, and H. Schmidt-Traub,“CAPD—Computer-aided plant design,” Comput. Chem. Eng., vol. 28,nos. 1–2, pp. 73–81, 2004.

[3] R. Guirardello and R. E. Swaney, “Optimization of process plant layoutwith pipe routing,” Comput. Chem. Eng., vol. 30, no. 1, pp. 99–114,2005.

[4] D. Xiao, S. Hu, and H. Zheng, “A high capacity combined reversiblewatermarking scheme for 2-D CAD engineering graphics,” MultimediaTools Appl., vol. 74, no. 6, pp. 2109–2126, 2015.

[5] C.-P. Yan, C.-M. Pun, and X.-C. Yuan, “Multi-scale image hashing usingadaptive local feature extraction for robust tampering detection,” SignalProcess., vol. 121, pp. 1–16, Apr. 2016.

[6] Y. Zhao, S. Wang, X. Zhang, and H. Yao, “Robust hashing for imageauthentication using Zernike moments and local features,” IEEE Trans.Inf. Forensics Security, vol. 8, no. 1, pp. 55–63, Jan. 2013.

[7] X. Wang, K. Pang, X. Zhou, Y. Zhou, L. Li, and J. Xue, “A visualmodel-based perceptual image hash for content authentication,” IEEETrans. Inf. Forensics Security, vol. 10, no. 7, pp. 1336–1349, Jul. 2015.

[8] Z. Tang, X. Zhang, X. Li, and S. Zhang, “Robust image hashing withring partition and invariant vector distance,” IEEE Trans. Inf. ForensicsSecurity, vol. 11, no. 1, pp. 200–214, Jan. 2016.

[9] R. Ohbuchi and H. Masuda, “Managing CAD data as a multimediadata type using digital watermarking,” in Proc. 4th Workshop Knowl.Intensive CAD Knowl. Intensive Eng. (IFIP), Deventer, The Netherlands,2001, pp. 103–116.

[10] S.-H. Lee and K.-R. Kwon, “CAD drawing watermarking scheme,”Digit. Signal Process., vol. 20, no. 5, pp. 1379–1399, 2010.

[11] L. Cao, C. Men, and R. Ji, “Nonlinear scrambling-based reversiblewatermarking for 2D-vector maps,” Vis. Comput., vol. 29, no. 3,pp. 231–237, 2013.

[12] F. Peng, Y. Liu, and M. Long, “Reversible watermarking for 2D CADengineering graphics based on improved histogram shifting,” Comput.-Aided Des., vol. 49, no. 4, pp. 42–50, 2014.

[13] S.-H. Lee, S.-G. Kwon, and K.-R. Kwon, “Robust hashing of vectordata using generalized curvatures of polyline,” IEICE Trans. Inf. Syst.,vol. E96.D, no. 5, pp. 1105–1114, 2013.

[14] S.-H. Lee, W.-J. Hwang, and K.-R. Kwon, “Polyline curvatures basedrobust vector data hashing,” Multimedia Tools Appl., vol. 73, no. 3,pp. 1913–1942, 2014.

[15] Z. Su, X. Yang, G. Liu, W. Li, and W. Tang, “Topology authenticationfor piping isometric drawing,” Comput.-Aided Des., vol. 66, no. 9,pp. 33–44, 2015.

[16] Z. Tang, X. Zhang, L. Huang, and Y. Dai, “Robust image hashing usingring-based entropies,” Signal Process., vol. 93, no. 7, pp. 2061–2069,2013.

Page 13: 1018 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND ... · detailed digital design graphics in CAD applications. Different from watermarking-based techniques, hashing-based schemes

1030 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 13, NO. 4, APRIL 2018

[17] O. Tuzel, F. Porikli, and P. Meer, “Region covariance: A fast descriptorfor detection and classification,” in Proc. 9th Eur. Conf. Comput. Vis.,Graz, Austria, May 2006, pp. 589–600.

[18] H. Tabia and H. Laga, “Covariance-based descriptors for efficient 3Dshape matching, retrieval, and classification,” IEEE Trans. Multimedia,vol. 17, no. 9, pp. 1591–1603, Sep. 2015.

[19] K. Wang, G. Lavoue, F. Denis, and A. Baskurt, “A comprehensive surveyon three-dimensional mesh watermarking,” IEEE Trans. Multimedia,vol. 10, no. 8, pp. 1513–1527, Dec. 2008.

[20] A. Khan, A. Siddiqa, S. Munib, and S. A. Malik, “A recent surveyof reversible watermarking techniques,” Inf. Sci., vol. 279, no. 20,pp. 251–272, 2014.

[21] C. Fornaro and A. Sanna, “Public key watermarking for authenticationof CSG models,” Comput.-Aided Des., vol. 32, no. 12, pp. 727–735,2000.

[22] K. Tarmissi and A. B. Hamza, “Information-theoretic hashing of 3Dobjects using spectral graph theory,” Exp. Syst. Appl., vol. 36, no. 5,pp. 9409–9414, 2009.

[23] R. M. Gray, “Vector quantization,” IEEE ASSP Mag., vol. 1, no. 2,pp. 4–29, Apr. 1984.

[24] Y. Linde, A. Buzo, and R. M. Gray, “An algorithm for vector quan-tizer design,” IEEE Trans. Commun., vol. COM-28, no. 1, pp. 84–95,Jan. 1980.

[25] P. Cirujeda, Y. D. Cid, X. Mateo, and X. Binefa, “A 3D scene registrationmethod via covariance descriptors and an evolutionary stable strategygame theory solver,” Int. J. Comput. Vis., vol. 115, no. 3, pp. 306–329,2015.

[26] D. Shreiner, G. Sellers, J. Kessenich, and B. Licea-Kane, OpenGL Pro-gramming Guide: The Official Guide to Learning OpenGL, D. Shreiner,Ed., 8th ed. Reading, MA, USA: Addison-Wesley, 2013.

[27] R. G. Baraniuk, “Compressive sensing,” IEEE Signal Process. Mag.,vol. 24, no. 4, pp. 118–121, Jul. 2007.

[28] Z. Su, L. Zhou, Y. Mao, Y. Dai, and W. Tang, “A unified framework forauthenticating topology integrity of 2d heterogeneous engineering caddrawings,” Multimedia Tools Appl., vol. 76, no. 20, pp. 20663–20689,2017.

Zhiyong Su received the B.S. and M.S. degreesfrom the School of Computer Science and Technol-ogy, Nanjing University of Science and Technology,in 2004 and 2006, respectively, and the Ph.D. degreefrom the Institute of Computing Technology, Chi-nese Academy of Sciences in 2009. He is currentlyan Associate Professor with the School of Automa-tion, Nanjing University of Science and Technology,China. His current interests include computer graph-ics, computer vision, and argument reality.

Ying Ye received the B.S. degree from YangzhouUniversity, China, in 2015. She is currently pursuingthe M.S. degree with the Nanjing University of Sci-ence and Technology, China. Her research interestsinclude image processing and computer vision.

Qi Zhang received the B.S. degree from AnhuiUniversity, China, in 2015. She is currently pursuingthe M.S. degree with the Nanjing University of Sci-ence and Technology, China. Her research interestsinclude image and video processing and computervision.

Weiqing Li received the B.S. and Ph.D. degreesfrom the School of Computer Sciences and Engi-neering, Nanjing University of Science and Technol-ogy, in 1997 and 2007, respectively. He is currentlyan Associate Professor with the School of ComputerScience and Engineering, Nanjing University of Sci-ence and Technology, China. His current interestsinclude computer graphics and virtual reality.

Yuewei Dai received the M.S. and Ph.D. degreesfrom the Nanjing University of Science and Technol-ogy, in 1987 and 2002, respectively, all in automa-tion. He is currently a Professor with the Schoolof Automation, Nanjing University of Science andTechnology, China. His research interests are inthe areas of information security, signal, and imageprocessing.