Arrow•s Impossibility Theorem inspired subjective image ...static.tongtianta.site/paper_pdf/41663c7e-d0c8-11e9-8ff8-00163e08… · Arrow’s Impossibility Theorem inspired subjective

Signal Processing 145 (2018) 193–201

Contents lists available at ScienceDirect

Signal Processing

journal homepage: www.elsevier.com/locate/sigpro

Arrow’s Impossibility Theorem inspired subjective image quality

assessment approach

Wenhan Zhu

a , Guangtao Zhai a , ∗, Menghan Hu

a , ∗, Jing Liu

b , Xiaokang Yang

a

a Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University, China b School of Electronic Information Engineering, Tianjin University, China

a r t i c l e i n f o

Article history:

Received 31 July 2017

Revised 13 November 2017

Accepted 1 December 2017

Available online 8 December 2017

Keywords:

Subjective image quality assessment

Arrow’s Impossibility Theorem

Pivotal subject

Mean opinion score

Paired comparison

Terahertz security image

a b s t r a c t

A large number of subjective image quality assessment databases have been constructed in the last

decade, in which the Mean Opinion Score (MOS) (with single or double stimulus), and Paired Comparison

(PC) are two dominant approaches for collecting the ground truth quality ratings and usually up to 15 or

more subjects are needed for each image. In this paper, we show the fact that there is a potential “dic-

tatorship” risk of using such averaging-over-multiple-rating type of method. Using Arrow’s Impossibility

Theorem (AIT), we prove that meeting of the unanimity and independence of irrelevant alternatives (IIA)

will generate a “pivotal subject”, who in fact determines the final rank of image quality. We also prove

that no an ideal democratic approach to synthesize the whole opinions of subjects. Therefore, we advo-

cate to recruit a small number of experts (a.k.a the “golden eyes”) for subjective viewing tests. In order

to verify the reliability of our proposal, experiments on two different databases conducting on the gen-

eral distorted images and professional images (here is Terahertz security image) are performed. In each

experiment, the raw scores of images are subjectively assigned by at least 15 inexperienced viewers and

3 experts, and meanwhile the MOS or difference mean opinion score (DMOS) are obtained. Afterwards,

the correlation of the scores rated by naive subjects and experts is analyzed. For general image experi-

ment, it is revealed that DMOS of inexperience viewers are highly related to DMOS of experts based on

six effective evaluation metrics. In professional image experiment, the preferences of experts also main-

tain favourable relevance with the opinions of inexperienced viewers in overall quality of THz image.

Moreover, considering the quality assessments of illegal substance regions in THz images, the experts

have higher accuracy than the inexperienced observers. In conclusion, the results of two validation ex-

periments verify that a small number of experts are more suitable for assessing the perceptual quality of

images, which can reduce cost and simplify procedure of creating databases.

© 2017 Elsevier B.V. All rights reserved.

1

p

r

o

e

t

r

t

t

i

z

j

o

f

f

o

M

a

D

s

t

g

j

h

0

. Introduction

Image quality assessment (IQA) plays a significant role in image

rocessing. In recent years, a large number of objective IQA algo-

ithms have been proposed to automatically quantify the quality

f distorted images [1,2] . The objective IQA models can be gen-

rally classified into three types in the light of the availability of

he original image, which are full-reference (FR) [3–6] , reduced-

eference (RR) [7–9] and no-reference (NR) [10–14] models, respec-

ively. However, images are ultimately perceived by human beings,

he primary and accurate method of quantifying visual perception

s subjective IQA [15,16] . The development of objective IQA relies

∗ Corresponding authors.

E-mail addresses: [email protected] (W. Zhu),

[email protected] (G. Zhai), [email protected] (M. Hu),

[email protected] (J. Liu), [email protected] (X. Yang).

a

[

s

s

P

ttps://doi.org/10.1016/j.sigpro.2017.12.001

165-1684/© 2017 Elsevier B.V. All rights reserved.

n the results of subjective IQA as the benchmark [17,18] . There-

ore the subjective IQA is also an important part in IQA.

For subjective IQA, the Mean Opinion Score (MOS) and Dif-

erence Mean Opinion Score (DMOS) are two widely used meth-

ds to describe the quality of distorted images [19] . For instance,

OS is applied in TID2008 [20] , TID2013 [21] , CCID2014 [22] to

cquire ground truth quality scores of distorted image’s quality.

MOS is used in LIVE [23] , CSIQ [24] , and MDID2013 [25] to de-

cribe the degree of difference between ground truth and dis-

orted images. Not limited to traditional databases designed for

eneral IQA problems, numerous emerging specific-purposed sub-

ective image/video quality studies, such as Waterloo 3D Im-

ge/Video database [26,27] and MCL 3D image quality database

28] for stereoscopic 3D image/video [29] as well as SIQAD [30] for

creen content image [31–33] , are also using MOS methods. Be-

ides MOS and DMOS, the quality of image can also be obtained by

aired Comparison (PC) [34–36] . Based on PC method, K.-T. Chen

https://doi.org/10.1016/j.sigpro.2017.12.001

http://www.ScienceDirect.com

http://www.elsevier.com/locate/sigpro

http://crossmark.crossref.org/dialog/?doi=10.1016/j.sigpro.2017.12.001&domain=pdf

mailto:[email protected]





https://doi.org/10.1016/j.sigpro.2017.12.001

194 W. Zhu et al. / Signal Processing 145 (2018) 193–201

Fig. 1. Illustrations of the existence of “dictator” in MOS method. (a) Example for

describing one “dictator” in the list of subjective score; and (b) Example for describ-

ing a set of “dictators” in subjective score. X and Y indicate distorted images to be

scored. Amy, Bob, Carol, David, and Ella are subjects.

2

2

n

M

w

t

m

p

s

C

i

X

(

s

d

E

t

s

t

m

A

t

s

2

t

n

>

r

c

e

c

t

o

m

2

t

et al. [34] established a crowdsourceable framework to quantify

the Quality of Experience (QoE) of multimedia content. Xu et al.

[35] exploited the randomized PC method to build an online rank-

ing scheme and gained quality of testing images from the Internet.

Yang et al. [36] employed PC method to establish a new compound

image database.

The number and the type of observers are also significant fac-

tors during subjective IQA. In general, there is one common sense

for subjective IQA tests: more observers may acquire more accurate

and more democratic results. According to the recommendation of

International Telecommunication Union (ITU), at least 15 viewers

who have no expertise in the image processing are employed to

guarantee the consistency in statistics [19] .

However, this recommendation may be inconsistent with a clas-

sical theory in social choice called Arrow’s Impossibility Theorem

(AIT) [37,38] . AIT was proposed by Kenneth Arrow, who won the

Nobel Economics Prize in 1972. AIT reveals an interesting and

counterintuitive conclusion: there is no ranked order voting sys-

tem can simultaneously satisfy four democratic principles namely

unrestricted domain, unanimity, independence of irrelevant alter-

natives (IIA), and non-dictatorship (see Section 3 for more details),

when three or more candidates exist. In fact, in any democratic

voting system, the preference of each subject is equally impor-

tant, thus the former three principles can be satisfied. This will

lead to the dictatorial voting system. In addition, with the increase

of voters and candidates in number, the “dictator” becomes more

powerful.

In this paper, inspired by AIT, we point out the ignored but

important issues for MOS and PC methods. The problem in MOS

method lies in the majority subordinate minority. While in PC

method, two compared images are influenced by irrelevant images.

First, we use three cases to introduce these issues. Subsequently,

we prove that there will be one “pivotal subject” among viewers if

the grading or ranking system meets the principles of the minority

subordinate to the majority and IIA. Based on the these above

two facts, we advocate to reduce the number of observers and

recruit a small group of experts who have substantial expertise in

image processing (a.k.a the “golden eyes”) to assess the quality of

testing images. In order to verify the reliability of our proposal,

two experiments carrying out on general distorted images and

professional images are performed. In the general distorted image

experiment, we select the subset of gaussian blur images in LIVE

[23] and recruit 26 inexperienced subjects and 3 experts to rate on

a hundred-grade scale. After processing the subjective raw scores,

the DMOS values are obtained. Based on them, we analyze the

reliability of experimental data and the correlation between the

ratings of experts and inexperienced viewers. In the professional

image experiment, 181 Terahertz (THz) security images [39] are

regarded as testing images. We invite 17 inexperienced observers

and 3 professional security inspectors to subjectively assess the

quality of THz images. The raw scores of THz images are gathered

into two aspects: the overall quality of human perception and local

sharpness of illegal substances. After disposing the raw data, the

MOS values of overall quality are acquired for each image. More-

over, the accuracy of judgement of existence of illegal goods are

computed for each subject. Based on the MOS values and the accu-

racy rates, the relationships between naive subjects and experts are

analyzed.

The remainder of this paper is organized as follows. In

Section 2 , the issues of MOS and PC are discussed by three cases.

Section 3 testifies the dictatorial property of any ideal grading

and ranking methods based on AIT. Detailed analysis of advoca-

tion is given in Section 4 . Two subjective experiments are per-

formed and the correlations between experts and inexperienced

subjects are evaluated in Section 5 . Finally, Section 6 concludes this

paper.

. Issues of MOS and PC methods

.1. Issue of MOS

The aim of MOS is to average all graders’ opinions, just as its

ame implies. The definition of MOS is:

OS j =

1

N

N ∑

i =1

s i, j , (1)

here s i, j represents the score of image j graded by subject i, N is

he number of subjects.

As one fair method, MOS should reflect the preferences of the

ajority. This means that the opinions of subjects are equally im-

ortant. However, this assumption fails in some real situations. A

imple case is shown in Fig. 1 (a). Five observers called Amy, Bob,

arol, David, and Ella score two distorted images named X and Y

n this case. All subjects score 2 point for Y . Ella grades 8 point for

, while the other subjects score 2 point for X . According to Eq.

1) , the quality of X is better than Y (2.4 > 2). Obviously, this re-

ult only reflects the opinion of Ella. That is to say, Ella completely

ominates the result regardless of the others’ opinions so we call

lla “dictator”. This case is succinct and it is unfrequent in prac-

ice. In addition, the score of Ella can be treated as the outlier by

ome statistical methods. For example, LIVE gets rid of outliers by

wo standard deviation method and TID2013 carries out 2% experi-

ents to eliminate outliers according to ITU-R Rec. BT.500-13 [19] .

lthough these methods can remove outliers, they can not evade

he situations where few subjects represent most subjects.

Another representative case is demonstrated in Fig. 1 (b). All

ubjects score 3 point for Y . For X , the former three subjects grade

point, while the latter two give 5 point. David and Ella prefer A

o B , and the other subjects hold the opposite opinion. In this case,

o subject can be regarded as outlier. The final result is X > Y (3.2

3), and thus a small part of observers (i.e., David and Ella) rep-

esent the preference of most subjects. Therefore, David and Ella

ome into a set of “dictators”, which is obviously unfair.

In fact, the possible reason for this problem is that the influ-

nce of each subject on the final score is different. The larger dis-

repancy between the score of individual and average score has,

he greater impact of he has on final result. Thus, the small part

f subjects whose scores are different with the scores of majority

ay play dominant role.

.2. Issue of PC

PC is also a common method used in subjective IQA. In contrast

o MOS, each subject involved in PC approach gives the choice to

W. Zhu et al. / Signal Processing 145 (2018) 193–201 195

Fig. 2. Example for describing the problems in MOS and PC methods. X, Y , and Z

represent distorted images to be scored. Amy, Bob, and Ella are subjects. The middle

two columns are calculated by PC, and the right column by MOS.

t

t

e

F

i

t

m

C

t

d

B

a

o

a

m

w

p

(

b

o

r

p

P

a

i

s

T

w

t

d

r

a

n

f

3

T

w

m

v

P

3

(

b

I

w

s

s

s

i

I

I

3

i

a

t

X

q

c

0

s

t

i

b

w

l

W

s

h

i

t

i

t

d

b

t

a

w

t

Z

o

he two images (0 and 1 for the worse and better images respec-

ively), and therefore, the above issue does not exist for PC. How-

ver, PC method will produce a new problem which is illustrated in

ig. 2 . Three subjects Amy, Bob and Carol compare three distorted

mages X, Y , and Z . In the first step, only two images X and Y need

o be evaluated. Three subjects provide their preferences using PC

ethod. Amy and Bob consider X > Y , while Carol believes Y > X .

learly, the final result is X > Y . Then, a new image Z is added to

he list of assessment and three subjects offer their evaluation or-

er again, where X > Y > Z, Z > X > Y, Y > Z > X are provided by Amy,

ob, and Carol respectively. Their original preferences between X

nd Y are unchanged. Amy and Bob consider X > Y , and Y > Z is

ffered by Amy and Carol, while Bob and Carol think Z > X . Now,

n interesting conclusion X > Y > Z > X is produced, and it is the fa-

ous paradox namely “Condorcet Paradox”. Using “Borda Voting”,

e assign 1, 2 and 3 point for the first, the second and the third

osition, respectively. Interestingly, three images all obtain 6 point

X = Y = Z ). However, if we wipe off image Z , the score of X will

e 4 point and Y only has 2 point. The final result becomes X > Y

nce again. In this sense, we can conclude that the final result is

elated with the number of testing images. It is also a classical

aradox in mathematics, namely “Borda Voting Paradox”. Thus, for

C method, we can find that the comparison of two images will be

ffected by the other irrelevant images. This is unreasonable.

Later, the same group of subjects are asked to score these three

mages using MOS method according to their preferences. The re-

ults of MOS method are shown in the right column of Fig. 2 .

he different result, Y > Z > X (5 > 4.7 > 4), is derived from MOS

hen comparing to PC. The best image X in PC method turns into

he worst when using MOS method. It is also unreasonable that

ifferent subjective IQA methods cause different final results.

Based on the analysis on the above three cases, a question is

aised naturally: can we find one ideal subjective IQA method to

void the above-mentioned issues? Unfortunately, the answer is

o, which is guaranteed by the evolution of AIT. The detailed proof

or this will be revealed in next section.

. Nonentity of ideal method based on Arrow’s Impossibility

heorem

In this section, based on Arrow’s Impossibility Theorem [40,41] ,

e prove that no ideal democratic grading or ranking method si-

ultaneously satisfies the unanimity and independence of irrele-

ant alternatives (IIA) for avoiding the unfair issues of MOS and

C.

.1. Formal statement

Let us first assume that there exists an ideal democratic method

IDM) to score or rank the images, its function can be represented

y:

DM : (R 1 (S) , . . . , R N (S)) → Q(S) , (2)

here S is a set of testing images, and N represents the number of

ubjects. R i shows the i th subject’s preference, and ( R 1 , ... , R N ) is its

et (named the preference set). Q ( S ) expresses the final order of S .

As a democratic evaluation function, IDM should meet three ba-

ic conditions to avoid the problems appearing in MOS and PC:

(1) Unanimity, or Pareto efficiency [42,43] : For two arbitrary im-

ages X and Y in the database, if X i has higher rank or score

than Y i for every subject i (i = 1 , . . . , N) in the preference

set ( R 1 , ... , R N ), the final score or rank of X will be higher

than Y .

(2) IIA [44] : For two preference sets (R (1) 1

, . . . , R (1) N

) and

(R (2) 1

, . . . , R (2) N

) , if ∀ i ∈ { 1 , . . . , N} , X i and Y i have the same

relative order in R (1) i

and R (2) i

(such as X (1) 1

> Y (1) 1

in

R (1) 1

, and X (2) 1

> Y (2) 1

in R (2) 1

), then X and Y will have

the same relative order in IDM(R (1) 1

, R (1) 2

, . . . , R (1) N

) as in

IDM(R (2) 1

, R (2) 2

, . . . , R (2) N

) . In other words, the influence of the

other irrelevant images on the order of these two images in

final result can be eliminated. This avoids the problem of PC.

(3) Non-dictatorship: There is no d ∈ { 1 , . . . , N} such that

∀ (R 1 , . . . , R N ) , if the score or rank of X d is higher than Y d in R d , the score or rank of X will be higher than Y in the fi-

nal result for arbitrary X and Y . This condition can avoid the

issue of MOS.

However, when N ≥ 3, there is no algorithm simultaneously sat-

sfies these three conditions according to AIT. On the other hand, if

DM meets the condition (1) and (2), there must exist a “dictator”.

t is hard to intuitively understand. Thus, we prove this below.

.2. Proof of nonentity of IDM

It should be noted that the proof can be expanded to the more

mages. For simplicity, we assume that there are three images X, Y

nd Z to be evaluated. In this section, we use X i > Y i to represent

he situation that i th subject assigns higher score or rank for image

than Y .

Assume that all subjects consider the image Z having the lowest

uality in R i , and the final results will be X > Y > Z and Y > X > Z ac-

ording to unanimity (condition (1)). We call this situation as state

(original state), which is also shown in Fig. 3 . Subsequently, as

hown in Fig. 3 , we set up a rule to alter the original state: keeping

he other part of original state fixed, the order of Z i is changed by

th subject (this subject is randomly selected each time) from the

ottom of R i to the top to form the new state, and this procedure

ill be looped for N times. C is the order of subjects randomly se-

ected by N times loops. For simplicity, j represents j th subject in C .

hen subject j changes the R j , the current state is converted to be

tate j . At the end of the loop (state N ), it is obvious that Z has the

ighest rank in the final result according to unanimity. Thus, dur-

ng the loop, there must be a state k , when subject k arranges Z k o the top of R k , making Z to be higher than X or Y or both of them

n the final result. We define the subject k as “pivotal subject”. Ac-

ually, once the rank of Z k changes, only one situation exists – Z

irectly transfers to the top position in the final result, which will

e proved in the next paragraph (step one). Also, if C is confirmed,

he “pivotal subject” will be determined. The following four steps

re used to prove that the “pivotal subject” is the only one person

ho dominates the final result.

Step one: In this step, the fact that the rank of Z directly moves

o the top position in the final result will be proved. Assuming

moves to the middle of the rank in final result after the “piv-

tal subject” alters the position of Z in R , the final result can
k k


Fig. 3. The definition of “pivotal subject”. The State k , k ∈ { 0 , 1 , . . . , N} , represents

that the front k subjects in order C change Z j to the top of R j , while latter subjects

remain unchanged position of Z j at the bottom of R j .

Fig. 4. Diagram of five states in step three. The first row is the initial state with

an arbitrary preference set. The second row is state (a), in which all Z j are at the

bottom of R j . The third, fourth and fifth rows are state (b), (c), (d) respectively. They

all meet the conditions that Z j are at the top of R j in group 1 and at the bottom of

R j in group 2. The preferences of “pivotal subject” k are X k > Y k > Z k , Z k > X k > Y k and

X k > Z k > Y k in these three states respectively.

(

t

a

c

“

i

fi

a

X

i

X

Z

Z

i

I

d

b

I

p

s

j

i

be X > Z > Y or Y > Z > X . We suppose that the result is X > Z > Y .

Now, we arrange all Y j to the front of X j ( Y j > X j , j = 1 , . . . , N ).

According to unanimity (condition (1)), the final result must have

Y > X . During the changes between X j and Y j , j = 1 , . . . , N, all Z j( j = 1 , . . . , N ) still stay in the top or bottom of R j . In other words,

these changes are irrelevant to the relative ranks of X j , Z j and Y j ,

Z j . The relative position of X, Z and Y, Z in the final result are un-

changed according to IIA (condition (2)), and the final result is

X > Z > Y , where X > Y . This is not in accordance with the above

conclusion generated by unanimity (condition (1)), where Y > X . So,

this assumption is failed.

Step two: In this step, we testify that the “pivotal subject” is

invariable regardless of the content in preference set ( R 1 , ... , R N )

of the original state, when meeting the next two premises. One is

that all Z j ( j = 1 , . . . , N ) are ranked at the bottom of R j and an-

other is that Z j is seriatim altered to the top of R j by the sequence

C . Due to Z j , j = 1 , . . . , N at the bottom of R j , X j > Z j and Y j > Z j are

fixed and independent of the content in preference set ( R 1 , ... , R N )

of the original state. For every state j ( j = 1 , . . . , N ), the relative

ranks of X j , Z j and Y j , Z j are the same as the situations described

in step one. Therefore, according to IIA, the relative ranks of X, Z

and Y, Z in final result should be equal to the situations in step

one. For this reason, the “pivotal subject” is the same one.

Step three: In this step, we certify that the “pivotal subject” can

individually determine the relative rank of X and Y in the final re-

sult under any situations. Now, given an arbitrary preference set

( R 1 , ... , R N ) as initial state, we assume that the “pivotal subject” k

prefers X k to Y k in this set. Subsequently, we can derive that the fi-

nal result is X > Y . To prove this, we make a series of modification

in preference set ( R 1 , ... , R N ), but keep the relative rank between X j

and Y j ( j = 1 , . . . , N ) fixed. The diagram of five states during the

proof are illustrated in Fig. 4 . Details are shown below.

We first move all Z j to the bottom of R j ( i = 1 , . . . , N ) from ini-

tial state to state (a), corresponding to from the first row to the

second row of Fig. 4 . Later, we define that the group of subjects be-

fore “pivotal subject” k as group 1 and the group of latter subjects

as group 2 (the second row of Fig. 4 ). To form state (b) from state

a), we shift the Z j evaluated by group 1 to the top of R j . Based on

he conclusion of step one and two, in the current state, Z is still

t the bottom of the rank in final result. Afterwards, the state (c)

an be generated by moving Z k to the top of R k , and the rank of

pivotal subject” therefore becomes Z k > X k > Y k (the primary order

s X k > Y k > Z k in state (b)). Now, Z runs to the top of the rank in

nal result, which are Z > X and Z > Y respectively. Then, we cre-

te the state (d) by changing the preference of “pivotal subject” to

k > Z k > Y k . It can be observed that the relative rank of X k and Z ks unchanged in state (b) and (d). Thus, the final result still has

> Z ( X > Z in state (b)). Similarly, the relative location of Y k and

k is also unchanged in state (c) and (d). So, the relative rank of

is above Y in the final result ( Z > Y in state (c)). Finally, combin-

ng state (b), (c), and (d), the final result in state (d) is X > Z > Y .

n addition, X > Y in the final result is fixed from beginning to end

uring the whole processes of step three. Because the relative rank

etween X j and Y j ( j = 1 , . . . , N ) is not altered by all operations.

n other words, the relative position of X and Y is completely de-

ended on the preference of “pivotal subject” k , regardless of other

ubjects’ opinions.

Step four: In this step, we prove the fact that the “pivotal sub-

ect” k can determine the relative rank of all testing images. That

s to say, we need to testify that there is only one “pivotal subject”


i

“

fi

i

i

H

T

g

o

e

r

T

o

t

v

v

a

a

4

o

t

i

a

a

d

n

i

p

i

s

v

e

m

f

i

w

y

u

b

d

n

m

t

s

i

i

i

c

t

t

r

fi

p

s

t

g

p

a

d

f

s

r

b

d

s

5

i

p

5

5

5

n all observers. According to previous three steps, there must be a

pivotal subject” k ′ determining the relationship of X and Z in the

nal result. Assume that k and k ′ are not the same subject and k ′ s one subject in group 1. Going back to the condition of state (b)

n step three, the final result should be Z > X because of Z k ′ > X k ′ .

owever, we have proved that the final result is X > Z in step three.

hese two results are contradictory. If the “pivotal subject” k ′ is in

roup 2, Z k ′ will have the lowest rank in R k ′ under the situation

f state (c) in step three. So, the final result should be X > Z . Nev-

rtheless, Z k is at the top of the R k at this moment, and the final

esult is Z > X . This is also contradictory. Therefore, we have k = k ′ .his argument can be suitable for the other pairs. So, there is only

ne “pivotal subject”.

In fact, this proof can be expanded to the more images. Under

hese situations, X and Y can be regarded as two image vectors,

iz., � X and

� Y , and the added images are imported into these two

ectors. Consequently, the above-mentioned process of proof can

lso work for the more images.

In conclusion, IDM meets condition (1) unanimity and (2) IIA,

nd is a “dictatorship” algorithm.

. Reflections on existing methods

As described in Section 2 , both MOS and PC methods have their

wn defects. MOS may generate a dictator or a small group of dic-

ators, while PC method suffers from the issue that two compared

mages can be influenced by the other irrelevant images. However,

ny ideal method satisfying unanimity and IIA will also produce

dictator, which has been proved in Section 3 . Kirman and Son-

ermann [45] also stated that, unless the subject number is infi-

ite, no matter how small, of the total number of subjects, there

s always a group forming a smaller proportion of the total, whose

references determine the final result. However, the situation hav-

ng an infinite number of subjects is impossible.

Thus, for subjective IQA researches, such as natural image,

tereoscopic 3D image and screen content image studies, our ad-

ocation is to recruit a small group of experts (a.k.a the “golden

yes”) who have abundant expertise in the image quality assess-

ent, as a set of “dictators” to evaluate the testing images. Also,

or some professional image studies, like medical images and THz

mages [39] , we consider that the experts should be the subjects

ho have sufficient experience in corresponding area at least one

ear. In practice, due to the fact that the subjective evaluation is

sually inconvenient, time-consuming and expensive, a large num-

er of objective IQA methods are put forward. However, with the

evelopment of IQA, plenty of different application scenarios and

ew distortion types are presented. The previous image databases

ay not meet the increasing requirements. The researchers need

o continually establish relevant databases and invite subjects to

core the distorted images. This produces a heavy workload. Luck-

ly, our suggestion can dramatically reduce cost and save time.

In addition, our suggestion can reduce the nondeterminacy and

ncrease the accuracy. The inexperienced observers including those

nstructed by experts lack the essential knowledge of image pro-

essing, and their attention may not focus on the key points. Thus,

he scores graded by these subjects may bring great deviations to

he final result. In some cases, some of subjects grading images

andomly will become the so-called “dictators” and dominate the

nal result according to the previous analysis. In contrast, the ex-

erts precisely know the purpose of task and targetedly give the

core. In this sense, we prefer the certain experts as the “dictators”

o the unpredictable “dictators” coming from non-expert observers’

roup during the subjective IQA. Therefore, if we invite several ex-

erts rather than a large group of nonspecialists, the more accurate

nd reliable results can be yielded, which will benefit further aca-

emic researches.

In the other professional occasions, our advocation can be also

ound elsewhere. For example, in the singing competitions, the

ponsors invite several specialists as the judges, not a group of

andomly selected audiences. Furthermore, the Oscar Award of the

est director is only evaluated by a small group of well-known

irectors. Consequently, the technical questions need the corre-

ponding experts to give the more accurate judgement.

. Experiments

In order to verify the reliability of our suggestions, two exper-

ments conducting on general images and professional images are

erformed.

.1. Experiment on general images

.1.1. Subjective test

(1) Image Sources: We select the Gaussian blur images subset

in LIVE database [23] as testing images for the general im-

age experiment. LIVE database is a widely recognized im-

age database and the distortion of Gaussian blur is a com-

mon distortion type in the daily life. The testing images con-

tains 174 images (29 reference images and their correspond-

ing distorted images).

(2) Experimental environment and configurations: In the light

of ITU-R REC. BT.500-13 [19] , single-stimulus (SS) method is

used in our experiment. All images are evaluated including

reference images. the viewing distance is controlled in 2.5-3

screen heights and the illuminance level of the experimen-

tal environment keeps normal. We design an user interface

using MATLAB to display the images and obtain scores. The

observers offer their estimated quality by moving the slider

in the interface. The quality scale is divided into five-equal

partitions, which are labeled as “Bad”, “Poor”, “Fair”, “Good”,

and “Excellent”. The scale of the slide is linearly mapped to

the interval 1–100.

(3) Experimental process: A total of 26 inexperience subjects

having no related expertise are recruited from the under-

graduate students at Shanghai Jiao Tong University. Also, 3

experts who have abundant experience in image quality as-

sessment and saliency detection are invited. Before testing,

the objective and procedure of this experiment are briefly

introduced to each viewer. Besides, subjects preview 12 sam-

ple images (obtained from 2 reference images) of different

quality to avoid unstable ratings at the beginning of the

tests. These training images are exclusive of the subsequent

testing stage. In the testing stage, the order of the images is

random and the randomization is different for each viewer.

.1.2. Data processing

(1) Subject rejection: In order to remove the extremely abnor-

mal raw scores, we choose the outlier detection method pro-

vided by VQEG [46] in Annex 2 of ITU-R Rec. BT.500-13 [19] .

First, the distribution of raw scores is evaluated by β2 test-

ing. Then, based on publication [19] we perform the subse-

quent procedure of the outlier rejection algorithm to discard

observers whose scores significantly distant from the raw

average scores. Overall, none of the twenty-six observers are

removed as outliers.

(2) DMOS scores: After the outlier detection, we convert the raw

scores to raw difference scores

d i j = r i ref ( j) − r i j (3)

where r ij is the raw score for j th image graded by i th subject,

and r i ref( j ) provides the raw score of the reference image cor-

responding to the j th distorted image marked by i th subject.


Fig. 5. The scatter plot for DMOS in LIVE and in our experiment (set of naive sub-

jects). The red line is curve fitted with the five parameter monotonic logistic func-

tion. The blue dash curves are 95% confidence intervals. (For interpretation of the

references to colour in this figure legend, the reader is referred to the web version

of this article.)

Y

S

M

t

p

v

u

c

s

t

s

t

e

M

e

v

F

l

t

t

T

c

i

t

t

b

y

t

t

“

r

b

L

g

t

s

e

j

A

e

a

a

D

r

i

e

i

5

c

t

To eliminate of the individual difference, the raw difference

scores are turned into Z scores [47] .

z i j = (d i j − d̄ i ) /σi (4)

where d̄ i is the average score of all raw difference scores of

distorted images assigned by the observer i , and σ i repre-

sents the standard deviation. After computation of Z-scores,

we linearly rescale them in the range of [0, 100] [48] . As-

suming that the distribution of Z-scores graded by a subject

follows a standard Gaussian [49] , 99% of the scores should

stay in the range [ −3 , +3] . The re-scaling difference scores

are computed by:

s i j =

100(z i j + 3)

6

(5)

Finally, the DMOS value of each distorted image is calculated

as the mean of the re-scaling difference scores:

DMOS j =

1

N i

N i ∑

i =1

s i j (6)

where N i is the number of residual subjects after the subject

rejection.

5.1.3. Subjective ratings analysis

Before investigating the relationship between the results of

naive subjects and experts, we compare these DMOS values in this

experiment with the DMOS values provided in the LIVE database to

verify the reliability of our experimental data. To eliminate the in-

fluence of scale, we use five-parameter { β1 , β2 , β3 , β4 , β5 } mono-

tonic logistic function to map the DMOS values of this experiment

and LIVE’s DMOS:

(x ) = β1

(0 . 5 − 1

1 + e β2 (x −β3 )

)+ β4 x + β5 (7)

where x and Y are the DMOS in this experiment and mapped

scores. i

Table 1

Relationships among the DMOS values in LIVE database and th

general image experiment. The α in F-test and T-test is 0.05. T

PLCC SROC

DMOS (LIVE) vs. (Naive Subjects) 0.9557 0.946

DMOS (LIVE) vs. (Experts) 0.9510 0.940

DMOS (Naive Subjects) vs. (Experts) 0.9608 0.950

Subsequently, Pearson Linear Correlation Coefficient (PLCC),

pearman Rank-Order Correlation Coefficient (SROCC) and Root

ean Squared Error (RMSE) are applied to measure the predic-

ion accuracy, prediction monotonicity and the error in the fitting

rocess respectively. A good correlation is expected to obtain high

alues (less than or equal to 1) in SROCC and PLCC, while low val-

es (greater than 0) in RMSE. In addition, we use three statisti-

al methods including T-test, F-test and correlation coefficient ( R

quare) to analyze the experimental data [50] . The F-test and T-

est are used to compute the statistical significance of two sets of

cores. With the 95% confidence (the parameter α in F-test and T-

est are equal to 0.05), two sets of scores have no significant differ-

nces, when the P-values of F-test and T-test are larger than 0.05.

oreover, a value approach to 1 for R square indicates a good lin-

ar correlation between two sets of scores.

The scatter plot of DMOS values in LIVE database versus DMOS

alues of inexperience subjects in our experiment is represented in

ig. 5 , where the red curves are fitted with the above-mentioned

ogistic function and 95% confidence intervals are represented by

he blue curves. The statistical results regarding the correlation of

hese two sets of DMOS values are shown in the second row in

able 1 . The PLCC and SROCC are around 0.95, which shows good

orrelation. The P-values of F-test and T-test are both beyond 0.05,

ndicating that there are no significant differences. Also, the R 2 of

hese two datasets is 0.9097, which reveals a good linear correla-

ion. It is demonstrated that the data in our experiment is reliable

ased on these statistical results.

After verifying the reliability of the experimental data, we anal-

se the relationship between the DMOS scored by the experts and

he inexperienced observers. Similarly, the above-mentioned statis-

ical approaches are employed in the comparison of “Experts” and

Naive Subjects”. We analyse not only the relationship between the

esults of experts and inexperienced subjects in our experiment,

ut also the correspondence of the data provided by “Experts” and

IVE. Fig. 6 illustrates the scatter plots of these two comparable

roups with the nonlinear fitting curve and the 95% confidence in-

ervals. The last two rows in Table 1 summarize the statistical re-

ults of the comparisons. From Table 1 , it is observed that the pref-

rences of “Experts” correlates closely with those of “Naive Sub-

ects” and LIVE. All of the PLCC and SROCC values are around 0.95.

mong them, the PLCC between “Experts” and “Naive Subjects” is

ven higher than 0.96. Also, the P-values of F-test and T-test are

ll far beyond 0.05 and the correlation coefficient R 2 maintain 0.9

pproximately.

Consequently, these three sets of DMOS namely DMOS of LIVE,

MOS of “Naive Subjects” and DMOS of “Experts” present high cor-

elation with each other. The obtained results demonstrate that it

s feasible to recruit several experts replacing a good deal of in-

xperienced observers for evaluating the quality of general natural

mages.

.2. Experiment on professional images

With the development of digital image processing, many spe-

ial images also need to be assessed. Unlike the general distor-

ion images, these images are more professional and hardly viewed

n daily life. Terahertz (THz) image is one of these professional

e DMOS of the set of naive subjects and experts in our

he data of the F-test and T-test is the P-value.

C RMSE F-test T-test R 2

2 4.6304 0.0785 0.2138 0.9097

5 4.8637 0.4202 0.2356 0.8993

1 3.8724 0.1123 0.5651 0.9164


Fig. 6. Scatter plot for DMOS in our experiment. (a) The scatter plot for the set of LIVE and the set of experts in our general experiment. (b) The scatter plot for the set

of naive subjects and experts in our general experiment. The red line is curve fitted with the five parameter monotonic logistic function. The blue dash curves are 95%

confidence intervals. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 7. Several samples for presenting THz images with different overall quality and

different illegal substances. (a)–(c) Three THz sample images with different overall

quality from good to bad. In training stage, these three images are endowed with

5, 3, 1 score for reference respectively. (d)Volunteer carrying hammer in left ab-

domen. (e)Subject taking phone in right chest. (Red circles will not appear in the

test images.).

i

o

n

b

q

t

i

l

B

o

g

5

5

m

o

t

f

r

M

w

t

i

j

a

mages, which has wide application prospects in chemistry, biol-

gy, physics and medicine [51] . Although the THz imaging tech-

ique is an effective analysis tool in security inspection [52] and

iological diagnosis [53] , the present THz images always have low

uality due to the equipment interference and environmental fac-

ors [54] . Thus, the quality assessment on THz images is mean-

ngful. In fact, we have done some preliminary studies and estab-

ished a special database with respect to THz images (THSID) [39] .

ased on these professional images, we implement an experiment

n them to research the relationship between “Experts” and the

roup of “Naive Subjects”.

.2.1. Subjective test

(1) Image sources: 181 THz security images in THSID database

[39] are served as the testing images for the professional

image experiment. A total of 176 images contain illegal sub-

stances hiding in various location in human body and the

volunteers in the rest of images do not carry goods. Five

sample images are illustrated in Fig. 7 . The first three images

are exhibited to THz images of different overall quality from

good to bad. It is easily find that the noise in Fig. 7 (a) is

quite less than the noise in Fig. 7 (c). In training stage, these

three images are endowed with 5, 3, 1 score for reference

respectively. In addition, the conditions of substances in THz

images are different. Some substances are easy to recognize,

like Fig. 7 (d), while as shown in Fig. 7 (e) some goods are

hardly to be distinguished from human body.

(2) Experimental environment and configurations: The experi-

mental configurations are basically the same as the exper-

iment of general images. For THz images, we use the five-

grade scale to obtain the scores.

(3) Experimental process: 17 inexperienced subjects and 3 pro-

fessional security inspectors are invited to attend the sub-

jective experiments. The security inspectors have long-term

observation experience in inspecting the prohibited goods at

some specific places, such as airport, subway station, etc.

The process of the experiment is also similar to the gen-

eral images test. However, we gather two different scores

for each THz image. One is the score of the overall quality

of each image and the other one is the score of local prohib-

ited goods. The marking standard of overall quality primarily

complies with the suggestions of ITU-R Rec. BT.500-13 [19] .

According to the particularity of the noise of THz security

images, we make minor adjustment for overall quality crite-

rion and the specific standard for the local quality of illegal

substances. The subjective evaluation criterion of the overall

quality and the local quality of illegal substances are set in

Table 2 .

.2.2. Data processing

After the experiments, we first apply the same subject rejection

ethod in Section 5.1.2 to remove the abnormal raw scores. One

f seventeen inexperienced subjects are regarded as outliers. Then,

he MOSs of overall quality of THz security images are calculated

rom the raw scores of the remaining naive subjects and experts

espectively:

OS j =

1

N i

N i ∑

i =1

r i j (8)

here N i is the number of residual subjects after the subject rejec-

ion and r ij denotes the raw score of j th image assigned by subject

.

On the other hand, we calculate the accuracy rate of each sub-

ect on distinguishing illegal goods. Four different categories are

pplied to define the judgements of the observers: (1) There is an


Table 2

The subjective evaluation criteria employing five-grade scale of overall quality and the local quality of illegal substances

for THz security image.

Score Quality Standards

Overall quality Sharpness of prohibited goods

5 Excellent Acceptable Distinct

4 Good Unacceptable, but not annoying Fairly clear, but cognizable

3 Fair Slightly annoying Perceptible, but do not know what it is

2 Poor Annoying Feel something, but uncertainty

1 Bad Quite annoying Imperceptible

Fig. 8. The scatter plot for MOS of the set of naive subjects and experts in our THz

image experiment. The red line is curve fitted with the five parameter monotonic

logistic function. The blue dash curves are 95% confidence intervals. (For interpre-

tation of the references to colour in this figure legend, the reader is referred to the

web version of this article.)

Table 3

The statistical results of the correlation between “Naive Subjects” and “Experts” on

perceiving the overall quality of THz images. (The α in ANOVA is 0.05.).

PLCC SROCC RMSE R 2 ANOVA

F F crit P-value

0.9092 0.9026 0.2241 0.8172 0.6017 3.8674 0.4384

o

c

i

i

a

c

a

c

j

h

a

v

f

l

t

u

i

g

h

t

t

o

s

i

t

o

3

6

t

a

d

i

t

n

s

a

w

i

i

a

t

illegal substance, and subject finds it; (2) There is an illegal sub-

stance, but the subject does not find it; (3) There is no illegal sub-

stance, but subject believes that he finds one; (4) There is no illegal

substance, and the subject thinks so. Two binary variables P ij and

Q j are represented the judgement of i th subject on j th image and

existence of illegal good in j th image.

P i j =

{0 if ω i j = 1

1 if ω i j = 2 , 3 , 4 , 5

,

Q j =

{0 if no illegal good in jth image

1 otherwise (9)

where ω ij denote the score of the local illegal substances ranked

by i th observers for image j . The results of judgments each time

can be expressed by:

T i j = P i j � Q j (10)

where, � is the XNOR gate. The accuracy rate of subject i on distin-

guishing illegal substances A i is computed by the following equa-

tion:

A i =

∑ N j j=1

T i j

N j

(11)

where, N j represents the number of THz images.

5.2.3. Subjective ratings analysis

For obtaining the correlation between the preferences of inex-

perienced subjects and experts on assigning the overall quality of

Thz security images, we apply the five statistical approaches: PLCC,

SROCC, RMSE, ANOVA and correlation coefficient ( R 2 ). ANOVA is a

common statistical approach to measure the statistical significance

f two sets of data. When the F value in ANOVA is smaller than

ritical F value (3.8674 with the 95% confidence) and the P–value

s larger than 0.05, these two sets of data have no statistical signif-

cance. Fig. 8 illustrates the scatter plot of the set of naive subjects

nd experts in THz images experiment with the nonlinear fitting

urve and the 95% confidence intervals. Also, the statistical results

re listed in Table 3 . From Fig. 8 and Table 3 , it is obvious that the

orrelation of the scores graded by experts and inexperienced sub-

ects maintains a high level, where the PLCC and SROCC value is

igher than 0.9. Also, the F (0.8172) is lower than F-crit (3.8674)

nd P-value (0.4384) is larger than 0.05, which mean that the MOS

alues between experts and naive subjects have no significant dif-

erence. In addition, the R 2 (0.8172) reveals the favorable linear re-

ationship.

On the other side, we calculate the accuracy of existence of

he illegal substances. The accuracy rate of the experts group is

p to 75.14%, while the naive subjects group is 60.19%. Therefore,

n terms of the professional object extraction, there exists a large

ap between the experts and inexperienced subjects. This gap is

ard to be bridged by the short time training. In fact, for both

he machine recognition and artificial perception, the sharpness of

he illegal substances is an important part in the integrated quality

f THz security images. Based on our experiments, the scores as-

igned by experts will be more accurate and reliable. Consequently,

t is appropriate to recruit a small group of experts for evaluate

he quality of some professional images. The experimental results

n THz image dataset validate the proposed theory in Section 2,

and 4 .

. Conclusion

In this paper, depending on the three examples, we point out

he problems of common subjective IQA methods namely MOS

nd PC. The problem in MOS method lies in the majority subor-

inate minority. While in PC method, two compared images are

nfluenced by irrelevant images. Also, based on AIT, we prove that

here must be a “pivotal subject” satisfying the conditions of una-

imity and IIA for any ideal ranking or scoring methods. Thus, we

uggest that ground truth of testing images can be collected from

small group of subjects only containing several experts, which

ill be time-saving and economic as well as reliable. Two exper-

ments conducting on general distorted images and professional

mage (THz image) are performed to verify the reliability of our

dvocation. Based on the results of rigorous statistical approaches,

he subjective ratings of a small group of experts are shown to


b

t

a

q

A

e

6

R

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

e highly related to the preferences of inexperienced subjects. Fur-

hermore, the evaluation results of experts may be more accurate

nd credible than the naive viewers in some professional image

uality assessment tasks.

cknowledgment

This work was supported in part by National Natural Sci-

nce Foundation of China under Grant nos. 61422112 , 61371146 ,

1521062 , and 61527804 .

eferences

[1] W. Lin , C.-C.J. Kuo , Perceptual visual quality metrics: a survey, J. Vis. Commun.

Image Represent. 22 (4) (2011) 297–312 . [2] Y. Fang , J. Yan , J. Liu , S. Wang , Q. Li , Z. Guo , Objective quality assessment of

screen content images by uncertainty weighting, IEEE Trans. Image Process.Publ. IEEE Signal Process. Soc. 26 (4) (2017) 2016–2027 .

[3] S. Wang , K. Ma , H. Yeganeh , Z. Wang , W. Lin , A patch-structure representa-tion method for quality assessment of contrast changed images, IEEE Signal

Process. Lett. 22 (12) (2015) 2387–2390 .

[4] K. Ma , T. Zhao , K. Zeng , Z. Wang , Objective quality assessment for color-to-grayimage conversion, IEEE Trans. Image Process. Publ. IEEE Signal Process. Soc. 24

(12) (2015) 4673–4685 . [5] Z. Wang , A.C. Bovik , H.R. Sheikh , E.P. Simoncelli , Image quality assessment:

from error visibility to structural similarity, IEEE Trans. Image Process. 13 (4)(20 04) 60 0–612 .

[6] K. Gu , L. Li , H. Lu , X. Min , W. Lin , A fast reliable image quality predictor by

fusing micro-and macro-structures, IEEE Trans. Ind. Electron. 64 (5) (2017)3903–3912 .

[7] R. Soundararajan , A.C. Bovik , Rred indices: reduced reference entropic differ-encing for image quality assessment, IEEE Trans. Image Process. Publ. IEEE Sig-

nal Process. Soc. 21 (2) (2012) 517–526 . [8] G. Zhai , X. Wu , X. Yang , W. Lin , W. Zhang , A psychovisual quality metric in

free-energy principle., IEEE Trans. Image Process. Publ. IEEE Signal Process. Soc.

21 (1) (2011) 41–52 . [9] X. Min , K. Gu , G. Zhai , M. Hu , X. Yang , Saliency-induced reduced-reference

quality index for natural scene and screen content images, Signal Process.(2017) .

[10] Y. Fang , K. Ma , Z. Wang , W. Lin , Z. Fang , G. Zhai , No-reference quality assess-ment of contrast-distorted images based on natural scene statistics, IEEE Signal

Process. Lett. 22 (7) (2014) 838–842 .

[11] Q. Wu , H. Li , F. Meng , K.N. Ngan , B. Luo , C. Huang , B. Zeng , Blind image qual-ity assessment based on multichannel feature fusion and label transfer, IEEE

Trans. Circuits Syst. Video Technol. 26 (3) (2016) 425–440 . [12] Q. Wu , Z. Wang , H. Li , A highly efficient method for blind image qual-

ity assessment, in: IEEE International Conference on Image Processing, 2015,pp. 339–343 .

[13] Q. Wu , H. Li , F. Meng , K.N. Ngan , S. Zhu , No reference image quality assess-ment metric via multi-domain structural information and piecewise regres-

sion, J. Vis. Commun.& Image Represent. 32 (C) (2015) 205–216 .

[14] X. Min , G. Zhai , K. Gu , Y. Fang , X. Yang , X. Wu , J. Zhou , X. Liu , Blind qualityassessment of compressed images via pseudo structural similarity, in: IEEE In-

ternational Conference on Multimedia and Expo, 2016, pp. 1–6 . [15] Y. Fang , Z. Chen , W. Lin , C.W. Lin , Saliency detection in the compressed domain

for adaptive image retargeting, IEEE Trans. Image Process. Publi. IEEE SignalProcess. Soc. 21 (9) (2012) 3888 .

[16] Y. Fang , W. Lin , Z. Chen , C.M. Tsai , C.W. Lin , A video saliency detection model

in compressed domain, IEEE Trans. Circuits Syst. Video Technol. 24 (1) (2014)27–38 .

[17] Y. Fang , C. Zhang , J. Li , J. Lei , S.M. Da , C.P. Le , Visual attention modeling forstereoscopic video: a benchmark and computational model., IEEE Trans. Image

Process. PP (99) (2017) . 1–1 [18] X. Min , G. Zhai , Z. Gao , K. Gu , Visual attention data for image quality assess-

ment databases, in: International Symposium on Circuits and Systems, 2014,

pp. 894–897 . [19] Assembly , ITU Radiocommunication, Methodology for the subjective assess-

ment of the quality of television pictures, Int. Telecommun. Union, 2012 . 20] N. Ponomarenko , V. Lukin , A. Zelensky , K. Egiazarian , M. Carli , F. Battisti ,

TID2008–A database for evaluation of full-reference visual quality assessmentmetrics, Adv. Modern Radioelectronis 10 (4) (2009) 30–45 .

[21] N. Ponomarenko , L. Jin , O. Ieremeiev , V. Lukin , K. Egiazarian , J. Astola , B. Vozel ,

K. Chehdi , M. Carli , F. Battisti , C.-C.J. Kuo , Image database TID2013: peculiari-ties, results and perspectives, Signal Process. 30 (2015) 57–77 .

22] K. Gu , G. Zhai , W. Lin , M. Liu , The analysis of image contrast: from qualityassessment to automatic enhancement, IEEE Trans. Cybernetics 46 (1) (2016)

284–297 .

23] H.R. Sheikh, Z. Wang, L. Cormack, A.C. Bovik, Live image quality assessmentdatabase release 2, 2005.

[24] E.C. Larson , D.M. Chandler , Most apparent distortion: full-reference imagequality assessment and the role of strategy, J. Electron. Imag. 19 (1) (2010) .

011006–011006 25] K. Gu , G. Zhai , X. Yang , W. Zhang , Hybrid no-reference quality metric for

singly and multiply distorted images, IEEE Trans. Broadcast. 60 (3) (2014) 555–567 .

26] J. Wang , A. Rehman , K. Zeng , S. Wang , Z. Wang , Quality prediction of asym-

metrically distorted stereoscopic 3D images, IEEE Trans. Image Process. Publ.IEEE Signal Process. Soc. 24 (11) (2015) . 3400–14.

[27] J. Wang , S. Wang , Z. Wang , Asymmetrically compressed stereoscopic 3Dvideos: quality assessment and rate-distortion performance evaluation, IEEE

Trans. Image Process. 26 (3) (2017) 1330–1343 . 28] R. Song , H. Ko , C.C.J. Kuo , Mcl-3D: a database for stereoscopic image quality

assessment using 2d-image-plus-depth source, J. Inf. Sci. Eng. 31 (5) (2015) .

29] J. Wang , S. Wang , K. Ma , Z. Wang , Perceptual depth quality in distorted stereo-scopic images, IEEE Trans. Image Process. Publ. IEEE Signal . Soc. 26 (3) (2017)

1202 . 30] H. Yang , Y. Fang , W. Lin , Perceptual quality assessment of screen content im-

ages, IEEE Trans. Image Process. 24 (11) (2015) 4 408–4 421 . [31] X. Min , K. Ma , K. Gu , G. Zhai , Z. Wang , W. Lin , Unified blind quality assessment

of compressed natural, graphic, and screen content images, IEEE Trans. Image

Process. 26 (11) (2017) 5462–5474 . 32] K. Gu , S. Wang , H. Yang , W. Lin , G. Zhai , X. Yang , W. Zhang , Saliency-guided

quality assessment of screen content images, IEEE Trans. Multimed. 18 (6)(2016) 1098–1110 .

[33] K. Gu , J. Zhou , J. Qiao , G. Zhai , W. Lin , A. Bovik , No-reference quality assess-ment of screen content pictures, IEEE Trans. Image Process. (2017) .

34] C.C. Wu , K.T. Chen , Y.C. Chang , C.L. Lei , Crowdsourcing multimedia qoe evalua-

tion: a trusted framework, IEEE Trans. Multimed. 15 (5) (2013) 1121–1137 . [35] Q. Xu , J. Xiong , Q. Huang , Y. Yao , Online hodgerank on random graphs for

crowdsourceable qoe evaluation, IEEE Trans. Multimed. 16 (2) (2014) 373–386 .

36] H. Yang , W. Lin , C. Deng , L. Xu , Study on subjective quality assessment of dig-ital compound images, in: IEEE International Symposium on Circuits and Sys-

tems (ISCAS), 2014, pp. 2149–2152 .

[37] K.J. Arrow , A difficulty in the concept of social welfare, J. Political Econ \ . (1950)328–346 .

38] K. J. Arrow , Social Choice and Individual Values, 12, Yale university press, 2012 .39] M. Hu, X. Min, G. Zhai, W. Zhu, Z. Wang, X. Yang, G. Tian, Terahertz security

image quality assessment by no-reference model observers, arXiv:1707.03574(2017).

40] A.D. Taylor , Social Choice and the Mathematics of Manipulation, Cambridge

University Press, 2005 . [41] N.N. Yu , A one-shot proof of arrow’s impossibility theorem, Economic Theory

50 (2) (2012) 523–525 . 42] N.A. Barr , The Economics of the Welfare State, Stanford University Press, 1998 .

43] A. Sen , Markets and freedoms: achievements and limitations of the marketmechanism in promoting individual freedoms, Oxford Econ. Papers (1993)

519–541 . 44] D. Saari , Decisions and Elections: Explaining the Unexpected, Cambridge Uni-

versity Press, 2001 .

45] A.P. Kirman , D. Sondermann , Arrow’S theorem, many agents, and invisible dic-tators, J. Econ. Theory 5 (2) (1972) 267–277 .

46] VQEG, Final report from the vqeg on the validation of objective models ofvideo quality assessment, Pase II, 2003.

[47] H.R. Sheikh , M.F. Sabir , A.C. Bovik , A statistical evaluation of recent full refer-ence image quality assessment algorithms, IEEE Trans. Image Process. 15 (11)

(2006) 3440–3451 .

48] L. Ma , W. Lin , C. Deng , K.N. Ngan , Image retargeting quality assessment: astudy of subjective scores and objective metrics, IEEE J. Selected Topics Sig-

nal Process. 6 (6) (2012) 626–639 . 49] K. Seshadrinathan , R. Soundararajan , A.C. Bovik , L.K. Cormack , Study of subjec-

tive and objective quality assessment of video, IEEE Trans. Image Process. 19(6) (2010) 1427–1441 .

50] D.J. Sheskin , Handbook of Parametric and Nonparametric Statistical Procedures,

crc Press, 2003 . [51] B.M. Fischer , H. Helm , P.U. Jepsen , Chemical recognition with broadband thz

spectroscopy, Proc. IEEE 95 (8) (2007) 1592–1604 . 52] D. Suzuki , S. Oda , Y. Kawano , A flexible and wearable terahertz scanner, Nat.

Photonics 10 (12) (2016) 809–813 . 53] K. Moldosanov , A. Postnikov , V. Lelevkin , N. Kairyev , Terahertz imaging tech-

nique for cancer diagnostics using frequency conversion by gold nano-objects,

Ferroelectrics 509 (1) (2017) 158–166 . 54] L. Hou , X. Lou , Z. Yan , H. Liu , W. Shi , Enhancing terahertz image quality by

finite impulse response digital filter, in: Infrared, Millimeter, and Terahertzwaves (IRMMW-THz), 2014, pp. 1–2 .

https://doi.org/10.13039/501100001809

http://refhub.elsevier.com/S0165-1684(17)30416-4/sbref0001













































































































































































































http://arxiv.org/abs/1707.03574

















































Arrow•s Impossibility Theorem inspired subjective image ...static.tongtianta.site/paper_pdf/41663c7e-d0c8-11e9-8ff8-00163e08… · Arrow’s Impossibility Theorem inspired subjective

Documents