This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
a Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University, China b School of Electronic Information Engineering, Tianjin University, China
a r t i c l e i n f o
Article history:
Received 31 July 2017
Revised 13 November 2017
Accepted 1 December 2017
Available online 8 December 2017
Keywords:
Subjective image quality assessment
Arrow’s Impossibility Theorem
Pivotal subject
Mean opinion score
Paired comparison
Terahertz security image
a b s t r a c t
A large number of subjective image quality assessment databases have been constructed in the last
decade, in which the Mean Opinion Score (MOS) (with single or double stimulus), and Paired Comparison
(PC) are two dominant approaches for collecting the ground truth quality ratings and usually up to 15 or
more subjects are needed for each image. In this paper, we show the fact that there is a potential “dic-
tatorship” risk of using such averaging-over-multiple-rating type of method. Using Arrow’s Impossibility
Theorem (AIT), we prove that meeting of the unanimity and independence of irrelevant alternatives (IIA)
will generate a “pivotal subject”, who in fact determines the final rank of image quality. We also prove
that no an ideal democratic approach to synthesize the whole opinions of subjects. Therefore, we advo-
cate to recruit a small number of experts (a.k.a the “golden eyes”) for subjective viewing tests. In order
to verify the reliability of our proposal, experiments on two different databases conducting on the gen-
eral distorted images and professional images (here is Terahertz security image) are performed. In each
experiment, the raw scores of images are subjectively assigned by at least 15 inexperienced viewers and
3 experts, and meanwhile the MOS or difference mean opinion score (DMOS) are obtained. Afterwards,
the correlation of the scores rated by naive subjects and experts is analyzed. For general image experi-
ment, it is revealed that DMOS of inexperience viewers are highly related to DMOS of experts based on
six effective evaluation metrics. In professional image experiment, the preferences of experts also main-
tain favourable relevance with the opinions of inexperienced viewers in overall quality of THz image.
Moreover, considering the quality assessments of illegal substance regions in THz images, the experts
have higher accuracy than the inexperienced observers. In conclusion, the results of two validation ex-
periments verify that a small number of experts are more suitable for assessing the perceptual quality of
images, which can reduce cost and simplify procedure of creating databases.
194 W. Zhu et al. / Signal Processing 145 (2018) 193–201
Fig. 1. Illustrations of the existence of “dictator” in MOS method. (a) Example for
describing one “dictator” in the list of subjective score; and (b) Example for describ-
ing a set of “dictators” in subjective score. X and Y indicate distorted images to be
scored. Amy, Bob, Carol, David, and Ella are subjects.
2
2
n
M
w
t
m
p
s
C
i
X
(
s
d
E
t
s
t
m
A
t
s
2
t
n
>
r
c
e
c
t
o
m
2
t
et al. [34] established a crowdsourceable framework to quantify
the Quality of Experience (QoE) of multimedia content. Xu et al.
[35] exploited the randomized PC method to build an online rank-
ing scheme and gained quality of testing images from the Internet.
Yang et al. [36] employed PC method to establish a new compound
image database.
The number and the type of observers are also significant fac-
tors during subjective IQA. In general, there is one common sense
for subjective IQA tests: more observers may acquire more accurate
and more democratic results. According to the recommendation of
International Telecommunication Union (ITU), at least 15 viewers
who have no expertise in the image processing are employed to
guarantee the consistency in statistics [19] .
However, this recommendation may be inconsistent with a clas-
sical theory in social choice called Arrow’s Impossibility Theorem
(AIT) [37,38] . AIT was proposed by Kenneth Arrow, who won the
Nobel Economics Prize in 1972. AIT reveals an interesting and
counterintuitive conclusion: there is no ranked order voting sys-
tem can simultaneously satisfy four democratic principles namely
unrestricted domain, unanimity, independence of irrelevant alter-
natives (IIA), and non-dictatorship (see Section 3 for more details),
when three or more candidates exist. In fact, in any democratic
voting system, the preference of each subject is equally impor-
tant, thus the former three principles can be satisfied. This will
lead to the dictatorial voting system. In addition, with the increase
of voters and candidates in number, the “dictator” becomes more
powerful.
In this paper, inspired by AIT, we point out the ignored but
important issues for MOS and PC methods. The problem in MOS
method lies in the majority subordinate minority. While in PC
method, two compared images are influenced by irrelevant images.
First, we use three cases to introduce these issues. Subsequently,
we prove that there will be one “pivotal subject” among viewers if
the grading or ranking system meets the principles of the minority
subordinate to the majority and IIA. Based on the these above
two facts, we advocate to reduce the number of observers and
recruit a small group of experts who have substantial expertise in
image processing (a.k.a the “golden eyes”) to assess the quality of
testing images. In order to verify the reliability of our proposal,
two experiments carrying out on general distorted images and
professional images are performed. In the general distorted image
experiment, we select the subset of gaussian blur images in LIVE
[23] and recruit 26 inexperienced subjects and 3 experts to rate on
a hundred-grade scale. After processing the subjective raw scores,
the DMOS values are obtained. Based on them, we analyze the
reliability of experimental data and the correlation between the
ratings of experts and inexperienced viewers. In the professional
image experiment, 181 Terahertz (THz) security images [39] are
regarded as testing images. We invite 17 inexperienced observers
and 3 professional security inspectors to subjectively assess the
quality of THz images. The raw scores of THz images are gathered
into two aspects: the overall quality of human perception and local
sharpness of illegal substances. After disposing the raw data, the
MOS values of overall quality are acquired for each image. More-
over, the accuracy of judgement of existence of illegal goods are
computed for each subject. Based on the MOS values and the accu-
racy rates, the relationships between naive subjects and experts are
analyzed.
The remainder of this paper is organized as follows. In
Section 2 , the issues of MOS and PC are discussed by three cases.
Section 3 testifies the dictatorial property of any ideal grading
and ranking methods based on AIT. Detailed analysis of advoca-
tion is given in Section 4 . Two subjective experiments are per-
formed and the correlations between experts and inexperienced
subjects are evaluated in Section 5 . Finally, Section 6 concludes this
paper.
. Issues of MOS and PC methods
.1. Issue of MOS
The aim of MOS is to average all graders’ opinions, just as its
ame implies. The definition of MOS is:
OS j =
1
N
N ∑
i =1
s i, j , (1)
here s i, j represents the score of image j graded by subject i, N is
he number of subjects.
As one fair method, MOS should reflect the preferences of the
ajority. This means that the opinions of subjects are equally im-
ortant. However, this assumption fails in some real situations. A
imple case is shown in Fig. 1 (a). Five observers called Amy, Bob,
arol, David, and Ella score two distorted images named X and Y
n this case. All subjects score 2 point for Y . Ella grades 8 point for
, while the other subjects score 2 point for X . According to Eq.
1) , the quality of X is better than Y (2.4 > 2). Obviously, this re-
ult only reflects the opinion of Ella. That is to say, Ella completely
ominates the result regardless of the others’ opinions so we call
lla “dictator”. This case is succinct and it is unfrequent in prac-
ice. In addition, the score of Ella can be treated as the outlier by
ome statistical methods. For example, LIVE gets rid of outliers by
wo standard deviation method and TID2013 carries out 2% experi-
ents to eliminate outliers according to ITU-R Rec. BT.500-13 [19] .
lthough these methods can remove outliers, they can not evade
he situations where few subjects represent most subjects.
Another representative case is demonstrated in Fig. 1 (b). All
ubjects score 3 point for Y . For X , the former three subjects grade
point, while the latter two give 5 point. David and Ella prefer A
o B , and the other subjects hold the opposite opinion. In this case,
o subject can be regarded as outlier. The final result is X > Y (3.2
3), and thus a small part of observers (i.e., David and Ella) rep-
esent the preference of most subjects. Therefore, David and Ella
ome into a set of “dictators”, which is obviously unfair.
In fact, the possible reason for this problem is that the influ-
nce of each subject on the final score is different. The larger dis-
repancy between the score of individual and average score has,
he greater impact of he has on final result. Thus, the small part
f subjects whose scores are different with the scores of majority
ay play dominant role.
.2. Issue of PC
PC is also a common method used in subjective IQA. In contrast
o MOS, each subject involved in PC approach gives the choice to
W. Zhu et al. / Signal Processing 145 (2018) 193–201 195
Fig. 2. Example for describing the problems in MOS and PC methods. X, Y , and Z
represent distorted images to be scored. Amy, Bob, and Ella are subjects. The middle
two columns are calculated by PC, and the right column by MOS.
t
t
e
F
i
t
m
C
t
d
B
a
o
a
m
w
p
(
b
o
r
p
P
a
i
s
T
w
t
d
r
a
n
f
3
T
w
m
v
P
3
(
b
I
w
s
s
s
i
I
I
3
i
a
t
X
q
c
0
s
t
i
b
w
l
W
s
h
i
t
i
t
d
b
t
a
w
t
Z
o
he two images (0 and 1 for the worse and better images respec-
ively), and therefore, the above issue does not exist for PC. How-
ver, PC method will produce a new problem which is illustrated in
ig. 2 . Three subjects Amy, Bob and Carol compare three distorted
mages X, Y , and Z . In the first step, only two images X and Y need
o be evaluated. Three subjects provide their preferences using PC
ethod. Amy and Bob consider X > Y , while Carol believes Y > X .
learly, the final result is X > Y . Then, a new image Z is added to
he list of assessment and three subjects offer their evaluation or-
er again, where X > Y > Z, Z > X > Y, Y > Z > X are provided by Amy,
ob, and Carol respectively. Their original preferences between X
nd Y are unchanged. Amy and Bob consider X > Y , and Y > Z is
ffered by Amy and Carol, while Bob and Carol think Z > X . Now,
n interesting conclusion X > Y > Z > X is produced, and it is the fa-
ous paradox namely “Condorcet Paradox”. Using “Borda Voting”,
e assign 1, 2 and 3 point for the first, the second and the third
osition, respectively. Interestingly, three images all obtain 6 point
X = Y = Z ). However, if we wipe off image Z , the score of X will
e 4 point and Y only has 2 point. The final result becomes X > Y
nce again. In this sense, we can conclude that the final result is
elated with the number of testing images. It is also a classical
aradox in mathematics, namely “Borda Voting Paradox”. Thus, for
C method, we can find that the comparison of two images will be
ffected by the other irrelevant images. This is unreasonable.
Later, the same group of subjects are asked to score these three
mages using MOS method according to their preferences. The re-
ults of MOS method are shown in the right column of Fig. 2 .
he different result, Y > Z > X (5 > 4.7 > 4), is derived from MOS
hen comparing to PC. The best image X in PC method turns into
he worst when using MOS method. It is also unreasonable that
ifferent subjective IQA methods cause different final results.
Based on the analysis on the above three cases, a question is
aised naturally: can we find one ideal subjective IQA method to
void the above-mentioned issues? Unfortunately, the answer is
o, which is guaranteed by the evolution of AIT. The detailed proof
or this will be revealed in next section.
. Nonentity of ideal method based on Arrow’s Impossibility
heorem
In this section, based on Arrow’s Impossibility Theorem [40,41] ,
e prove that no ideal democratic grading or ranking method si-
ultaneously satisfies the unanimity and independence of irrele-
ant alternatives (IIA) for avoiding the unfair issues of MOS and
C.
.1. Formal statement
Let us first assume that there exists an ideal democratic method
IDM) to score or rank the images, its function can be represented
y:
DM : (R 1 (S) , . . . , R N (S)) → Q(S) , (2)
here S is a set of testing images, and N represents the number of
ubjects. R i shows the i th subject’s preference, and ( R 1 , ... , R N ) is its
et (named the preference set). Q ( S ) expresses the final order of S .
As a democratic evaluation function, IDM should meet three ba-
ic conditions to avoid the problems appearing in MOS and PC:
(1) Unanimity, or Pareto efficiency [42,43] : For two arbitrary im-
ages X and Y in the database, if X i has higher rank or score
than Y i for every subject i (i = 1 , . . . , N) in the preference
set ( R 1 , ... , R N ), the final score or rank of X will be higher
than Y .
(2) IIA [44] : For two preference sets (R (1) 1
, . . . , R (1) N
) and
(R (2) 1
, . . . , R (2) N
) , if ∀ i ∈ { 1 , . . . , N} , X i and Y i have the same
relative order in R (1) i
and R (2) i
(such as X (1) 1
> Y (1) 1
in
R (1) 1
, and X (2) 1
> Y (2) 1
in R (2) 1
), then X and Y will have
the same relative order in IDM(R (1) 1
, R (1) 2
, . . . , R (1) N
) as in
IDM(R (2) 1
, R (2) 2
, . . . , R (2) N
) . In other words, the influence of the
other irrelevant images on the order of these two images in
final result can be eliminated. This avoids the problem of PC.
(3) Non-dictatorship: There is no d ∈ { 1 , . . . , N} such that
∀ (R 1 , . . . , R N ) , if the score or rank of X d is higher than Y d in R d , the score or rank of X will be higher than Y in the fi-
nal result for arbitrary X and Y . This condition can avoid the
issue of MOS.
However, when N ≥ 3, there is no algorithm simultaneously sat-
sfies these three conditions according to AIT. On the other hand, if
DM meets the condition (1) and (2), there must exist a “dictator”.
t is hard to intuitively understand. Thus, we prove this below.
.2. Proof of nonentity of IDM
It should be noted that the proof can be expanded to the more
mages. For simplicity, we assume that there are three images X, Y
nd Z to be evaluated. In this section, we use X i > Y i to represent
he situation that i th subject assigns higher score or rank for image
than Y .
Assume that all subjects consider the image Z having the lowest
uality in R i , and the final results will be X > Y > Z and Y > X > Z ac-
ording to unanimity (condition (1)). We call this situation as state
(original state), which is also shown in Fig. 3 . Subsequently, as
hown in Fig. 3 , we set up a rule to alter the original state: keeping
he other part of original state fixed, the order of Z i is changed by
th subject (this subject is randomly selected each time) from the
ottom of R i to the top to form the new state, and this procedure
ill be looped for N times. C is the order of subjects randomly se-
ected by N times loops. For simplicity, j represents j th subject in C .
hen subject j changes the R j , the current state is converted to be
tate j . At the end of the loop (state N ), it is obvious that Z has the
ighest rank in the final result according to unanimity. Thus, dur-
ng the loop, there must be a state k , when subject k arranges Z k o the top of R k , making Z to be higher than X or Y or both of them
n the final result. We define the subject k as “pivotal subject”. Ac-
ually, once the rank of Z k changes, only one situation exists – Z
irectly transfers to the top position in the final result, which will
e proved in the next paragraph (step one). Also, if C is confirmed,
he “pivotal subject” will be determined. The following four steps
re used to prove that the “pivotal subject” is the only one person
ho dominates the final result.
Step one: In this step, the fact that the rank of Z directly moves
o the top position in the final result will be proved. Assuming
moves to the middle of the rank in final result after the “piv-
tal subject” alters the position of Z in R , the final result can
k k
196 W. Zhu et al. / Signal Processing 145 (2018) 193–201
Fig. 3. The definition of “pivotal subject”. The State k , k ∈ { 0 , 1 , . . . , N} , represents
that the front k subjects in order C change Z j to the top of R j , while latter subjects
remain unchanged position of Z j at the bottom of R j .
Fig. 4. Diagram of five states in step three. The first row is the initial state with
an arbitrary preference set. The second row is state (a), in which all Z j are at the
bottom of R j . The third, fourth and fifth rows are state (b), (c), (d) respectively. They
all meet the conditions that Z j are at the top of R j in group 1 and at the bottom of
R j in group 2. The preferences of “pivotal subject” k are X k > Y k > Z k , Z k > X k > Y k and
X k > Z k > Y k in these three states respectively.
(
t
a
c
“
i
fi
a
X
i
X
Z
Z
i
I
d
b
I
p
s
j
i
be X > Z > Y or Y > Z > X . We suppose that the result is X > Z > Y .
Now, we arrange all Y j to the front of X j ( Y j > X j , j = 1 , . . . , N ).
According to unanimity (condition (1)), the final result must have
Y > X . During the changes between X j and Y j , j = 1 , . . . , N, all Z j( j = 1 , . . . , N ) still stay in the top or bottom of R j . In other words,
these changes are irrelevant to the relative ranks of X j , Z j and Y j ,
Z j . The relative position of X, Z and Y, Z in the final result are un-
changed according to IIA (condition (2)), and the final result is
X > Z > Y , where X > Y . This is not in accordance with the above
conclusion generated by unanimity (condition (1)), where Y > X . So,
this assumption is failed.
Step two: In this step, we testify that the “pivotal subject” is
invariable regardless of the content in preference set ( R 1 , ... , R N )
of the original state, when meeting the next two premises. One is
that all Z j ( j = 1 , . . . , N ) are ranked at the bottom of R j and an-
other is that Z j is seriatim altered to the top of R j by the sequence
C . Due to Z j , j = 1 , . . . , N at the bottom of R j , X j > Z j and Y j > Z j are
fixed and independent of the content in preference set ( R 1 , ... , R N )
of the original state. For every state j ( j = 1 , . . . , N ), the relative
ranks of X j , Z j and Y j , Z j are the same as the situations described
in step one. Therefore, according to IIA, the relative ranks of X, Z
and Y, Z in final result should be equal to the situations in step
one. For this reason, the “pivotal subject” is the same one.
Step three: In this step, we certify that the “pivotal subject” can
individually determine the relative rank of X and Y in the final re-
sult under any situations. Now, given an arbitrary preference set
( R 1 , ... , R N ) as initial state, we assume that the “pivotal subject” k
prefers X k to Y k in this set. Subsequently, we can derive that the fi-
nal result is X > Y . To prove this, we make a series of modification
in preference set ( R 1 , ... , R N ), but keep the relative rank between X j
and Y j ( j = 1 , . . . , N ) fixed. The diagram of five states during the
proof are illustrated in Fig. 4 . Details are shown below.
We first move all Z j to the bottom of R j ( i = 1 , . . . , N ) from ini-
tial state to state (a), corresponding to from the first row to the
second row of Fig. 4 . Later, we define that the group of subjects be-
fore “pivotal subject” k as group 1 and the group of latter subjects
as group 2 (the second row of Fig. 4 ). To form state (b) from state
a), we shift the Z j evaluated by group 1 to the top of R j . Based on
he conclusion of step one and two, in the current state, Z is still
t the bottom of the rank in final result. Afterwards, the state (c)
an be generated by moving Z k to the top of R k , and the rank of
pivotal subject” therefore becomes Z k > X k > Y k (the primary order
s X k > Y k > Z k in state (b)). Now, Z runs to the top of the rank in
nal result, which are Z > X and Z > Y respectively. Then, we cre-
te the state (d) by changing the preference of “pivotal subject” to
k > Z k > Y k . It can be observed that the relative rank of X k and Z ks unchanged in state (b) and (d). Thus, the final result still has
> Z ( X > Z in state (b)). Similarly, the relative location of Y k and
k is also unchanged in state (c) and (d). So, the relative rank of
is above Y in the final result ( Z > Y in state (c)). Finally, combin-
ng state (b), (c), and (d), the final result in state (d) is X > Z > Y .
n addition, X > Y in the final result is fixed from beginning to end
uring the whole processes of step three. Because the relative rank
etween X j and Y j ( j = 1 , . . . , N ) is not altered by all operations.
n other words, the relative position of X and Y is completely de-
ended on the preference of “pivotal subject” k , regardless of other
ubjects’ opinions.
Step four: In this step, we prove the fact that the “pivotal sub-
ect” k can determine the relative rank of all testing images. That
s to say, we need to testify that there is only one “pivotal subject”
W. Zhu et al. / Signal Processing 145 (2018) 193–201 197
i
“
fi
i
i
H
T
g
o
e
r
T
o
t
v
v
a
a
4
o
t
i
a
a
d
n
i
p
i
s
v
e
m
f
i
w
y
u
b
d
n
m
t
s
i
i
i
c
t
t
r
fi
p
s
t
g
p
a
d
f
s
r
b
d
s
5
i
p
5
5
5
n all observers. According to previous three steps, there must be a
pivotal subject” k ′ determining the relationship of X and Z in the
nal result. Assume that k and k ′ are not the same subject and k ′ s one subject in group 1. Going back to the condition of state (b)
n step three, the final result should be Z > X because of Z k ′ > X k ′ .
owever, we have proved that the final result is X > Z in step three.
hese two results are contradictory. If the “pivotal subject” k ′ is in
roup 2, Z k ′ will have the lowest rank in R k ′ under the situation
f state (c) in step three. So, the final result should be X > Z . Nev-
rtheless, Z k is at the top of the R k at this moment, and the final
esult is Z > X . This is also contradictory. Therefore, we have k = k ′ .his argument can be suitable for the other pairs. So, there is only
ne “pivotal subject”.
In fact, this proof can be expanded to the more images. Under
hese situations, X and Y can be regarded as two image vectors,
iz., � X and
� Y , and the added images are imported into these two
ectors. Consequently, the above-mentioned process of proof can
lso work for the more images.
In conclusion, IDM meets condition (1) unanimity and (2) IIA,
nd is a “dictatorship” algorithm.
. Reflections on existing methods
As described in Section 2 , both MOS and PC methods have their
wn defects. MOS may generate a dictator or a small group of dic-
ators, while PC method suffers from the issue that two compared
mages can be influenced by the other irrelevant images. However,
ny ideal method satisfying unanimity and IIA will also produce
dictator, which has been proved in Section 3 . Kirman and Son-
ermann [45] also stated that, unless the subject number is infi-
ite, no matter how small, of the total number of subjects, there
s always a group forming a smaller proportion of the total, whose
references determine the final result. However, the situation hav-
ng an infinite number of subjects is impossible.
Thus, for subjective IQA researches, such as natural image,
tereoscopic 3D image and screen content image studies, our ad-
ocation is to recruit a small group of experts (a.k.a the “golden
yes”) who have abundant expertise in the image quality assess-
ent, as a set of “dictators” to evaluate the testing images. Also,
or some professional image studies, like medical images and THz
mages [39] , we consider that the experts should be the subjects
ho have sufficient experience in corresponding area at least one
ear. In practice, due to the fact that the subjective evaluation is
sually inconvenient, time-consuming and expensive, a large num-
er of objective IQA methods are put forward. However, with the
evelopment of IQA, plenty of different application scenarios and
ew distortion types are presented. The previous image databases
ay not meet the increasing requirements. The researchers need
o continually establish relevant databases and invite subjects to
core the distorted images. This produces a heavy workload. Luck-
ly, our suggestion can dramatically reduce cost and save time.
In addition, our suggestion can reduce the nondeterminacy and
ncrease the accuracy. The inexperienced observers including those
nstructed by experts lack the essential knowledge of image pro-
essing, and their attention may not focus on the key points. Thus,
he scores graded by these subjects may bring great deviations to
he final result. In some cases, some of subjects grading images
andomly will become the so-called “dictators” and dominate the
nal result according to the previous analysis. In contrast, the ex-
erts precisely know the purpose of task and targetedly give the
core. In this sense, we prefer the certain experts as the “dictators”
o the unpredictable “dictators” coming from non-expert observers’
roup during the subjective IQA. Therefore, if we invite several ex-
erts rather than a large group of nonspecialists, the more accurate
nd reliable results can be yielded, which will benefit further aca-
emic researches.
In the other professional occasions, our advocation can be also
ound elsewhere. For example, in the singing competitions, the
ponsors invite several specialists as the judges, not a group of
andomly selected audiences. Furthermore, the Oscar Award of the
est director is only evaluated by a small group of well-known
irectors. Consequently, the technical questions need the corre-
ponding experts to give the more accurate judgement.
. Experiments
In order to verify the reliability of our suggestions, two exper-
ments conducting on general images and professional images are
erformed.
.1. Experiment on general images
.1.1. Subjective test
(1) Image Sources: We select the Gaussian blur images subset
in LIVE database [23] as testing images for the general im-
age experiment. LIVE database is a widely recognized im-
age database and the distortion of Gaussian blur is a com-
mon distortion type in the daily life. The testing images con-
tains 174 images (29 reference images and their correspond-
ing distorted images).
(2) Experimental environment and configurations: In the light
of ITU-R REC. BT.500-13 [19] , single-stimulus (SS) method is
used in our experiment. All images are evaluated including
reference images. the viewing distance is controlled in 2.5-3
screen heights and the illuminance level of the experimen-
tal environment keeps normal. We design an user interface
using MATLAB to display the images and obtain scores. The
observers offer their estimated quality by moving the slider
in the interface. The quality scale is divided into five-equal
partitions, which are labeled as “Bad”, “Poor”, “Fair”, “Good”,
and “Excellent”. The scale of the slide is linearly mapped to
the interval 1–100.
(3) Experimental process: A total of 26 inexperience subjects
having no related expertise are recruited from the under-
graduate students at Shanghai Jiao Tong University. Also, 3
experts who have abundant experience in image quality as-
sessment and saliency detection are invited. Before testing,
the objective and procedure of this experiment are briefly
introduced to each viewer. Besides, subjects preview 12 sam-
ple images (obtained from 2 reference images) of different
quality to avoid unstable ratings at the beginning of the
tests. These training images are exclusive of the subsequent
testing stage. In the testing stage, the order of the images is
random and the randomization is different for each viewer.
.1.2. Data processing
(1) Subject rejection: In order to remove the extremely abnor-
mal raw scores, we choose the outlier detection method pro-
vided by VQEG [46] in Annex 2 of ITU-R Rec. BT.500-13 [19] .
First, the distribution of raw scores is evaluated by β2 test-
ing. Then, based on publication [19] we perform the subse-
quent procedure of the outlier rejection algorithm to discard
observers whose scores significantly distant from the raw
average scores. Overall, none of the twenty-six observers are
removed as outliers.
(2) DMOS scores: After the outlier detection, we convert the raw
scores to raw difference scores
d i j = r i ref ( j) − r i j (3)
where r ij is the raw score for j th image graded by i th subject,
and r i ref( j ) provides the raw score of the reference image cor-
responding to the j th distorted image marked by i th subject.
198 W. Zhu et al. / Signal Processing 145 (2018) 193–201
Fig. 5. The scatter plot for DMOS in LIVE and in our experiment (set of naive sub-
jects). The red line is curve fitted with the five parameter monotonic logistic func-
tion. The blue dash curves are 95% confidence intervals. (For interpretation of the
references to colour in this figure legend, the reader is referred to the web version
of this article.)
Y
S
M
t
p
v
u
c
s
t
s
t
e
M
e
v
F
l
t
t
T
c
i
t
t
b
y
t
t
“
r
b
L
g
t
s
e
j
A
e
a
a
D
r
i
e
i
5
c
t
To eliminate of the individual difference, the raw difference
scores are turned into Z scores [47] .
z i j = (d i j − d̄ i ) /σi (4)
where d̄ i is the average score of all raw difference scores of
distorted images assigned by the observer i , and σ i repre-
sents the standard deviation. After computation of Z-scores,
we linearly rescale them in the range of [0, 100] [48] . As-
suming that the distribution of Z-scores graded by a subject
follows a standard Gaussian [49] , 99% of the scores should
stay in the range [ −3 , +3] . The re-scaling difference scores
are computed by:
s i j =
100(z i j + 3)
6
(5)
Finally, the DMOS value of each distorted image is calculated
as the mean of the re-scaling difference scores:
DMOS j =
1
N i
N i ∑
i =1
s i j (6)
where N i is the number of residual subjects after the subject
rejection.
5.1.3. Subjective ratings analysis
Before investigating the relationship between the results of
naive subjects and experts, we compare these DMOS values in this
experiment with the DMOS values provided in the LIVE database to
verify the reliability of our experimental data. To eliminate the in-
fluence of scale, we use five-parameter { β1 , β2 , β3 , β4 , β5 } mono-
tonic logistic function to map the DMOS values of this experiment
and LIVE’s DMOS:
(x ) = β1
(0 . 5 − 1
1 + e β2 (x −β3 )
)+ β4 x + β5 (7)
where x and Y are the DMOS in this experiment and mapped
scores. i
Table 1
Relationships among the DMOS values in LIVE database and th
general image experiment. The α in F-test and T-test is 0.05. T
PLCC SROC
DMOS (LIVE) vs. (Naive Subjects) 0.9557 0.946
DMOS (LIVE) vs. (Experts) 0.9510 0.940
DMOS (Naive Subjects) vs. (Experts) 0.9608 0.950
Subsequently, Pearson Linear Correlation Coefficient (PLCC),
pearman Rank-Order Correlation Coefficient (SROCC) and Root
ean Squared Error (RMSE) are applied to measure the predic-
ion accuracy, prediction monotonicity and the error in the fitting
rocess respectively. A good correlation is expected to obtain high
alues (less than or equal to 1) in SROCC and PLCC, while low val-
es (greater than 0) in RMSE. In addition, we use three statisti-
al methods including T-test, F-test and correlation coefficient ( R
quare) to analyze the experimental data [50] . The F-test and T-
est are used to compute the statistical significance of two sets of
cores. With the 95% confidence (the parameter α in F-test and T-
est are equal to 0.05), two sets of scores have no significant differ-
nces, when the P-values of F-test and T-test are larger than 0.05.
oreover, a value approach to 1 for R square indicates a good lin-
ar correlation between two sets of scores.
The scatter plot of DMOS values in LIVE database versus DMOS
alues of inexperience subjects in our experiment is represented in
ig. 5 , where the red curves are fitted with the above-mentioned
ogistic function and 95% confidence intervals are represented by
he blue curves. The statistical results regarding the correlation of
hese two sets of DMOS values are shown in the second row in
able 1 . The PLCC and SROCC are around 0.95, which shows good
orrelation. The P-values of F-test and T-test are both beyond 0.05,
ndicating that there are no significant differences. Also, the R 2 of
hese two datasets is 0.9097, which reveals a good linear correla-
ion. It is demonstrated that the data in our experiment is reliable
ased on these statistical results.
After verifying the reliability of the experimental data, we anal-
se the relationship between the DMOS scored by the experts and
he inexperienced observers. Similarly, the above-mentioned statis-
ical approaches are employed in the comparison of “Experts” and
Naive Subjects”. We analyse not only the relationship between the
esults of experts and inexperienced subjects in our experiment,
ut also the correspondence of the data provided by “Experts” and
IVE. Fig. 6 illustrates the scatter plots of these two comparable
roups with the nonlinear fitting curve and the 95% confidence in-
ervals. The last two rows in Table 1 summarize the statistical re-
ults of the comparisons. From Table 1 , it is observed that the pref-
rences of “Experts” correlates closely with those of “Naive Sub-
ects” and LIVE. All of the PLCC and SROCC values are around 0.95.
mong them, the PLCC between “Experts” and “Naive Subjects” is
ven higher than 0.96. Also, the P-values of F-test and T-test are
ll far beyond 0.05 and the correlation coefficient R 2 maintain 0.9
pproximately.
Consequently, these three sets of DMOS namely DMOS of LIVE,
MOS of “Naive Subjects” and DMOS of “Experts” present high cor-
elation with each other. The obtained results demonstrate that it
s feasible to recruit several experts replacing a good deal of in-
xperienced observers for evaluating the quality of general natural
mages.
.2. Experiment on professional images
With the development of digital image processing, many spe-
ial images also need to be assessed. Unlike the general distor-
ion images, these images are more professional and hardly viewed
n daily life. Terahertz (THz) image is one of these professional
e DMOS of the set of naive subjects and experts in our
he data of the F-test and T-test is the P-value.
C RMSE F-test T-test R 2
2 4.6304 0.0785 0.2138 0.9097
5 4.8637 0.4202 0.2356 0.8993
1 3.8724 0.1123 0.5651 0.9164
W. Zhu et al. / Signal Processing 145 (2018) 193–201 199
Fig. 6. Scatter plot for DMOS in our experiment. (a) The scatter plot for the set of LIVE and the set of experts in our general experiment. (b) The scatter plot for the set
of naive subjects and experts in our general experiment. The red line is curve fitted with the five parameter monotonic logistic function. The blue dash curves are 95%
confidence intervals. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 7. Several samples for presenting THz images with different overall quality and
different illegal substances. (a)–(c) Three THz sample images with different overall
quality from good to bad. In training stage, these three images are endowed with
5, 3, 1 score for reference respectively. (d)Volunteer carrying hammer in left ab-
domen. (e)Subject taking phone in right chest. (Red circles will not appear in the
test images.).
i
o
n
b
q
t
i
l
B
o
g
5
5
m
o
t
f
r
M
w
t
i
j
a
mages, which has wide application prospects in chemistry, biol-
gy, physics and medicine [51] . Although the THz imaging tech-
ique is an effective analysis tool in security inspection [52] and
iological diagnosis [53] , the present THz images always have low
uality due to the equipment interference and environmental fac-
ors [54] . Thus, the quality assessment on THz images is mean-
ngful. In fact, we have done some preliminary studies and estab-
ished a special database with respect to THz images (THSID) [39] .
ased on these professional images, we implement an experiment
n them to research the relationship between “Experts” and the
roup of “Naive Subjects”.
.2.1. Subjective test
(1) Image sources: 181 THz security images in THSID database
[39] are served as the testing images for the professional
image experiment. A total of 176 images contain illegal sub-
stances hiding in various location in human body and the
volunteers in the rest of images do not carry goods. Five
sample images are illustrated in Fig. 7 . The first three images
are exhibited to THz images of different overall quality from
good to bad. It is easily find that the noise in Fig. 7 (a) is
quite less than the noise in Fig. 7 (c). In training stage, these
three images are endowed with 5, 3, 1 score for reference
respectively. In addition, the conditions of substances in THz
images are different. Some substances are easy to recognize,
like Fig. 7 (d), while as shown in Fig. 7 (e) some goods are
hardly to be distinguished from human body.
(2) Experimental environment and configurations: The experi-
mental configurations are basically the same as the exper-
iment of general images. For THz images, we use the five-
grade scale to obtain the scores.
(3) Experimental process: 17 inexperienced subjects and 3 pro-
fessional security inspectors are invited to attend the sub-
jective experiments. The security inspectors have long-term
observation experience in inspecting the prohibited goods at
some specific places, such as airport, subway station, etc.
The process of the experiment is also similar to the gen-
eral images test. However, we gather two different scores
for each THz image. One is the score of the overall quality
of each image and the other one is the score of local prohib-
ited goods. The marking standard of overall quality primarily
complies with the suggestions of ITU-R Rec. BT.500-13 [19] .
According to the particularity of the noise of THz security
images, we make minor adjustment for overall quality crite-
rion and the specific standard for the local quality of illegal
substances. The subjective evaluation criterion of the overall
quality and the local quality of illegal substances are set in
Table 2 .
.2.2. Data processing
After the experiments, we first apply the same subject rejection
ethod in Section 5.1.2 to remove the abnormal raw scores. One
f seventeen inexperienced subjects are regarded as outliers. Then,
he MOSs of overall quality of THz security images are calculated
rom the raw scores of the remaining naive subjects and experts
espectively:
OS j =
1
N i
N i ∑
i =1
r i j (8)
here N i is the number of residual subjects after the subject rejec-
ion and r ij denotes the raw score of j th image assigned by subject
.
On the other hand, we calculate the accuracy rate of each sub-
ect on distinguishing illegal goods. Four different categories are
pplied to define the judgements of the observers: (1) There is an
200 W. Zhu et al. / Signal Processing 145 (2018) 193–201
Table 2
The subjective evaluation criteria employing five-grade scale of overall quality and the local quality of illegal substances
for THz security image.
Score Quality Standards
Overall quality Sharpness of prohibited goods
5 Excellent Acceptable Distinct
4 Good Unacceptable, but not annoying Fairly clear, but cognizable
3 Fair Slightly annoying Perceptible, but do not know what it is
2 Poor Annoying Feel something, but uncertainty
1 Bad Quite annoying Imperceptible
Fig. 8. The scatter plot for MOS of the set of naive subjects and experts in our THz
image experiment. The red line is curve fitted with the five parameter monotonic
logistic function. The blue dash curves are 95% confidence intervals. (For interpre-
tation of the references to colour in this figure legend, the reader is referred to the
web version of this article.)
Table 3
The statistical results of the correlation between “Naive Subjects” and “Experts” on
perceiving the overall quality of THz images. (The α in ANOVA is 0.05.).
PLCC SROCC RMSE R 2 ANOVA
F F crit P-value
0.9092 0.9026 0.2241 0.8172 0.6017 3.8674 0.4384
o
c
i
i
a
c
a
c
j
h
a
v
f
l
t
u
i
g
h
t
t
o
s
i
t
o
3
6
t
a
d
i
t
n
s
a
w
i
i
a
t
illegal substance, and subject finds it; (2) There is an illegal sub-
stance, but the subject does not find it; (3) There is no illegal sub-
stance, but subject believes that he finds one; (4) There is no illegal
substance, and the subject thinks so. Two binary variables P ij and
Q j are represented the judgement of i th subject on j th image and
existence of illegal good in j th image.
P i j =
{0 if ω i j = 1
1 if ω i j = 2 , 3 , 4 , 5
,
Q j =
{0 if no illegal good in jth image
1 otherwise (9)
where ω ij denote the score of the local illegal substances ranked
by i th observers for image j . The results of judgments each time
can be expressed by:
T i j = P i j � Q j (10)
where, � is the XNOR gate. The accuracy rate of subject i on distin-
guishing illegal substances A i is computed by the following equa-
tion:
A i =
∑ N j j=1
T i j
N j
(11)
where, N j represents the number of THz images.
5.2.3. Subjective ratings analysis
For obtaining the correlation between the preferences of inex-
perienced subjects and experts on assigning the overall quality of
Thz security images, we apply the five statistical approaches: PLCC,
SROCC, RMSE, ANOVA and correlation coefficient ( R 2 ). ANOVA is a
common statistical approach to measure the statistical significance
f two sets of data. When the F value in ANOVA is smaller than
ritical F value (3.8674 with the 95% confidence) and the P–value
s larger than 0.05, these two sets of data have no statistical signif-
cance. Fig. 8 illustrates the scatter plot of the set of naive subjects
nd experts in THz images experiment with the nonlinear fitting
urve and the 95% confidence intervals. Also, the statistical results
re listed in Table 3 . From Fig. 8 and Table 3 , it is obvious that the
orrelation of the scores graded by experts and inexperienced sub-
ects maintains a high level, where the PLCC and SROCC value is
igher than 0.9. Also, the F (0.8172) is lower than F-crit (3.8674)
nd P-value (0.4384) is larger than 0.05, which mean that the MOS
alues between experts and naive subjects have no significant dif-
erence. In addition, the R 2 (0.8172) reveals the favorable linear re-
ationship.
On the other side, we calculate the accuracy of existence of
he illegal substances. The accuracy rate of the experts group is
p to 75.14%, while the naive subjects group is 60.19%. Therefore,
n terms of the professional object extraction, there exists a large
ap between the experts and inexperienced subjects. This gap is
ard to be bridged by the short time training. In fact, for both
he machine recognition and artificial perception, the sharpness of
he illegal substances is an important part in the integrated quality
f THz security images. Based on our experiments, the scores as-
igned by experts will be more accurate and reliable. Consequently,
t is appropriate to recruit a small group of experts for evaluate
he quality of some professional images. The experimental results
n THz image dataset validate the proposed theory in Section 2,
and 4 .
. Conclusion
In this paper, depending on the three examples, we point out
he problems of common subjective IQA methods namely MOS
nd PC. The problem in MOS method lies in the majority subor-
inate minority. While in PC method, two compared images are
nfluenced by irrelevant images. Also, based on AIT, we prove that
here must be a “pivotal subject” satisfying the conditions of una-
imity and IIA for any ideal ranking or scoring methods. Thus, we
uggest that ground truth of testing images can be collected from
small group of subjects only containing several experts, which
ill be time-saving and economic as well as reliable. Two exper-
ments conducting on general distorted images and professional
mage (THz image) are performed to verify the reliability of our
dvocation. Based on the results of rigorous statistical approaches,
he subjective ratings of a small group of experts are shown to
W. Zhu et al. / Signal Processing 145 (2018) 193–201 201
b
t
a
q
A
e
6
R
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
e highly related to the preferences of inexperienced subjects. Fur-
hermore, the evaluation results of experts may be more accurate
nd credible than the naive viewers in some professional image
uality assessment tasks.
cknowledgment
This work was supported in part by National Natural Sci-
nce Foundation of China under Grant nos. 61422112 , 61371146 ,
1521062 , and 61527804 .
eferences
[1] W. Lin , C.-C.J. Kuo , Perceptual visual quality metrics: a survey, J. Vis. Commun.
Image Represent. 22 (4) (2011) 297–312 . [2] Y. Fang , J. Yan , J. Liu , S. Wang , Q. Li , Z. Guo , Objective quality assessment of
screen content images by uncertainty weighting, IEEE Trans. Image Process.Publ. IEEE Signal Process. Soc. 26 (4) (2017) 2016–2027 .
[3] S. Wang , K. Ma , H. Yeganeh , Z. Wang , W. Lin , A patch-structure representa-tion method for quality assessment of contrast changed images, IEEE Signal
Process. Lett. 22 (12) (2015) 2387–2390 .
[4] K. Ma , T. Zhao , K. Zeng , Z. Wang , Objective quality assessment for color-to-grayimage conversion, IEEE Trans. Image Process. Publ. IEEE Signal Process. Soc. 24
(12) (2015) 4673–4685 . [5] Z. Wang , A.C. Bovik , H.R. Sheikh , E.P. Simoncelli , Image quality assessment:
from error visibility to structural similarity, IEEE Trans. Image Process. 13 (4)(20 04) 60 0–612 .
[6] K. Gu , L. Li , H. Lu , X. Min , W. Lin , A fast reliable image quality predictor by
21 (1) (2011) 41–52 . [9] X. Min , K. Gu , G. Zhai , M. Hu , X. Yang , Saliency-induced reduced-reference
quality index for natural scene and screen content images, Signal Process.(2017) .
[10] Y. Fang , K. Ma , Z. Wang , W. Lin , Z. Fang , G. Zhai , No-reference quality assess-ment of contrast-distorted images based on natural scene statistics, IEEE Signal
Process. Lett. 22 (7) (2014) 838–842 .
[11] Q. Wu , H. Li , F. Meng , K.N. Ngan , B. Luo , C. Huang , B. Zeng , Blind image qual-ity assessment based on multichannel feature fusion and label transfer, IEEE
Trans. Circuits Syst. Video Technol. 26 (3) (2016) 425–440 . [12] Q. Wu , Z. Wang , H. Li , A highly efficient method for blind image qual-
ity assessment, in: IEEE International Conference on Image Processing, 2015,pp. 339–343 .
[13] Q. Wu , H. Li , F. Meng , K.N. Ngan , S. Zhu , No reference image quality assess-ment metric via multi-domain structural information and piecewise regres-
[14] X. Min , G. Zhai , K. Gu , Y. Fang , X. Yang , X. Wu , J. Zhou , X. Liu , Blind qualityassessment of compressed images via pseudo structural similarity, in: IEEE In-
ternational Conference on Multimedia and Expo, 2016, pp. 1–6 . [15] Y. Fang , Z. Chen , W. Lin , C.W. Lin , Saliency detection in the compressed domain
[16] Y. Fang , W. Lin , Z. Chen , C.M. Tsai , C.W. Lin , A video saliency detection model
in compressed domain, IEEE Trans. Circuits Syst. Video Technol. 24 (1) (2014)27–38 .
[17] Y. Fang , C. Zhang , J. Li , J. Lei , S.M. Da , C.P. Le , Visual attention modeling forstereoscopic video: a benchmark and computational model., IEEE Trans. Image
Process. PP (99) (2017) . 1–1 [18] X. Min , G. Zhai , Z. Gao , K. Gu , Visual attention data for image quality assess-
ment databases, in: International Symposium on Circuits and Systems, 2014,
pp. 894–897 . [19] Assembly , ITU Radiocommunication, Methodology for the subjective assess-
ment of the quality of television pictures, Int. Telecommun. Union, 2012 . 20] N. Ponomarenko , V. Lukin , A. Zelensky , K. Egiazarian , M. Carli , F. Battisti ,
TID2008–A database for evaluation of full-reference visual quality assessmentmetrics, Adv. Modern Radioelectronis 10 (4) (2009) 30–45 .
[21] N. Ponomarenko , L. Jin , O. Ieremeiev , V. Lukin , K. Egiazarian , J. Astola , B. Vozel ,
K. Chehdi , M. Carli , F. Battisti , C.-C.J. Kuo , Image database TID2013: peculiari-ties, results and perspectives, Signal Process. 30 (2015) 57–77 .
22] K. Gu , G. Zhai , W. Lin , M. Liu , The analysis of image contrast: from qualityassessment to automatic enhancement, IEEE Trans. Cybernetics 46 (1) (2016)
284–297 .
23] H.R. Sheikh, Z. Wang, L. Cormack, A.C. Bovik, Live image quality assessmentdatabase release 2, 2005.
[24] E.C. Larson , D.M. Chandler , Most apparent distortion: full-reference imagequality assessment and the role of strategy, J. Electron. Imag. 19 (1) (2010) .
011006–011006 25] K. Gu , G. Zhai , X. Yang , W. Zhang , Hybrid no-reference quality metric for
26] J. Wang , A. Rehman , K. Zeng , S. Wang , Z. Wang , Quality prediction of asym-
metrically distorted stereoscopic 3D images, IEEE Trans. Image Process. Publ.IEEE Signal Process. Soc. 24 (11) (2015) . 3400–14.
[27] J. Wang , S. Wang , Z. Wang , Asymmetrically compressed stereoscopic 3Dvideos: quality assessment and rate-distortion performance evaluation, IEEE
Trans. Image Process. 26 (3) (2017) 1330–1343 . 28] R. Song , H. Ko , C.C.J. Kuo , Mcl-3D: a database for stereoscopic image quality
assessment using 2d-image-plus-depth source, J. Inf. Sci. Eng. 31 (5) (2015) .
29] J. Wang , S. Wang , K. Ma , Z. Wang , Perceptual depth quality in distorted stereo-scopic images, IEEE Trans. Image Process. Publ. IEEE Signal . Soc. 26 (3) (2017)
1202 . 30] H. Yang , Y. Fang , W. Lin , Perceptual quality assessment of screen content im-
ages, IEEE Trans. Image Process. 24 (11) (2015) 4 408–4 421 . [31] X. Min , K. Ma , K. Gu , G. Zhai , Z. Wang , W. Lin , Unified blind quality assessment
of compressed natural, graphic, and screen content images, IEEE Trans. Image
Process. 26 (11) (2017) 5462–5474 . 32] K. Gu , S. Wang , H. Yang , W. Lin , G. Zhai , X. Yang , W. Zhang , Saliency-guided
[33] K. Gu , J. Zhou , J. Qiao , G. Zhai , W. Lin , A. Bovik , No-reference quality assess-ment of screen content pictures, IEEE Trans. Image Process. (2017) .
tion: a trusted framework, IEEE Trans. Multimed. 15 (5) (2013) 1121–1137 . [35] Q. Xu , J. Xiong , Q. Huang , Y. Yao , Online hodgerank on random graphs for
36] H. Yang , W. Lin , C. Deng , L. Xu , Study on subjective quality assessment of dig-ital compound images, in: IEEE International Symposium on Circuits and Sys-
tems (ISCAS), 2014, pp. 2149–2152 .
[37] K.J. Arrow , A difficulty in the concept of social welfare, J. Political Econ \ . (1950)328–346 .
38] K. J. Arrow , Social Choice and Individual Values, 12, Yale university press, 2012 .39] M. Hu, X. Min, G. Zhai, W. Zhu, Z. Wang, X. Yang, G. Tian, Terahertz security
image quality assessment by no-reference model observers, arXiv:1707.03574(2017).
40] A.D. Taylor , Social Choice and the Mathematics of Manipulation, Cambridge
University Press, 2005 . [41] N.N. Yu , A one-shot proof of arrow’s impossibility theorem, Economic Theory
50 (2) (2012) 523–525 . 42] N.A. Barr , The Economics of the Welfare State, Stanford University Press, 1998 .
43] A. Sen , Markets and freedoms: achievements and limitations of the marketmechanism in promoting individual freedoms, Oxford Econ. Papers (1993)
519–541 . 44] D. Saari , Decisions and Elections: Explaining the Unexpected, Cambridge Uni-
versity Press, 2001 .
45] A.P. Kirman , D. Sondermann , Arrow’S theorem, many agents, and invisible dic-tators, J. Econ. Theory 5 (2) (1972) 267–277 .
46] VQEG, Final report from the vqeg on the validation of objective models ofvideo quality assessment, Pase II, 2003.
[47] H.R. Sheikh , M.F. Sabir , A.C. Bovik , A statistical evaluation of recent full refer-ence image quality assessment algorithms, IEEE Trans. Image Process. 15 (11)
(2006) 3440–3451 .
48] L. Ma , W. Lin , C. Deng , K.N. Ngan , Image retargeting quality assessment: astudy of subjective scores and objective metrics, IEEE J. Selected Topics Sig-
nal Process. 6 (6) (2012) 626–639 . 49] K. Seshadrinathan , R. Soundararajan , A.C. Bovik , L.K. Cormack , Study of subjec-
tive and objective quality assessment of video, IEEE Trans. Image Process. 19(6) (2010) 1427–1441 .
50] D.J. Sheskin , Handbook of Parametric and Nonparametric Statistical Procedures,
crc Press, 2003 . [51] B.M. Fischer , H. Helm , P.U. Jepsen , Chemical recognition with broadband thz
spectroscopy, Proc. IEEE 95 (8) (2007) 1592–1604 . 52] D. Suzuki , S. Oda , Y. Kawano , A flexible and wearable terahertz scanner, Nat.
Photonics 10 (12) (2016) 809–813 . 53] K. Moldosanov , A. Postnikov , V. Lelevkin , N. Kairyev , Terahertz imaging tech-
nique for cancer diagnostics using frequency conversion by gold nano-objects,
Ferroelectrics 509 (1) (2017) 158–166 . 54] L. Hou , X. Lou , Z. Yan , H. Liu , W. Shi , Enhancing terahertz image quality by
finite impulse response digital filter, in: Infrared, Millimeter, and Terahertzwaves (IRMMW-THz), 2014, pp. 1–2 .