TRECVID2006: Search Task
Alan SmeatonDublin City University
&Tzveta Ianeva
NIST
TRECVID 2006 2
Search Task Definition
Goal: promote progress in contentbased retrieval from digital video via open, metricsbased evaluation;
Given a test collection, a topic and a common shot boundary reference, return a ranked list of at most 1,000 shots which best satisfy the need;
NIST created more topics asking for general (vs. specific) NIST created 10 of 24 topics to ask for video of events –
encouraging exploration beyond onekeyframepershot Videos were viewed by NIST personnel, notes taken on
content, and candidates emerging were chosen;
TRECVID 2006 3
Search Task Definition
Persearch measures: average precision, elapsed time
Perrun measure: mean average precision (MAP) Interactive search participants were asked to have
their subjects complete pre, posttopic and postsearch questionnaires;
Each result for a topic can come from only 1 user search; same searcher does not need to be used for all topics.
TRECVID 2006 4
Search Task Definition
Bing Xiang, John Makhoul, and Ralph Weischedel at BBN for providing MT/ASR
Christian Petersohn (Fraunhofer Institute) for master shot reference
DCU team for formatting and selecting keyframes
MediaMill team for 101 features baseline results donation
CMU and IBM for 449 LSCOM features annotations
TRECVID 2006 5
Data characteristics
TRECVid 2006 data is again (deliberately) textnoisy with video from English language, Arabic & Chinese broadcasts;
32.2% of the test video comes from programs not represented in the development data
Text is derived from speech recognition and then machine translation, thus poorer quality than with Englishonly sources but ASR/MT from “stateoftheart” GALE system.
TRECVID 2006 6
2006: Search task participants (26, up from 20)
AT&T Labs – Research USABeijing Jiaotong U. China Bilkent U. Turkey Carnegie Mellon U. USA Chinese U. of Hong Kong China City University of Hong Kong China CLIPSIMAG France Columbia U. USA Dublin City U. Ireland Fudan U. ChinaFX Palo Alto Laboratory Inc USA Helsinki U. of Technology FinlandIBM T. J. Watson Research Center USA Imperial College London / Johns Hopkins U. UK, USA
TRECVID 2006 7
2006: Search task participants (continued)
NUS / I2R Singapore Mediamill / U. of Amsterdam NetherlandsRMIT U. School of CS&IT Australia Tsinghua U. ChinaU. of Central Florida USA U. of Glasgow / U. of Sheffield UK U. of Iowa USAU. of Oxford UKU. Rey Juan Carlos SpainZhejiang U. China
COST292 (www.cost292.org) France, Netherlands, UK, Ireland, Greece, Turkey,
Serbia and Montenegro, Slovakia
KSpace (kspace.qmul.net) UK, Germany, Austria, Greece, Ireland, Netherlands, France, Switzerland, Czechia
TRECVID 2006 8
Search Types: Automatic, Manual and Interactive
Number of runs: 76 automatic 11 manually assisted 36 interactive
TRECVID 2006 9
Everybody likes to search automatically, dislikes manually
0%
20%
40%
60%
80%
100%
2004 2005 2006
InteractiveManualFully automatic
TRECVID 2006 10
173. Find shots with a view of one or more tall buildings (more than 4 stories) and the top story visible [3, 4, 142]
174. Find shots with one or more people leaving or entering a vehicle [0, 10, 675]
175. Find shots with one or more soldiers, police, or guards escorting a prisoner [0, 4, 204]
176. Find shots of a daytime demonstration or protest with at least part of one building visible [4, 4, 111]
177. Find shots of US Vice President Dick Cheney [3, 3, 393]
178. Find shots of Saddam Hussein with at least one other person's face at least partially visible [8, 0, 99]
179. Find shots of multiple people in uniform and in formation [3, 5, 191]180. Find shots of US President George W. Bush, Jr. walking [0, 5, 197]
24 Topics [ number of image, video examples and relevant found]
TRECVID 2006 11
24 Topics [ number of image, video examples and relevant found]
181. Find shots of one or more soldiers or police with one or more weapons and military vehicles [2, 6, 128]
182. Find shots of water with one or more boats or ships [3, 5, 307]
183. Find shots with one or more emergency vehicles in motion (e.g., ambulance, police car, fire truck, etc.) [0, 4, 299]
184. Find shots of one or more people seated at a computer with display visible [3, 4, 440]
185. Find shots of one or more people reading a newspaper [3, 4, 201]
186. Find shots of a natural scene with, for example, fields, trees, sky, lake, mountain, rocks, rivers, beach, ocean, grass, sunset, waterfall, animals, or people; but no buildings, no roads, no vehicles [2, 4, 523]
187. Find shots of one or more helicopters in flight [0, 6, 119]
TRECVID 2006 12
24 Topics [ number of image, video examples and relevant found]
188. Find shots of something burning with flames visible [3, 5, 375]
189. Find shots of a group including at least four people dressed in suits, seated, and with at least one flag [3, 5, 446]
190. Find shots of at least one person and at least 10 books [3, 5, 295]
191. Find shots containing at least one adult person and at least one child [3, 6, 775]
192. Find shots of a greeting by at least one kiss on the cheek [0, 5, 98]
193. Find shots of one or more smokestacks, chimneys, or cooling towers with smoke or vapor coming out [3, 2, 60]
194. Find shots of Condoleezza Rice [3, 7, 122]
195. Find shots of one or more soccer goalposts [3, 4, 333]
196. Find shots of scenes with snow [3, 6, 692]
TRECVID 2006 13
Some statistics
2006: Number of shots in test collection: 79.484 ~9.1% relevant shots found: 7.225
2005 Number of shots in test collection: 45.765 ~18.3% relevant shots found: 8.395
2004 Number of shots in test collection: 33.367 ~5.4% relevant shots found: 1.800
2003 Number of shots in test collection: 32.318 ~6.5% relevant shots found: 2.114
TRECVID 2006 14
0
10
20
30
40
50
60
70
Beijing Jiaotong Bilkent
CLIPSIMAG CMU
Chinese U. of Hong KongFudan U.
COST292
Imperial College LondonKspace
U. of Oxford
Helsinki U. of Technology
Tsinghua U.IBM
U. of Central Florida
Glasgow U.
U. of Iowa
U. Rey Juan Carlos
Mediamil Team / U. Amsterdam
Zhejiang U.
I2R / National U. of Singapure
Num
ber o
f uni
que,
rele
vant
sho
ts2006: 20 sites contributed one or more unique,
relevant shots
TRECVID 2006 15
Beijing Jiatong
Bilkent
CLIPS_IMAGCMU
Chinese U. of Hong Kong
Fudan U.
COST292
Imperial College London
Kspace
U. of Oxford
Helsinki U. of Technology
Tsinghua U.IBM
U. of Central Florida
Glasgow U.
U. of Iowa
U. Rey Juan Carlos
Mediamil Team / U
. Amsterdam
Zhejiang U.
I2R / National U. of Singapure 173 (142)
175 (204)
177 (393)
179 (191)
181 (128)
183 (299)
185 (201)
187 (119)
189 (446)
191 (775)
193 (60)
195 (333)
11
1
7
1
5
1 2 2
1
1
2
9
6
4 3
28
2
7
32
1
114
2
13
14
4
1
5
8
4
12
2 21 1 1
2 2 22
8
11 1 2
1
11 2
1 1
1 21 2 2
11 1
3
1 1
2
2
19
1 1
5
11 1
16
3
2
6
1
11 1
3
11
11
2
1 2 3
1 1
1
11
1 1
1
1
21 1 2
1
5
11 3
7
11 2
11
7
1 2 3
1
11
0
2
4
6
8
10
Number of unique
true shots
Group
Topic (total relevant)
2006: Rel shots contrib. uniquely per topic by team
186, 191, 196 have 500+
TRECVID 2006 16
2006: Most rel shots uniquely returned by topic & team
186, 191, 196 have 500+
CLIPS_IMAG
CMU
Imperial College London
U. of Oxford
Tsinghua U.
Mediamil Team / U. Amsterdam 173 (142)
174 (675)
175 (204)
176 (111)
180 (197)
182 (307)
1
12
2
1
11
1
1 2
1
5
13
11
7
1
10
2
4
6
8
10
Number of unique
true shots
Group
Topic (total relevant)
TRECVID 2006 17
2006: Most rel shots uniquely returned by topic & team
186 have 500+
CLIPSIMAG
CMU
Imperial College London
U. of Oxford
Tsinghua U.
Mediamil Team 183 (299)
184 (440)185 (201)
186 (523)187 (119)
188 (375)
22
2
8
1
1
12
1
2
3
2
2
1
5
1
16
3
2
6
1
1
0
2
4
6
8
10
Number of unique
true shots
Group
Topic (total relevant)
TRECVID 2006 18
2006: Most rel shots uniquely returned by topic & team
191, 196 have 500+
CLIPSIMAG
CMU
Imperial College London
U. of Oxford
Tsinghua U.
Mediamil Team 189 (446)
190 (295)
191 (775)
192 (775)
193 (60)
1
5
1 2
11
9
6
3
28
711
4
2
1 3
4
58
4
2
1
0
2
4
6
8
10
Number of unique
true shots
Group
Topic (total relevant)
?
TRECVID 2006 19
Unique relevant shots return by Oxford U. for Topic 191 (“adult and child”)
TRECVID 2006 20
2006: Automatic runs top 10 MAP (of 76)(mean elapsed time (mins) / topic)
0
0 ,1
0 ,2
0 ,3
0 ,4
0 ,5
0 ,6
0 ,7
0 ,8
0 ,9
1
0
0,1
0 ,2
0 ,3
0 ,4
0 ,5
0 ,6
0 ,7
0 ,8
0 ,9 1
Recall
Prec
isio
n
F_A_2_TJW_Qclass_4 (15)
F_A_2_TJW_Qcomp_2 (15)
F_A_2_CMU_Taste_5 (15)
F_A_2_TJW_Qind_5 (15)
F_B_2_i2Rnus_1 (6)
F_B_2_i2Rnus_2 (6)
F_B_2_COLUMBIA_RR9_storyqeibteviscon (15)F__B_2_COLUMBIA_RR8_textibviscon(15)F_B_2_THU03_3 (0.49)
F_B_2_THU02_2 (0.5)
TRECVID 2006 21
2005: Automatic runs top 10 MAP (of 42)(mean elapsed time (mins) / topic)
0
0 .1
0 .2
0 .3
0 .4
0 .5
0 .6
0 .7
0 .8
0 .9
1
0
0.1
0.2
0 .3
0 .4
0 .5
0 .6
0 .7
0 .8
0 .9 1
Recall
Prec
isio
n
F_B_2_NUS_PRIS_1 (0.55)
F_A_2_TJW_VM_4 (15)
F_A_2_TJW_TVM_2 (15)
F_A_2_TJW_V_3 (15)
F_B_2_NUS_PRIS_2 (0.56)
F_A_2_TJW_TV_5 (15)
F_A_2_NUS_PRIS_3 (0.3)
F_C_2_ColumbiaA2_5 (15)
F_B_2_UvAMM_6 (0.7)
F_A_2_PicSOMF2_3 (0.14)
TRECVID 2006 22
Significant differences among top 8 automatic runs (using randomization test, p < 0.05)
A_2_TJW_Qclass_4 B_2_COLUMBIA_RR9_storyqeibteviscon_1 B_2_COLUMBIA_RR8_textibviscon B_2_i2Rnus_2
A_2_TJW_Qcomp_2 B_2_i2Rnus_2 B_2_COLUMBIA_RR9_storyqeibteviscon_1 B_2_COLUMBIA_RR8_textibviscon
A_2_CMU_Taste_5 B_2_COLUMBIA_RR9_storyqeibteviscon_1 B_2_COLUMBIA_RR8_textibviscon
B_2_i2Rnus_1 B_2_COLUMBIA_RR9_storyqeibteviscon_1 B_2_COLUMBIA_RR8_textibviscon
Run name (MAP)
A_2_TJW_Qclass_4 (0.087)
A_2_TJW_Qcomp_2 (0.086)
A_2_CMU_Taste_5 (0.079)
A_2_TJW_Qind_5 (0.076)
B_2_i2Rnus_1 (0.075)
B_2_i2Rnus_2 (0.067)
B_2_COLUMBIA_RR9… (0.060)
B_2_COLUMBIA_RR8… (0.056)
*
=
=
=
=
>
>
>
TRECVID 2006 23
2006: Manual runs top 10 MAP (of 11)(mean human effort (mins) / topic)
0
0 ,1
0 ,2
0 ,3
0 ,4
0 ,5
0 ,6
0 ,7
0 ,8
0 ,9
1
0
0,1
0,2
0 ,3
0 ,4
0 ,5
0 ,6
0 ,7
0 ,8
0 ,9 1
Recall
Prec
isio
n
M_A_2_FD_M_TEXT_1 (12,8)
M_A_2_KSpaceM3_3 (5)
M_A_2_CLIPSLISLSR_5 (1,12)
M_A_2_KSpaceM5_5 (5)
M_A_2_KSpaceM1_1 (5)
M_A_2_CLIPSLISLSR_6 (1,05)
M_A_2_FD_MM_BC_3 (12,75)
M_A_2_FD_M_TRAIN_TEXT_2 (12,75)
M_A_2_BILKENT1_1 (6,2)
M_A_1_BILKENT2_2 (5,38)
TRECVID 2006 24
2005: Manual runs top 10 MAP (of 26)(mean human effort (mins) / topic)
0
0 .1
0 .2
0 .3
0 .4
0 .5
0 .6
0 .7
0 .8
0 .9
1
0
0.1
0.2
0 .3
0 .4
0 .5
0 .6
0 .7
0 .8
0 .9 1
Recall
Prec
isio
n
M_A_2_CMU.Manu.ExpECA.QC04CR.PU_5 (15)
M_A_2_CMU.Manu.ExpE.QC05U_7 (15)
M_A_2_PicSOMM3_2 (0.93)
M_A_2_FD_MM_BC_1 (11.1)
M_A_2_OUMT_M7TE_7 (5.06)
M_A_2_OUMT_M6TS_6 (5.02)
M_A_2_PicSOMM2_4 (0.87)
M_A_2_FD_AOH_LR_ONLINE_3 (11.1)
M_A_1_OUMT_M5T_5 (5.01)
M_A_1_dcu_manual_text_img_6 (3)
TRECVID 2006 25
2006: Interactive runs top 10 MAP (of 36) (mean elapsed time for all == ~15 mins/topic)
0
0 ,1
0 ,2
0 ,3
0 ,4
0 ,5
0 ,6
0 ,7
0 ,8
0 ,9
1
0
0,1
0,2
0 ,3
0 ,4
0 ,5
0 ,6
0 ,7
0 ,8
0 ,9 1
Recall
Prec
isio
n
I_A_2_CMU_See_1
I_B_2_UvA_MM_1
I_A_2_CMU_Hear_2
I_A_2_UCFVISION_1
I_A_2_CMU_ESP_3
I_B_2_UvAMM_2
I_B_1_FXPAL5LNP_5
I_B_1_FXPAL2LNC_2
I_B_1_FXPAL1LN_1
I_B_1_FXPAL4UNC_4
TRECVID 2006 26
2005: Interactive runs top 10 MAP (of 44) (mean elapsed time for all == ~15 mins/topic)
0
0 .1
0 .2
0 .3
0 .4
0 .5
0 .6
0 .7
0 .8
0 .9
1
0
0.1
0.2
0 .3
0 .4
0 .5
0 .6
0 .7
0 .8
0 .9 1
Recall
Prec
isio
n
B_2_UvAMM_1
A_2_CMU.MotoX_6
B_2_CMU_Mon_1
A_2_CMU.Snowboarding_S
A_1_FXPAL1LCN_2
A_1_FXPAL0LN_1
A_1_FXPAL4LC_5
B_2_UvAMM_4
B_2_UvAMM_2
A_1_FXPAL2RAN_3
TRECVID 2006 27
Significant differences among top 8 interactive runs (using randomization test, p < 0.05)
A_2_CMU_See_1 B_2_UvAMM_1
A_2_UCFVISION_1 A_2_CMU_ESP_3 B_2_UvAMM_2 B_1_FXPAL5LNP B_1_FXPAL4UNC
A_2_CMU_Hear_2
Run name (MAP)
A_2_CMU_See_1 (0.303)
B_2_UvAMM_1 (0.267)
A_2_CMU_Hear_2 (0.226)
A_2_UCFVISION_1 (0.225)
A_2_CMU_ESP_3 (0.216)
B_2_UvAMM_2 (0.212)
B_1_FXPAL5LNP_5 (0.210)
B_1_FXPAL4UNC_4 (0.210)
* >
>
>
>
>
>
>
TRECVID 2006 28
2006: Average precision by topic
0
0 ,1
0 ,2
0 ,3
0 ,4
0 ,5
0 ,6
0 ,7
0 ,8
0 ,9
1
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
Topic number
Mea
n av
erag
e pr
ecis
ion
Interactive maxManual maxAutomatic maxInteractive medianManual medianAutomatic median
Condoleezza Rice
People in uniform and in formation
Soccer goalposts
Soldiers, police or guards escorting a prisoner
Events
TRECVID 2006 29
2005: Average precision by topic
0
0 .1
0 .2
0 .3
0 .4
0 .5
0 .6
0 .7
0 .8
0 .9
1
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
Topic number
Mea
n av
erag
e pr
ecis
ion
Interactive maxManual maxAutomatic maxInteractive medianManual medianAutomatic medianTennis
playerTony Blair Soccer match goal
People entering/leaving a building
TRECVID 2006 30
2006: Interactive runs’ median average precision by topic
0,559
0,356 0,3550,324
0,27 0,266
0,148 0,137 0,1340,105 0,092 0,079 0,073 0,071 0,068 0,067 0,066 0,061 0,05 0,049 0,038 0,037 0,034 0,03
0
0,1
0,2
0,3
0,4
0,5
0,6
195 196 179 194 178 188 181 177 187 183 191 190 182 184 180 185 193 176 186 174 189 173 175 192
Interactive median AP195: Soccer goalposts 196: Scenes with snow179: People in uniform and in formation 194: Condoleezza Rice178: Saddam Hussein with at least one other person's face 173: Tall buildings (more than 4 stories)175: Soldier/s, police, or guard/s escorting a prisoner192: Greeting by at least one kiss
TRECVID 2006 31
2005: Interactive runs’ median average precision by topic
0,56 0,546
0,486
0,4050,389
0,339 0,336
0,286 0,275 0,274 0,27 0,258
0,195
0,138
0,098 0,097 0,0960,074 0,067 0,067 0,065 0,057 0,044
0,0130
0,1
0,2
0,3
0,4
0,5
0,6
156 153 171 149 151 165 154 155 158 150 152 164 159 161 163 168 169 167 166 157 170 160 172 162
Interactive median AP 156: Tennis players on the court – both players visible at the same time153: Tony Blair171: Goal being made in a soccer match149: Condoleezza Rice151: Omar Karami
TRECVID 2006 32
2006: Manual runs’ median average precision by topic
0,119
0,073 0,0610,034 0,032 0,025 0,024 0,015 0,011 0,011 0,011 0,009 0,008 0,006 0,005 0,005 0,0040,00020,001 0,001 0,001 0,001 0,001 00
0,1
0,2
0,3
0,4
0,5
0,6
178 179 195 181 188 194 196 187 191 177 174 183 186 184 173 189 190 193 185 180 176 175 192 176
Manual median AP 178: Saddam Hussein with at least one other person's face 179: People in uniform and in formation 195: Soccer goalposts 181: One or more soldiers or police with one or more weapons and military vehicles 188: Something burning with flames visible 175: Soldier/s, police, or guard/s escorting a prisoner192: Greeting by at least one kiss on the cheek176: Daytime demonstration or protest with at least part of one building visible
TRECVID 2006 33
2005: Manual runs’ median average precision by topic
0,255
0,2
0,1530,128
0,076 0,07 0,056 0,053 0,048 0,04 0,037 0,032 0,029 0,02 0,016 0,015 0,013 0,009 0,007 0,005 0,004 0,004 0,002 0,0020
0,1
0,2
0,3
0,4
0,5
0,6
151 152 153 171 164 154 161 165 156 158 149 169 168 155 150 163 160 172 159 170 157 166 162 167
Manual median AP151: Omar Karami, the former PM of Iraq 152: Hu Jintao, President of the People’s Republic of China153: Tony Blair171: tall building164: ship or boat
TRECVID 2006 34
2006: Automatic runs’ median average precision by topic
0,12 0,117 0,114
0,042 0,039 0,037 0,035 0,024 0,013 0,01 0,007 0,006 0,006 0,006 0,004 0,004 0,003 0,001 0,001 0,001 0,001 0,001 0 00
0,1
0,2
0,3
0,4
0,5
0,6
196 178 195 188 194 179 177 182 187 183 181 186 184 173 191 185 174 193 192 190 176 175 189 180
Automatic median AP 196: Scenes with snow178: Saddam Hussein with at least one other person's face 195: Soccer goalposts 188: Something burning with flames visible 194: Condoleezza Rice175: Soldier/s, police, or guard/s escorting a prisoner189: A group of at least 4 people dressed in suits, seated, and with at least one flag180: US President George W. Bush Jr. walking
TRECVID 2006 35
2005: Automatic runs’ median average precision by topic
0.166 0.165 0.157 0.154
0.0840.05 0.042 0.039 0.037 0.038 0.034 0.032 0.028
0.009 0.008 0.008 0.007 0.004 0.004 0.002 0.001 0.001 0 00
0.1
0.2
0.3
0.4
0.5
0.6
171 151 153 152 164 154 168 156 149 158 169 161 165 163 150 172 160 166 170 157 155 162 167 159
Automatic median AP 171: Goal being made in a soccer match151: Omar Karami, the former PM of Iraq 153: Tony Blair152: Hu Jintao164: ship or boat
TRECVID 2006 36
2006: Mean average precision (interactive max)vs total number relevant
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
0 50 100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
Total number of relevant
Mea
n av
erag
e pr
ecis
ion
TRECVID 2006 37
2005: Mean average precision (interactive max)vs total number relevant
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 50 100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
850
900
950
1000
1050
1100
1150
1200
1250
Total number of relevant
Mea
n av
erag
e pr
ecis
ion
TRECVID 2006 38
Who did what ?
Speaker slots to follow: Carnegie Mellon
University University of Amsterdam Columbia University IBM
Demos ? Posters ?
TRECVID 2006 39
Observations 2005 !
We’re still getting “ Lots of variation, interesting shot browsing interfaces, mixture of interactive & manual”, and additionally automatic runs;
Top performances on all 3 search types are up, even with more difficult data, but data is different, systems are different … anybody run 2004 system on 2005 data ?
Some leveraged the structured nature of B/News; Many did automatic search & fewer did interactive search
because its easier (no users) ? Most common issue explored was the best combination of
text vs. image search vs. concept/features; Search participants are the “regulars” plus new groups, some
bigger, some smaller;
TRECVID 2006 40
Observations 2006
Top performances on all 3 search types are down Test collection is twice as big Half as many relevant shots Harder topics ? Data ? ‘Events’ in topics ?
Again, increase in automatic search & fewer did interactive search, almost nobody manual It’s easier (no users)? Topic to query translation good enough? ?
Manual runs no longer outperform automatic – is this because so few manual, and does it make sense to keep this processing type ?