Not All Mementos Are Created Equal: Measuring The Impact Of Missing Mementos
Post on 29-Nov-2014
640 Views
Preview:
DESCRIPTION
Transcript
Not All Mementos are Created Equal: Measuring the Impact of Missing
ResourcesJustin F. Brunelle, Mat Kelly, Hany SalahEldeen, Michele C. Weigle,
Michael L. Nelson
Old Dominion University
{jbrunelle, mkelly, hany, mweigle, mln}@cs.odu.edu
1
Goal: Automatically measure the quality of the archives
2
20% missing
Goal: Automatically measure the quality of the archives
3
14% missing
Goal: Automatically measure the quality of the archives
4
28% missing
Goal: Automatically measure the quality of the archives
5
7% missing
“Live” XKCD
• Missing 17% of embedded resources
• Looks complete
6
“Live” XKCD
• Take three resources:• Logo
• Main Comic
• Navigation Strip
• Relative importance?
• All present in “Live” XKCD
7
Damaging XKCD
• Created a local memento
• Removed the logo and navigation strip
• Now missing 29% of embedded resources
• Human assessment: looks OK
8
Damaging XKCD
• From our local memento
• Removed the Main Comic
• Now missing 24% of embedded resources
• Human assessment: Not a usable memento
9
Damaging XKCD
• From our local memento
• Removed the Main Comic
• Now missing 24% of embedded resources
• Human assessment: Not a usable memento
• Percent of missing embedded resources is not a suitable metric for memento quality
10
Image Importance
• Size (as percentage of all pixels)
11
Image Importance
• Size
• Position (in viewport?)
12
Image Importance
• Size
• Position
• Centrality (in the vertical or horizontal center?)
13
Missing CSS
• Damage not limited to images
• When missing CSS, content shifts left
14
Missing CSS
• Partitioned snapshot into thirds
• Background color determined
• Pixel-by-pixel comparison
15
Missing CSS
• Calculated the amount of content in each vertical third
• If >=80% in left column and missing CSS, CSS is important
• Only performed if stylesheets are missing
16
Percent Missing vs. Weighted Damage
• 𝑀𝑀 = Percent of embedded resources missing
𝑀𝑀 =𝐸𝑚𝑏𝑒𝑑𝑑𝑒𝑑 𝑅𝑒𝑠𝑜𝑢𝑟𝑐𝑒𝑠 𝑀𝑖𝑠𝑠𝑖𝑛𝑔
𝑇𝑜𝑡𝑎𝑙 𝐸𝑚𝑏𝑒𝑑𝑑𝑒𝑑 𝑅𝑒𝑠𝑜𝑢𝑟𝑐𝑒𝑠
• 𝐷𝑀 = Damage rating of missing embedded resources
𝐷𝑀 =𝐷𝑀𝐴𝑐𝑡𝑢𝑎𝑙𝐷𝑀𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙
𝐷𝑀𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙 = 𝑖=1
𝑛[𝐼|𝑀𝑀]𝐷[𝐼|𝑀𝑀] (𝑖)
𝑛[𝐼|𝑀𝑀]+ 𝑖=1
𝑛[𝐶]𝐷[𝐶] (𝑖)
𝑛𝐶 17
𝐼 = 𝐼𝑚𝑎𝑔𝑒
𝑀𝑀 = 𝑀𝑢𝑙𝑡𝑖𝑀𝑒𝑑𝑖𝑎
𝐶 = 𝐶𝑆𝑆
Calculated Damage
• 𝑀𝑀 = Percent of embedded resources missing
𝑀𝑀 =𝐸𝑚𝑏𝑒𝑑𝑑𝑒𝑑 𝑅𝑒𝑠𝑜𝑢𝑟𝑐𝑒𝑠 𝑀𝑖𝑠𝑠𝑖𝑛𝑔
𝑇𝑜𝑡𝑎𝑙 𝐸𝑚𝑏𝑒𝑑𝑑𝑒𝑑 𝑅𝑒𝑠𝑜𝑢𝑟𝑐𝑒𝑠
• 𝐷𝑀 = Damage rating of missing embedded resources
𝐷𝑀 =𝐷𝑀𝐴𝑐𝑡𝑢𝑎𝑙𝐷𝑀𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙
𝐷𝑀𝑃𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙 = 𝑖=1
𝑛[𝐼|𝑀𝑀]𝐷[𝐼|𝑀𝑀] (𝑖)
𝑛[𝐼|𝑀𝑀]+ 𝑖=1
𝑛[𝐶]𝐷[𝐶] (𝑖)
𝑛𝐶 18
𝑀𝑀 = 0.29𝐷𝑀 = 0.36
𝑀𝑀 = 0.24𝐷𝑀 = 0.41
What do Web users think?
19
Setting up the Turk Test
• Amazon’s mechanical turkers represent real web users
• Two legs of the experiment:• Manually damaged memento vs. Live resource
• 10 manually damaged mementos and resources
• Real Memento vs. Real Memento• 100 URI-Rs, one memento per year
20
21
22
23
Quantifying Turker Response
• 5 turkers for each comparison
• Assume 𝐷𝐴 < 𝐷𝐵 (i.e., A is less damaged)
• Measure turker agreement:
Image A Image B Split
Turker 1 Y
Turker 2 Y
Turker 3 Y
Turker 4 Y
Turker 5 Y
Result 5 0 5-024
Quantifying Turker Response
• 5 turkers for each comparison
• Assume 𝐷𝐴 < 𝐷𝐵 (i.e., A is less damaged)
• Measure turker agreement:
Image A Image B Split
Turker 1 Y
Turker 2 Y
Turker 3 Y
Turker 4 Y
Turker 5 Y
Result 4 1 4-125
Quantifying Turker Response
• 5 turkers for each comparison
• Assume 𝐷𝐴 < 𝐷𝐵 (i.e., A is less damaged)
• Measure turker agreement:
Image A Image B Split
Turker 1 Y
Turker 2 Y
Turker 3 Y
Turker 4 Y
Turker 5 Y
Result 0 5 0-526
Quantifying Turker Response
• 5 turkers for each comparison
• Assume 𝐷𝐴 < 𝐷𝐵 (i.e., A is less damaged)
• Measure turker agreement:
Image A Image B Split
Turker 1 Y
Turker 2 Y
Turker 3 Y
Turker 4 Y
Turker 5 Y
Result 0 5 0-527
No agreement!
Quantifying Turker Response
• 5 turkers for each comparison
• Assume 𝐷𝐴 < 𝐷𝐵 (i.e., A is less damaged)
• Measure turker agreement:
Image A Image B Split
Turker 1 Y
Turker 2 Y
Turker 3 Y
Turker 4 Y
Turker 5 Y
Result 3 2 3-228
Quantifying Turker Response
• 5 turkers for each comparison
• Assume 𝐷𝐴 < 𝐷𝐵 (i.e., A is less damaged)
• Measure turker agreement:Defined only by 4-1 and 5-0 splits
Image A Image B Split
Turker 1 Y
Turker 2 Y
Turker 3 Y
Turker 4 Y
Turker 5 Y
Result 3 2 3-229
Split decision No agreement!
Turk Results
• Compared damage(𝐷𝑀) and percent missing (𝑀𝑀)• M0: Manually damaged mementos
• D: Internet Archive Mementos
• M: Percent missing in Internet Archive Mementos
• 𝐷𝑀vs. Live: 78.9% true positives
• 𝑀𝑀 vs. Live: 47.2% true positives• Worse than a 50/50 chance!
• 𝐷𝑀 vs 𝐷𝑀: 58.4% true positives
30
Damage in the Internet Archive
• 1,000 URI-Rs from Bitly
• 1,000 URI-Rs from Archive-it
• Remove non-HTML representations
• 1,861 URI-Rs remaining
• Sample 1 memento per year from Internet Archive
• Measure damage
31
• Measured Internet Archive mementos
• Damage generally improves over time
• Despite missing more resources over time
Damage in the Internet Archive
32
Conclusions
• 𝐷𝑀 is a better measure of memento quality than 𝑀𝑀• On average, the Internet Archive is improving its quality over time
• Internet Archive is also missing more embedded resources over time
• Improved damage weighting (58.4% correct can be improved)
• Measure cumulative temporal damage ratings• E.g., a logo that never changes for 10 years and is used by 100 mementos is
more important than the one used in a single memento.
33
top related