Top Banner
41

Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

Jul 14, 2019

Download

Documents

doanliem
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

Bentley et al� scheme on word�level

Regel bib book� book� geo news obj� obj� paper� paper� pic progc progl progp trans average

MF���� ����� ���� ���� ���� ����� ����� ��� � ����� ���� ��� ���� ���� ���� �� ����

MRI��� ����� ��� ����� ����� ����� ����� ��� ��� ���� ��� ���� ���� ���� ���� ��� �PRI��� ����� ��� ����� ��� ����� ���� ����� ���� �� � ��� ��� ���� ���� ��� ���� �MF

���� ����� ���� ����� ����� ����� ���� ����� ����� ��� ��� � ����� ���� ��� ��� ���� FC

� ��� ����� ���� ���� ��� ����� �� �� ����� �� �� ���� ���� ���� ��� ���� ���� �MF��� ���� ���� ����� ����� ���� ����� ����� ��� ��� ����� ��� ���� ����� ��� �����PRI��� ��� � ��� ���� ����� ��� ����� ����� ����� ��� ��� ���� ����� ���� ���� ������MRI��� ��� � ��� ���� ����� ��� ����� ����� ����� ��� ��� ���� ����� ���� ���� ������MRI��� ����� �� ����� ���� ���� ����� ���� ����� �� � ��� ����� ����� ���� ���� ������FC ���� ���� ���� ���� ����� ����� �� � ��� � �� �� ���� ����� ����� ���� ��� � ������

MRI��� ���� �� � ����� ��� ��� � ����� ���� ����� �� ����� ����� ���� ��� ���� �����MRI��� ����� ��� ����� ����� ��� � ����� ����� ����� ���� ����� ����� ����� ���� ��� ��� MRI��� ���� �� �� ���� ����� ���� ����� ���� ����� ���� ����� ����� ���� ���� ��� ��� ��MF

�� � ���� ���� ���� ����� ����� ���� ���� ��� ��� ��� ����� ����� ��� ����� ��� MRI��� ��� � �� �� ����� ����� ���� ����� ��� ���� ���� ����� ���� ����� ��� ��� �����PRI��� ��� �� � ���� ����� ��� ����� ���� ����� ����� ���� ����� ����� ���� ��� �����MRI�� ���� �� �� ���� ����� ��� ����� ��� � ����� ����� ����� ����� ����� ��� ��� ��� MRI� � ����� �� � ���� ����� ����� ���� ���� ����� ����� ����� ���� ����� �� �� �����PRI

���� ����� ���� ���� ����� ���� ����� ��� ��� ���� ����� ��� � ��� � ��� ��� �����PRI��� ���� ���� ��� ���� ����� ����� ���� ���� ���� ����� ���� ����� ��� ��� ������MF��� ����� ��� ��� � ���� ����� ���� ��� ����� ��� ����� ��� � ����� ���� ����� �����MRI

���� ���� ���� ����� ���� ����� ���� ���� ���� ����� ����� ����� ����� ��� ��� ������MRI

���� ��� � ��� ����� ���� ����� ����� ����� ����� ���� ����� ���� ���� �� �� � ������PRI

���� ����� ��� ���� ����� ��� � ����� ����� ����� ����� ��� ����� ���� ���� ��� ������PRI

���� ���� �� � ����� ����� ����� ���� ���� ����� ����� ���� ���� ����� ��� ��� �����MRI

���� ���� ��� ����� ����� ����� ����� ����� ����� ����� ����� ���� ����� �� � ��� ����MRI

���� ���� ��� ����� ����� ��� � ����� ����� ���� ����� ����� ����� ����� ��� �� ���� �PRI��� ����� ���� ����� ��� � ����� ����� ��� ���� ���� ����� ����� ���� �� � �� ���� �MRI

���� ���� ��� ����� ����� ����� ����� ���� ����� ����� ����� ����� ����� �� ��� ������MRI

���� ����� ���� ����� ���� ����� ����� ���� ���� ���� ����� ����� ���� ��� ��� �����MRI

��� ���� ��� ���� ����� ����� ����� ��� � ����� ����� ����� ����� ����� ��� ��� ���� �PRI��� ���� ���� ����� ���� ����� ���� ��� ����� ����� ���� ���� ����� �� ��� ���� �MRI

�� � ����� ��� ����� ����� ���� ���� ��� ����� ����� ���� ���� ����� �� �� �� ������PRI��� ��� � ���� ����� ���� ���� ����� �� � ���� ����� ���� ��� � ����� ��� �� ������PRI

���� ����� ����� ���� ��� � ���� ���� ���� ���� ����� ����� ���� ����� ��� �� � ���� PRI�� ���� ���� ����� ���� ����� ����� �� ����� ���� ����� ��� ����� �� �� ����� ����MHD� � ����� �� � ����� ����� ���� ���� ����� ��� ���� ���� ���� ����� ��� ��� ������PRI� � ���� ��� ����� ��� ���� ����� ��� ���� ����� ����� ���� ���� �� �� ����� ������MHD���� ���� �� � ����� ����� ����� ���� ��� � ��� � ���� ����� ��� ����� �� � ����� ���� �PRI

���� ����� ����� ����� ���� ���� ����� ���� ���� ���� ����� ���� ���� �� �� ��� ����MHD��� ���� �� �� ���� ����� ���� ����� ����� ���� ����� ���� ����� ���� �� �� ����� ���PRI

���� ����� ���� ����� ���� ���� ����� ���� ����� ���� ���� ����� ����� �� �� ����� ��� �MHD���� ����� �� � ����� ���� ��� � ����� ��� ��� ����� ����� ��� ����� �� �� ��� � ����PRI

���� ���� ����� ����� ����� ���� ����� ��� ����� ���� ����� ����� ���� �� �� ����� ��� �MTF� ��� � ��� ���� ���� ��� ��� ���� ���� ����� ����� ����� ��� �� � ���� ��� ��PRI

��� ����� ����� ���� ����� ���� ���� �� ����� ���� ����� ���� ����� �� �� ���� ��� �MHD���� ���� ���� ����� ���� ����� ���� �� � ��� ����� ����� ����� ��� �� �� ���� ��� �MRI

���� ���� ����� ��� ���� ��� � ��� ����� ���� ���� ���� ��� ���� ��� �� ��� �MHD��� � ���� ���� ��� ����� ����� ���� �� � ���� ����� ����� ���� ����� �� �� ����� �����MHD����� ���� ���� ���� ����� ���� ���� �� � ����� ����� ���� ���� ����� �� �� ����� �����PRI

�� � ���� ����� ���� ����� ��� ����� �� � ���� ��� ����� ����� ���� �� � ����� �����MHD����� ����� ��� ��� ���� ���� ����� �� �� ����� ����� ���� ����� ����� �� �� ���� �����MHD������ ���� ���� ����� ����� ��� ���� �� � ����� ����� ���� ����� ��� �� �� ����� �����MF

����� ����� �� �� ����� ����� ����� ���� ���� ��� ����� ����� ����� ���� �� � ����� �����MHD��� ��� �� � ����� ����� ���� ���� ��� ����� ���� ����� ����� ���� ��� ����� ������MTF ����� ��� � ����� ����� ����� ���� �� �� ��� � ����� ���� ��� ���� �� � ���� ������

MF������� ����� ����� ���� ����� ���� ���� �� �� ����� ��� ���� ����� ���� �� � ���� �����

MF������ ���� ����� ���� ����� ����� ���� �� � ���� ����� ���� ���� ����� �� � ���� ������

TRANS ���� �� � ����� ���� ��� ���� ��� ���� ���� ����� ���� ����� ���� ����� �����MF

������ ����� ����� ���� ���� ���� ����� �� � ��� ���� ���� ���� ����� �� � ���� ��� ��MF

����� ����� �� � ����� ���� ���� ���� �� �� ��� ���� ����� ��� ����� �� ��� ��� ��MF

���� � ����� ���� ��� ����� ��� ���� ��� ����� ����� ����� ��� ���� ���� ����� ������MF� � ���� �� � �� ���� ��� �� � ���� ��� ��� � ����� �� �� ����� ��� ���� ������MF

����� ����� ���� ���� ����� ���� �� � ���� ���� ����� ����� ���� ��� ���� ����� �����MF���� ����� ���� �� � �� �� ���� �� ����� ����� ���� ����� ���� ���� ���� ����� �� ���MF���� ���� �� �� ��� ���� ��� ���� ����� ���� ���� ����� �� �� ����� �� � ������MF���� ����� ����� ����� ���� ���� ��� ���� ���� ����� ��� ���� �� � ����� ���� ������MF��� � ���� ����� ��� ��� �� ���� ��� ���� ����� ������ ����� ��� ���� ����� ����MF����� ���� ���� ����� ��� �� � ���� ��� ���� ����� ������ ���� ��� ���� ����� ��� �MF����� ���� ����� ����� ��� �� � ���� ��� ���� ����� ������ ���� ���� ���� ����� ��� ��MF������ ���� ����� ����� ��� �� ���� ��� ���� ����� ������ ���� ���� ���� ����� ��� �MTL ���� �� ���� ���� �� ����� � � �� �� � ���� ��� ����� ��� ��� �����RMHD ����� �� �� ���� ���� ��� ��� � ����� ���� ����� ���� ���� ���� ���� ��� ��� �COMB ��� � �� ���� ���� ���� ����� ���� ����� ����� ���� ����� ���� ��� ����� �����TS ����� ����� ��� � ����� ���� ���� ���� ����� ����� ���� ����� ��� �� �� ���� ��� ��RST ���� ��� ��� ����� ����� ��� ���� ���� ����� ���� ����� ���� �� �� ����� ��� �BIT ����� ����� ����� ����� ����� ���� ���� ���� ��� � ����� ���� ��� �� �� ����� ����� CTR ����� ����� ���� ���� ����� ����� ���� ��� � ��� ����� ��� � ��� �� �� ����� ������SPL ���� ���� ���� ����� ����� ����� �� ��� � ��� ����� ��� � ����� �� �� ����� �����RMTF ��� ���� ����� ����� ��� � ���� �� �� ����� ����� ���� ���� ����� �� �� ���� ������

Table ��� Results of the word�level compression experiment�

xi

Page 2: Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

Bentley et al� scheme on byte�level

algorithm bib book� book� geo news obj� obj� paper� paper� pic progc progl progp trans average

MF���� ����� ��� ��� ���� ��� ���� ���� ����� ���� ���� ���� ����� ����� ���� ������

MF���� ��� � ���� ���� �� � ��� ��� ����� ����� ����� ��� �� �� ����� ����� ���� �����

MRI��� ���� ����� ���� ��� �� � ���� ���� ����� ���� ��� � �� � ����� ����� ���� ���� MHD��� ��� ���� ����� ��� ���� ����� ���� ����� ����� ���� ��� ����� ����� ��� ���� MRI

���� ����� ����� ����� ����� �� ���� ����� ����� ����� ��� � �� � ����� ����� ���� ��� MF��� ���� ���� ���� ����� ���� ���� ����� ���� ����� ���� �� �� ����� ����� ��� ��� �TRANS ��� ����� ���� ����� ��� �� ����� ���� ��� ���� ���� ����� ����� ����� �����FC

� ���� ���� ����� ����� ���� ����� ����� ����� ����� ��� � �� �� ����� ����� ����� ����FC ���� ���� ����� ����� ���� ����� ����� ����� ����� ��� � �� �� ����� ����� ����� ��� �

MHD��� ����� ��� � ��� � ��� ��� ����� ��� ���� ����� ��� � �� �� ����� ����� �� ������MF

�� � ����� ����� ����� ��� �� �� ���� ����� ����� ����� ��� � �� �� ����� ���� ���� ���� MF��� ���� ���� ���� ��� ����� ����� ���� ����� ��� ��� ��� ����� ��� � ���� ������PRI��� ��� ���� ���� ����� �� � ����� ��� ���� ���� ���� ��� ���� ����� ����� �����PRI

���� ��� � ���� ���� ����� �� � ���� ���� ��� ����� ��� ���� ����� ����� ����� �����MHD� � ����� ����� ����� ��� �� � ����� ���� ���� ����� ���� ���� ����� ����� ���� ��� ��PRI��� ����� ����� ����� ����� ��� ���� ���� �� � ����� ����� ����� ��� ��� ����� �����MRI��� ����� ����� ����� ����� ��� ���� ���� �� � ����� ����� ����� ��� ��� ����� �����PRI

���� ���� ����� ����� ���� ��� ��� ���� ��� ����� ����� ���� ��� ���� ����� �����MRI

���� ���� ����� ����� ����� ��� ���� ���� �� ����� ����� ���� ���� ���� ����� ����MRI��� ���� ����� ���� ��� ��� ����� ����� �� �� ����� ����� ����� ��� ���� ����� ����MRI

���� ����� ����� ���� ���� ��� ����� ����� �� �� ����� ����� ����� ����� ��� ����� ��� �MRI��� ����� ��� ���� ���� �� � ����� ���� �� �� ���� ����� ���� ���� ���� ���� ����MRI

���� ����� ��� ���� ���� �� � ����� ���� �� � ���� ����� ����� ����� ���� ����� ����MF

����� ���� ����� ���� ��� �� � ��� ���� �� �� ��� � ���� ����� ��� �� ����� �� �MRI��� ��� ����� ���� ��� ����� ����� ���� �� � ���� ����� ���� ���� ��� ���� �� ��MRI

���� ��� � ����� ���� ����� ���� ���� ���� �� � ���� ����� ���� ����� ��� ���� �� ���MHD���� ����� ��� �� �� ��� ����� ���� ����� ���� �� �� ����� ���� ��� � �� ���� �� ��MRI��� ����� ��� � �� � ����� ���� ����� ��� � ��� �� � ���� ���� ����� �� �� ��� � �� ���MRI

���� ����� ��� � �� � ���� ���� ���� ��� � ���� �� � ���� ���� ���� �� �� ��� � �� ��PRI��� ���� ���� ���� ���� ����� ��� ���� ���� ��� ����� ����� ����� �� �� ���� �� ��PRI

���� ���� ���� ���� ���� ���� ��� ���� ���� ���� ����� ����� ����� �� � ���� �� ���MRI��� ����� ���� �� � ����� ���� ���� ��� ��� �� �� ���� ����� ���� �� �� ��� �� ���MRI

���� ����� ���� �� � ����� ���� ��� ����� ��� �� �� ����� ����� ���� �� �� ����� �� ���MRI�� ����� ���� �� � ���� ���� ��� � ����� ��� �� �� ����� ����� ��� � �� �� ����� �� � MRI

��� ����� ���� �� � ���� ���� ��� ����� ��� �� �� ����� ����� ��� � �� �� ����� �� ���MRI� � ��� � ���� �� �� ����� ���� ���� ����� �� �� �� ����� ����� ��� �� �� ����� �� �MRI

�� � ��� � ���� �� �� ����� ���� ����� ����� �� �� �� ����� ����� ��� �� � ����� �� �PRI��� ����� ��� �� � ��� ����� ���� ���� ����� �� �� ���� ��� � ���� �� ����� �� ��PRI

���� ���� ��� �� � ����� ����� ����� ���� ���� �� �� ���� ��� ���� �� � ���� �� �MF

����� ��� � ���� ���� ����� ���� ���� ����� ����� ���� ����� ���� ���� ���� ���� ���PRI��� ����� �� � �� � ����� ��� ���� ���� ���� ���� ����� ����� ����� ���� ���� �����MHD���� ����� ���� ��� ���� ��� ����� ����� ��� �� ����� ����� ���� ��� ����� �����PRI

���� ����� �� � �� � ����� ��� ��� � ���� ����� ���� ����� ���� ����� ��� ���� �����PRI��� ��� � �� �� ��� ����� ����� ���� ��� ���� ���� ����� ���� ����� ��� ����� �����PRI

���� ��� � �� �� ��� ����� ����� ����� ��� ��� ���� ���� ���� ����� ���� ����� �����PRI��� ����� �� � ���� ���� ����� ����� ���� ���� ���� ���� ����� ���� ��� ���� ����PRI

���� ����� �� � ���� ����� ����� ���� ���� ���� ���� ���� ���� ���� ��� ���� ���PRI�� ���� �� ���� ����� ����� ����� ���� ����� ���� ���� ��� ��� � �� ���� ���MTF� ����� �� �� �� ����� ����� ��� � ��� ��� �� � ���� ��� � ��� ���� ���� �� �PRI

��� ����� �� ���� ����� ����� ���� ���� ���� ���� ����� ��� ��� � �� � ���� �� ��PRI� � ����� ��� ���� ����� ����� ����� ���� ����� ��� ����� ��� � ���� �� � ��� � �� �PRI

�� � ����� ��� ���� ����� ����� ���� ���� ���� ��� ����� ��� ���� �� ��� � �� �MHD���� ���� �� � ���� ����� ��� ����� ���� ����� ����� ����� ����� ���� ����� ����� ������MF� � ����� ����� ����� ��� ���� ��� ��� ����� ����� ���� ��� ���� ���� ����� �����

MHD��� � ��� �� � ����� ��� � ����� ����� �� � ���� ����� ����� ���� ���� ���� ����� ����MF

����� ���� �� ����� ����� ��� ��� ��� ���� ���� ����� ���� ��� ����� ����� �����MTF ��� �� � ����� ���� ����� ���� �� �� ���� ����� ����� ���� ���� ���� ����� ������

MF���� � ��� �� � ����� ���� ����� ���� �� �� ���� ����� ����� ���� ���� ���� ����� �����

MF���� ����� ���� ����� �� ��� ��� � �� ���� ���� ����� �� ��� � ���� ���� ����MF���� ���� ����� ����� ���� �� � � � �� ����� ���� ���� ��� ���� �� �� �� � ���MF���� ���� ����� ����� ���� �� � ��� � ����� ����� ��� ��� ����� �� �� �� � ����MF��� � ��� ����� ����� ���� ���� ���� ���� ����� ����� � ���� ���� �� �� ���� �����MTL ��� ��� ��� ����� ��� ���� ����� ���� ���� ��� ��� ���� ���� ���� ������RMHD ��� � ����� ���� ����� ����� ���� ���� �� � ���� ����� ����� ����� ���� ���� �� �COMB ����� ���� �� �� ���� ���� ����� ���� �� � �� � ����� ����� ����� �� ����� ��� TS ���� �� � �� �� ���� ����� ���� ��� ����� �� �� ����� ���� ����� �� � ����� ���� RST ����� �� �� �� � ���� ���� ��� � ���� ����� �� �� ����� ���� ����� �� � ����� ����CTR ����� �� �� �� � ����� ���� ���� ���� ����� �� �� ����� ���� ���� ���� ���� ����BIT ���� �� �� �� � ����� ����� ��� ��� ���� �� � ����� ��� ���� ���� ���� �� ��SPL ���� �� ���� ��� � ����� ���� �� ����� ���� ����� ����� ���� ���� ��� �����RMTF ���� �� � ��� ��� ���� ��� � �� � ����� ���� ����� ���� ���� ��� ����� �����

Table ��� Results of the byte�level compression experiment�

x

Page 3: Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

character list

pos� codeword length pos� codeword length

� ���� � �� ����������� ��

� � � � � �

� ���� � � ����������� ��

� ������� �� ������������ ��

� � � � � �

�� ������� �� ������������ ��

�� ���������� � EOF ������������ ��

� � �

�� ���������� �

Table �� The ��� �� ��start�step�stop code �optimized by the phasing technique used in the byte�level compression experiment�

space character list

pos� codeword length pos� codeword length

� �� � � ��� �

� ��� � ��� �

� ��� � ��� �

� ��� �

non space character list

pos� codeword length pos� codeword length

� ����� � � ����������� ��

� � � � � �

� ����� � �� ����������� ��

� ������� ��� ������������ ��

� � � � � �

�� ������� ��� ������������ ��

�� ��������� �

� � �

� ��������� �

Table � The codes used to transmit list positions in the character lists of the word�level compressionexperiment�

ix

Page 4: Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

Sequence geo � Average access costs of FC�� �� ��

FC� ������� MHD���� ����� MRI��� ������ MRI�� ���� �� PRI

���� ������� PRI� � ������

FC ������� MF���� ��� �� MRI��� ������ MF

���� � ������ PRI��� ������� PRI��� ������

MHD� � ������� TRANS ��� �� MRI���� ������ MRI

��� ������� PRI���� ���� � PRI

�� � ���� ��

MF�� � ������� MF��� ��� � MRI

���� ����� � MRI� � ������� MTF� ���� SPL ���� ��

MF����� ������ MF

����� ����� MRI���� ������ MRI

�� � ������ PRI��� ������ RST �������

MHD��� ������� MF��� ����� � MRI���� ����� RMHD ������� MF���� ����� � MTF ������

MHD���� ���� � MHD���� ������ MF� � ������ MHD��� � ���� PRI���� ������ RMTF �����

MRI��� ������� PRI��� ������ MRI��� ������� COMB ������ BIT ������� MF���� �������

MF���� ����� MRI��� ������� PRI

���� ������ PRI��� ������� CTR ������� MF���� �������

MHD��� ������� MRI��� ������ MRI���� ������� PRI

���� ����� PRI��� ����� MF��� � ����

MF����� ���� � PRI

���� ������� MRI��� ������� TS ������� PRI�� ������ MTL ���� �

MRI���� ������� PRI��� ������ MRI

���� ����� PRI��� ������� PRI���� ����� �

Sequence geo � Average access costs of FC�� �������

FC� ������� PRI

���� ��� �� PRI��� ������� MRI��� ������ MHD� � ������� PRI� � �������

FC ������� MRI���� ��� � MRI

���� ������� MRI��� ��� �� PRI

���� ������ SPL ����� �

MRI���� ������ PRI��� ��� �� MRI��� ������ MRI�� ��� ��� PRI��� ������� RMTF �������

MRI��� ������� MRI��� ��� �� MF�� � ����� TS ��� � MTF� ���� �� RST ���� �

TRANS ������� MRI���� ����� RMHD ������� COMB ��� � � BIT ����� MF��� �����

PRI���� ������ MRI��� ���� � MRI

���� ������� PRI���� ��� � PRI

���� ���� �� MTF ��� ��

PRI��� ������ MHD��� ������ MRI��� ������� PRI��� ��� PRI��� ���� �� MF��� ������

MF���� ������ MRI

���� ������ PRI���� ������� MRI

�� � ����� PRI��� ������ MF� � ��� �

MHD��� ����� MRI��� ����� PRI��� ������� MRI� � ������ PRI�� ����� MF���� �� �� �

MF���� ����� PRI

���� ������� MRI���� ����� CTR ������� PRI

�� � ������ MTL ������

Sequence geo � Average access costs of MF����� ���� ��

MF���� ������� MHD��� ����� MRI

�� � ������ PRI���� ������ PRI��� ������ RMTF �����

MF�� � ������� MRI��� ���� �� MRI��� ������ PRI��� ������ PRI

��� ������� FC� �������

MHD���� ������ MRI���� ���� � MRI

���� ����� PRI���� ����� PRI�� ������� FC ���� �

MRI��� ������� MRI��� ���� � PRI���� ���� PRI��� ���� � PRI

�� � ������ MF���� � ��� �

MHD���� ������� MRI��� ���� � PRI��� ����� TS ���� � PRI� � ������� MF����� �����

MHD� � ���� � MRI���� ������ MRI��� ����� PRI

���� ������� BIT ���� � MF���� �������

MRI���� ������� MRI

���� ������ MRI���� ��� PRI��� ������ CTR ����� TRANS ������

MF���� ������� MRI�� ����� PRI��� ���� COMB ������� MTF� ����� MF���� ������

MF����� ������ MRI

��� ������ PRI���� ��� � MF

����� ������� SPL ������� MF���� �� ����

MF��� ������ MRI��� ������ RMHD ����� PRI���� ������� MTF ������� MF��� � ���� ��

MF��� ����� MRI� � ����� MHD��� � ������ PRI��� ������� RST ������ MTL ��� ��

MHD���� ����� � MRI���� ����� MF� � ����� PRI

���� ������� MHD��� �������

Sequence geo � Average access costs of MF����� ������

MF���� ������� MHD��� ������ MRI�� ��� ��� MF

����� ������ PRI��� ���� � RST ������

MF�� � ������� MRI��� ��� � MRI

��� ��� ��� PRI���� ������ CTR ���� � MTF ������

MHD���� ������ MRI��� ��� PRI��� ��� ��� PRI��� ������ PRI���� ������ RMTF �������

MRI��� ����� MRI���� ����� MRI� � ��� ��� COMB ����� PRI��� ����� MF���� ���

MHD� � ����� MRI���� ����� MRI

�� � ��� ��� TS ���� � FC� ������ MF

���� � ��� ��

MRI���� ������� MF� � ����� PRI

���� ��� ��� PRI���� ������� PRI

��� ����� MF����� ���� �

MHD���� ������ MRI��� ���� PRI���� ��� ��� PRI��� ������ PRI�� ���� TRANS ������

MF��� ������� MRI���� ����� PRI��� ��� ��� PRI

���� ����� MHD��� ���� MF���� ��� �

MF���� ������ MRI��� ��� ��� MRI��� ��� ��� PRI��� ������ PRI

�� � ������� MF���� �����

MF����� ������ MRI

���� ��� ��� MRI���� ��� � MTF� ������ PRI� � ������ MF��� � �������

MF��� ������� MRI��� ��� ��� RMHD ����� PRI���� ���� SPL ������ MTL ��� �

MHD���� ������� MRI���� ��� ��� MHD��� � ���� BIT ���� � FC ������

Sequence geo � Average access costs of TRANS� ������

TRANS ������� PRI��� ���� � PRI��� ������� MRI� � ���� � PRI�� ����� � SPL �������

MRI���� ������ MRI��� ���� � PRI

���� ������� MRI�� � ���� PRI

��� ������ RST ������

MRI��� ����� MRI���� ���� � MRI��� ����� PRI��� ������� TS ������ MTF ���� ��

MF���� ����� PRI

���� ������ MRI���� ������ PRI

���� ������� PRI� � ����� RMTF ������

MHD��� ������ MRI��� ������ MRI��� ������� RMHD ������ PRI�� � ����� FC ������

MF���� ���� �� MRI

���� ������ MRI���� ������� PRI��� ������ COMB ������ FC

� ������

PRI���� ������ MRI��� ������� MRI�� ������ PRI

���� ������� CTR ������� MF��� ������

PRI��� ������ MRI���� ������ MRI

��� ������ MHD� � ����� BIT ������ MF� � ������

MHD��� ����� MRI��� ����� � PRI��� ���� � PRI��� ������� MTF� ������ MF���� ���� �

MF�� � ����� MRI

���� ������ PRI���� ���� PRI

���� ������� MF��� ������ MTL �������

Table �� Costs ratios on request sequences extracted from �le geo of the Calgary Corpus�

viii

Page 5: Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

Sequence book� � Average access costs of FC�� ������

FC� ������� MHD� � ������� MRI��� ������ MRI

�� � ������ PRI��� ������ MF����� ������

FC ������� MF����� ����� MRI

���� ������ MF��� ������ PRI���� ������ MHD���� ��� ��

TRANS ������ PRI��� ������ RMHD ������� TS ������ PRI��� ���� � MTF �����

MHD��� ������ MRI��� ������ PRI��� ������� PRI��� ������ PRI���� ���� � MF� � �������

MF���� ���� PRI

���� ������ PRI���� ������� PRI

���� ����� SPL ������� MF���� ��� ��

MF�� � ���� MRI

���� ������ MRI��� ������� COMB ���� � PRI�� ������� MF���� ��� �

MHD��� ����� MRI��� ��� �� MRI���� ������ CTR ���� � PRI

��� ������� MF���� �� ���

MRI��� ����� MRI���� ��� MF��� ������� PRI��� ������ PRI� � ����� MF��� � �� ���

MRI���� ������ MRI��� ����� MRI�� ������� PRI

���� ������� PRI�� � ����� MTL ������

MF���� ������ MRI

���� ������ MRI��� ������ MTF� ������� MHD���� ������

PRI��� ������ MRI��� ������ MHD���� ���� � BIT ������ RST �����

PRI���� ������� MRI

���� ������� MRI� � ������ MF����� ����� � RMTF ������

Sequence book� � Average access costs of FC� �� ����

FC ������� PRI���� ������� MF��� ����� CTR ����� PRI

���� ������ PRI��� ������

FC� ������� MHD��� ������ MRI��� ������� COMB ������ SPL ����� PRI

���� ������

TRANS ����� MF��� ������ MRI���� ������� MRI��� ����� � MRI�� ������ PRI��� �������

MHD��� ����� PRI��� ������ RMHD ������� MRI���� ����� � MRI

��� ������ PRI���� ������

MF���� ����� MRI��� ������ PRI��� ������� MTF� ����� MF���� ��� �� PRI�� �������

MRI��� ���� �� MRI���� ������ PRI

���� ������� BIT ������ MRI� � ��� �� PRI��� �������

MRI���� ���� �� PRI

���� ������ MRI��� ���� MRI��� ����� MRI�� � ��� �� PRI� � ������

MF���� ������ MRI��� ����� MRI

���� ���� MRI���� ����� RST ��� ��� PRI

�� � ������

MF�� � ���� � MRI

���� ����� TS ������� RMTF ���� PRI��� ��� � MTF �������

PRI��� ������ MF� � ��� MHD� � ����� � PRI��� ������ PRI���� ��� � MTL �����

Sequence book� � Average access costs of MF����� ��� ����

MF���� ������� PRI

���� ������� PRI��� ���� � MHD��� ������ MHD��� ����� MF��� �� ����

PRI���� ������ MRI��� ����� � MRI��� ���� � PRI� � ������ MTF� ������ MF� � ����

PRI��� ����� MRI���� ������� PRI��� ���� PRI

�� � ������� BIT ������� FC� �� ����

PRI��� ������ MRI��� ������� MRI���� ������ MF

���� ������� RST ������ FC �� ����

PRI���� ������� MRI

���� ������� PRI���� ������ MHD���� ���� � SPL ������� MF���� ��� ���

MRI��� ���� �� MRI��� ������� PRI��� ������ MHD���� ������� MF�� � ���� � MF���� ��� ��

MRI���� ���� �� MRI

���� ������ PRI���� ������ MHD���� ����� CTR ������ MF���� �����

MRI��� ���� � PRI��� ������� MRI� � ���� MTF ���� MF����� ��� �� MF��� � ��� ���

MRI���� ������ PRI

���� ������� MRI�� � ������� COMB ������� MF

����� ����� MTL ����� ��

MRI��� ������� MRI�� ����� MHD� � ������ MF����� ������ RMTF ������

MRI���� ������ MRI

��� ���� PRI�� ������ RMHD ������ TRANS ���� �

PRI��� ������ PRI���� ���� �� PRI

��� ������ TS ������ MF��� ������

Sequence book� � Average access costs of MRI���� �������

MRI��� ������� PRI��� ������� RMHD ����� PRI���� ����� PRI

�� � ������� MF��� �������

MRI���� ������ PRI

���� ������� MRI��� ��� �� MF�� � ���� SPL ������� MF� � ���

PRI��� ������ MHD��� ���� � MRI���� ��� �� PRI��� ������� MHD���� ����� FC

� ������

PRI���� ������ MRI��� ���� � MHD� � ��� ��� PRI

���� ������ RST ������� FC �������

MF���� ������ MRI

���� ������ MRI�� ��� ��� COMB ����� � MHD���� ������ MF���� �� ��

PRI���� ������� MHD��� ������� MRI

��� ��� ��� TRANS ����� MHD���� ������ MF���� �����

PRI��� ������� MRI��� ���� PRI��� ��� � PRI�� ������ MTF ����� � MF���� �������

MRI��� ������� MRI���� ���� � PRI

���� ��� �� BIT ����� MF����� ����� � MF��� � ������

MRI���� ������� PRI��� ���� � MRI� � ����� PRI

��� ������ RMTF ������ MTL �������

MF���� ������� PRI

���� ������ MRI�� � ���� MTF� ����� MF

����� �������

MRI��� ���� � MRI��� ������ TS ������ CTR ����� MF����� �����

MRI���� ������ MRI

���� ����� PRI��� ������ PRI� � ������� MF��� ��� �

Sequence book� � Average access costs of MRI���� �� �

MRI��� ������� MRI���� ����� MRI

���� ��� � MRI���� ������� PRI��� ������� PRI

�� � ����� �

MRI���� ������� MF

���� ������ RMHD ������ CTR ������ PRI���� ������� RMTF �������

PRI��� ����� MRI��� ������� COMB ����� PRI��� ������ SPL ������ MTF �������

PRI���� ������ MRI

���� ������ PRI��� ����� PRI���� ������ PRI��� ������ MF��� ������

TRANS ������� PRI��� ������ PRI���� ������ MRI�� ������ PRI

���� ������ MF��� ���� ��

MF���� ����� PRI

���� ������ MRI��� ���� MRI��� ������� MHD� � ���� � FC

� ������

MHD��� ������ MRI��� ���� MRI���� ���� BIT ������� PRI�� ������� FC �����

PRI��� ����� MRI���� ����� TS ��� MTF� ������� PRI

��� ������� MF� � �� ����

MRI��� ����� MHD��� ��� �� MF�� � ������ MRI� � ������ RST ������� MF���� �����

PRI���� ����� MRI��� ��� � MRI��� ������ MRI

�� � ������� PRI� � ����� � MTL �������

Table � Costs ratios on request sequences extracted from �le book� of the Calgary Corpus�

vii

Page 6: Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

��� �� ��start�step�stop code� The remaining positions are encoded by the pre�x ��� followed by an��bit binary representation again optimized by the phasing technique�

The word�level compression algorithm of Section � works with two word lists of arbitrary length�For these two lists� we therefore need coding schemes providing an in�nite number of codewords�In the experiment� we use an Elias encoding to encode locations in the space word list and ��encoding in the non�space word list� However� the two character lists have only �nite length� andwe developped special codeword sets for them� We use a simple ����bit binary code improved bythe phasing technique for the ��element space character list� The non�space character list holds ���characters� thus we need ��� codewords� The �rst � codewords are taken from a ��� �� ��start�step�stop code� The remaining codewords are provided by a ����bit binary code �improved by thephasing technique where each codeword is pre�xed by ���� The two character list codeword setsare given in Table �

F Results of the compression experiment

Tables �� and �� show the results of the compression experiment with the Bentley et al� schemeon byte� and word�level� respectively� The tables give the compression ratios� that is the numberof produced output bits per input character� achieved on the �les of the Calgary corpus� Thealgorithms appear ordered by their average compression ratio with respect to the entire corpuswhich is given in the last column�

vi

Page 7: Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

��encoding� For an integer i � �� the length of the binary representation of i is encoded by Eliasencoding� followed by the binary representation of i excluding the most signi�cant bit� Thusi is encoded with � � blog� ic� �blog��� � blog� icc bits�

��encoding� To encode i � �� the binary representation ��i of i is written� Then the binaryrepresentation of j��ij � � is appended to the left� This process is repeated recursively andhalts on the left with � bits� A single zero is appended on the right to mark the end of thecodeword�

���encoding� This coding scheme is similar to the ��encoding� but halts with � bits�

If only a �nite number of codewords is needed� coding schemes like the ones above are usuallyine�cient� and techniques like the following often yield better encodings�

Start�step�stop�encodings� The start�step�stop family produces a great variety of codes� Eachmember of the family is speci�ed by three parameters a� b and c� An �a� b� c�start�step�stopcode uses k di�erent binary encodings of lengths a� a� b� a��b� � � � � a� c� That is� it consistsof k � �c� a�b� � di�erent binary codes� To indicate which binary code s with � � s � kis used� the codeword is pre�xed by the unary representation of s� Thus� an �a� b� c�codeprovides

�c�a��bXi��

�a�ib ��b�c � �a

�b � �

codewords�

Phasing

If only a �nite number of codewords is needed� variable length pre�x�free codes can often beimproved by the so�called phasing technique�

If integers i with � � i � m are to be encoded� i�e� m codewords are needed� each integer canbe encoded with k � dlog�me bits� However if m � �k� then �k �m codewords remain unused� Inthis case� the coding can be done in the following way� If i � �k�m then i is encoded in k� � bitsby the binary representation of i � �� If i � �k �m� then i� �k �m is encoded in k bits by thebinary representation of i� � � �k �m�

Start�step�stop codes can often be improved by the phasing technique� if not all codewords inone of the binary subcodes are needed� The codes used in the compression experiment have beenimproved in this way�

Codeword sets used in the compression experiment

In the byte�level compression experiment of Section �� we obtained the best results with a ��� �� ��start�step�stop code� By the above formula� this code provides ��� codewords� however only ���are needed� Therefore� we applied the phasing technique on the last binary code of length �� thusobtaining the codeword set described in Table �� The colons in the codewords mark the ends ofcodeword pre�xes and are not transmitted�

Positions in the ��element space character list are encoded using a ����bit binary encoding op�timized by the phasing technique� To encode the ��� positions in the non�space list� we developpeda special code described in Table of Appendix E� The �rst � list positions are encoded by a

v

Page 8: Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

� � ��� Average access costs of FC�

FC ������� PRI��� ���� �� MRI��� ������ MTF� ����� PRI��� �������

FC� ������� MRI��� ���� �� MRI

���� ������ PRI��� ������ PRI���� �������

TRANS ������� MRI���� ���� �� MF��� ������� PRI

���� ������ MF� � ������

MF���� ����� PRI

���� ���� �� TS ������ MRI�� ����� MF���� �����

MHD��� ���� MRI��� ������ MRI��� ������� MRI��� ����� PRI��� ������

MF�� � ��� � MRI

���� ������ MRI���� ������� SPL ������ PRI

���� ������

MRI��� ������� MRI��� ��� ��� COMB ������ RMTF ������� PRI�� �������

MRI���� ������� MRI

���� ��� ��� MF��� ����� MRI� � ������� PRI��� �������

MF���� ����� RMHD ��� �� CTR ����� MRI

�� � ������� PRI� � ������

MHD��� ������ PRI��� ������ MRI��� ������ PRI��� ������ PRI�� � ������

PRI��� ����� � PRI���� ������ MRI

���� ������ PRI���� ������ MTF �����

PRI���� ����� � MHD� � ����� BIT ����� � RST ����� � MTL ������

� � ��� Average access costs of FC��

FC� ������� MRI��� ���� � MRI

���� ���� PRI���� ������ PRI

���� �������

FC ������� MRI���� ���� � MRI��� ������ PRI��� ����� MTF �����

TRANS ������ MRI��� ����� MRI���� ������ PRI

���� ����� RMHD ������

MF���� ������� MRI

���� ����� MRI�� ����� PRI��� ������ RST ������

MF�� � ������� MRI��� ���� � MRI

��� ����� MRI��� ������ TS ������

MHD��� ���� � MRI���� ���� � PRI��� ����� PRI

���� ������ COMB ����� �

MF� � ������� MRI��� ����� PRI���� ����� MRI

���� ������ BIT ����� �

MF��� ������� MRI���� ���� MRI� � ������� PRI�� ������� MTF� �������

MHD��� ������ PRI��� ������ MRI�� � ������� PRI

��� ������� SPL ������

MF��� ����� � PRI���� ������ PRI��� ������ PRI� � ������� CTR �����

MF���� ����� MHD� � ������ PRI

���� ������� PRI�� � ������� RMTF ������

MF���� ��� �� MRI��� ���� PRI��� ������ PRI��� ������� MTL �����

� � �� Average access costs of MTF�

MTF ������� PRI���� ������� BIT ���� � MF

���� ����� MRI���� ������

PRI�� � ���� � PRI��� ������ MTF� ���� MRI

���� ������ MRI��� ������

PRI� � ���� RST ���� � MRI���� ����� � MRI��� ������ MF

�� � �����

PRI��� ������� MRI

�� � ��� �� MRI��� ����� RMHD ����� MHD��� ������

PRI�� ������� MRI� � ��� �� COMB ����� PRI���� ����� MF

���� �����

PRI���� ������ PRI

���� ������ MRI���� ����� MRI

���� ����� TRANS ������

PRI��� ������ PRI��� ������ MRI��� ����� PRI��� ������ FC� �������

PRI���� ����� MRI

��� ����� TS ������� MRI��� ������ FC ������

PRI��� ������� MRI�� ����� MRI���� ������ PRI

���� ����� MF��� ����� �

PRI���� ������ SPL ���� MRI��� ������ PRI��� ���� MF� � ���

PRI��� ����� MRI���� ������ MHD��� ����� RMTF ����� MF���� �� ���

MHD� � ������ MRI��� ������ CTR ���� �� MF��� ������� MTL �����

Table �� Costs ratios on request sequences of length ��� produced by the Markov source of Section����� for the values � ���� ��� and ���

E Variable length pre�x�free binary encodings

In the compression experiment� we examined various variable length pre�x�free binary codingschemes� We now give short descriptions of some standard coding schemes� Then we explainthe phasing technique which can often be applied to improve codeword sets� Finally� we giveprecise descriptions of the codeword sets used in our experiments�

Standard coding schemes

We start with four coding schemes providing an in�nite number of codewords� Such coding schemesare used� if no upper bound on the number of codewords �or alternatively the integers to be encodedis known�

Elias encoding� To encode the integer i � �� we transmit the unary encoding of the length ofthe binary representation of i� followed by the binary representation of i itself� excluding themost signi�cant bit� Thus i is encoded with �blog� ic� � bits�

iv

Page 9: Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

D Results of the access cost experiment

This section gives the detailed results of the access cost experiments we performed� The algorithmsare ranked in terms of the ratio of the algorithms� cost and the cost of the algorithm that performedbest among the examined algorithms� Additionally� each table gives the average access cost perrequest of the best algorithm� Thus� each algorithm�s average access cost can be computed as theproduct of its cost ratio and the average access cost of the best algorithm�

Table � gives the results of the access cost experiment on Zipf sequences with m � �� � and ��distinct items� The results of the Markov experiment are listed in Table � for the values � ���� ���and ��� Finally� Tables and � present the results obtained on the request sequences extractedfrom the Calgary Corpus�

m � Average access costs for FC� ������

FC ������� MRI���� ������� PRI

���� ��� ��� COMB ������ PRI��� �������

FC� ������� PRI

���� ������� MRI��� ��� � TS ������ PRI� � �������

TRANS ���� � MRI��� ������ MRI���� ��� � PRI��� ���� PRI

�� � �������

MF���� ������ MRI

���� ������ MF��� ��� �� PRI���� ���� SPL ������

MRI��� ������ PRI��� ������� MHD��� ����� MRI� � ������ RST �������

MRI���� ������ PRI

���� ������� MRI��� ����� MRI�� � ������ RMTF ���� �

MF���� ������� MF� � ������ MRI

���� ����� CTR ������� MTF ���� �

MHD��� ������� MRI��� ������� PRI��� ��� PRI��� ������� MF��� �����

PRI��� ���� MRI���� ������� PRI

���� ���� PRI���� ������� MTL �� �

PRI���� ���� MRI��� ����� RMHD ���� MTF� �������

PRI��� ������� MRI���� ����� MRI�� ������ BIT ������

MRI��� ������� PRI��� ��� ��� MRI��� ������ PRI�� �������

m � �� Average access costs for FC� ���� ��

FC� ������� PRI��� ������ MRI��� ����� MRI

��� ������ BIT �������

FC ������� PRI���� ������ MRI

���� ����� MRI� � ����� PRI��� �������

TRANS ������ PRI��� ����� MRI��� ���� MRI�� � ����� PRI

���� �������

MF��� ��� �� MRI��� ����� MRI���� ���� PRI��� ������� PRI�� �����

MF���� ������ PRI

���� ����� MF� � ������� PRI���� ������� PRI

��� �����

MHD��� ����� MRI���� ����� RMHD ����� COMB ������ PRI� � �����

MF��� ����� � MRI��� ���� � MRI��� ������� TS ������ PRI�� � �����

MF���� ������� MRI

���� ���� � MRI���� ������� MF���� ������ SPL ����

MRI��� ������� MRI��� ��� � � PRI��� ������ PRI��� ����� � RST ��� ���

MRI���� ������� MRI

���� ��� � � PRI���� ������ PRI

���� ����� � RMTF ������

MF�� � ������� PRI��� ����� MHD� � ������ CTR ���� � MTF �����

MHD��� ���� PRI���� ����� MRI�� ������ MTF� ������� MTL ������

m � �� Average access costs for FC� � ���

FC ������� PRI���� ������ MRI��� ������ PRI

���� ���� PRI��� ����� �

FC� ������� MF��� ������ MRI

���� ������ COMB ������ PRI� � ���� �

TRANS ������� MF��� ��� � PRI��� ������� TS ������ PRI�� � ���� �

MHD��� ������� PRI��� ��� ��� PRI���� ������� PRI��� ����� SPL ����

MF���� ����� MRI��� ��� ��� RMHD ������� PRI

���� ���� MF� � ����� �

MF�� � ��� �� MRI

���� ��� ��� MRI��� ������� CTR ��� � RST ���� �

MHD��� ������� PRI���� ��� ��� MRI

���� ������� PRI��� ����� RMTF ���� ��

MRI��� ������� MRI��� ��� �� MRI�� ����� PRI���� ����� MTF �����

MRI���� ������ MRI

���� ��� �� MRI��� ����� BIT ���� � MF���� �������

MF���� ������� MRI��� ����� MRI� � ������ MTF� ������ MF���� ������

MF����� ����� MRI

���� ����� MRI�� � ������ PRI��� ����� MTL �� ���

MHD� � ������ MRI��� ������ MHD���� ������� PRI���� �����

PRI��� ����� MRI���� ������ PRI��� ���� PRI�� �����

Table �� Costs ratios on request sequences of length ��� produced by distributions according toZipf�s law�

iii

Page 10: Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

last request for x� Let rz� and rz� be the �rst and second requests for z after the i�th request �forx� respectively� �rz� and rz� exist by the de�nition of z� By the de�nition of y� y was requestedat most once between the i�th and ith requests �for x and therefore� was requested at most oncebetween rz� and rz� � By the induction hypothesis for the request rz� and by the de�nitions ofMRI�� and TIMESTAMP it must be that z passed y at that time �if z was not already in front of y�

It may be the case though that y was requested once after the second request for z �i�e� rz�and before the current request for x �i�e� the ith request� By the above derivation� it must be thatjust before such a request for y� z appears in front of y� We now prove that y will remain after z�By the induction hypothesis for this request for y and by the de�nitions of MRI�� and TIMESTAMP

it cannot be that y will pass z since z must have been requested at least twice between this requestfor y and the previous one� Therefore� z must be in front of y and the invariant holds�

C Proof of Lemma �

To see the �rst part of the lemma� we observe that the expected number of subsequent requests ina local set is

�Xt��

t � t�� � ��� �

It is straightforward to verify that the value of this sum is ����� �To show the second part of the lemma� we make use of a result from the theory of Markov

chains� This result tells us that the stationary distribution of a Markov source with transitionmatrix P is the unique probability vector �p�� � � � � pm satisfying

�p�� � � � � pm � P � �p�� � � � � pm� ���

We show that� if P is the transistion matrix de�ned in equation � and the probability vector�p�� � � � � pm is de�ned according to equation ��� condition ��� is satis�ed which implies the secondpart of the lemma�

Let �a�� � � � � am be the vector resulting from the matrix multiplication on the left side of equa�tion ���� i�e�

�a�� � � � � am �� �p�� � � � � pm � P

� �q�k� � � � �

qnk� � � � �

q�k� � � � �

qnk �

�BBBB�

Q ���k��Q � � � ���

k��Q���k��Q Q � � � ���

k��Q���

���� � �

������k��Q

���k��Q � � � Q

�CCCCA �

Remember that m � k �m and that Q is the �n � n�matrix with all lines identical to the vector�q� � � � � qn� Because of the form of the matrix P and because the entries of �p�� � � � � pm repeat withperiod n� we can restrict our attention to the �rst n entries of �a�� � � � � an and the �rst n columnsof P � The entry aj is the scale product of �p�� � � � � pm and the jth column of P � i�e�

aj �nX

���

q�k� qj � �k � �

nX���

q�k� ��

k � �qj �

The simplication of the right side yields

aj � qjk� ���

qjk�

qjk� pj �

Thus aj � pj and the lemma follows immediately�

ii

Page 11: Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

A Proof of Inequality ��� and Lemma �

Denote by OPT�P xy� the total cost incurred by OPT for paid exchanges between x and y while

serving � Then a very similar development to the one leading to identity �� with the inclusion ofthe costs incurred by OPT for paid exchanges easily yields

OPT�� �

Xfx�yg�Lx��y

hOPT

�xy� � OPT

�P xy�

i��

Also� it is not hard to see that OPT does not in general satisfy the Pairwise Property� Neverthe�less� it can be easily shown that OPT does satisfy the following inequality for every pair of items� xand y

OPT��xy � OPT

�xy� � OPT

�P xy�� �

To prove this inequality notice that the right hand side of � gives the total cost of someo�ine �not necessarily optimal algorithm that is a �projection� of OPT over x and y� Namely� thisprojected algorithm operates on Lxy and serves the request sequence xy according to the relativeorder of x and y in L as maintained by OPT while serving � An optimal o�ine algorithm for thetwo�item list� whose total cost in serving xy is the left hand side of �� surely pays no more thanany other o�ine algorithm so the inequality must hold�

Equation �� combined with inequality � yields inequality ��� That is�Xfx�yg�L

x��y

OPT��xy � OPT

���

The extension of inequality �� to the full�cost model can be made using an extended cost functioninstead of ALG�x� rj� De�ne the following extended cost function ALG�x� rj�

ALG�x� rj �

�����

� � ���� if x is in front of rj�

���� if rj is in front of x�

� otherwise �x � rj�

���

This cost function indirectly counts also the last �ith comparison that must be counted in thefull�cost model �that is� when we access the ith item� It is not hard to verify that with thisextended cost function the equality �� holds within the full�cost model� Analogous to ALG

�xy� �

we now de�ne ALGxy� �using this extended cost function and accordingly modify the equality��� Equation ��� and the inequality �� This proves Lemma ��

B Proof of Lemma �

Fix any request sequence � We prove by induction on i� the index of the ith request� that theinvariant holds� The base case� i � �� trivially holds� Assume the induction hypothesis for all j � i�We prove that the invariant holds for the ith request for an item x� �We refer to this request �thecurrent request for x�� the previous request for x is referred to as �the last request for x�� Considerthe con�guration of the list L �of either MRI�� or TIMESTAMP� Suppose� by contradiction� thatthe invariant does not hold� Let y be the �rst element in L that was requested less than twicesince the last request for x� Let z be the last element in L that was requested twice or more sincethe last for x� By the contradiction assumption y appears in front of z� Let i� be the index of the

i

Page 12: Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

��� T� Bell� J�G� Cleary� and I�H� Witten� Text Compression� Prentice Hall� ���

� � J�L� Bentley and C� McGeoch� Amortized analysis of self�organizing sequential search heuris�tics� Communications of the ACM� ������������� ����

��� J�L� Bentley� D�D� Sleator� R�E� Tarjan� and V�K� Wei� A locally adaptive data compressionscheme� Communications of the ACM� ������������ �� �

��� A� Borodin and R� El�Yaniv� Online Computation and Competitive Analysis� CambridgeUniversity Press� ���

�� M� Burrows and D�J� Wheeler� A block�sorting lossless data compression algorithm� TechnicalReport ���� Digital System Research center� ���

���� D� Grinberg� S� Rajagopalan� R� Venkatesan� and V�K� Wei� Splay trees for data compression����

���� J�H� Hester and D�S� Hirschberg� Self�organizing linear search� ACM Computing Surveys������������� ����

���� S� Irani� Two results on the list update problem� Information Processing Letters� ��� ��������� June ���

���� S� Irani� N� Reingold� J� Westbrook� and D�D� Sleator� Randomized competitive algorithmsfor the list update problem� pages ����� �� ���

���� A�R� Karlin� L� Rudolph� and D�D� Sleator� Competitive snoopy caching� ���������� ����

���� J� McCabe� On serial �les with relocatable records� ��� �� ��� July � ��

�� � K� Mehlhorn� S� N�aher� M� Seel� and C� Uhrig� The LEDA user manual � version ���� Max�Planck�Institut f�ur Informatik� Saarbr�ucken� ���

���� N� Reingold and J� Westbrook� Randomized algorithms for the list update problem� TechnicalReport YALEU�DcS�TR����� Yale University� June ���

���� N� Reingold� J� Westbrook� and D� Sleator� Randomized competitive algorithms for the listupdate problem� ��������� ���

��� R� Rivest� On self�organizing sequential search heuristics� Communications of the ACM� ���� �� �� February �� �

���� F� Schulz� Two new families of list update algorithms� In ISSAC���� LCNS ��� pages �����Springer� ���

���� D�D� Sleator and R�E� Tarjan� Amortized e�ciency of list update and paging rules� Commu�nications of the ACM� ������������� ����

���� B� Teia� A lower bound for randomized list update algorithms� Information Processing Letters������� ���

���� A� Tenenbaum� Simulations of dynamic sequential search algorithms� Communications of theACM� ����������� ����

���� J� Westbrook� � � personal communication�

Page 13: Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

be aware that� because of the nature of the examined request sequences� our experimental resultsare hardly appropriate to support theoretical results obtained by competitive analysis�

In the access cost experiment� we tried to tie on previous experimental studies and to extendtheir results to our large algorithm set� In contrast to previous studies however� we also triedto cover a wider range of request sequences� The fact that we obtained quite di�erent rankingsfor di�erent request sequences indicates that� at least for the online list accessing problem� it isdangerous to restrict experimental studies to one particular class of input sequences� This maylead to false conclusions� We used an experimental approach to examine the in uence of localityin request sequences on the performance of online list accessing algorithms� Our experiments showthat the degree of locality has a considerable in uence both on the algorithms� costs and on theirranking� It would be of major importance to devise a meaningful� quantitative measure of localityof reference that could be used to classify request sequences and further investigate the correlationbetween various algorithms and their performance with respect to sequences� To the best of ourknowledge� no such measure has been studied� Also� it would be of great interest to put togetheran appropriate corpus that could be used to test the performance of data structures and algorithmsfor dictionary maintenance�

The results concerning compression performance clearly indicate that the list accessing com�pression scheme by itself will not give compression ratios that are competitive with popular com�pression algorithms such as those based on Lempel�Ziv schemes� Nevertheless� it is remarkable thata compression scheme as simple as the Bentley et al� scheme on byte�level is already capable of per�forming compression� Burrows and Wheeler �� used the Bentley et al� scheme with MTF�encodingas backend for their very powerful block�sorting compression algorithm� For example� this methodis used in the BZIP compression software� Our results in the access cost experiment with sequencesgenerated by the BW scheme suggest that using other list accessing algorithms such as MF insteadof MTF might yield even better results� Finally� the results of the access cost experiment suggestthat it would be very interesting to experiment with dynamic transitions between di�erent basiclist accessing algorithms in order to adapt to changing levels of locality of reference�

Acknowledgments

We thank Susanne Albers� Allan Borodin� Brenda Brown� David Johnson� Steve Ponzio� Je�reyWestbrook� and the anonymous referees for very useful comments that greatly improved the pre�sentation and content�

References

��� S� Albers� Improved randomized on�line algorithms for the list update problem� pages ����������

��� S� Albers and M� Mitzenmacher� Average case analyses of list update algorithms� with ap�plications to data compression� Technical Report TR������ International Computer ScienceInstitute� ���

��� S� Albers� B� von Stengel� and R� Werchner� A combined BIT and TIMESTAMP algorithmfor the list update problem� ���

��� R� Arnold and T� Bell� A corpus for the evaluation of lossless compression algorithms� In DataCompression Conference� pages �������� ���

��

Page 14: Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

fact that for all corpus �les� more than half of the non�space words �and these dominate the costsoccur only once� Thus inserting a new word at the front of the list usually has the only e�ect ofmaking the encoding of future words more expensive� The members of the MHD family appear inthe second part of the �eld only� with the extrem members MTF and TRANS �but also MHD�� andMHD����� ranking among the worst algorithms�

For the randomized algorithms� we make similar observations as in the byte�level experiment�Again� RMHD is by far the best randomized algorithm and the ranking of the randomized algorithmsalmost matches their competitive ratio ranking �with the algorithms BIT and CTR being transposedin that ranking�

Finally we note that the results of this experiment support the qualitative results of the Albers�Mitzenmacher compression experiment ���� Namely� the compression ratios obtained by algorithmTIMESTAMP were consistently better than those obtained by MTF�compression�

��� Some notes on the compression experiment

The word�level compression experiment shows that the Bentley et al� compression scheme mayyield good results� if it makes use of some prior knowledge about the structure of the input� as ourword�level setting did by assuming text inputs and parsing the input �le into words� On the otherhand� if the input is not of the assumed structure� schemes depending on prior knowledge usuallyperform poorly� Our compression experiment illustrates this problem by the poor results obtainedfor the highly compressible corpus �le pic� Note that if one expects a �universal� compressionalgorithm� that compresses well on average �i�e� averaged over all the inputs it will ever see� then thecompression algorithms in our word�level setting are not acceptable� Examples of universal or morepowerful compression algorithms are the Context�tree weighting method� Ziv�Lempel algorithms�the PPM �prediction by partial match schemes� and the block�sorting algorithm� These algorithmsusually achieve better compression ratios than our word�level compression setting� though the latteris already a rather complicated variant of the basic Bentley et al� scheme�

Concluding remarks

The mri family presented here exhibits some very attractive features� Nevertheless notice that theimplementation of the algorithm MRI�m is quite expensive in terms of time and memory� For theimplementation of MRI�m we need to maintain� for each item x on the list� an m�ary vector Txcontaining the times �indices of requests of the last m requests for x�

This paper leaves some questions open� For instance� it remains to determine the competitiveratio of the algorithms in the pri family� We conjecture that for eachm � � PRI�m is ��competitive�

Perhaps a more interesting goal would be to identify �the most conservative� algorithms thatare still optimal thus expanding the set of optimal list accessing algorithms �with di�erent charac�teristics even more� Note that this question as stated is not well de�ned and in fact� one crucialstep towards answering this question would be to de�ne a meaningful measure of �conservatism��

To our knowledge� our experimental study is the �rst one comparing a large number of di�erentonline list accessing algorithms� Even so few interesting algorithm were left aside� mainly becausethey were introduced to us after we conducted our experiments� Two such interesting families ofalgorithms were developed by Schulz ����� sort�by�rank and sort�by�delay�

Some of the experimental results reported in this paper stand in contrast to various theoreticalstudies of the list accessing problem� In the case of the access cost experiment� the reader should

��

Page 15: Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

��� Results of the compression experiment

The detailed results of the compression experiment can be found in Appendix F� Compressionperformance is usually measured in terms of the compression ratio which is the number of cal�culated output bits per input character� Tables �� and �� in Appendix F give the compressionratios achieved by all algorithms on each corpus �le for the byte� and the word�level compressionexperiment� respectively�

Let us �rst turn to the byte�level compression experiment� Almost all list accessing algorithmsindeed compress all input �les� although the compression scheme is very simple and has no pre�knowledge about the input� Only some members of the MF�k family expand some of the input �les�these algorithms do not reorganize the front part of their lists� However� the compression perfor�mance of all algorithms is poor compared with standard compression programs such as compressand gzip� The best results are obtained by members of the families MF��k and MHD�k for smallvalues of k� FC and MRI�� and PRI��� Note that none of these algorithms is optimal or evencompetitive in terms of the competitive ratio� We observe that the members of the families MF��kand MHD�k appear ordered by increasing k �modulo some exceptions for small values of k� Thesame holds for the families MRI�m and PRI�m with increasing m� This is remarkable� becausethese algorithms have more knowledge about past requests �or symbols for greater values of m�but apparently they cannot take an advantage of this knowledge �w�r�t� this data set� The MRI

family seems to be superior to the PRI family� MRI�m outperforms PRI�m for all values of m�Another remarkable observation is a considerable decline of the compression performance betweenMRI�� and MRI��� as well as between PRI�� and PRI��� The algorithm MTF is among the worstperforming algorithms�

We observe that the randomized algorithms perform considerably worse than the best deter�ministic ones� Algorithm RMHD is by far the best randomized algorithm� A striking fact is thatthe ranking of the randomized algorithms exactly matches their competitive ratio ranking� For thealgorithm COMB� it is important to mention that the best performance was consistently due to thedeterministic algorithm TIMESTAMP and not to BIT�

In the word�level compression experiment� we observe again that all algorithms compressed allinput �les �except for some MF�k algorithms that could not compress the �le pic� The compressionperformance is considerable better than in the byte�level compression experiment� This is becauselonger substrings are encoded by relatively short codewords on the text �les of the Calgary corpus�where our word�parsing is natural� However� we observe that on those inputs where our word�parsing is not natural� namely the binary �les of the corpus� the word�level performance maybe worse than in the byte�level setting� The relative inadequacy of our setting for these �les issupported by the following observation made for all algorithms� For the binary �les� most of theoutput bits were contributed by the character lists� whereas the word lists added only a marginallysmall fraction �e�g� for algorithm MTF and �le geo� the character lists produced about � ! ofall output bits� During the compression of the text �les however� the word and character listscontributed comparable fractions of bits to the output �e�g� for algorithm MTF and �le bib� thecharacter lists contributed about �� ! of all output bits�

The best results are obtained by members of the MF��k family for small values of k� the MRI

family �in particular MRI��� PRI��� PRI�� �i�e� TIMESTAMP and FC� Again the members of thefamilies MRI�m and PRI�m appear ordered by increasingm� and we observe a considerable increasein cost between MRI�� and MRI��� For all considered values of m� MRI�m outperforms PRI�m�In the word�level experiment however� the variants MRI��m and PRI

��m which insert new itemsat the front have signi�cantly higher costs than MRI�m and PRI�m� The reason for this is the

Page 16: Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

characters non space words space words�le

total distinct ratio total distinct ratio total distinct ratio

bib ����� �� ��� ���� ���� ���� ���� �������

book� �� �� ��� ����� ��� �� ����� � �������

book� ���� � �� ������ ���� � ������ �� �������

geo ������ �� ��� �� ��� ���� ��� �� �����

news ���� �� ���� ����� ���� ��� ����� ��� ������

obj� ����� �� �� ��� ���� �� ����

obj� ����� �� �� ��� ���� �� ��� � ������

paper� ���� �� ��� ���� ��� ���� ���� �� �����

paper� ����� �� ��� ����� ���� ���� ����� �� �������

pic ����� ��� ��� �� �� ���� �� ���

progc ���� �� ��� ��� ��� ��� ��� ����

progl �� � ��� ���� ���� ���� ���� � ������

progp ���� �� ��� ��� ���� ��� ��� ��� ����

trans ���� �� �� ���� �� ��� ���� � �����

Table �� The Calgary Corpus� Total number of symbols� number of distinct symbols and the ratiobetween them with respect to byte� and word�parsing of the Calgary Corpus �les� The corpus �lescan be downloaded from the ftp site ftp�cpsc�ucalgary�ca�

��� compared the compression performance of TIMESTAMP and MTF with respect to the CalgaryCorpus� They considered both word and byte �i�e� character parsings� Using Elias encoding �seeAppendix E� they obtained the following results� TIMESTAMP�compression is signi�cantly betterthan MTF�compression with respect to byte parsing� but both are not �competitive� with standardUnix compression utilities� With respect to word parsing� TIMESTAMP is often �only marginallybetter than MTF compression� The word�based compression performance of both these algorithmsis found to be close to that of standard Unix compression utilities� However� note that the resultsof this experiment do not count the encoding of new words that are not already on the list� A fewother studies compare the performance of particular� more sophisticated list accessing compressorsthat alter the basic scheme� Burrows and Wheeler �� tested the performance of an MTF�compressorthat operates on data that is �rst transformed via a �block�sorting� transformation� Grinberg etal� ���� tested the performance of a MTF�compressor that uses �secondary lists�� We note �based ontheir results and ours that these more sophisticated schemes achieve in general better compressionresults than the basic scheme�

From the above� the only comparable studies are those of Bentley et al�� and Albers and Mitzen�macher� First we note that our study supports the qualitative results of both these studies� How�ever� they are not comparable quantitatively �the Bentley et al� study is incomparable to ours as itused a di�erent corpus� the quantities reported in the Albers�Mitzenmacher study are incomparableto ours as they have not measured the transmission costs of new words�

Compared to the known empirical studies� the results reported here are signi�cantly morecomprehensive and provide insights into many algorithms� among which some that have never beentested�

��

Page 17: Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

encoded by its current position in the corresponding list using some pre�x�free code� If a new wordis encountered and � is the current length of the list� we encode the list position ���� thus switchinginto a character mode to transmit the characters of the new word� Here again� we use two lists� onefor space characters and one for non�space characters� Furthermore� both lists contain an escapecharacter to encode the end of the new word� After encoding the end of the new word� we switchback into word mode� and the new word is inserted in the word list according to the applied listaccessing algorithm� To encode the end of the input sequence� we switch into character mode andthen immediately back into word mode �this can be viewed as transmitting an empty word� Allfour lists are managed by the same list accessing algorithm� We use an Elias code to encode thepositions in the space word list� and a ��code to encode the positions in the non�space word list�We developed two special codes to encode the positions in the ��element space character and the����element non�space character list� These two codes are described in Appendix E�

In both variants of the Bentley et al� compression scheme� we are only interested in measuring thecompression performance and ignore space and time complexity issues� Also none of the algorithmswas actually implemented to compress and decompress data� However� both settings are completelyrealistic from a practical point of view� because they take into account everything that is necessaryto compress and decompress the data� In particular� we charge the costs contributed by escapesymbols in our settings� This is often neglected� but we consider escape symbols an essential partof the compressed representation of an input sequence�

��� The data set

Exact analyses of compression algorithms are di�cult� because they require a precise mathematicaldescription of the input source� Real data is usually too complex for such a description�

In the practical �eld of data compression� an algorithm�s performance is usually measuredexperimentally with respect to a benchmark corpus which is a relatively large data set supposedto be representative for future inputs to the algorithm� Such benchmark data sets exist for various�elds of data compression such as text compression� or compression of images or audio data�

Bell� Cleary� and Witten ��� collected the Calgary Corpus� This corpus has become the standardcorpus for lossless compression algorithms� It consists of �mainly text �les from various application�elds including books� papers� numeric data� a picture� programs and object �les� Descriptions ofthe �les can be found in ���� Table � lists some properties of the corpus �les�

Recently� Arnold and Bell ��� proposed the Canterbury Corpus as an alternative corpus forlossless compression algorithms� The motivation for the new corpus was the fear of compressionalgorithms being tuned for the standard corpus and the possibility of the Calgary Corpus being tooout�dated for today�s applications� However� after having tested various compression algorithmson the new corpus� the authors admit that there are no considerable qualitative di�erences in theresults obtained on the two data sets� and conclude that the Calgary Corpus is still appropriate tomeasure the performance of compression algorithms�

��� Previous experimental studies

A few empirical tests of the performance of some list accessing algorithms applied to text compres�sion have been conducted� Bentley et al� ��� tested the performance of MTF�compression algorithmswith various list ��cache� sizes with respect to several text �les �containing several C and Pascalprograms� book sections and transcripts of terminal sessions� They also compared the algorithms�performance with that of Hu�man coding compression and found that for a su�ciently large cachesize �e�g� �� MTF�compression is �competitive� with Hu�man coding� Albers and Mitzenmacher

��

Page 18: Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

sequence length distinct sequence length distinct

book� � �� �� geo � ������ ��book� � �� � geo � ������ �book� � �� �� geo � ������ ��book� � ���� �� geo � �� ��book� � ���� � geo � �� �

Table �� Lengths and numbers of distinct items in the request sequences extracted from the Calgarycorpus�

� sequences in the experiment� because the �raw�� unprojected sequences produce rather longlists� and this setting is rather untypical for linear�list applications� However� the results indicatethat there is relation between the projected and unprojected sequences�

The raw results of the access cost experiment on the request sequences extracted from theCalgary Corpus are summarized in Tables and � of Appendix D�

� Experiment � Compression performance

��� Description of the compression algorithm

Bentley et al� ��� proposed a compression scheme based on online list accessing algorithms� In thisscheme� all possible symbols �e�g� characters or words occurring in the input string are stored ina linear list� Whenever a symbol is to be encoded� a binary encoding of its current list positionis transmitted �using some variable length pre�x�free code� Before the next symbol is encoded�the list is reorganized by a list accessing algorithm� If the symbol is not in the list� it has to betransmitted speci�cally� To restore the data� the receiver performs the inverse operations� Thatis� the receiver can recover each symbol whose location was transmitted and updates its symbollist using the same list accessing algorithm� As the receiver has no knowledge about locationstransmitted in the future� the list accessing algorithm has to work online� In the experimentdescribed in this section� we tested the performance of all algorithms in a byte� and a word�levelvariant of the Bentley et al� compression scheme�

In the byte�level experiment� we start with a full list storing all �� characters of the ASCIIcharacter set� Thus� we need a pre�x�free code providing ��� codewords for the possible listpositions and an additional end�of��le symbol� We performed the byte�level experiment with severalvariable length pre�x�free encodings including Elias� ��encoding� ��encoding and several start�step�stop encodings� A brief description of these schemes can be found in Appendix E� The best resultswere obtained with a ��� �� ��start�step�stop code improved by the phasing technique �see Table �in Appendix E�

In the word�level experiment� two disjoint word sequences are extracted from the input sequence�We denote all characters of the ASCII character set for which the C�library function isspace returnsa non�zero value as space characters� All other characters are called non�space characters� A spaceword then is a maximum length sequence of subsequent space characters between two non�spacecharacters� Analogously� a non�space word is a maximum length sequence of subsequent non�spacecharacters between two space characters� We use two initially empty word�lists which can havearbitrary length� The �rst list stores all space words� the second list stores all non�space words�In the input sequence� space words and non�space words alternate� and the next word is always

��

Page 19: Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

this is con�rmed by the tables in Appendix D � that FC performs best while � �� � Only whenthe locality becomes very strong� FC is outperformed by other algorithms reacting faster on localchanges�

We now turn to the complete results summarized in the tables of Appendix D� We start withthe case � ��� and make the following observations� The algorithm FC performs best� Thecosts of the MHD�k algorithms increase with k� The same holds for the algorithms of the MRI�mand PRI�m family when m increases� Furthermore� MRI�m outperforms PRI�m for each m� TheMF

��k family performs well as long as k is relatively small� All randomized algorithms performconsiderably worse than the best deterministic algorithms�

In the case � ���� FC is still the best algorithm� however its lead over all other algorithms hasdecreased� the costs of the worst algorithm are only �� ! higher� The algorithms of the MHD�kfamily still appear in order of k� Still MRI�m outperforms PRI�m for each m� but for both familiesthe members for extreme values of m have higher costs than the members for modest values of mlike � � m � �� Again the MF��k family performs well for k� All randomized algorithms appearat the end of the list� without exception having higher costs than MTF which is the second worstperforming deterministic algorithm�

Let us now consider the interesting case of � ��� where we have a strong locality in ourrequest sequence� We observe that the results in this case are extremely di�erent from the resultsfor the other values of � and in some parts are even turned to the opposite� The algorithmFC is now among the worst algorithms� and its costs are more than �� ! over the costs of thenow best performing algorithm MTF� The algorithms of the families MHD�k� MRI�m and PRI�mnow appear ordered by decreasing k or m� respectively� Furthermore� PRI�m outperforms MRI�mfor each considered value of m� This time the members of MF��k family show a relatively poorperformance� whereas the randomized algorithms yield relatively good results�

The Markov model we used in this section to examine the in uence of locality is certainlyarguable� However� it is simple� and we were able to show some properties indicating that it is areasonable abstraction of locality� In any case� the model was su�cient for our purposes� namely toillustrate that the ranking of list accessing algorithm depends strongly on properties of the servedrequest sequence�

��� Request sequences extracted from the Calgary Corpus

In the third and last access cost experiment� we extracted request sequences from some �les ofthe Calgary Corpus� basically following the approach of Bentley and McGeoch � � who extractedrequest sequences from several Pascal and text �les� More precisely� we extracted �ve sequencesfrom each of the two corpus �les book� and geo which are qualitatively rather di�erent� Whereasbook� contains pure English text and thus numerous repeating substrings� geo is a concatenation of���bit numbers� The �rst extracted sequence is the �le itself� i�e� each individual byte is considereda request� For the second sequence� we considered each byte modulo � � thus obtaining a relatedsequence producing a shorter list� The remaining sequences were obtained by application of theBurrows�Wheeler transformation �� �BW transformation for short� For a �nite input string� theBW transformation calculates a permutation in which there usually is a strong locality� We obtainedthe third request sequence by application of the BW transformation on the raw corpus �le� Theforth sequence was obtained from the third sequence by replacing each run of subsequent requestson the same item by one single request on that item� Finally� the �fth sequence was obtainedby taking each byte in the forth sequence modulo � � Table � shows the length and numbers ofdistinct items in the resulting request sequences� We decided to include the �projected� �modulo

��

Page 20: Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

���

���

��

��

��� ��� ��� ��� ��� �� �� ��� ���

Averagecost perrequest

FC

� � � � � � � � �

�TRANS

� � � � � ��

�MTF

b

b

b

b

b

b

b

b

b

b

MRI��

� � � � ��

�TIMESTAMP

rr

r

r

r

r

r

r

r

r

Figure �� Average access cost per request when serving sequences of di�erent degree of locality ofsome selected algorithms�

����� Markov sequences� discussion of results

For the experiment� we set k � n � � and chose the probabilities qj for the distribution �q�� � � � � qnaccording to Zipf�s law� The probability was varied in order to examine the in uence of locality�We produced request sequences of length ��� for all values � ���� ���� � � � � ��� The initial statewas chosen at random� Table � in Appendix D presents the average access costs of all algorithmsfor the values � ���� ��� and ��� Figure � graphically presents the average access costs of someselected algorithms for all examined values of �

Let us �rst turn to Figure �� We observe that the degree of locality has a considerable in uenceon the performance of many� but not all algorithms� The algorithm FC has about the same costs forall values of � After a certain time� its list will be ordered according to the limiting distributionwhich� as we have seen� is independent of � and FC serves the entire sequence without remarkablereorganisations of its list� The results for the other algorithms show that the locality in the sequencesproduced by our Markov source may have a positive in uence on the algorithms� performance�Their access costs decrease with increasing � This e�ect is particularly strong for MTF� which isthe worst performing algorithm for � ���� but performs best for � ��� Figure � shows � and

��

Page 21: Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

We will call these sets the local sets� In order to receive request sequences with locality� the Markovsource should have the following property� Once it enters a local set Si� it is to stay there fora certain time before entering a new local set Sj �i �� j� The degree of locality is varied by aparameter �� � � � denoting the probability that the next request is in the current local set�

It remains to specify the exact transition probabilities� For reasons of simplicity� we use a �xedprobability distribution �q�� � � � � qn where qj � � for all � � j � n� The probability that the jthsmallest state in a local set is entered on the next request will be proportional to qj � Let Q be the�n� n�transition matrix where all lines are identical to the vector �q�� � � � � qn� We are now readyto de�ne the �k � n� k � n�transition matrix P of our Markov source�

P �� P �k� n� � q�� � � � � qn ��

�BBBB�

Q ���k��Q � � � ���

k��Q���k��Q Q � � � ���

k��Q���

���� � �

������k��Q

���k��Q � � � Q

�CCCCA �

It is easy to verify that� by our choice of and the qj � all entries in P are non�negative and thatthe entries along each line of P sum up to �� i�e� P is a valid transition matrix� Furthermore� wesee that the probability for the source to stay in its current local set is exactly �

The simple probabilistic model just described can easily be generalized by varying the sizes ofthe local sets and the transition probabilities within and between the local sets� Before specifyingthe parameters used in our experimental study� we want to show some properties con�rming thatthe above model is reasonable for abstraction of locality�

The theory of Markov chains tells us that if all entries of P are positive� then the sequencefP lgl�� of transition matrixes converges towards a limiting matrix in which all lines are identicalto a probability vector �p�� � � � � pm� This vector is called the stationary distribution of the Markovsource� It is independent of the initial distribution and gives the probabilities for the occurrences ofthe states in an in�nite sequence produced by the Markov source� The following Lemma � describestwo important properties of our Markov source�

Lemma � Let k� n � IN� � � � �� and let �q�� � � � � qn be a probability distribution with qj � �for � � j � n� Consider the Markov source with m � k � n states and the transition matrix P asde�ned in equation ���

� The expected number of subsequent requests in a local set is ����� �

�� The stationary distribution �p�� � � � � pm is given by

p�i���n�j �qjk

�� � i � k� � � j � n� ��

The proof of Lemma � can be found in Appendix C� The �rst part of the lemma con�rmsthat the parameter is an indicator for the degree of locality� the closer is to �� the longer theMarkov source stays in a local set� thus producing a request sequence with stronger locality� Thesecond part of the lemma reveals an interesting property of our Markov source� namely that thestationary distribution is independent of the parameter � This means that the frequencies of therequested items do not depend on the degree of locality in our request sequence� Only the numberof request before a local set is left again varies with � Furthermore� we see from the periodicityof the vector �p�� � � � � pm in equation �� that our Markov source requests each local set with thesame probability�

��

Page 22: Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

monotonicity in terms of their performance as a function of the parameter m� The algorithmsMRI�� and PRI�� achieved the best performance in their families� We observe a considerableincrease in costs between MRI�� and MRI���

The costs of the algorithms of the MHD�k class increase with k� Again� more careful �orconservative behavior is rewarded� We observe that the extrem member MTF of this class is amongthe worst performing algorithms on all examined sequences� On the other hand� TRANS �which isequivalent to MHD�� is always one of the best algorithms�

There is no consistent behaviour of the algorithms of the MF�k family� Indeed� di�erent Zipfsequences �among those presented here� but also others yielded very di�erent results� The reasonis that these algorithms do not update the front of their list� and therefore their performancedepends strongly on the �rst requested items� As for the members of the variant family MF

��k�these algorithms perform well� as long as the MTF�cache remains small�

Finally� the randomized algorithm have considerably higher costs than the best deterministicalgorithms in all cases� and they mostly appear at the end of the �eld� These results stand instrong contrast to known analyses of randomized algorithms which say that randomization canyield algorithms with a competitivity below the deterministic lower bound of �� However we shouldrecall that in competitive analysis� requests are produced by an adversary whose goal is to increasethe algorithm�s costs with respect to an optimal o�ine algorithm� This includes that a requestmay depend on the algorithm�s reactions on former requests� and in this case randomization onthe algorithm�s side may make the adversaries life more di�cult� However in this experiment�request sequences are generated independently from each other without memory� Therefore� thereis no reason to expect that the competitive results will be applicable here� and in particular thatrandomization will give any advantage�

��� Request sequences produced by Markov sources

����� Markov sequences� description

We say that a request sequence exhibits �locality of reference�� if the number of distinct requesteditems in a small part of the sequence is considerably smaller than the number of distinct items inthe entire sequence� or alternatively� if requests depend strongly on immediately preceding requests�It is clear that the degree of locality in the request sequence must have a considerable in uence onthe performance of online algorithms� depending on how quickly they adapt on local changes in therequest sequence� In the second access cost experiment� we want to examine this in uence�

We use a simple probabilistic model to produce request sequences with di�erent degrees oflocality� The model is based on Markov sources� A Markov source consists of a set of m states andis described by a transition matrix

�p�������������m where p��� � � �� � � � � m andmX���

p��� � � �� � � m�

Thus each line of the transition matrix gives a probability distribution� and if the Markov sourceis in state � the probability that is goes to state � is exactly p��� � The initial state may again bechosen according to an initial distribution�

We now describe the Markov source used to generate request sequences with di�erent degreesof locality� The source will request the item � if it is in state � Let k� n � IN� The Markov sourcehas m � k � n states which are partitioned in k disjoint n�element sets Si

Si �� f�i� � � n � �� � � � � i � ng�

Page 23: Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

�� Experiment � Compression performance

In the compression experiment� wemeasured the algorithms� performance as data compressors in thecompression scheme of Bentley et al� ���� The algorithms were ranked according to the compressionratio they obtained on the Calgary Corpus ���� a data set designed to test the performance of textcompression algorithms� We examined both a byte�level and a word�level variant of this compressionscheme� Here again the results for the randomized algorithms were averaged over � executions�The compression experiment is discussed in detail in Section ��

� Experiment �� Access cost performance

��� Previous experimental studies

In this section� we describe our experiment for testing the access cost performance of online list�accessing algorithms� A few such empirical studies have been conducted in the past�

Bentley and McGeoch � � tested the performance of MTF� FC� TRANS with respect to requestsequences generated from several text �les �� �les containing Pascal programs and other Englishtext �les� The sequences generated from the text �les by parsing the �les to �words� with aword de�ned as an �alphanumeric string delimited by spaces or punctuation marks�� It was foundthat FC is always superior to TRANS and that MTF is often superior to FC� Tenenbaum ���� testedthe performance of various algorithms from the move�ahead�k family �MHD�k with respect torequest sequences distributed by Zipf�s law� A few other simulation results testing various propertiesof particular algorithms with respect to sequences generated via Zipf�s distribution are summarizedin a survey by Hester and Hirschberg �����

��� Request sequences distributed by Zipf�s law

����� Zipf sequences� description

In the �rst access cost experiment� we carry on the studies of Tenenbaum ���� and Hester andHirschberg ���� and test our more extensive set of online list accessing algorithms on request se�quences distributed by Zipf �s law� Given m distinct items� Zipf�s law assigns to the ith item theprobability

pi ���

i �Hmfor � � i � m�

where Hm �Pm

j���j is the mth Harmonic number and serves to normalize the sum of probabilities

to �� The requested items are selected independently according to this distribution�

����� Zipf sequences� discussion of results

We performed the access cost experiment for the values m � �� � and �� on request sequences oflength ���� The detailed results of in terms of cost ratios can be found in Table � in Appendix D

The results for the di�erent values of m are similar� We observe that more conservative al�gorithms seem to be favorized� The most conservative algorithm in both cases is FC� and thisalgorithms indeed performs best� After a certain time� FC will almost certainly have ordered itslist according to the given probability distribution and will not perform any update operations anymore� Recall that FC is not optimal and not even competitive�

Interesting observations can be made for the families MRI�m and PRI�m� First� we observethat MRI�m always outperforms PRI�m� Secondly� both families exhibited consistent and perfect

��

Page 24: Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

The optimal choice is easily shown to be p � ���p��� in which case the competitive ratio attainedis the golden ratio � � �� �

p��� � �� �� From this family we have tested this algorithm and

denote it by TSP�In ���� Albers� von Stengel and Wechner present the following algorithm called COMB which is a

probability mixture of BIT and TS� Algorithm COMB is shown to be ����competitive� To date thisbound is the best known for a list accessing algorithm�

Algorithm COMB� Before serving any request choose algorithm BIT with probability���� and algorithm TS with probability ���� Serve the entire request sequence with thechosen algorithm�

Although for any data set the empirical performance of COMB can be determined from the perfor�mances of BIT and TS� we have tested COMB separately� as a control for the statistical signi�canceof our tests of randomized algorithms�

���� A Benchmark algorithm

In order to have a reference result we tested the performance of the following �bad� algorithm�MTL� which acts in a way that seems to be the worst possible�

Algorithm MTL� Upon a request for an item x move x to the back of the list�

Although the model requires that we charge MTL for the paid exchanges it performs we ignoredthese costs and only counted access costs�

�� Experiment � Access cost performance

In the access cost experiment� we made the algorithms serve several types of request sequencesstarting on an empty list� The goal was to rank the access cost performance of the various algorithmswithin the traditional dynamic model� The algorithms� performance was measured with respect tothe full�cost model where an access to the ith item in the list is charged i� To obtain statisticallysigni�cant results for the randomized algorithms� we averaged their results over � executions ofeach experiment� ��

We examined�

Request sequences produced by memoryless sources according to Zipf�s law�

Request sequences produced by Markov sources in order to examine the in uence of locality�

Request sequences extracted from the Calgary Corpus ��� �see Section ����

We used the data type random source of the LEDA library �� � to produce uniformly distributedrandom numbers� The access cost experiment is discussed in detail in Section ��

��We computed ��� con�dence intervals for the compression experiment we conducted� Just in one case twocon�dence interval did intersect� In all cases the intervals were very narrow such that our qualitive results holds forany mistake in the con�dence interval�

��

Page 25: Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

Algorithm SPL� For each item x on the list maintain a pointer p�x pointing to someitem on the list� Initially set p�x � x� Upon a request for an item x� with probability��� move x to the front and with probability ��� insert x just in front of p�x� Thenset p�x to point the �rst item on the list�

Algorithm BIT due to Reingold and Westbrook ���� attains a competitive ratio of ����� �������

Algorithm BIT� For each item on the list maintain a mod�� counter b�x initiallyset to either � or � randomly� independently and uniformly� Upon a request for an itemx �rst complement b�x� Then if b�x � � move x to the front�

The following algorithm called RMTF is very similar to BIT but somewhat surprisingly its worstcase performance is inferior� It can be shown that its competitive ratio is not smaller than � �����

Algorithm RMTF� Upon a request for x� with probability ��� move x to the front�and with probability x leave x in place�

The algorithm RMHD is a simple randomized relative of the deterministic MHD family�

Algorithm RMHD� Upon a request for an item x currently at the ith position�randomly choose a position from the set f�� � � � � ig and move x to that position�

The family of algorithms counter�s� S �CTR�s� S due to Reingold� Westbrook and Sleator���� is a sophisticated generalization of algorithm BIT� Let s be a positive integer and S� a nonemptysubset of f�� �� � � � � s� �g�

Algorithm CTR�s� S� For each item x on the list maintain a mod s counter c�x�initially set randomly� independently and uniformly to a number in f�� �� � � � � s � �g�Upon a request for an item x� decrement c�x by � �mod s and then if c�x � S movex to the front�

Thus� CTR��� f�g is BIT� Reingold et al� prove that CTR��� f�� �� �g is �����competitive �� ������From this family we have tested algorithm CTR��� f�� �� �g�

The family of algorithms random�reset�s�D �RST�s�D due to Reingold et al� ���� is avariation on the counter algorithms� Let s be a positive integer and D� a probability distributionon the set S � f�� �� � � � � s� �g such that for i � S�D�i is the probability of i�

Algorithm RST�s�D� For each item x on the list maintain a counter c�x� initiallyset randomly a number in i � f�� �� � � � � s � �g with probability D�i� Upon a requestfor an item x� decrement c�x by �� If c�x � � then move x to the front and thenrandomly reset c�x using D�

The best RST�s�D algorithm� in terms of the competitive ratio� is obtained with s � � and D suchthat D�� � �

p� � ��� and D�� � �� � p

���� The competitive ratio attained in this case isp� � ������ In this family we have only tested this algorithm�Let p � ��� �� The following is family of algorithm called timestamp�p �TS�p due to Albers

��� that is a kind of randomized combination of algorithm TS and MTF� For each p� TS�p is provento be maxf�� p� � � p��� pg�competitive�

Algorithm TS�p� Upon a request for an item x� with probability p execute �i movex to the front� and with probability � � p execute �ii let y be the �rst item on thelist such that either �a y was not requested since the last request for x� or �b y wasrequested exactly once since the last request for x and that request for y was served bythe algorithm using step �ii� Insert x just in front of y� If there is no such y or if thisis the �rst request for x leave x in place�

Page 26: Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

Algorithm MF�k� Upon a request for an item x currently at the ith position� movex di�ke � � positions towards the front�

Thus� algorithm MF�� is MTF� The algorithms of the MF�k family as described above have theweakness that they do not perform any update operation upon an access for one of the �rst k itemsin the list� Therefore� we included in the experiments a variant MF��k treating the �rst k itemsin its list as a �MTF�cache�� The algorithm MF

��k behaves just like MF�k� but upon a request forone of the �rst k items in the list� the requested item is moved to the front� We have tested thealgorithms MF�k and MF

��k for k � �� �� �� � � � � � � �����Algorithm timestamp �TS due to Albers is an optimal algorithm attaining a competitive ratio

of �� ���

Algorithm TS� Upon a request for an item x� insert x in front of the �rst �from thefront of the list item y that precedes x on the list and was requested at most once sincethe last request for x� Do nothing if there is no such item y� or if x has been requestedfor the �rst time�

The following family of algorithms is the pri family �introduced in Section ���� As discussedearlier� for each m � � PRI�m is ��competitive and MRI�� is ��competitive�

Algorithm PRI�m� Upon a request for item x� move x forward just in front of the�rst item z on the list that was requested at most m times since the last request for x�Do nothing if there is no such item z� If this is the �rst request for x� leave x in place�move x to the front�

Notice that modulo the handling of �rst requests� algorithm PRI�� is identical to algorithm TS�Also� as noted above� the limit element of this family� as m grows� is MTF� We have tested thealgorithm PRI�m with m � �� �� � � � � � and with the insertion position being the last and the front�In the sequel each member of the PRI family� PRI�m� that inserts a new item at the �rst �resp�last position is denoted PRI

��m �resp� PRI�m�The following family of algorithms is the pri family introduced in Section ���� For eachm � �

algorithm MRI�m is ��competitive and is thus optimal� The competitive ratio of algorithm MRI��is bounded below by "�

p��

Algorithm MRI�m� Upon a request for item x� move x forward just after the lastitem z on the list that is in front of x and that was requested at least m�� times sincethe last request for x� If there is no such item z move x to the front� If this is the �rstrequest for x� leave x in place �move x to the front�

Recall that MRI�� is equivalent to algorithm TS �modulo �rst requests for items� We have testedthe algorithms MRI�m with m � �� �� � � � � � and with the insertion position being the last and thefront� In the sequel each member of the MRI family� MRI�m� that inserts a new item at the �rst�resp� last position is denoted MRI

��m �resp� MRI�m�

���� Randomized algorithms

Algorithm split �SPL due to Irani ���� was the �rst randomized algorithm that was shown toattain a competitive ratio lower than the deterministic lower bound of � � ����� �� It is knownthat SPL is ���� �competitive �� ������

��

Page 27: Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

� An experimental study of list accessing algorithms

In this section� we give an introduction to the experimental part of the paper� We specify the setof algorithms tested and shortly outline the experiments we conducted� Detailed descriptions ofthese experiments will then be given in the corresponding Sections � and � of the paper�

�� Algorithms

We now describe the algorithms we tested� In the description of most algorithms we specify thealgorithm�s action only with respect to an access request� Unless otherwise speci�ed� it is implicitlyassumed that an insert request for an item x places x at the back of the list� and then x is treatedas if it was accessed� With respect to each algorithm we also specify bounds on its competitiveratio whenever they are known� In the sequel � denotes the maximum number of items present inthe list at any point in time�

���� Deterministic algorithms

Algorithm move�to�front �MTF is one of the most well known and used algorithms� Thisalgorithm attains an optimal competitive ratio of �� ����� � ���� ����

Algorithm MTF� Upon an access for an item x� move x to the front�

Algorithm transpose �TRANS is perhaps the most �conservative� algorithm presented hereand is an extreme opposite to MTF� The competitive ratio of TRANS is bounded below by ���� �����

Algorithm TRANS� Upon an access to an item x� transpose x with the immediatelypreceding item�

Algorithm frequency�count �FC attempts to adapt its list to the empirical distribution ofrequests observed so far� A lower bound of ��� ��� on its competitive ratio is known �����

Algorithm FC� Maintain a frequency counter for each item� Upon inserting an item�initialize its counter to �� After accessing an item increment its counter by one andthen reorganize the list so that items on the list are ordered in non�increasing order oftheir frequencies�

In our implementation we further require that if two items have the same frequency count then theitem requested less recently is positioned in front of the item requested more recently� The variantadopting the reverse of this ordering is denoted by FC

��The following algorithm called MTF� is a more �relaxed� version of MTF� MTF� can be shown to

be ��competitive�

Algorithm MTF�� Upon the ith request for an item x� move x to the front if andonly if i is even�

Algorithm move�ahead�k �MHD�k proposed by Rivest ��� is a simple compromise betweenthe relative extremes of TRANS and MTF�

Algorithm MHD�k� Upon a request for an item x� move x forward k positions�

Thus algorithm MHD�� is TRANS� We have tested MHD�k for k � �� �� �� � � ��� � � � � �����Algorithm move�fraction�k �MF�k proposed by Sleator and Tarjan is a slightly more so�

phisticated compromise between TRANS and MTF� For each k� algorithm MF�k is �k�competitive�����

��

Page 28: Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

By di�erentiating the costs ratio �online to o�ine with respect to k it is possible to see that themaximum ratio is obtained at

k� �������� � � ��� �

������ �� ����

Substituting k� for k in the expression for costs ratio results in a function of � that can be boundedbelow by #�

p��

The true competitive ratio of MRI�� remains an open question but we conjecture that it isindeed "�

p�� We note that the competitive ratio of one of the most conservative �yet �reasonable�

algorithms called transpose is ��� �����

Extensions to the dynamic model

Theorem � can be extended to the dynamic �standard list accessing model ���� ��� In this model arequest is either an insertion of an item� an access of an item� or a deletion of an item� An accessor deletion of an item positioned ith from the front cost i whereas an insertion of an item to alist currently holding � items costs �� �� This insertion cost implicitly requires that a new item isinserted at the back of the list �but of course� the new item can then move� using free exchanges�to any position closer to the front�

Each of the algorithms MRI�m is naturally extended to handle deletions and insertions� De�pending on the exact de�nition of MRI�m� in particular� how it deals with �rst requests for items�this extension is done in a straightforward manner� In accordance with the original de�nition ofMRI�m we require that each new item is inserted at the front of the list�

Theorem � For each m � � MRI�m is ��� ����competitive in the dynamic model� where � is thenumber of items inserted�

Proof� The extension of Theorem � to this dynamic model is easily established by provingthat for each insertion and deletion the amortized cost for MRI�m is within a factor �� ��� of theactual cost incurred by FOPT where � is the total number of insertions �starting from an emptylist� Consider a request for an insertion of an item x� Suppose that at this stage there are exactly�� � � items on the list��� Since the cost of an insertion in the dynamic model is �� � � we caninterpret it as if both OPT and MRI�m �rst insert x at the back of their lists where for FOPT� which

contains���

���item lists before the insertion� we add �� new ��item lists� Lxz � one for each item z

that was on the list before this request� In each of these ��item lists x appears on the back �i�e�second� At this stage there are no inversions corresponding to x but new inversions can be createdby FOPT�s move that may move x forward� using free transpositions� in some number k of its ��itemlists� Each of the new k inversions is clearly of weight � since a subsequent request for x will causeMRI�m to move x to the front� Further� the maximum number of such new inversions is ��� thenumber of new ��item lists containing x� Since ���� is the actual cost �of both MRI�m and FOPT�the amortized cost for MRI�m is in the worst case

�� � � � $% � �� � � � k � ��� � � � ��� ������� � � � ��� ������� ��

The analysis of the case of a request for deletion of an item x is even simpler because all oldinversions with respect to x are eliminated and no new inversions are created� �Note that in thiscase we remove from the set FOPT all the ��item lists containing x�

��The trivial cases where �� � �� should be treated separately but do not bear any special diculty�

��

Page 29: Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

��� The algorithm mri� �

Algorithm MRI�� is signi�cantly more �conservative� than MRI�� �and the rest of the algorithmsin the mri family� Unfortunately not only is MRI�� not optimal� it cannot attain a constantcompetitive ratio�

Lemma � The competitive ratio of MRI�� is not smaller than #�p��

Proof� For simplicity� we start with a lower bound of � and then note how to extend itto the higher lower bound as stated in the lemma� Assume that the initial list� of � itemsis hx�� x�� � � � � x���� ai� The initial segment� �� of the nemesis request sequence� � is � �a� x���� x���� � � � � x�� Notice that after processing � MRI�� returns to the above initial con�g�uration �and now each item was requested once� The rest of consists of an arbitrary number ofrepetitions of the following segment

� � �x���� a� x���� �x���� a� x���� � � � � �x�� a� x��

For each i � � � �� �� �� � � � � �� the item xi will be brought by MRI�� to the front only after thesecond request for xi in the subsequence �xi� a� xi is processed� The element a will always be leftat the back of the list� Clearly� after servicing � the list maintained by MRI�� returns to the initialcon�guration� The cost incurred by MRI�� for the initial segment � is some constant C � �� andthe cost of servicing the segment � is ��� ����� � � ���� ��� �� Consider an o�ine algorithmOPT

� that serves as follows� After the �rst request for a� OPT� brings a to the front and then itkeeps the list static forever� That is� each subsequent request for an item costs OPT� the positionof the item in the following list

ha� x�� x�� � � � � x���i�Thus� the cost incurred by OPT

� for the initial segment � is some constant D � ��P

��i�� i andthe cost to serve the segment � is � � � � � �P��i�� i � �� � �� � �� The costs ratio� online too�ine� for the segment � is thus ���� ����� � which approaches � as � grows�

This idea can be carried further� We now describe the following straightforward generalizationof the segment � that will yield a nemesis request sequence that will force a competitive ratio ofat least #�

p�� Assume that the initial order of MRI���s list is

hx�� x�� � � � � x��k� a�� a�� � � � � aki�and that of OPT� is

ha�� a�� � � � � ak� x�� x�� � � � � x��ki�where � � k � � � �� These initial con�gurations can be obtained after a �xed initial segmentrequesting all items� The repetition is now on the segment

�x��k � a�� a�� � � � � ak� x��k� � � � � �x�� a�� a�� � � � � ak� x��

Here again� after completion of this segment MRI���s list returns to the initial con�guration andOPT

� keeps its list static at the initial con�guration� The cost incurred by MRI�� for this segmentis

�� k

���� ��k � k� � �k

The cost incurred by OPT� for this segment is

k

��k � �� k� � �k

� �� � �� k�

��

Page 30: Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

Therefore if ri�� � z� the element z will pass x and the weight of the new inversion hx� zi � L �ifit is created is exactly ��

Other possibilities for new inversions are when for some element z � C� in Lxz � x passes z butx does not pass z in L� the weight of a new such inversion must be � as well� since the last requestwas for x so an additional request for x will move x to the front of L� It follows that the increasein % due to new inversions created is at most C�

It is easy to see that the following identities hold�

k � � � C � I

j � S � C�

For example� to justify the �rst identity notice that the elements that are in front of x in L �justbefore the ith request are exactly those that are either in C or in I and these sets are disjoint� Thesecond identity follows from the de�nitions of j� S and C� From these identities it readily followsthat k � I � j � S � � and that C � j� Here again j � �� ��

Putting all this together� the amortized cost ai for MRI�m to serve the ith request is boundedabove as follows�

ai � k � I � S � C

� I � j � S � �� I � S � C

� j � C � �

� �j � �

� ��� ��� � �j � � � ��� ��� � FOPTi�

Using lemmas � and � we conclude that MRI�m is ��� ����competitive�

��� The equivalence of mri��� and TIMESTAMP

Consider algorithm TIMESTAMP �see Section ���� Here we prove that except for the di�erence inhandling �rst requests for items algorithm TIMESTAMP is equivalent to algorithm MRI�� � Specif�ically� we shall prove that if we alter the de�nition of either MRI�� TIMESTAMP �or both so thatthey handle �rst requests for items in the same manner� then the two algorithms are equivalent inthe sense that they process any request sequence in exactly the same manner�

Lemma Each of the algorithms TIMESTAMP and MRI�� maintains the following invariant� upona request for x� all the items that are requested twice or more since the last request for x are infront of all items that were requested less than twice since the last request for x�

Using Lemma we learn that upon a request for an item x both MRI�� and TIMESTAMP move xforward just after all the elements that were requested twice or more since the last request for x�equivalently� just in front of all the elements that were requested less than twice since the lastrequest for x� This means that both algorithms maintain identical lists at all times� Hence wehave proved Proposition ��

Interestingly� other algorithms in the mri Family �i�e� m �� � do not maintain analogousinvariants� For example the algorithm MRI�� does not maintain the following invariant� �upon arequest for x� all the items that are requested once or more since the last request for x are in frontof all items that were not requested since the last request for x��

��

Page 31: Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

increased additive constant assume that ri is not the �rst request for x� �We can pad the requestsequence with a pre�x consisting of � requests� one for each item� The additional cost due tothis pre�x can be attributed to the additive constant and will not alter the competitive ratio�Assume that just before this request is revealed� x is located at the kth position in L and thatx appears at the second position �i�e� last position in j of the ��item lists in FOPT� The formerassumption means in particular that the actual cost for MRI�m� is MRI�mi � k� Similarly� thelatter assumption implies that FOPTi � j � ��

De�ne I to be the set of elements z such that hz� xi in L is an inversion �i�e� hz� xi in L buthx� zi in Lxz � FOPT� De�ne C to be the set of elements z such that hz� xi in L is not an inversion�and de�ne S to be the set of elements z such that hx� zi in L is an inversion� Set I � jIj� C � jCjand S � jSj�

We are now in a position to bound the amortized cost ai� First consider the change in potentialafter MRI�m�s move due to inversions in L corresponding to elements in I� By the de�nitionsof MRI�m and of the weight function w� if in L an element z � I is passed by x� the weight ofthe inversion hz� xi in L was � before the move and this inversion is eliminated by this move� Anelement z corresponding to an inversion hz� xi that is not passed by x� contributed �together withx a weight of � before the move and by the de�nition of MRI�m and w� reduces this weight to �after MRI�m moves x� Hence there is a decrease of exactly I in % due to elements in I� Due toMRI�m�s move and before FOPT moves� the weight of each inversion hx� zi in L with z � S mayincrease by one giving a total increase of up to S�

Since FOPT does not use any paid exchanges its move cannot a�ect the above change in potentialdue to inversions corresponding to elements in I� �Of course� FOPT�s move can eliminate inversionsdue to elements in S� this possibility will be considered next�

We shall make use of the following claim�

Claim � If� in the list L� x passes an element z � C then z was requested at least once since thelast request for x�

To prove this claim suppose that z � C was not requested at all since the last request for x� Itfollows �property �ii of Lemma � that z cannot be in C since FOPT would have moved x in front ofz in Lxz before this request for x� Hence� if an element z � C is passed by x� then z was requestedat least once since the last request for x� which proves the above claim�

Suppose that a new inversion hx� zi in L is created �i�e� x does not pass z in Lxz but x doespass z in L� By de�nition of the weight function� the weight of this new inversion is � if and onlyif z will pass x if z is requested next� We now show that this is indeed the case �i�e� that if thesubsequent request is for z� then z will pass x�

Consider the con�guration of L just after MRI�m serves the ith request� By our assumption� xhas just moved in front of z� Consider any element y that is now positioned between x and z� thatis� in the last move x also passed y and now in L� x appears in front of y� which appears in frontof z� Suppose that the previous request for x is ri�� By the de�nition of MRI�m since x passedy� it must be that y was requested at most m times between the last �ith and second last �i�threquests for x� Assume that the next request ri�� is for z� Then z will pass x if and only if thefollowing two conditions hold� �i all items y now appearing between x and z were requested at atmost m times between the current �i�e� �i� �st and last request for z� and �ii x was requestedat most m times since the last request for z� Since� by the above claim� z was requested at leastonce between the two most recent request for x� and since any such item y was requested at mostm times between the two most recent requests for x it must be that these two conditions hold�

��

Page 32: Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

w�x� y � w�x� y� L� FOPT�

w�x� y �

�������������

�� if hx� yi is not an inversion��� if hx� yi is an inversion and y will pass x

in L if y is requested next��� if hx� yi is an inversion and y will pass x

in L i� y is now requested twice in a row�

It is easy to verify that w�x� y is well de�ned� Moreover w�x� y satis�es the following independenceproperty that is essential for the proof of Theorem �� the weight of two elements y and z cannotbe increased �from � to � after a request for x is made�

Lemma Let m � � be given� Let y and z be two elements such that hz� yi in L is an inversionwith w�z� y � �� Let x be any element other than y and z� Then w�z� y remains after a requestfor x is serviced by MRI�m�

Proof� The proof is by contradiction� Consider a con�guration of L with hz� yi in L an inversionwith w�z� y � �� Consider the ith request for an element x� x �� y� x �� z� and assume that afterMRI�m processes this request for x� the weight w�z� y increases from � to �� By the de�nition ofthe function w it must be that due to this request for x� MRI�m positions x in between y and z sothat x becomes the pivot of y� This means that the pivot of x just before the ith request� p�x� isan element located in front of y and after z �or p�x is z itself� Set p � p�x� Since after the serviceof the ith request x becomes the pivot of y it must be that after servicing the ith request therewere m�� requests for x since the last request for y� In between the times of the two last requestsfor x there must be m�� requests for p� As m � � the second last request for x occurred after thelast request for y which means that p was a pivot of y even before MRI�m serviced the ith requestfor x� But this means that w�z� y � � before the ith request in contradiction to our assumption�It follows that w�z� y cannot increase from � to �� Note also that w�z� y cannot decrease to � if zor y are not requested�

We now de�ne the following potential function�

% � %�L� FOPT �X

fx�yg�Lx��y

w�x� y�

It is clear that % is bounded below as it is always non�negative� Let � r�� r�� � � � � rn be any requestsequence� The amortized cost ai for MRI�m to serve the ith request is de�ned as ai � MRI�mi�$%i

where MRI�mi is the actual cost cost incurred by MRI�m while serving ri� and $%i � %i�%i��� isthe di�erence of the potential after both MRI�m and FOPT served ri minus the potential before theirmoves� Similarly� de�ne FOPTi to be the actual cost incurred by FOPT while serving ri� We shalluse the standard potential function technique that is summarized in the following simple lemma���� ��

Lemma � Suppose that i� % is bounded below� and ii� there is a positive constant c such that foreach i � �� �� � � � � n� MRI�mi � c � FOPTi� Then MRI�m is c�competitive�

Proof of Theorem �� Imagine that algorithms MRI�m and FOPT are working concurrently andprocessing the request sequence � Consider the ith request �ri for some item x� At the penalty of

Page 33: Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

Lemma � Assume L is a list of two elements x and y� Then there exists an optimal o�inealgorithm OPT for L that satis�es the following properties� i� OPT does not use paid exchanges�and ii� Whenever there is a run of two or more consecutive requests for x y�� OPT moves x y�to the front if it is not already there� after the �rst request of this run� using free exchanges���

The proof of Lemma � is simple and is left to the reader�This lemma will be essential to our analyses� Using it in conjunction with Lemma �� one

can easily bound the optimal o�ine cost to serve a request sequence by summing up the costsOPT�xy for every pair fx� yg L�

Let OPT be any optimal o�ine list accessing algorithm and let L be a list of � items� Denoteby FOPT ��factored� OPT the collection of all

��

���element lists Lxy with fx� yg L that are

each maintained by an optimal o�ine algorithm satisfying properties �i and �ii of Lemma �� Let be any request sequence� Abusing notation� we shall refer to FOPT both as the set of all these��element lists and also as the optimal o�ine algorithm that maintains these lists and serves therequest sequences xy �fx� yg L� De�ne FOPTi to be the cost incurred by FOPT to serve the ithrequest in in the full cost model� Formally� if the ith request is for an element x then FOPTi

is de�ned to be one plus the number of ��element lists Lxy � FOPT that contain x in the secondposition� By the de�nition of the extended cost function ��� it is clear that

j�jXi��

FOPTi �X

fx�yg�Lx��y

OPT�xy ��

Therefore� using Lemma � we obtain

Lemma �j�jXi��

FOPTi � OPT�

� Analysis of the mri Family

We use the following notation� Suppose L is the list maintained by some algorithm� For each pairof items x and y in L� we say that �hx� yi in L� when x is currently in front of y in L� If hx� yi inthe list maintained by the online algorithm but hy� xi in the list maintained by the optimal o�inealgorithm then we say that hx� yi is an inversion� With respect to MRI�m consider a request foran item x� De�ne p�x� the current pivot of x to be the last element on the list that is in front ofx and was requested at least m � � times since the last request for x� If there is no such elementor if this is the �rst request for x then p�x is unde�ned� Thus� upon a request for an element x�MRI�m moves x one position after p�x� and if p�x is unde�ned� x is moved to the front�

Fix any request sequence and set L to be the ��item list maintained by MRI�m� We comparecon�gurations of L to con�gurations of FOPT� Consider any con�guration of L and a con�gurationof FOPT� For each �ordered pair of items �x� y L� x �� y� we de�ne the following weight function

��In fact� it is not hard to see that an optimal o�ine algorithm for lists of size two� that satis�es property i� must

satisfy property ii��

Page 34: Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

�XxL

XyL

Xj rj�y

ALG�x� rj

�X

x�yLx��y

Xj rj�y

ALG�x� rj

�X

fx�yg�Lx��y

Xj rjfx�yg

�ALG�x� rj � ALG�y� rj� � ��

For every x and y in L� and request sequence we denote the internal summation of the expressionin �� by ALG

�xy�� That is�

ALG�xy� �

Xj rjfx�yg

�ALG�x� rj � ALG�y� rj� �

We can now write equation �� as

ALG�� �

Xfx�yg�L

x ��y

ALG�xy�� ��

De�ne ALG��xy to be the cost that ALG pays for serving the projected request sequence xywhile operating on the projected list Lxy� We say that the algorithm ALG satis�es the PairwiseProperty if for every pair fx� yg L� ALG��xy � ALG

�xy��

Which online algorithms satisfy the Pairwise Property& The following is a useful characteriza�tion due to Bentley and McGeoch � �� An algorithm satis�es the Pairwise Property i� for everyrequest sequence � when ALG serves � the relative order of every two elements x and y in L is thesame as their relative order in Lxy when ALG serves xy �

Equations �� and �� do not in general apply to an optimal o�ine algorithm� OPT �OPT� asin general� OPT cannot avoid paid exchanges�� Nevertheless� it is not hard to obtain the followinginequality X

fx�yg�Lx��y

OPT��xy � OPT

�� ��

Moreover� it is possible to extend this inequality to the full cost model� This is established in thefollowing lemma�

Lemma � For any request sequence �Xfx�yg�L

x��y

OPT�xy � OPT�

The proof of inequality �� and Lemma � can be found in ���� �For the reader�s convenience weincluded them in Appendix A� What makes Lemma � so useful is the fact that the optimal o�inealgorithm� restricted to lists of size two� has a very simple structure� �in fact� it is not unique andwe refer to one particular optimal o�ine algorithm� which is summarized in the following lemma�

The following example due to Reingold and Westbrook ��� proves this assertion� Consider a list L � fx�� x�� x�gof size � initially ordered x�� x�� x� x� at the front�� The optimal o�ine cost to serve the request sequence x�� x�� x�� x�is �� An optimal o�ine algorithm without paid exchanges pays � to serve the same sequence�

The best known list accessing optimal o�ine algorithm for lists of sizes � larger than � ��� does not have aconcise description� which is probably essential to make it useful for analyses� We note that this algorithm computesthe optimal solution in time exponential in ��

Page 35: Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

MRI�� MRI�� MRI�� MRI�� MRI�� MRI� MRI�� MRI��

� ��� � ��� ��� ��� �� ��� �� ��

� �� ��� ��� ��� � ��� ��� ��� ���

��� � �� �� ��� ��� ��� � � �

Table �� The costs of MRI�m� m � �� � � � � � � with respect to i� i � �� �� �� The minimum cost ineach row is marked by �

MRI�� and MRI��� With respect to �� the minimum cost is obtained by MRI�� and the sequenceof costs incurred by MRI�m� m � �� �� �� � is monotone decreasing� on the other hand� the sequenceof costs incurred by MRI�m� m � �� � � � � � is monotone increasing� Lastly� with respect to theminimum cost is obtained by MRI�� and the sequence of costs for MRI�m is a monotone decreasingwith m�

� Preliminaries and list factoring

We begin by introducing some notation that will be used throughout� Let fx�� x�� � � � � x�g be aset of items� These items are the ��xed set of items on the list� L� Let � r�� r�� � � � � rn be anyrequest sequence with ri � L� For each pair of items x and y �x �� y denote by xy the �projection�of over x and y� de�ned to be after deletion of all the requests for items other than x and y�Similarly we de�ne Lxy to be the projection of the list L over x and y �i�e� Lxy is the two elementlist holding x and y�

The list factoring technique was �rst discovered by Bentley and McGeoch � � and was extendedand used in several papers ���� ��� ��� This technique enables a reduction of the cost analysis of listaccessing algorithms to lists of size two� In this paper we use this technique only to bound frombelow the optimal o�ine cost� We now present parts of this technique that are essential to ouranalysis or to the discussion that follows�

Call the variant where we charge i � � for accessing the ith item on the list the partial�costmodel� The usual model where we charge i for accessing the ith item will be called the full�costmodel� For any list accessing algorithm ALG and request sequence denote by ALG

�� the costincurred by ALG for serving within the partial�cost model�

Suppose ALG is a deterministic list accessing algorithm that does not use paid exchanges� Let � r�� r�� � � � � rn be any request sequence� For each item x � L� and integer � � j � n� we denoteby ALG�x� rj a cost function measuring the penalty attributed to item x for being in the way whileALG accesses rj �the jth request� That is

ALG�x� rj �

�� if x is in front of rj �� otherwise �including the case x � rj�

��

Using the above notation and this cost function we have�

ALG�� �

X��j�n

XxL

ALG�x� rj

�XxL

X��j�n

ALG�x� rj

Page 36: Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

integer� Consider the following deterministic list accessing algorithm called pass�recent�item�m�or PRI�m for short�

Algorithm PRI�m� Upon a request for item x� move x forward just in front of the�rst item z on the list that was requested at most m times since the last request for x�Do nothing if there is no such item z� If this is the �rst request for x� move x to thefront�

We call this in�nite set of algorithms fPRI�mgm��� �the pri Family�� It is readily seen thatalgorithm PRI�� is identical to algorithm TIMESTAMP �modulo handling of �rst requests� Hereagain� the limit algorithm of this family �as m approaches in�nity is MTF� We state without aproof the following result��

Proposition � For all m � � PRI�m is ��competitive�

The exact competitive ratio of the algorithm in the pri Family �other than PRI�� is an openquestion�

It is not hard to see that for all m �� � �m � � PRI�m acts di�erently than MRI�m� Considerthe following example�

Example �� Consider a list of � elements initially ordered h�� �� �� �� �i where the element � isat the front� Consider the following request sequence

� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� ��

Consider algorithm MRI��� After processing the list maintained by MRI�� is h�� �� �� �� �i andthe cost it incurs is ��� For comparison consider algorithms PRI�� � TIMESTAMP� PRI��� and MTF�The costs incurred by these algorithms for processing are� respectively� �� � and � and theyend up with the following lists� respectively� h�� �� �� �� �i� h�� �� �� �� �i and h�� �� �� �� �i� Thus� insome sense algorithms MRI��� PRI�� and PRI�� all have di�erent �predictions� for what is likelyto be the next request�

Example �� We now consider three request sequences� each of which distinguishes betweenthe algorithms MRI�m� m � �� �� � � � � � in a di�erent way� The following are the three sequences�

� � �����������������������������������������������������������������

����������������������������������������������

� � �����������������������������������������������������������������

�����������������������������������������������������������������

���������

� �����������������������������������������������������������������

�����������������������������������������������������������������

The respective costs of the algorithms for each of these sequences are summarized in Table ��As can be seen in Table �� with respect to � the minimum cost of ��� is obtained by MRI�� andthe cost monotonically increases with m� so that the maximum cost of �� is obtained by both

The proof of Proposition � has a similar structure to the proof of Theorem �

Page 37: Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

The algorithms in the mri family and in particular� MRI�� look similar to algorithm TIMESTAMP�Indeed� all these algorithms maintain and use time �stamps� in order to determine their actions�Note the following two di�erences between MRI�� and TIMESTAMP� The �rst di�erence is in themanner they handle �rst requests for items� upon a �rst request for an item x� TIMESTAMP keepsx in place whereas MRI�� moves x to the front� The second di�erence is that MRI�� moves therequested item x forward to a position just after the last element in front of x that was requestedat least twice since the last request for x whereas TIMESTAMP moves x forward just before the�rst element in front of x that was requested at most once since the last request for x� The �rstdi�erence is of minor signi�cance� The second di�erence� although subtle� appears to be �at leastat the outset of major signi�cance� Nevertheless� later we shall prove that MRI�� and TIMESTAMP

are equivalent �modulo the above �rst di�erence�

Proposition � Algorithms MRI�� and TIMESTAMP are equivalent�

�The precise de�nition of this equivalence will be given later� in Section ����For each i � �� algorithm MRI�i appears more �reluctant� than algorithm MRI�i� � to move

an accessed item forward� This property makes the mri family attractive as it provides a gradualtransition from the �conservative� TIMESTAMP to the more �hasty� MTF� An example that demon�strates this feature will be presented� In particular this property of the mri family may be usedin applications to dynamically adapt to varying degrees of locality of reference� by �shifting gears�between algorithms in the mri family�

Theorem � For any list of size � and for each integer m � �� MRI�m is ��� ����competitive inthe static model��

This result signi�cantly expands our knowledge of competitive�optimal deterministic list access�ing algorithms� The proof of Theorem � �Section � uses a standard potential function argumentin conjunction with standard list factoring �to bound below the optimal o�ine cost� These twotechniques have been usually used separately� Note that standard list factoring by itself cannot beemployed for proving upper bounds for the mri family �except when m � �� The reason is that form � � algorithm MRI�m does not satisfy the �Pairwise Property� � �� On the other hand� we couldnot obtain a pure potential function proof� It is interesting to note that both MTF and TIMESTAMP

do satisfy the Pairwise Property and thus allow for a standard and simple list factoring analysis�as well as a potential function analysis�

The �odd� element of the mri family is the algorithm MRI�� which is no better than #�p��

competitive� This is established in Section ���� It is interesting to note that in the experimentalstudy� presented later in this paper� members of the mri family and in particular MRI�� were foundto be among the best list accessing algorithms in terms of their access cost performance �as well asdata compressors�

Theorem � also holds within the dynamic list accessing model �in which case � is the totalnumber of elements that were inserted to the list� This is established in Section ��

��� The pri Family

Analogous to the mri Family and generalizing the algorithm TIMESTAMP of Albers in a straight�forward manner� we now de�ne the following family of algorithms� Let m be a non�negative

�As pointed out by an anonymous referee it is possible to prove that MRI m�� m � � attain the optimal upperbound of �� �� �� � using a di�erent proof technique than ours�

Page 38: Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

��� Contents and paper organization

The paper is organized as follows� In Section � we introduce the new mri and pri families andstate our results� In Section � we develop some techniques that will be used later in the analyses ofthe mri and pri families� In particular� we present some features of the well�known �list factoring�technique that in this paper will be used only for lower bounding optimal o�ine costs� In Section �we prove the optimality of the algorithms in the mri family and prove the equivalence betweenMRI�� and TIMESTAMP� We also prove a lower bound of #�

p� on the competitive ratio of algorithm

MRI��� In Section � we extend our results to the dynamic list model�The second part of the paper starts in Section and is concerned with the results of an extensive

experimental study of many list accessing algorithms� In Section �� we describe all the algorithmstested� These include representatives of �� di�erent families and more than �� algorithms in all�Whenever known we specify bounds on the competitive ratios of the algorithms described� Wehave performed two experiments� The �rst attempts to rank the various algorithms in terms oftheir access cost performance� and the second� in terms of their performance as data compressors�General descriptions of these two experiments are given in Sections �� and ��� Sections � and �treat each of these experiments in detail� In particular� they describe the data sets� the experiments�the conclusions and other relevant and related experiments� Finally� in Section we draw ourconclusions�

New list accessing algorithms

In this section we present new deterministic list accessing algorithms� In particular� we introducetwo classes of algorithms called the mri and the sc pri families�

��� The mri Family

Letm be a non�negative integer� Consider the following deterministic list accessing algorithm calledmove�to�recent�item�m �or MRI�m for short�

Algorithm MRI�m� Upon a request for item x� move x forward just after the lastitem z on the list that is in front of x and that was requested at least m�� times sincethe last request for x� If there is no such item z or if this is the �rst request for x� movex to the front�

We call the set of algorithms fMRI�mgm�� �the mri Family�� In this paper we prove that themri Family includes all deterministic list accessing algorithms that are so far known to be optimal�i�e� MTF and TIMESTAMP� First� notice that MTF is the limit element of the mri Family as mapproaches in�nity� In fact� for each particular ��nite request sequence each of the algorithmsMRI�m with m � jj is equivalent to MTF with respect to �i�e� it acts identically to MTF� Asnoted above MTF is optimal� Let us now describe algorithm TIMESTAMP due to Albers �����

Algorithm TIMESTAMP� Upon a request for an item x� insert x in front of the �rst�from the front of the list item y that precedes x on the list and was requested at mostonce since the last request for x� Do nothing if there is no such item y� or if x has beenrequested for the �rst time�

�The precise name of TIMESTAMP in �� is TIMESTAMP ���

Page 39: Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

analysis� Recently� Albers ��� discovered another deterministic algorithm called TIMESTAMP andproved that it is ��competitive� So far MTF and TIMESTAMP have been the only deterministicalgorithms known to attain an optimal competitive ratio of ���

The importance and practical usefulness of algorithm MTF has been well known for quite sometime �see e�g� Bentley and McGeoch � �� This algorithm provides a very simple and e�cientmethod of maintaining a linked list with a guaranteed performance� Further� MTF can be used todevise a very simple and successful �text compression algorithm �see e�g� ��� � ����

Soon after its discovery� algorithm TIMESTAMP was also shown to play an important role� Con�tinuing Albers� work� Albers� von Stengel and Werchner ��� combined TIMESTAMP and algorithmBIT ���� ��� in a ������� probability mixture and showed that the resulting randomized algorithm�called COMB is ����competitive against an oblivious adversary� So far this is the best known ran�domized upper bound that leaves a small gap to the best known lower bound of ��� ����� Anotherwork due to Albers and Mitzenmacher ��� demonstrated the usefulness of TIMESTAMP by consid�ering it as an engine for the data compression scheme of Bentley et al� ��� ��� With respect tothe Calgary Corpus� a collection of standard benchmark �les for �text compression ���� they foundthat the TIMESTAMP�based compression algorithm is in most cases superior to the MTF�based one�They argued that this phenomenon is related to the fact that TIMESTAMP is more �conservative�than MTF and therefore it is better suited to the degree of locality of reference exhibited in �thecorpus text �les�

These results concerning MTF and TIMESTAMP emphasize compelling arguments supporting thequest for other deterministic �list accessing algorithms� and in particular� optimal ones�� First� dif�ferent �optimal deterministic algorithms have di�erent behaviors on di�erent input sequences andby identifying the characteristic behaviors with respect to various inputs one can better match analgorithm to the task in question� For instance� it could be rewarding to be able to match �or adaptonline the choice of a list accessing algorithm to varying degrees of locality of reference exhibitedin the input� Also� a set of �optimal deterministic algorithms can be used to obtain randomizedalgorithms by considering probability distributions over the set� The discovery of algorithm COMB isan important example for such possibilities� �Note that algorithm BIT� the randomized componentof COMB� is in fact a uniform probability mixture over a set of �� deterministic algorithms each ofwhich is ��competitive�

In the �rst part of this paper we introduce two new families of deterministic online list accessingalgorithms� We analyze one of the families and provide some initial results regarding algorithms ofthe second family�

The list accessing problem has been studied for Despite the �over thirty years of list accessingresearch� with plenty of interesting theoretical results� this topic su�ers from a lack of experimentalfeedback that could lead to better and more realistic theory� In the second part of this paper wereport the results of extensive empirical study testing the access cost performance and compressionperformance of more than �� list accessing algorithms� These include many well known as well asour new algorithms�

�Any deterministic list accessing algorithm is termed here optimal if it attains a competitive ratio of �� Analgorithm which attains the upper bound of � � �� �� � is termed strictly optimal�

�Another variant of MTF that moves an item to the front every other access can be also shown to be ��competitive��Note that this goal of discovering more or all� optimal algorithms is very often overlooked in competitive analysis

once a single� more or less �satisfying�� optimal algorithm is found�

Page 40: Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

� Introduction

In the list accessing problem an online algorithm maintains a set of items as an unsorted linear list�A sequence of requests for items on the list is given and the algorithm must serve these requests inthe order of their arrival� Upon a request for an item x� the algorithm must access x by searchingfor x starting from the front of the list� In a standard model of this problem the cost associatedwith accessing the ith item from the front is i� The algorithm may reorganise the list at any time�The incentive for such reorganization is to decrease the cost of future accesses� Reorganization isdone via a sequence of transpositions of consecutive items� After an item x is accessed� x maybe moved free of charge to any position closer to the front� Thus the transpositions that bring x

forward after an access are called free exchanges� Any other transposition costs � and is called paid�The above variant of the list accessing problem is called the static model� The dynamic model

deals with a dynamic list where new items may be inserted and old items may be deleted �oraccessed� Unless otherwise is speci�ed we usually assume the static model�

The importance of the list accessing problem arises from the fact that online list accessingalgorithms are often used by practitioners� Although organizations of dictionaries as linear listsare relatively ine�cient� there are various situations in which a linear list is the implementation ofchoice� For instance� when the dictionary is small �say for organising the list of identi�ers maintainedby a compiler� or for organising collisions in a hash table� or when there is no space to implemente�cient but space consuming data structures� etc� � � ���� In addition� any list accessing algorithmcan be used as the �heart� of a data compression algorithm ��� ��� Due to its great relevancy the listaccessing problem has been studied since � � �see e�g� ���� �� � ��� ��� ��� Nevertheless� despitethese extensive studies� this simple�to�state problem is not yet well understood� In particular�the basic question of which are the best list accessing algorithms and for which inputs� remainsunsolved except for two extreme cases� where inputs are independent observations of a probabilitydistribution� or when inputs are generated by an adversary aiming to maximize the multiplicative�regeret� of the algorithm �i�e� its competitive ratio� see below� Each of these models impliesa di�erent notion of optimality� and the work done so far in this area identi�ed many interestingoptimality and approximate�optimality results� However� with respect to hardest �model� of real�life inputs� the problem remains illusive for the most part�

Competitive list accessing algorithms� Let ALG be any list accessing algorithm� For any sequenceof requests we denote by ALG� the total cost incurred by ALG to service � Following Sleator andTarjan ���� we measure the performance of an online list accessing algorithm ALG by its competitiveratio� de�ned as follows� we say that ALG attains a competitive ratio c �or that ALG is c�competitiveif there exists a constant such that for all request sequences � ALG� � c � OPT� � whereOPT is an optimal o�ine list accessing algorithm� �The constant is usually called �the additiveconstant�� If ALG is randomized� it is c competitive against an oblivious adversary if for everyrequest sequence � E�ALG�� � c � OPT� � where is a constant independent of � and E���is the mathematical expectation taken with respect to the random choices made by ALG� The useof the competitive ratio for the analysis of online algorithms is termed competitive analysis��

On known competitive optimal list accessing algorithms and their applications� In their seminalpaper� Sleator and Tarjan ���� showed that the well�known algorithm move�to�front �MTF� adeterministic algorithm that moves each requested item to the front� is ��competitive� Raghavanand Karp �reported in ���� proved a lower bound of �� ����� � on the competitive ratio of anydeterministic algorithm maintaining a list of size �� Irani ���� gave a matching upper bound for MTFshowing that MTF is a strictly optimal online algorithm judged from the perspective of competitive

�The term �competitive analysis� was coined by Karlin et al� ����

Page 41: Ben - cs.technion.ac.ilrani/papers/list-accessing.pdf · t with the Ben tley et al sc heme on b yte and w ordlev el resp ectiv ely The tables giv e the compression ratios that is

On the competitive theory and practice of online list accessing

algorithms

Ran Gilad�Bachrach � Ran El�Yaniv y Martin Reinst�adtler z

November� ����

Abstract

This paper concerns the online list accessing problem� In the �rst part of the paper we presenttwo new families of list accessing algorithms� The �rst family is of optimal� � competitive�deterministic online algorithms� This family� called the mri �move�to�recent�item� family�includes as members the well known move�to�front �MTF� algorithm� and the recent� more�conservative� algorithm TIMESTAMP due to Albers� So far move�to�front and TIMESTAMPwere the only algorithms known to be optimal in terms of their competitive ratio� This newfamily contains a sequence of algorithms fA�i�gi�� where A��� is equivalent to TIMESTAMP

and the limit element A��� is MTF� Further� in this class� for each i� the algorithm A�i� ismore conservative than algorithm A�i � �� in the sense that it is more reluctant to move anaccessed item to the front� thus giving a gradual transition from the conservative TIMESTAMPto the �reckless� MTF� The second new family � called the pri �pass�recent�item� family isalso in�nite and contains TIMESTAMP� We show that most algorithms in this family attain acompetitive ratio of ��

In the second� experimental part of the paper� we report the results of an extensive empiricalstudy of the performances of a large set of online list accessing algorithms �including membersof our mri and pri families�� The algorithms� access cost performances were tested with respectto a number of di�erent request sequences� These include sequences of independent requestsgenerated by probability distributions and sequences generated by Markov sources to examinethe in�uence of locality� It turns out that the degree of locality has a considerable in�uence onthe algorithms� absolute and relative costs� as well as on their rankings� In another experiment�we tested the algorithms� performances as data compressors in two variants of the compressionscheme of Bentley et al� In both experiments� members of the mri and pri families were foundto be among the best performing algorithms�

�Institute of Computer Science� The Hebrew University� email� ranb�cs�huji�ac�ilyDepartment of Computer Science� Technion � Israel Institute of Technology� email� rani�cs�technion�ac�ilzMax�Planck�Institut f�ur Informatik� Saarbr�ucken� email� marei�mpi�sb�mpg�de