Top Banner
- 1 - &R(GPHHWV970/ /DUV%HQGL[ Computer Science Department Aalborg University Fredrik Bajers Vej 7E DK-9220 Aalborg Ø. +45 9635 8080 [email protected] )DELR9LWDOL Computer Science Department University of Bologna Mura Anteo Zamboni, 7 I-40121 Bologna BO +39 51 354502 [email protected] $EVWUDFW &ROODERUDWLYHO\ FUHDWLQJ GRFXPHQWV LV D FRPSOH[ WDVN WKDW UHTXLUHV VRSKLVWLFDWHG WRROV &R(G LV D V\VWHP SURYLGLQJ RYHUYLHZ DQG YHUVLRQLQJ VXSSRUW IRU WKH FUHDWLRQ RI FRPSOH[ DQG KLHUDUFKLFDOO\ VWUXFWXUHG WH[W GRFXPHQWV,WVOLPLWDWLRQVLQWKHHIILFLHQWPDQDJHPHQWRIGLVNXVDJHDQGLQ SURYLGLQJDIHZVRSKLVWLFDWHGFROODERUDWLRQIXQFWLRQDOLWLHVOHGXVWRFRQVLGHU WKH970/FKDQJHWUDFNLQJODQJXDJHIRULPSURYLQJWKHIHDWXUHVHWRI&R(G 7KLV UHSRUW H[SORUHV WKH DGYDQWDJHV RI XVLQJ D VRSKLVWLFDWHG FKDQJH WUDFNLQJODQJXDJHLQDYHUVLRQLQJV\VWHPIRUFROODERUDWLYHZULWLQJ .H\ZRUGV Co-ordination, communication, change tracking, version control. ,QWURGXFWLRQ The CoEd system [2] was born to provide support for collaborative writing to teams of students at the University of Aalborg needing to prepare Latex reports for software projects connected with their courses. The available tools were felt lacking in facilities for global overview, co-ordination, version control and communication among writers.
25

CoEd Meets VTML.

Dec 10, 2022

Download

Documents

bruna pieri
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CoEd Meets VTML.

- 1 -

!"#$%&''()%*+,-

-./)%0'1$23Computer Science Department

Aalborg UniversityFredrik Bajers Vej 7EDK-9220 Aalborg Ø.

+45 9635 [email protected]

4.52"%*2(.62Computer Science Department

University of BolognaMura Anteo Zamboni, 7

I-40121 Bologna BO+39 51 354502

[email protected]

75)(/.8(9

!"##$%"&$'()*#+, -&*$'(./, 0"-12*.'3, (3, $, -"24#*5, '$36, '7$', &*81(&*33"47(3'(-$'*0, '""#39, !":0, (3, $, 3+3'*2, 4&")(0(./, ")*&)(*;, $.0, )*&3(".(./3144"&', <"&, '7*, -&*$'("., "<, -"24#*5, $.0, 7(*&$&-7(-$##+, 3'&1-'1&*0, '*5'0"-12*.'39,='3,#(2('$'(".3,(.,'7*,*<<(-(*.',2$.$/*2*.',"<,0(36,13$/*,$.0,(.4&")(0(./,$,<*;,3"47(3'(-$'*0,-"##$%"&$'(".,<1.-'(".$#('(*3,#*0,13,'",-".3(0*&'7*,>?@A,-7$./*,'&$-6(./,#$./1$/*,<"&,(24&")(./,'7*,<*$'1&*,3*',"<,!":09?7(3, &*4"&', *54#"&*3, '7*, $0)$.'$/*3, "<, 13(./, $, 3"47(3'(-$'*0, -7$./*B'&$-6(./,#$./1$/*,(.,$,)*&3(".(./,3+3'*2,<"&,-"##$%"&$'()*,;&('(./9

:';<"/$)9 Co-ordination, communication, change tracking, versioncontrol.

=%>1(/"$?8(2"1The CoEd system [2] was born to provide support for collaborative writingto teams of students at the University of Aalborg needing to prepare Latexreports for software projects connected with their courses. The availabletools were felt lacking in facilities for global overview, co-ordination,version control and communication among writers.

Page 2: CoEd Meets VTML.

- 2 -

The first two prototypes [1] built at the University of Aalborg providedoverview of the structure of the texts and version management of the basictext units of the students’ reports. Thus, CoEd was able to solve manyproblems connected to the mentioned tasks. Students could carry outsatisfactorily the processes connected to writing their reports and theavailability of a sophisticated tool such as CoEd allowed the creation ofbetter reports and considerably reduced the efforts for creating andcorrecting them.

On the other hand, CoEd implements an unsatisfactory management ofpersistent data, by storing whole versions and ignoring the inherent structureof the documents handled. Furthermore, many useful features are notimplementable given the current underlying data model, such as support formanaging and visualising differences in the structure, querying of attributesor comparison of parallel versions. Evolving the CoEd system to handle thiskind of information and to provide the functionality requires deciding on amechanism for improved internal data management.

VTML (Versioned Text Markup Language, [10]) is a markup language fordescribing changes occurred to arbitrary sequential data types. It allows tospecify arbitrary attributes to each change, such as authors and dates, and tobuild arbitrarily complex version graphs detailing the development of adocument. A VTML-based system rely on the VTML format to providesupport for efficient data management, version branching, lock-freeconcurrent access to shared documents [7], version identification, easycomparison of versions, and reliable addressing space for the document’scontent.

VTML seems adequate for providing intelligent disk management to CoEd,and to provide support for much of the sophisticated functionalitymentioned. Other engines such as RCS were judged inadequate because theyare not capable of dealing with structured texts as easily as VTML can.Furthermore, additional information, in the form of attributes, is handled in astraightforward way by VTML.

Our current aim is to create a new prototype for handling change tracking incollaborative writing efforts. We will rely on the old CoEd to provide theuser interface and the versioning model, behaving as the front-end of thesystem, and use VTML to provide the versioning engine and work as theback-end. The new prototype is supposed to give better change tracking thanprevious systems (including old CoEd) and better change tracking helps

Page 3: CoEd Meets VTML.

- 3 -

improve the co-ordination and communication in collaborative writingefforts.

In this technical report, we describe the goals of our current research projectaimed at:

• specifying the requirements for a collaborative writing environment• specifying the interface between CoEd and VTML• designing the new integrated prototype• implementing the services that were not implemented with the old

back-end• developing the versioning model of CoEd to provide additional

functionality made possible by the disappearance of old restrictions

The rest of the paper is structured as follows: In section 2, we will describethe existing CoEd system and the results that were obtained by using it. Insection 3, the VTML language is described and a simple session is analysedto provide insight on the actual working of the language. Section 4 describesthe analysis of the problems in integrating CoEd and VTML and sketchesthe design of the combined system. In section 5 we draw some conclusions,draft our plans for future work in the project and state some preliminaryresults.

@%!"66.5"/.(2A'%B/2(21CCoEd is a collaborative tool aimed at supporting teams in writing sharedstructured documents. For years the students at the department of ComputerScience of the University of Aalborg have encountered numerous problemswhen they have had to work together to write reports. Each semester thesestudents spend the major part of their time developing a system, enablingthem to put the theory they are taught during the courses into practice. Theywork in groups of 3-8 people over a period of four months. The theory andthe process, as well as the final product, have to be documented in a report,which is usually between 80 and 120 pages long. The major part of thisreport is written during the last three weeks of the project period.

Page 4: CoEd Meets VTML.

- 4 -

The students experienced problems, not so much during the programmingprocess where existing tools seem to be of sufficient help, as during thewriting process which is usually short and hectic and characterised by a verydynamic organisation of tasks and responsibilities. They especially hadproblems in keeping an overview of the document and how its structuredevelops through new versions. This caused them to have problems inestablishing baselines of the document and to find proper use of versionhistories. Finally, communication of information about the development isimportant as these students often work in a distributed way. Some studentsin a group may work from home, while others work from the room that eachgroup has at the university, and others yet work from the computer labs atthe department. This fact led to creating an environment called CoEd [1],where distributed clients could work on shared data via the same server, asshown in figure 1.

CoEd client

CoEd client

CoEd client

CoEd client

CoEd server

Figure 1. Clients and servers in CoEd.

The problems of these students are typical examples of the problems presentin collaborative writing efforts. The CoEd system is a tool better suited fortheir specific problems than the ones previously used. Instead of buying anew tool or trying to solve a general problem, the strategy was to try to solvethe specific problems of the students in their given context, and to gainexperiences from the students actually using a prototype of the tool.

All groups of students use Latex for producing their project reports. Somegroups have so far used self-imposed group discipline to be able to managethe development, dividing the document up into disjoint parts with respect toresponsibility. They have, however, usually encountered serious problems,both because parts of the report are inherently interdependent and because ofthe complete absence of versioning of the compound document. Most groups

Page 5: CoEd Meets VTML.

- 5 -

have used either RCS[8] or CVS [3] as their tool of choice to manage thedevelopment, usually based on whether they liked a strict lockingmechanism or not. This enabled them to version the development of thesingle parts of the document, but they still had problems in keeping anoverview of the entire document and in manipulating its structure.

@D=%+E'%F/"56'&)%21%!"66.5"/.(2"1

The work on CoEd [2] took its origin in the problems that students hadreported from their co-operative work on developing textual documents. Theproblems of these students are many and varied, but can roughly be groupedin three categories. One that has to do with the lack of overview and co-ordination, both of the document and of what everyone else is doing.Another category that has to do with problems doing version control the waythat they want and need. And, finally, there are problems that have to dowith the communication.

Problems that have to do with the lack of overview and co-ordinationmanifest themselves in several ways. Students complain that it is verydifficult to organise the structure of the report and to have the structurevisualised while working in front of the screen. Much paper is wastedprinting out indices or entire reports to gain overview and much work is lostin manually changing Latex commands (and/or file names) to reflect areorganisation of the report. This also implies that groups rarely change theirway of working. If they work in a top-down fashion, the structure of thepaper remains fixed right from the start. Groups working in a bottom-up wayremain in a limbo until the very last moment where all the pieces can finallybe put together. A mixed and more flexible approach would obviously bedesirable.

These problems are, in part, due to the fact that if we divide the document upinto several files, reflecting its hierarchical structure, then the version controltool (or the file system) is treating those just as single pieces and not as awhole too. In part, the problems are due to the lack of a proper GUI that canvisualise the structure of the document. Version control tools permit us todivide the document in logic entities, like chapters and sections, and putthem on separate files organised in a hierarchical directory structure.However, without a proper GUI it is difficult to get a quick overview of theentire document. Furthermore, the fact that the structure is not manifest, but

Page 6: CoEd Meets VTML.

- 6 -

only implied by a directory structure, means that we must manually changethis structure every time the organisation of the document is to be changed.

To remedy this problem the CoEd system has knowledge of Latex, so that itcan automatically create (and maintain) the storage organisation from thatimplied by the Latex code. Furthermore, a GUI that is capable of visualisingLatex structures is indispensable. Finally, it is possible to visualise - andwork with – both the document as a whole and its individual parts.

In the second category of problems we find misfits between the versioncontrol needs of the students and the functionality provided by the tools theyuse. The students do not have very sophisticated needs for version control.They do not develop variants and do not have to maintain old versions, as itis usual in software development. Still they have troubles in finding helpfrom traditional version control tools. They find difficulties in retrieving oldversions. Not rarely, confusion arises when the supervisor comments on thedocument and the students find out that it is not the version they printed outjust before the meeting. When a section or a chapter is split up into two, theversion history for one of the parts is lost. They also have problems infollowing which changes have been carried out and which are still pending.These problems are very similar to the problems in version selection,baselining and change tracking pointed out by Tichy [9].

Again the problems are, in part, due to the lack of a GUI, and, in part, toproblems with the data model that the version tools build on. An adequateGUI makes version selection far easier because one immediately sees whatone selects, at least with versions of the individual parts. Baselining theentire document is a cumbersome and sometimes error prone process. This isdue to the fact that it is a manual task where the document is viewed as acollection of versioned parts. As such, there is no explicit versioning of theentire collection as a whole. Furthermore, as the tools are unaware ofoperations like splitting a unit, this becomes something that is unsupportedand has to be carried out outside of the tool's control.

To avoid these problems we made the construction and versioning ofbaselines an integral part of the tool, treated on equal footing with theversioning of the individual parts. Furthermore, we supported splitting ofunits as a basic functionality. Finally, the GUI is able to enable versionselection and visualise the result immediately.

Page 7: CoEd Meets VTML.

- 7 -

@D@%+E'%7/8E2('8(?/'%"G%!"#$

The architecture of CoEd is built around the principle that a Latex documenthas a hierarchical structure and as such consists of a set of leaves andinternal nodes (sections and subsections), each of which can contain text.Leaves and nodes are the smallest granularity of the system and are called1.('3. For the versioning of a unit we use the traditional approach of creatinga version group for the unit and let the development of versions be reflectedby a version graph. The root node has a special status as it represents thewhole document. The root node is versioned just like all other nodes and thisprovides us with versioning of the document as a whole. A given version ofthe whole document is called a baseline in accordance with the terminologyused in software configuration management.

Figure 2. How CoEd presents itself to the user.

Page 8: CoEd Meets VTML.

- 8 -

CoEd has four browsers which each shows a different aspect of the internaldata structure. This makes it possible to look at the data structure (i.e. thedocument) at varying levels of details allowing for a flexible granularity.Figure 2 shows the GUI of the CoEd system.

At the bottom left, we find the 7(*&$&-7+,%&";3*&. This browser shows thehierarchical structure of the document as it is implied by the Latexcommands in the text. The design is inspired by the Pathfinder ofWindows95 and icons can be expanded and compressed by double-clickingon them. This makes it very easy to get a quick overview of the whole orparts of the document at the desired level of detail.

At the bottom right, we find the '*5', %&";3*&. Here is shown the text thatcorresponds to the icon selected in the hierarchy browser (in this case\subsection{RCS}). As documents, in contrast to programs, consist ofcontiguous pieces of text it is possible to scroll this text and thus arrive at thetext that precedes or follows the selection. If a selection is made in the textbrowser, by double-clicking anywhere in a logical unit, then this text ishighlighted and the corresponding icon in the hierarchy browser ishighlighted too.

At the top left, we find the )*&3(".,%&";3*&, which has two windows. Thetop window shows all the versions of the baseline, which is equivalent toshowing the version group of the root, as we version whole structures. In thebottom window is shown the version graph for the unit that is selected in thehierarchy browser (in this case \subsection{RCS}). The two windows are"synchronised" in the sense that if we select another baseline version in theupper window, it might highlight another version in the unit graph - if theversion of that unit is not the same in the two baselines. If another version isselected in the unit graph, it will definitely highlight another baselineversion, as a change in a unit will always cause a new baseline. Thehierarchy browser is a "slave" of the version browser as it always shows thestructure that corresponds to the current baseline selection. Through thehierarchy browser, also the text browser is a "slave" of the version browserand thus always shows the text which corresponds to the current baselineselection.

At the top right, we find the -".<(/1&$'("., %&";3*&. This browser wasintroduced in order to solve the lack of overview caused by the high numberof baselines. As can be seen in the version browser window, baselines canalso be named to distinguish important ones. These named baselines are the

Page 9: CoEd Meets VTML.

- 9 -

ones that appear in the configuration browser. The selection of one of thesebaselines will cause the icon of the same baseline to be selected andhighlighted in the version browser.

From figure 2, we can see that initially the user selected the "5. December"baseline in the configuration browser. This caused CoEd to find and selectthat version of the document (version 19) and highlight it in the upperwindow of the version browser (leaving the lower window blank). CoEdalso finds and displays the structure of this baseline version in the hierarchybrowser. Then the user selected "\subsection{RCS}" as the unit he isinterested in and CoEd found and displayed the text of this unit (and ofimmediately surrounding units to fill out the text window) in the textbrowser, highlighting it. CoEd also displayed the version group for theselected unit in the lower window of the version browser. We can also seethat it is not the latest version of the unit (version 6), that is a part of the “5.December” baseline, but an earlier version (version 5). This caused CoEd tofind the text for that version of the node and displaying it in the text browser.

@DH%I"<%!"#$%>)%J)'$

A typical scenario for the use of CoEd will find us starting with (a piece of)a document which we now want to continue to develop using CoEd. Usingthe file menu, we will ask CoEd to check in the file containing thedocument. CoEd parses the Latex code and, if successful, constructs theimplied hierarchical structure, otherwise it refuses the text.

Using the version browser, we now select the baseline we want to change(usually the latest). We can now use either the hierarchy browser or the textbrowser to select the contiguous piece of text we want to change (it can spanseveral logic units) and ask CoEd to check it out to a single file.

This file we can edit using our favourite editor and when we have finishedediting the text, we ask CoEd to check it in again. CoEd automaticallydiscovers which units have been changed - creating new versions - andwhich have not - leaving them untouched. It will even discover if units havebeen added or deleted and react correspondingly.

Page 10: CoEd Meets VTML.

- 10 -

@DK%7$A.18'$%4?18(2"1.62(;%21%!"#$

CoEd also has some more advanced functions that work at the structurallevel of the document. These are split of a unit, creation of meta-versionsand direct manipulation of the structure.

The basic idea behind the versioning mechanism used in CoEd is that ofversioning whole structures and not just single parts. Traditionally the latterapproach has been used. This means that each single part of a system isversioned individually in a version group of its own. To give the structure ofthe compound system, we then have a separate description, which may beversioned too. This creates two major problems: a conceptual one ofmanaging versioning on two distinct levels and a technical one of keepingthe description of the compound system in synchronisation with its actualdevelopment.

The way that CoEd versions both structure and contents means that theseproblems have been solved. Conceptually the compound system is explicitlyversioned as a whole as demonstrated by the baseline window of the versionbrowser where a version of the whole document is selected. The structure ofthis baseline of the document is then visualised in the window of thehierarchy browser. Implicitly the parts of the compound system areversioned too. This is demonstrated by the version group window of theversion browser. When we select a unit of the structure in the hierarchybrowser the version graph of that unit is shown in this window.

By having the versioning of the whole structure as the primary, explicitfocus and the versioning of the parts as a subordinate, implicit consequencewe avoid the traditional conceptual mismatch and overcome the problems ofkeeping the description of the system up-to-date with its actual development.It has, however, some consequences at the technical level. If, for example,we make some modifications to a sub-subsection, it means that not only hasthat sub-subsection changed and a new version of it must be created. But thesubsection in which it is a part has changed too because some part in itsstructure has changed. And the section that contains the subsection hasconceptually changed too, as has the chapter and the entire report. So addinga comma to a sub-subsection creates not one but five new versions – i. e. anew version of every node in the tree of the structure from the node wherethe text was modified to the root. Apart from the node where themodification was actually made, the other nodes appear to be unchanged, as

Page 11: CoEd Meets VTML.

- 11 -

the change has happened to a node further down in the tree. This is aconceptually sound principle, but at the technical level care has to beexercised in order not to consume too much disk space. This care has notbeen implemented in the present prototype of CoEd.

A more serious problem surfaces because of the versioning philosophy. Weallow the user to check out more than one unit at a time – as long as the textis contiguous. So the user could, for example, check out sections one, twoand three of the first chapter. Modifications can be made to all three sectionsand the combined result can be checked in again. CoEd will now discoverthe modification to section one and create a new version of this section, anew version of chapter one and a new version of the report. After that is willdiscover that section two has been modified too, creating a new version ofthat section, yet another new version of chapter one and of the report. Thesame will happen when CoEd discovers the modifications to section three.This is problematic both at the conceptual and the technical level.Conceptually the user will think of his operations as one single change andmay be confused by seeing it represented by three different versions thatmay not even reflect intermediate steps in his editing. Technically, it is apotential waste of space to create three new versions of chapter one and ofthe report if only one is needed. A solution at the conceptual level is offeredin the present prototype of CoEd.

Meta-versions is a way to reduce the number of baselines created, such thatit becomes manageable (the configuration browser has the same objective).In the mechanism we have adapted for versioning in CoEd (the engine partof CoEd’s architecture from figure 5) each change in a unit means that notonly is a new version created of that unit, but a new version is also createdfor all units on the path to the root of the structure. This is because weconsider also structures and not just contents of singular units. So if we havechapter 3 with sections A, B and C, and make changes to sections A and C,there will be created ';" new versions of chapter 3. One containing sectionsA’, B and C and one containing sections A’, B and C’. It is easy to imaginehow many baseline versions a document with more than 100 units in a fourlevel hierarchy would create when changed frequently. It was difficult for usto change the basic versioning mechanism such that it would create only thelatter version of chapter 3. Instead, we made a change to the model at thehigher level, such that these versions would automatically be groupedtogether into a single meta-version. Meta-versions can be opened so that thesingle versions in the meta-version can be accessed.

Page 12: CoEd Meets VTML.

- 12 -

Figure 3. Added functionality in CoEd.

Let us assume that we have sections A, B and C, and want to split section Ainto two sections A1 and A2. When we check in the result - A1, A2, B and C- CoEd will discover that there is one more section than was checked out. Itwill, however, also recognise that A1 and A2 were parts of the originalsection A and create two new version groups and connect them with theversion group of the original A in a seamless way, in order not to loosecontinuity in the compound version history.

It is also possible to directly manipulate the structure and in this waypermute units. It is possible to move both single units and parts of thestructure. We simply select what has to be moved and then drag it to theplace where it has to be inserted. In this way, we can change a chapter to asection (including its sub-structure) or vice versa, and CoEd will make thenecessary changes to the Latex code for us. In the implementation of directmanipulation, we profit very much from automatically created meta-versions. This is because a move operation is implemented as a number ofinserts followed by a number of deletes, which would quickly generate anawful lot of baseline versions. The automatic meta-versions were created to

Page 13: CoEd Meets VTML.

- 13 -

compact a series of external editing operations into one representation.Correspondingly, we have the possibility to manually create meta-versions,which can be used to compact a series of internal editing operations like thedirect manipulations.

@DL%#3M'/2'18')%<2(E%!"#$

We have implemented a working prototype of CoEd and students of theUniversity of Aalborg have used it for developing their project reports. Theresults from these experiments are rather promising. The average number ofpages handled by CoEd was about 80 pages per project. Some groups justused it to play around because they did not trust the stability of CoEd andwere afraid to loose their data. These groups had relatively little text (about50 pages), while a few groups used it seriously and had about 120 pagesunder the control of CoEd. In the end, CoEd turned out to be amazinglystable and very little data was lost the few times it crashed.

The average number of units per project was 147, again with serious usergoing higher. The number of baseline versions was 2 to 6 times the numberof units1. That the students were able to maintain an overview anyway,proves the value of the configuration browser and the meta-version concept.There were about 9 versions in meta-versions, but up to 20-30 versions wereseen. Especially for groups that brought larger pre-written pieces of text intoCoEd rather than developing them from scratch using CoEd.

Development was mostly linear with very few branches and merges. Thisindicates that conflicts because of parallel development were not verycommon. This may be so because the tools that the students had been usingpreviously did not encourage such behaviour. Split of units was used but notextensively. The direct manipulation of the structure, on the other hand, wasused very extensively and was rated by the students as one of the strongestpoints about CoEd. And, above all, the fact that direct manipulation could becarried out under full version control. The fact that a move is carried out as a

1Note that just bringing a document into CoEd for the first time creates just as many

baseline versions as there are units. This is so because the present versioningmechanism handles the units of the document one at the time. It thus creates one newversion of the whole document for each changed, added or removed unit.

Page 14: CoEd Meets VTML.

- 14 -

delete followed by an insert combined with the fact that each single changecreates new versions means that we have actually obtained an undo functionfor this kind of editing operation for free. All the single steps can befollowed one by one, and at the end they can be grouped in a meta-version inorder not to create too many baseline versions.

Students also felt that this kind of version control and change trackinglowered the need for face-to-face meetings for exchanging information. Thisindicates that in the past many such meetings were held mainly forcommunicating information and for co-ordination purposes, and that byusing CoEd they were able to reduce the co-ordination needs.

H%!E.1C'%+/.8N21CVTML is a descriptive data format for fine-grained change tracking. It is nota versioning 3+3'*2, but a flexible data format that can be used by systemsthat implement a wide range of versioning styles. It was born from thetentative of determining an adequate versioning style for hypermediadocuments in a collaborative environment ([4] and [5]). The versioningstyles allowed by VTML can vary from extremely informal and unstructuredasynchronous collaboration patterns among creative writers, to theformalised and controlled sequential actions of a team of programmers, tothe synchronous access to a shared blueprint by a team of architects anddesigners. It can be flexibly used with a large number of systemarchitectures, varying from flexible editing clients and dumb storage servers,to extremely dumb clients interacting with a sophisticated versioning server.

The format is designed to be consumed by programs, and so it is relativelyterse and simple to parse. Although we are currently applying VTML to themanagement of text, any text or binary format can be directly represented ina VTML document.

VTML-based systems may make use of the features of VTML to obtain afew interesting features, such as:

• the version history may branch, creating a tree of variants. The versionhistory may also converge, creating a master version that inherits fromseveral different variants by some form of user-guided or automaticmerge.

Page 15: CoEd Meets VTML.

- 15 -

• locks to control accesses to authors are not necessary. This is aconsequence of allowing branching versions: conflicting check-inoperations can always be allowed, automatically creating new branchesof the version tree if necessary. The versions can then be “harmonised”with a merge operation.

• a check out operation is not necessary: users may use copies withoutsynchronisation control by a server or a distributed consistencyalgorithm, for instance by using a local copy on the client’s file space.

• automatic version identification are supported, according to a series ofnumbering schemata. Four numbering schemata can be used, eachhaving equivalent expressive power.

• VTML versions provide a consistent and reliable addressingmechanism for document spans, that requires no modification to thedocument and can survive unmonitored changes to the document itself.One important service that a VTML-based versioning system canprovide is to precisely locate the position of data designated by anoffset into a previous version of the document. This is an importantoperation for the support of external link bases that refer to changingdata. The same mechanism can also be used to provide flexibledocument fragment re-use, with little additional machinery.

VTML stores information about all single modifications to the shareddocument. It is able to report that something as simple as an insertion hastaken place, or something as complex as a sort. Since the list of possibleoperations is open, VTML describes every complex change as a list ofsimple operations: insertions, deletions and modifications. Thus the basicpurpose of the language (i.e., to be able to build a given version of adocument according to the changes it has incurred into since its creation) ispreserved even if the meaning of the actual operations is unknown.

Attributes are associated to single changes. This allows an extremeflexibility in describing them. In order to avoid overloading of repeated data,shorthand facilities are provided. The list of data items that can be associatedto every change is also open, and possibly very large. Thus, instead of listingextensively the kind of attributes, only a few necessary ones are determined,and a way to add new ones is provided. The necessary attributes arebasically used to univocally determine the whereabouts and the correctgrouping of the changes. Everything else, from the author or the date of the

Page 16: CoEd Meets VTML.

- 16 -

change to the comments about a given change, or to the author’s shoe size,for that matters, is an additional attribute that is not part of the language.

VTML comes in two equivalent formats: the internal format stores side byside the modifications in the positions they have happened. The externalformat stores them in the chronological order they have happened. A VTMLdocument is composed of one or more VTML blocks, contained within a{VTML} {/VTML} set of tags. VTML blocks are composed either ofinternal markup (using the elements ATT, USROP, INS, and DEL) orexternal markup (using the elements ATT, USROP, EXTINS, EXTDEL).The same document may contain VTML blocks of both types.

All change commands that are described with internal tags are stored withina single VTML block, while external tags may be stored in as many blocksas needed. Applications that require support for both internal and externalchanges in a single file may concatenate multiple blocks together.

HD=%7%!"&M6'('%#3.&M6'%J)21C%*+,-

Basically, VTML tags are meant to describe the editing operation performedon the document, and describe operations that are not the result of changesin the document data, but rather the selection of some existing changes. Letus suppose we have the following situation: David and Lars arecollaborating on writing a document.

First version: Lars inserts the string: “The quick brown fox jumps over thelazy dog.”

Second version: David substitutes “quick” with “speedy”, and removes“lazy“: “The speedy brown fox jumps over the dog.”

Third version: Lars substitutes “brown” with “red” and inserts “sleepy”before “dog”: The speedy red fox jumps over the sleepy dog”.

For reasons known to the VTML engine, version 1 and 2 are stored togetherwith the internal markup, while version 3 is stored externally (maybe theengine hasn’t had the time yet to import the new version). Versions 1 and 2correspond to the following VTML block:

{VTML NAME=“Hunting” CVERS=2 _AUTHORS=“Lars, David”}{ATTR ID=1 vers=1 _author=“Lars”}

Page 17: CoEd Meets VTML.

- 17 -

{ATTR ID=2 vers=2 _author=“David”}

{INS ATT=1} The {INS ATT=2} speedy {/INS} {DEL ATT=2} quick{/DEL} brown fox jumps over the {DEL ATT=2} lazy {/DEL}dog. {/INS}{/VTML}

Each VTML tag describes the shared context given, at least, by thedocument name, and the current version, and, in this case, also by the groupof legal authors. The ATTR tag stores a few attributes that should berepeated several times in the document tags, and that are associated with theATT attribute of the actual tags. Therefore, writing

{ATTR ID=1 vers=1 _author=“Lars”}{ATTR ID=2 vers=2 _author=“David”}

{INS ATT=1} The {INS ATT=2} speedy {/INS} {DEL ATT=2} quick{/DEL} brown fox jumps over the {DEL ATT=2} lazy {/DEL}dog. {/INS}

is equivalent to writing:

{INS vers=1 _author=“Lars”} The {INS vers=2_author=“David”} speedy {/INS} {DEL vers=2 _author=“David”}quick {/DEL} brown fox jumps over the {DEL vers=2_author=“David”} lazy {/DEL} dog. {/INS}

INS and DEL represent the actual changes that were performed on the text.Since the text is a linear sequential format, there is no need for modificationoperations, but we can safely restrict to insertions (INS tags) and deletions(DEL tags).

On the other hand, this is an external representation of version 3:

{VTML NAME=“Hunting” CVERS=3 _AUTHORS=“Lars, David”}{ATTR ID=1 SOURCE=“Hunting” VERS=3 _author=“Lars”} {EXTDEL ATT=1 POS=15 LENGTH=5} {EXTINS ATT=1 POS=15}red{/EXTINS} {EXTINS ATT=1 POS=42}sleepy {/EXTINS}{/VTML}

This VTML block contains an external description of the changes leading toversion 3. In this case, insertions (stored as EXTINS tags) specify theirposition, while deletions (EXTDEL tags) specify both their position and thenumber of removed characters.

Separately, Fabio opened version 2 of the "Hunting" document and madesome other modifications: he substituted “jumps over” with “is not caught

Page 18: CoEd Meets VTML.

- 18 -

by” and inserts “Today” at the beginning of the sentence: “Today the speedybrown fox is not caught by the dog.” Therefore, the following is the result ofhis modifications:

{VTML NAME=“Hunting” CVERS=3 _author=“Fabio, Lars, David”}{ATTR ID=1 SOURCE=“Hunting” VERS=3 _author=“Fabio”}

{USROP ATT=1 REF=2 NAME=“SUBSTITUTION”} {EXTDEL POS=29 LENGTH=10}jumps over{/EXTDEL} {EXTINS POS=29}is not caught by{/EXTINS}{/USROP}{EXTINS ATT=1 POS=1}oday t{/EXTINS}{/VTML}

This version, besides making use of the external representation of changes,uses the USROP command, which collects into a single operation asequence of basic editing commands (insertions, deletions, modifications).In the external format, the USROP tag groups together the basic operations itis composed of, and labels them with a human-understandable name.

The first problem with accepting this version is that both versions claim tobe version 3, since both were created from version 2 in absence of otherderived versions. Lars decides that his own version will remain in the mainbranch of the version tree. This affects the numbering of the versions, asFabio’s version is renumbered and becomes 3.1. Then Lars merges Fabio’scontributions into a new version: he accepts the substitution of the verb, butNOT the insertion of “Today”.

This is a structure of the version tree:

Version 1Lars

Version 2David

Version 3Lars

Version 4Lars

Version 3.1Fabio

Figure 4. The version tree of the VTML example.

Page 19: CoEd Meets VTML.

- 19 -

The engine easily generates the following internal representation:

{VTML NAME=“Hunting” CVERS=CURRENT AUTHORS=“Lars, David,Fabio”}{ATTR ID=1 ref=1 vers=1 _author=“Lars”}{ATTR ID=2 ref=2 vers=2 _author=“David”}{ATTR ID=3 ref=3 vers=3 _author=“Lars”}{ATTR ID=4 ref=4 vers=3.1 _author=“Fabio”}{ATTR ID=5 vers=CURRENT _author=“Lars”}

{USROP ATT=4 NAME=“Substitution” REF=6 INCLUDES="5"}{USROP ATT=5 NAME=“Merge” EXCLUDES="7"}

{INS ATT=1} T{INS ATT=4 REF=7}oday t{/INS}he {INS ATT=2}speedy {/INS} {DEL ATT=2} quick {/DEL} {DEL ATT=3} brown{/DEL} {INS ATT=3} red {/INS} fox {DEL REF=5} jumps overthe {/DEL} {INS REF=5}is not caught by {/INS} {INS ATT=3}sleepy {/INS} {DEL ATT=2} lazy {/DEL} dog. {/INS}{/VTML}

The main features of this version are that the internal form of USROP hasbeen used and that a merge has been performed. The internal format of theUSROP tag specifies the basic operations it is composed of by listing theirREF number in an INCLUDES attribute or listing the other ones in anEXCLUDES attribute. A merge is just another USROP operation where therelevant operations are either accepted or ignored in the merged version.Thus, in this case, version 4 is composed of a single operation that merges(accepts) all previous operations except for the one with REF = 7.

This new block can either be stored as such by the VTML engine or dividedagain into elements and stored separately. When the engine saves thedocument, it will substitute the CURRENT value with the appropriateversion number (in this case, 4).

K%>1('C/.(21C%!"#$%.1$%*+,-In this section we briefly describe the reasons for integrating CoEd andVTML, the interface between the two systems, and a few example scenarioswhere using VTML can provide additional functionality to the CoEdcollaborative system.

Page 20: CoEd Meets VTML.

- 20 -

CoEd has proven itself a strong and flexible tool to use for supportingcollaborative writing through change tracking, versioning of wholedocuments and management of document structure. We have, however,found some things to improve through our experiments with the prototype.While CoEd’s interface and model layers (see figure 5) work rather well, theengine layer is far too simple, since it does basically nothing but system callsto the file system. CoEd stores each version of a unit in its entirety and doesnot even try to use space-saving delta mechanisms. This means that usingCoEd becomes prohibitive in a larger scale as it really burns up disc space.We made this initial choice because we wanted to put emphasis on theconcepts and development of an experimental prototype rather than on anefficient implementation.

In order to further develop the functionality of CoEd and to make it a moreefficient tool that can be used for real projects, a more powerful and flexibleengine is needed. We have looked into traditional tools for version control,like RCS [8], and we found that they are simply not powerful enough. Suchtools are very efficient in representing version groups in as little space aspossible, but they are limited in that they do not go beyond version groups.This still leaves the versioning model of CoEd the task of managing thestructure of the document and of versioning this structure.

VTML efficiently represents changes in a versioned text, so that VTML-aware applications may make use of the change-tracking facilities of thelanguage to provide sophisticated versioning support to its users: versionselection, branching, comparison, and merge. VTML therefore seems like anoptimal choice for the engine component of the next CoEd prototype since itcan handle and version both contents and structure. VTML is a language, nota tool, so we had to decide what kind of VTML-enabled application we werelooking for. A VTML engine can provide basic parsing and storagefunctionality. By adding a simple interface layer for the CoEd applications,we can easily provide sophisticated versioning functionality.

In figure 5, we show the overall architecture of the foreseen application.

The interface layer provides the operational interface between CoEd and theVTML engine, and consists of the following operations:

Put_version(data:Data_structure,depends_on:Version_name) ->Version_name;

Page 21: CoEd Meets VTML.

- 21 -

The Put_version operation appends a new version to an existing document.This corresponds to a check-in operation for the VTML engine thatgenerates a diff between the specified version name and the new onesubmitted. Based on that, the engine determines the VTML coding and theversion number corresponding to the new version. The VTML engine willthen decide autonomously whether to store the new version using theexternal format in an autonomous file, or to insert it as internal coding in theexisting one. Finally, it will return the new version name for CoEd to updateits internal database.

CoEd

VTML-engine

CoEd-VTML interface

User Interface

Versioning Model

Versioning Engine

Figure 5. The conceptual architecture of CoEd with VTML.

Get_version(version:Version_name) ->Data_structure;

The Get_version operation creates the required version. This corresponds toa check-out operation for the VTML engine. The engine will retrieve all theversions leading up to the requested one, and will perform the changeoperations stored in them necessary to build the requested version. It willthen return the data corresponding to the requested version.

Compare_versions(versions:Version_group_list,deleted_data: Boolean) ->Comparison_data_structure;

A comparison structure is simply a text document that contains some colourcoding information. The CoEd model will request a list of versions to bedisplayed together to ease the comparison. For each version, it will suggest a

Page 22: CoEd Meets VTML.

- 22 -

colour. In the deleted_data parameter, it will then specify whether deleteddata should be displayed or not. This corresponds for the VTML to amultiple check-out operation where instead of simply building the requestedversions, each version is assigned a colour coding that will be used tospecify the display of each document bit. If deleted data are requested, thedeletion operations are not performed, but the corresponding data are left inthe document with an additional special colour coding.

To clarify the working of the CoEd+VTML system, we examine fourpossible scenarios where the system is used and provides sophisticatedcollaborative functionality:

I - Creating a new documentStudent A places a sharable and existing document under the wings of CoEd.

In this case CoEd will parse the text of the document and create thehierarchical structure implied by the Latex commands. For each of theleaves and internal nodes in this tree, it will create a new version group andinsert the text of the unit as a first version in this version group.

II - Getting and modifying a documentStudent B makes a modification to the document’s latest baseline and savesit.

When the text is checked in, CoEd discovers which parts have beenmodified. For all the modified units it calls the VTML engine to have newversions created and stored.

VTML handles and stores each single change that has happened to adocument between saves. This means that, after each editing session, theVTML engine must determine what has changed since the last savedversion. Since there are presently no plans of integrating a VTML-awareeditor into CoEd, the difference is determined by making a diff of the twoversions. The output of the diff program is then converted into VTMLcommands, and passed back to the VTML engine. The VTML engine nowcan choose between using the internal format, and creating a single VTMLfile containing all the existing versions of the document, and using theexternal format, which can then be stored independently of the rest of thedocument, in an autonomous file. The choice is done according to reasons ofefficiency and availability of the new version.

Page 23: CoEd Meets VTML.

- 23 -

III - Comparing different versionsStudent C accesses student B’s baseline and wants to compare it with aprevious baseline.

The CoEd interface transforms this command in a request to the VTMLengine for two different versions of the document. The VTML engineverifies whether those versions of the document are stored externally. In thiscase internalises them and generates the compact internal representation ofthe selected versions of the document. Then it transforms the relevantchange instructions in colour choices for the text of the document display,thereby allowing the comparison of the two versions. This is simply done byeliminating version information for those bits that belong to both version,and converting the version information into colour instructions for those bitsthat have been modified in either version.

This information is then visualised in a separate window by the CoEd GUI.

IV - Parallel access to a documentStudents A and C want to make modifications to the same baseline at thesame time.

Any check out of text in CoEd is done within the context of a baseline. It ispossible to make more than one check out from the same baseline – either inparallel or sequentially. CoEd notices that a branch has to be created andhandles it at both the structural and the textual level. The structural level ishandled internally, while the management of parallel variants of text ishandled by the VTML engine.

Since the VTML engine easily allows branching, neither student is blockedfrom accessing in write mode the document. We are not planning to useVTML-aware editors or notification mechanisms, so at save time the twoversions are autonomously accepted by the VTML engine and put in twoparallel variants. Since VTML allows parallel variants to coexist withoutrequiring to merge the incompatibilities, and since VTML is able to provideany selection of versions even if belonging to different version branches,there is no pressure to resolve the inconsistencies that may have been createdduring the parallel edits. Once the need to harmonise the differencesbecomes paramount, the merge operation can be activated from the CoEdGUI. A merge can either be done automatically or manually. In both cases aperson or an algorithm will select, for each edit that appears in either

Page 24: CoEd Meets VTML.

- 24 -

relevant branch, whether it should belong to the final version or not. Themerge version therefore is an optional operation that reconciles differentversion branches of the same document without loosing information on eachcomposing branch.

L%!"186?)2"1)The way that CoEd handles versioning of whole hierarchical structuresresembles that of the COOP/Orm system [6], which has inspired the designof CoEd. This system, however, has not solved the problem of theproliferation of &""',)*&3(".3 (as they are called in COOP/Orm) or %$3*#(.*3(as they are called in CoEd). This means that although they are able tohandle versioning of entire structures it is very difficult to get an overview ofthe baselines. COOP/Orm also uses a different versioning philosophy, whichcauses this proliferation to happen in all version groups; this does nothappen in CoEd. It is therefore more difficult to use COOP/Orm in order toget a global overview of what is happening and thus co-ordinating acollaborative effort.

The CoEd environment has proven to be robust and useful in manycollaborative situations. On the other hand, the simplicity of the underlyingstorage engine has prevented much interesting functionality to be added.VTML may provide the requested sophistication in the management of theversions, and thus allow to improve the feature set of CoEd. Furthermore,the significant space savings available with the VTML format may easilymake CoEd usable in heavily real-life situations.

This report presents an analysis of and a sketch of the design for the on-going evolution of CoEd.

78N1"<6'$C'&'1()9This work has been supported, in part, by the Danish Research Council,grant no. 9701406.

Page 25: CoEd Meets VTML.

- 25 -

O%P'G'/'18')1. Bendix, L., Larsen, P. N., Nielsen, A. I., and Petersen, J. L. S. CoEd - A

Tool for Cooperative Development of Hierarchical Documents,Technical Report R-97-5012, Computer Science Department, AalborgUniversity, Denmark, September 1997.

2. Bendix, L., Larsen, P. N., Nielsen, A. I., and Petersen, J. L. S. CoEd - ATool for Versioning of Hierarchical Documents, in C&"-**0(./3,"<,D!@BE (Bruxelles, Holland, July 1998), Lecture Notes of Computer Science,Springer Verlag.

3. Berliner, B. CVS II: Parallelizing Software Development, in Proceedingsof USENIX Winter 1990 (Washington, DC, 1990).

4. Durand, D., Haake, A., Hicks, D., and Vitali, F. (eds.), C&"-**0(./3,"<'7*,F"&637"4, "., >*&3(".(./, (.,G+4*&'*5', D+3'*23H held in connectionwith The European Conference on Hypertext, ECHT94. Available asGMD Arbeitspapiere 894, GMD - IPSI, Dolivostrasse 15, 64293Darmstadt, Germany.

5. Hicks, D., Haake, A. Durand, D., Vitali, F. (eds.) C&"-**0(./3, "<, '7*:!D!FIJKF"&637"4, "., '7*, L"#*, "<, >*&3("., !".'&"#, (., !D!FM44#(-$'(".3, available as Boston University’s Technical Report 96-009,http://www.cs.bu.edu/techreports/96-009-ecscw95-proceedings/Book/proceedings_txt.html

6. Magnusson, B., Asklund, U. Fine Grained Version Control ofConfigurations in COOP/Orm, in C&"-**0(./3, "<, D!@BN (Berlin,Germany, March 1996), Lecture Notes of Computer Science, SpringerVerlag.

7. Slein J., Vitali F., Whitehead, E. Jr., Durand D., "Requirements forDistributed Authoring and Versioning on the World Wide Web", inM!@,D'$.0$&0>(*;, 5(1), March 1997, p. 17-24

8. Tichy, W. F. RCS - A System for Version Control, Software - Practiceand Experience, Vol. 15 (7), July 1985.

9. Tichy, W. F. Tools for Software Configuration Management, inC&"-**0(./3,"<,D!@BO (Grassau, Germany, January 1988).

10. Vitali F., Durand D., Using versioning to support collaboration on theWWW, in ?7*,F"&#0,F(0*,F*%,P"1&.$#, 1(1), O'Reilly, 1995.