Top Banner
EXACT: A collaboration toolset for algorithm-aided annotation of almost everything Christian Marzahl 1,2,* , Marc Aubreville 1 , Christof A. Bertram 3 , Jennifer Maier 1 , Christian Bergler 1 , Christine Kr ¨ oger 2 ,J¨ orn Voigt 2 , Robert Klopfleisch 3 , and Andreas Maier 1 1 Pattern Recognition Lab, Friedrich-Alexander-Universit¨ at Erlangen-N ¨ urnberg, Erlangen, Germany 2 Research and Development, EUROIMMUN Medizinische Labordiagnostika AG, L¨ ubeck, Germany 3 Institute of Veterinary Pathology, Freie Universit¨ at Berlin, Germany * [email protected] ABSTRACT In many research areas scientific progress is accelerated by multidisciplinary access to image data and their interdisciplinary annotation. However, keeping track of these annotations to ensure a high-quality multi purpose data set is a challenging and labour intensive task. We developed the open-source online platform EXACT (EXpert Algorithm Cooperation Tool) that enables the collaborative interdisciplinary analysis of images from different domains online and offline. EXACT supports multi-gigapixel whole slide medical images, as well as image series with thousands of images. The software utilises a flexible plugin system that can be adapted to diverse applications such as counting mitotic figures with the screening mode, finding false annotations on a novel validation view, or using the latest deep learning image analysis technologies. This is combined with a version control system which makes it possible to keep track of changes in data sets and, for example, to link the results of deep learning experiments to specific data set versions. EXACT is freely available and has been applied successfully to a broad range of annotation tasks already, including highly diverse applications like deep learning supported cytology grading, interdisciplinary multi-centre whole slide image tumour annotation, and highly specialised whale sound spectroscopy clustering. Introduction The joint interdisciplinary evaluation of images is critical to scientific progress in many research areas. Specialised interpretation of images highly benefits from cross-border cooperation among experts from different disciplines. E.g. the annotation of pathology microscopy slides with the aim of facilitating routine pathology tasks. The strenuous annotation work can be greatly simplified by tailored algorithmic support for medical experts provided by engineers and computer scientists. However, this interdisciplinary cooperation has specific demands on all parties involved. One important aspect to be observed is data privacy and protection. Regulations must be put in place to control who is allowed to access which image set and which data are shared. Furthermore, the tools for viewing and annotating images must be efficient and user-friendly in order to achieve a high level of acceptance among medical professionals. Computer-scientists, however, require traceable high-quality and high-quantity data sets which are essential for reproducibility when creating accurate machine learning algorithms. In order to meet these diverse requirements for annotating image data, a wide variety of open-source software solutions have been designed and published in recent years. These software solutions can be divided into three groups: firstly, offline annotation tools like SlideRunner 1 , AnnotatorJ 2 , Icy 3 , or QuPath 4 . Secondly, web-based solutions focusing on cooperation like Cytomine 5 or OpenHI 6 . And finally, platforms that combine established solutions like Icytomine 7 which combines both Icy and Cytomine. All these solutions natively support whole slide images (WSI) and provide open-source access for scientific research purposes. Our general requirements for a state-of-the-art collaborative annotation tool can be summarised as follows: The software should be usable online and offline, while providing multi-centre support for interdisciplinary cooperation and a REST-API to facilitate integration with existing software. Furthermore, an extendable plugin system for easy adaptation and extension to specific use cases must be included in conjunction with an image-set administration aspect to manage and group images with restricted access through a user management system. Bounding boxes and polygon annotations as well as single click support are critical features for an efficient and flexible annotation workflow. Annotation templates enforce a unique naming schema essential for standardisation and allow the incorporation of background knowledge. Additionally the elaborate polygon annotation process should be supported by advanced methods like region growing or superpixels and guided screening to annotate WSIs systematically. Finally, to achieve reproducible results in the machine learning algorithm development process, a version control system for annotations and the possibility to perform inference of deep learning models should be included. After extensive study, we found that none of the published open-source solutions meet all these requirements (see Table 1). In arXiv:2004.14595v1 [cs.HC] 30 Apr 2020
9

EXACT: A collaboration toolset for algorithm-aided annotation of … · EXACT: A collaboration toolset for algorithm-aided annotation of almost everything Christian Marzahl1,2,*,

Sep 27, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: EXACT: A collaboration toolset for algorithm-aided annotation of … · EXACT: A collaboration toolset for algorithm-aided annotation of almost everything Christian Marzahl1,2,*,

EXACT: A collaboration toolset for algorithm-aidedannotation of almost everythingChristian Marzahl1,2,*, Marc Aubreville1, Christof A. Bertram3, Jennifer Maier1, ChristianBergler1, Christine Kroger2, Jorn Voigt2, Robert Klopfleisch3, and Andreas Maier1

1Pattern Recognition Lab, Friedrich-Alexander-Universitat Erlangen-Nurnberg, Erlangen, Germany2Research and Development, EUROIMMUN Medizinische Labordiagnostika AG, Lubeck, Germany3Institute of Veterinary Pathology, Freie Universitat Berlin, Germany*[email protected]

ABSTRACT

In many research areas scientific progress is accelerated by multidisciplinary access to image data and their interdisciplinaryannotation. However, keeping track of these annotations to ensure a high-quality multi purpose data set is a challengingand labour intensive task. We developed the open-source online platform EXACT (EXpert Algorithm Cooperation Tool) thatenables the collaborative interdisciplinary analysis of images from different domains online and offline. EXACT supportsmulti-gigapixel whole slide medical images, as well as image series with thousands of images. The software utilises a flexibleplugin system that can be adapted to diverse applications such as counting mitotic figures with the screening mode, findingfalse annotations on a novel validation view, or using the latest deep learning image analysis technologies. This is combinedwith a version control system which makes it possible to keep track of changes in data sets and, for example, to link the resultsof deep learning experiments to specific data set versions. EXACT is freely available and has been applied successfully to abroad range of annotation tasks already, including highly diverse applications like deep learning supported cytology grading,interdisciplinary multi-centre whole slide image tumour annotation, and highly specialised whale sound spectroscopy clustering.

IntroductionThe joint interdisciplinary evaluation of images is critical to scientific progress in many research areas. Specialised interpretationof images highly benefits from cross-border cooperation among experts from different disciplines. E.g. the annotation ofpathology microscopy slides with the aim of facilitating routine pathology tasks. The strenuous annotation work can be greatlysimplified by tailored algorithmic support for medical experts provided by engineers and computer scientists. However, thisinterdisciplinary cooperation has specific demands on all parties involved. One important aspect to be observed is data privacyand protection. Regulations must be put in place to control who is allowed to access which image set and which data are shared.Furthermore, the tools for viewing and annotating images must be efficient and user-friendly in order to achieve a high level ofacceptance among medical professionals. Computer-scientists, however, require traceable high-quality and high-quantity datasets which are essential for reproducibility when creating accurate machine learning algorithms. In order to meet these diverserequirements for annotating image data, a wide variety of open-source software solutions have been designed and publishedin recent years. These software solutions can be divided into three groups: firstly, offline annotation tools like SlideRunner1,AnnotatorJ2, Icy3, or QuPath4. Secondly, web-based solutions focusing on cooperation like Cytomine5 or OpenHI6. And finally,platforms that combine established solutions like Icytomine7 which combines both Icy and Cytomine. All these solutionsnatively support whole slide images (WSI) and provide open-source access for scientific research purposes.

Our general requirements for a state-of-the-art collaborative annotation tool can be summarised as follows: The softwareshould be usable online and offline, while providing multi-centre support for interdisciplinary cooperation and a REST-APIto facilitate integration with existing software. Furthermore, an extendable plugin system for easy adaptation and extensionto specific use cases must be included in conjunction with an image-set administration aspect to manage and group imageswith restricted access through a user management system. Bounding boxes and polygon annotations as well as single clicksupport are critical features for an efficient and flexible annotation workflow. Annotation templates enforce a unique namingschema essential for standardisation and allow the incorporation of background knowledge. Additionally the elaborate polygonannotation process should be supported by advanced methods like region growing or superpixels and guided screening toannotate WSIs systematically. Finally, to achieve reproducible results in the machine learning algorithm development process,a version control system for annotations and the possibility to perform inference of deep learning models should be included.After extensive study, we found that none of the published open-source solutions meet all these requirements (see Table 1). In

arX

iv:2

004.

1459

5v1

[cs

.HC

] 3

0 A

pr 2

020

Page 2: EXACT: A collaboration toolset for algorithm-aided annotation of … · EXACT: A collaboration toolset for algorithm-aided annotation of almost everything Christian Marzahl1,2,*,

Table 1. Comparison of open-source pathology applications and their features for data set annotation and machine learning(ML) support.

Applications Combined ApplicationsIcytomine7 EXACT

Features QuPath4 AnnotatorJ2 OpenHI6 Icy3 Cytomine5 Sliderunner1 EXACT

App

licat

ion

Online - - X - X - XCross-platform X X X X X X XMulti-Center - - - - - - XREST-API - - - - X - XLanguage Java Java Python Java Python Python PythonPlugins X - - X X X X

Image-set administration X - X - X - XUser management - - X - X - X

Ann

otat

ion

Box / Polygon X X X X X X XTemplates - - - - - - X

Single click X X X X X X XAdvanced annotation X X X X X X -

Guided screening - - - - - X X

ML Version control - - - - - - X

Inference X X X X X X X

this work, we introduce EXACT, a novel online open-source software solution for massive collaboration in the age of deeplearning and big data. EXACT was developed with seamless interaction of offline clients in mind, and works symbioticallytogether with the established SlideRunner software1.

In the following section, we evaluate the use of EXACT within four different scenarios to create high-quantity and high-quality data sets. In chapter three, we present a discussion and outlook. Finally, this article is concluded by a description ofEXACT’s unique features and the design principles behind them.

Evaluation and ResultsIn the following, we present several of our previous usage scenarios and introduce how EXACT and its features wereutilised to increase efficiency. EXACT’s source-code is freely available under open-source license at https://github.com/ChristianMarzahl/Exact together with a python REST-API implementation at https://github.com/ChristianMarzahl/EXACT-Sync for streamlined integration into existing projects. We additionally provide scripts toset up a local server with the virtualisation software Docker as well as example scripts for the most common EXACT use-caseslike up- and downloading of data or creating cluster- or annotations maps.

Pathology Annotation StudyWith the support of ten pathologists, we hosted a study on EXACT to investigate how the efficiency of the pathology imageannotation process can be increased with algorithmic support. All pathologists had to perform three pathologically relevantdiagnostic tasks on 20 images, once without algorithmic support and once with support. Firstly, they had to detect mitoticfigures on microscopy images. Each of those images spanned ten high power fields (HPF, total area=2.37mm2). The secondtask focused on diagnosing equine asthma. For this, five types of visually clearly distinguishable cells (eosinophils, mastcell, neutrophils, macrophages, lymphocytes) had to be labelled. The final task was to determine the severity of pulmonaryhaemorrhage by grading five types of cells according to the predominantly used schema by Golde et al.9. Several EXACTfeatures were used for this study. First of all, we used the blind annotation mode for assigning identical grading tasks toall pathology experts, which we then combined with the feature of importing pre-computed annotation for the algorithmicsupport. The annotation templates enabled rapid single click annotations by providing reasonable default annotation sizes,particularly for the equine asthma task where the cells are distinguishable by size. The systematical grading of the large wholeslide microscopy images is supported by the persistent screening mode plugin, which enables the expert to resume the gradingprocess at the previously selected position on the slide at any time. In order to estimate the impact of the EXACT server on theresults we conducted a voluntary survey at the end of the study and compared the results with a previous EIPH study10 on thesame EIPH-images with the commercial software LabelBox. The grading scale ranged from 1 (very good) to 6 (insufficient).We evaluated the overall impression of the software, whether the insertion, modification or deletion of annotations was efficient

2/9

Page 3: EXACT: A collaboration toolset for algorithm-aided annotation of … · EXACT: A collaboration toolset for algorithm-aided annotation of almost everything Christian Marzahl1,2,*,

Histiocytoma

1363-19.svs

1711-19.svs

1905-19.svs

2027-19.svs

2071-19.svs

21_19_Histioctyoma_MF.svs

3590_19_RK.svs

4292_19.svs

4292_19.svs

504_19_Histiocytoma_MF.svs

21_19_Histioctyoma_MF.svs

Subcutis (0)

s Annotation Search Filter Screening

Stroke Width

Draw Total: 48

Back (q) Skip (e)

Verify & Next (f)

0x 1x 2x 5x 10x 20x 40x 80x

Show All

10000 10000

EXACT Label Server Home Histiocytoma Explore Messages Tools Administration My Teams Screenshot3

Image filter

Brigthness

Contrast

CLAHE

Thresholding

Invert Grey R G B

Annotations

Label Key Count Color Example

Subcutis 0 5 / 0

Epidermis 1 15 / 0

Dermis 2 28 / 0

Trichoblastoma 3 0

SCC 4 0

ID: Label: First

Editor:

Last

Editor:

Verified:

391660 Dermis marzahl marzahl false

391658 Dermis marzahl marzahl false

391659 Dermis marzahl marzahl false

@label:Dermis

Search

Media

0:01 / 0:01 AutoPlay

Label Key Count Color

0 0 88

1 1 122

2 2 105

3 3 33

4 4 19

Method Score

Doucet et al. 138.147139

EIPH-Score

Figure 1. Left: Five examples of plugins (from top to bottom): The image filter plugin allows to make common intensityadjustments to the image, the annotations plugin shows the available annotations and their frequency of use. The search fieldallows to query the database for arbitrary annotation properties. The media plugin can be used to play media files attached to anannotation. The EIPH-Score plugin is an example of a domain-specific plugin, allowing to calculate the Doucet score8. Right:A screenshot of the annotation view depicting a WSI with polygon annotations, the list of images in the image set and thescreening mode plugin, which enables the user to screen the image persistently.

and the overall effectiveness. During the course of the study the pathologists annotated 26,015 cells on 1200 images. Thealgorithmic support with EXACT lead to an increase in accuracy and a decrease of annotation time11 for all tasks. The overallimpression of the software, the persistent screening mode and the annotation effectiveness was rated with a median of two bythe participating pathology experts. For the annotation mode without algorithmic support, the interaction time was significantlyreduced [F(1,29)=11.23, p<0.01] from a mean of 105.07 to 50.43 seconds per image, compared to the study conducted withLabelBox.

Multi species pulmonary hemosiderophages cytology data setEXACT played an essential part in creating the largest currently known fully annotated and publicly available multi-speciespulmonary haemorrhage data set, building on our previous work12 in which 17 WSIs with 78,047 pulmonary hemosiderophageswere fully annotated by a veterinary pathologist using SlideRunner1. Using EXACT, 40 additional expert-algorithm annotatedequine WSIs were added to this data set. Another 14 felis catus WSIs were annotated using the same algorithmic approach.Finally, the expert-algorithm collaboration results were verified by incorporating and modifying the EXACT feature ofannotation maps as follows: To take into account that the hemosiderin absorption, which is assessed for grading EIPH, is acontinuous process which is mapped to a discrete grading system we utilised the provided cell-based regression approach12.This assigns a continues grade between zero and four to each cell to generate a new image for efficient manual validation (Fig.2 bottom). On this new image, cells were displayed ascendingly from grade zero to four on the x-axis, while hemosiderophageswith the same score were stacked on the y-axis. This enabled the trained pathologist to efficiently verify the computer-generatedannotations by focusing on the cells which were located on the borders between two grades. Furthermore, EXACT’s flexibleplugin system was utilised to develop a specialised EIPH plugin that calculates the EIPH score for the field of view in real-time,according to Doucet et al.8.

3/9

Page 4: EXACT: A collaboration toolset for algorithm-aided annotation of … · EXACT: A collaboration toolset for algorithm-aided annotation of almost everything Christian Marzahl1,2,*,

Figure 2. Top Left: Polygon annotations of a canine skin tumour tissue whole slide image. Top Right: Clustered whale soundspectroscopy images with the option to listen to the attached waveform online. Bottom: Pulmonary hemosiderophages, labelledaccording to their predicted class and arranged according to their predicted regression score for efficient validation by humanexperts.

Skin tumour tissue quantificationThis project aims to segment and classify nine of the most common dog skin tumours with deep learning algorithms. For thispurpose, 280 slides were scanned and partly annotated using SlideRunner’s advanced tissue annotation tools. Furthermore,this project needs to synchronise the generated slides and annotations to EXACT for coordination and distribution betweenthe participating pathology experts and computer scientists for analysis at multiple institutes and locations. SlideRunner andEXACT communicate via EXACT’s REST-API to synchronise annotations, images and annotation templates. EXACT’s novelfeature of annotation templates play an essential role to increase standardisation and the overall image set quality by ensuring

4/9

Page 5: EXACT: A collaboration toolset for algorithm-aided annotation of … · EXACT: A collaboration toolset for algorithm-aided annotation of almost everything Christian Marzahl1,2,*,

standard annotation naming schemes and the use of polygon annotations independent of the user or user application (Fig. 2 topleft). While the project is actively being developed, 120 slides have already been fully annotated, resulting in 4,487 polygonannotations representing tissue layers. This indicates that a combination of online and offline tools enables fast multi-expertannotations.

Clustering and visualisation of killer whale soundsWhile the EXACT platform is primarily developed for cooperative interdisciplinary research on microscopy images, itsflexibility extends to other research areas without adaptation. One of these projects13 aims at deepening the understanding ofkiller whales (Orcinus Orca) and their large variety of different sound types by clustering and visualising the spectral shapeof machine-pre-segmented killer whale audio samples (Fig. 2 top right). Multiple EXACT features support this challengingundertaking: Firstly, the support of viewing and annotating gigapixel size images, which, in this use case, contain up tothousands of clustered spectrograms, where each spectrogram represents an individual killer whale sound. Secondly, groupedannotation assignments, which enable the user to select numerous visually grouped spectrograms simultaneously by drawing arectangle around them in order to assign them to the same label. Finally, EXACT supports attaching media records like videos,images or sound files to the respective annotations and plays them in the Browser (Fig. 1 left). These features enable the user tosee the grouped spectrograms and additionally listen to the attached killer whale sound.

DiscussionWith the booming digitisation of image data and the widespread use of machine learning algorithms the need for platformsthat are able to organise and display huge amount of large image data while also managing annotations and keeping track ofexperiments is more crucial than ever. EXACT has proven to satisfy these requirements in several different projects rangingfrom collaborative tissue segmentation in the field of digital pathology to whale sound clustering. This generalisability showsthe versatility of the software as a primary advantage. It not only allows existing offline projects to be extended with cooperationand synchronisation functions, but also is able to support researchers in as of yet unforeseen fields. This is highlighted by thepositive feedback from medical professionals during our first studies as well as by the significant decrease in annotation time incomparison to LabelBox for EIPH grading. Furthermore, EXACT’s features are providing computer scientist with versioncontrolled annotations, advanced visualisation techniques like annotation maps or clustering, and organising artefacts fromexperiments. With EXACT, it is also possible to define reproducible training, validation and testing sets.

The flexible software architecture allows for easy adaptation to future developments in pathology or other research areas. Infuture releases, we are planning to support image stacks. In addition we want to create specialised plugins exploring molecularpathology issues - an increasingly significant subdiscipline of the classic anatomical pathology. Also, the integration of servers(like Omero14), which are specialised in providing microscopic images is an option we strongly consider. Furthermore, we areinvestigating gamification integration as a promising new method to annotate data at scale.

In summary, EXACT provides a novel feature set to boost the creation of high-quality big data sets in combination withfunctions to develop reproducible state-of-the-art machine learning algorithms.

System and methods

EXACT massively extends the established online open-source software ImageTagger15, which was developed for the RoboCupcompetition to create training data for machine learning projects. We chose ImageTagger because it already fulfilled many ofour basic requirements. ImageTagger uses Django as its web framework, a Postgres database system and HTML with Javascriptas front end user interface. We added support to view whole slide images (WSIs) by incorporating the open-source softwareOpenSeadragon, loading WSIs from hard drive with OpenSlide16 to support a wide range of image formats. Additionally,we implemented methods for efficient loading and drawing of thousands of annotations. Furthermore, we added features forattaching media files like sounds or videos to annotations as well as the ability to play them in a web browser. A flexibleplugin system was added enabling developers to add custom HTML and JavaScript front end code and Python server-sidecode for specialised analysis tasks like calculating data sets dependent on a score or adding advanced search options (Fig.1).One of these plugins is a persistent user-based screening mode which enables the user to systematically screen a WSI or partsthereof on a self-defined zoom level (Fig.1). Depending on the use case, the inference of deep learning models is supported inthree ways. For fast response time the execution of JavaScript-based TensorFlow models is implemented. For high throughputapplications, the inference load can be distributed via the REST-API on multiple machines or run directly on the server.Furthermore, we added real-time collaboration to view changes made by other users in compliance with the data set userright management. To facilitate deployment and to be able to cope with a wide range of installation scenarios ranging fromsingle-user, single-computer setups to massive cloud deployment with modern load balancing mechanisms we added Docker

5/9

Page 6: EXACT: A collaboration toolset for algorithm-aided annotation of … · EXACT: A collaboration toolset for algorithm-aided annotation of almost everything Christian Marzahl1,2,*,

support (using nginx). The features described in the following section are novel EXACT features which are not only tailoredfor pathologist and researchers working with WSIs but can also be applied to image data in general.

Multi-centre supportMedical data should naturally be subject to the highest safety standards possible. Despite that, in order to enable interdisciplinarymedical research and cooperation between different groups and locations, it can be necessary to share medical image dataanonymously and in strict consideration to data privacy. Therefore, EXACT ensures the original image data, which may containpatient information (file name, metadata) to remain within the original institute while the actual data exchange between expertsand institutes is executed on small sub-images via decentralised image storage. Technically this is implemented in several steps.Firstly, all server communication is protected with Hypertext Transfer Protocol Secure (HTTPS) and access is restricted via auser authentication system. Secondly, when transferring the images to one EXACT server instance, an original private namederived from the file name and a pseudonymised public name is generated. The pseudonymised public name is generated by thecurrent date-time followed by a four-digit hash function of the original name (yymmdd-hhmm-****). Thirdly, for cooperationbetween different institutes, virtual image sets are supported. Here the information (for example annotations) is imported fromseveral EXACT instances to a central server. However, access to the images themselves is always provided by the instituteowning the data in compliance with their respective data privacy policy for images. This means, that only the requested rawpixel data for the field of view is transferred to the collaborator, but not the image container or any metadata.

Annotation Map Screening ModeA specialised validation mode where each annotation can be verified individually is standard for this type of application1, 7, 15.For data sets with hundreds or thousands of annotations, this is an important but error-prone, labour-some and time-consumingtask. This becomes even more complicated for usage scenarios where each cell can receive multiple labels by one or multipleusers. To make this validation process more efficient, we combined the screening mode with annotation maps for each label. Tothis end, we created multiple new images, where each newly created image only consisted of annotations belonging to one classarranged in a matrix-like fashion (Fig. 3 top). The annotation maps can be efficiently screened for errors, while the users candefine how many annotations they want to see simultaneously. Corrections made on these screening images are synchronisedwith the original data.

An advanced extension of this method is the clustering of labelled and unlabelled images. This manner of presentationallows the user to create initial labelling in an efficient way or to quickly validate prior annotations, since similar imageswhich are likely to have similar labels are displayed closely together. The clustering pipeline consists of three steps. Firstly,characteristic features are extracted from each image, for example, by deep learning or classic image processing. Secondly, theextracted high-dimensional features are transformed into two-dimensional features, for example, using t-SNE17, PCA18 orUMAP19. Finally, the extracted image patches are drawn in a new image container according to their nearest two-dimensionalfeature representation, which does not overlay any other image patches. The resulting image is transferred to EXACT andvisualised for labelling or validation (Fig. 3 bottom).

Image set versioning and machine learning supportIn general, two main criteria in research and medical applications are reproducibility and traceability of results and experiments.Especially reproducibility is non-trivial in settings where researchers from different fields like medicine and computer sciencework together and make adjustments to data sets over time. In software development, it is an established process to use versioncontrol systems (such as git or subversion) for programming source code to coordinate the collaboration between softwaredevelopers and to keep code changes traceable. Remarkably, this process is to our knowledge not provided by any opensource software for annotations on medical data sets. To implement this feature, we have extended the "tags" introduced inImageTagger15 with functions that support traceability of experiments. If a tag is added to a data set, on request the currentannotation state, an optional description and the current list of images in the data set is saved. For training machine learningalgorithms, the annotations can be filtered by tags and exported in user-defined formats or per script using the REST-API.This, thus, supports the users to perform reproducible experiments on defined data sets while providing the flexibility to exportinput data to a wide range of machine learning frameworks. Additionally, training artefacts like performance metrics, createdannotations or generated models can be attached to a tag. In combination with the virtual image set function introducedpreviously in this article, it is possible to create virtual training, testing and validation sets. This combination of tags and virtualimage sets helps to keep track of different experiment versions and supports the comparability of results.

Crowd-Sourcing and study supportOne of the biggest challenges in developing, training, testing and validating state-of-the-art machine learning algorithms is theavailability of high-quality, high-quantity labelled image databases. Crowd-sourcing has numerous successful applications inthe medical field20 and crowd-algorithm collaboration has the potential to decrease the human effort10. EXACT is supporting

6/9

Page 7: EXACT: A collaboration toolset for algorithm-aided annotation of … · EXACT: A collaboration toolset for algorithm-aided annotation of almost everything Christian Marzahl1,2,*,

Supervised Data

Unsupervised Data

Figure 3. Top row: Supervised single-cell validation, with annotation maps generated from three labelled equine asthmaWSIs where each colour represents one class of cells. Bottom row: UMAP19 dimensionality reduction approach in anunsupervised setting. The segmented equine asthma cells are first classified, and features for each cell are extracted.Afterwards, the high-dimensional features are transformed into a two-dimensional representation and visualised in a new image.Both approaches allow the user to verify and enhance the created classification results.

this development by providing multiple features for managing crowd-sourcing and studies. Firstly, the user privilege systemallows to set specific rights like annotation or validation to users or user groups. Secondly, the crowd- or expert-algorithmcollaboration is assisted by importing pre-computed annotations or generating them on-premise with machine learning models.Finally, EXACT supports multiple annotation modes like:

1. Cooperative: One user can verify the image, and each user sees all other annotations.

2. Competitive or Blind: Every user must verify every image and can’t see other users’ annotations.

3. Second opinion: A predefined number of the users must verify every annotation.

Annotation TemplatesStandardisation is critical to encourage cooperation, interoperability and efficiency. To support this, EXACT introducesannotation templates, allowing to define properties of annotations allocated to a defined label. Annotation types contain generalinformation about the target structure like name, an example image, the sort order in which the annotation should be displayedon the user interface, display colour, keyboard shortcuts to efficiently assign the label to an annotation, and default size. Defaultsizes enable the user to introduce background knowledge into the annotation process; this allows for efficient single clickannotations and reduces the need to further adjust annotations. One or more annotation types are grouped to products withpieces of information like name or description and can be assigned to image sets. The products in turn can be assigned tomultiple image sets and support the reproducibility of the annotation process by enforcing a standard naming and annotationschema.

7/9

Page 8: EXACT: A collaboration toolset for algorithm-aided annotation of … · EXACT: A collaboration toolset for algorithm-aided annotation of almost everything Christian Marzahl1,2,*,

AcknowledgementsCAB gratefully acknowledges financial support received from the Dres. Jutta & Georg Bruns-Stiftung für innovative Veter-inärmedizin.

Author contributions statementC. M. Developed the server, created the visualisation code and wrote the main part of the manuscript. M. A. co-wrote themanuscript, provided code for the synchronisation with SlideRunner, provided expertise through intense discussions. C. A.B. co-wrote the manuscript, provided expertise through intense discussions. J. M., J. V., C. B., C. K., R. K., A. M. Providedexpertise through intense discussions.

All authors contributed to the preparation of the manuscript and approved of the final manuscript for publication.

Supplementary informationCompeting interests. The authors declare no competing interests.Code availability. Server: https://github.com/ChristianMarzahl/ExactREST-API Client: https://github.com/ChristianMarzahl/EXACT-Sync

References1. Aubreville, M., Bertram, C., Klopfleisch, R. & Maier, A. Sliderunner - a tool for massive cell annotations in whole slide

images. In Bildverarbeitung für die Medizin 2018, 309–314 (Springer, 2018).

2. Hollandi, R. & Horvath, P. Annotatorj: an imagej plugin to ease hand-annotation of cellular compartments. bioRxiv (2020).

3. De Chaumont, F. et al. Icy: an open bioimage informatics platform for extended reproducible research. Nat. Methods 9,690 (2012).

4. Bankhead, P. et al. Qupath: Open source software for digital pathology image analysis. Sci. Rep. 7, 1–7 (2017).

5. Marée, R. et al. Collaborative analysis of multi-gigapixel imaging data using cytomine. Bioinformatics 32, 1395–1401(2016).

6. Puttapirat, P. et al. Openhi-an open source framework for annotating histopathological image. In IEEE Int ConfBioinformatics Biomed, 1076–1082 (IEEE, 2018).

7. Obando, D. F. G., Mandache, D., Olivo-Marin, J.-C. & Meas-Yedid, V. Icytomine: A user-friendly tool for integratingworkflows on whole slide images. In ECDP, 181–189 (Springer, 2019).

8. Doucet, M. Y. & Viel, L. Alveolar macrophage graded hemosiderin score from bronchoalveolar lavage in horses withexercise-induced pulmonary hemorrhage and controls. J Vet Intern Med 16, 281–286 (2002).

9. Golde, D. W., Drew, W. L., Klein, H. Z. & et al. Occult pulmonary haemorrhage in leukaemia. Br Med J 2, 166–168(1975).

10. Marzahl, C. et al. Is crowd-algorithm collaboration an advanced alternative to crowd-sourcing on cytology slides? InBildverarbeitung für die Medizin 2020, 26–31 (Springer, 2020).

11. Marzahl, C. et al. Are fast labeling methods reliable? a case study of computer-aided expert annotations on microscopyslides (2020). 2004.05838.

12. Marzahl, C. et al. Deep learning-based quantification of pulmonary hemosiderophages in cytology slides (2019). 1908.04767.

13. Bergler, C. et al. ORCA-SPOT: An Automatic Killer Whale Sound Detection Toolkit Using Deep Learning. Sci. Rep.10997/2019, DOI: 10.1038/s41598-019-47335-w (2019).

14. Allan, C. et al. Omero: flexible, model-driven data management for experimental biology. Nat. Methods 9, 245–253(2012).

15. Fiedler, N., Bestmann, M. & Hendrich, N. Imagetagger: An open source online platform for collaborative image labeling.In Robot World Cup, 162–169 (Springer, 2018).

16. Goode, A., Gilbert, B., Harkes, J., Jukic, D. & Satyanarayanan, M. Openslide: A vendor-neutral software foundation fordigital pathology. J. Pathol. Inform. 4 (2013).

17. Maaten, L. v. d. & Hinton, G. Visualizing data using t-sne. J MACH LEARN RES 9, 2579–2605 (2008).

8/9

Page 9: EXACT: A collaboration toolset for algorithm-aided annotation of … · EXACT: A collaboration toolset for algorithm-aided annotation of almost everything Christian Marzahl1,2,*,

18. Wold, S., Esbensen, K. & Geladi, P. Principal component analysis. CHEMOMETR INTELL LAB 2, 37–52 (1987).

19. McInnes, L., Healy, J. & Melville, J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction(2018). 1802.03426.

20. Ørting, S. et al. A survey of crowdsourcing in medical image analysis (2019). 1902.09159.

9/9