Top Banner
Seeing without Looking: Analysis Pipeline for Child Sexual Abuse Datasets CAMILA LARANJEIRA, Department of Computer Science, Universidade Federal de Minas Gerais, Brazil JOÃO MACEDO, Department of Computer Science, Universidade Federal de Minas Gerais, Brazil SANDRA AVILA, Artificial Intelligence Lab. (Recod.ai), Institute of Computing, University of Campinas, Brazil JEFERSSON A. DOS SANTOS, Department of Computer Science, Universidade Federal de Minas Gerais, Brazil The online sharing and viewing of Child Sexual Abuse Material (CSAM) are growing fast, such that human experts can no longer handle the manual inspection. However, the automatic classification of CSAM is a challenging field of research, largely due to the inaccessibility of target data that is — and should forever be — private and in sole possession of law enforcement agencies. To aid researchers in drawing insights from unseen data and safely providing further understanding of CSAM images, we propose an analysis template that goes beyond the statistics of the dataset and respective labels. It focuses on the extraction of automatic signals, provided both by pre-trained machine learning models, e.g., object categories and pornography detection, as well as image metrics such as luminance and sharpness. Only aggregated statistics of sparse signals are provided to guarantee the anonymity of children and adolescents victimized. The pipeline allows filtering the data by applying thresholds to each specified signal and provides the distribution of such signals within the subset, correlations between signals, as well as a bias evaluation. We demonstrated our proposal on the Region-based annotated Child Pornography Dataset (RCPD), one of the few CSAM benchmarks in the literature, composed of over 2000 samples among regular and CSAM images, produced in partnership with Brazil’s Federal Police. Although noisy and limited in several senses, we argue that automatic signals can highlight important aspects of the overall distribution of data, which is valuable for databases that can not be disclosed. Our goal is to safely publicize the characteristics of CSAM datasets, encouraging researchers to join the field and perhaps other institutions to provide similar reports on their benchmarks. Additional Key Words and Phrases: dataset, sensitive media, bias, transparency, child sexual abuse ACM Reference Format: Camila Laranjeira, João Macedo, Sandra Avila, and Jefersson A. dos Santos. 2022. Seeing without Looking: Analysis Pipeline for Child Sexual Abuse Datasets. In ACM FAccT 2022: ACM Conference on Fairness, Accountability, and Transparency, June 21–24, 2022, Seoul, South Korea. ACM, New York, NY, USA, 22 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn 1 INTRODUCTION Child Sexual Abuse (CSA) is one of the major issues we are currently tackling as a society. Its mitigation is listed as one of the 17 Global Goals for Sustainable Development outlined in 2015 at the United Nations General Assembly 1 . The abuse can take many forms, it may involve physical contact or violent acts [10], but it may also consist in the online sharing of child images for sexual purposes, to which we refer as Child Sexual Abuse Material (CSAM). The latter is growing exponentially, according to reports from the National Center for Missing and Exploited Children (NCMEC) [6]. The same report shows that the generation of novel content is also rapidly increasing, and its nature varies widely, from violent acts such as organized groups abducting children to abuse, document, and later share with 1 https://www.globalgoals.org Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. © 2022 Association for Computing Machinery. Manuscript submitted to ACM 1 arXiv:2204.14110v1 [cs.CV] 29 Apr 2022
22

Analysis Pipeline for Child Sexual Abuse Datasets - arXiv

Mar 08, 2023

Download

Documents

Khang Minh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Analysis Pipeline for Child Sexual Abuse Datasets - arXiv

Seeing without Looking: Analysis Pipeline for Child Sexual Abuse Datasets

CAMILA LARANJEIRA, Department of Computer Science, Universidade Federal de Minas Gerais, Brazil

JOÃO MACEDO, Department of Computer Science, Universidade Federal de Minas Gerais, Brazil

SANDRA AVILA, Artificial Intelligence Lab. (Recod.ai), Institute of Computing, University of Campinas, Brazil

JEFERSSON A. DOS SANTOS, Department of Computer Science, Universidade Federal de Minas Gerais, Brazil

The online sharing and viewing of Child Sexual Abuse Material (CSAM) are growing fast, such that human experts can no longerhandle the manual inspection. However, the automatic classification of CSAM is a challenging field of research, largely due to theinaccessibility of target data that is — and should forever be — private and in sole possession of law enforcement agencies. To aidresearchers in drawing insights from unseen data and safely providing further understanding of CSAM images, we propose ananalysis template that goes beyond the statistics of the dataset and respective labels. It focuses on the extraction of automatic signals,provided both by pre-trained machine learning models, e.g., object categories and pornography detection, as well as image metricssuch as luminance and sharpness. Only aggregated statistics of sparse signals are provided to guarantee the anonymity of childrenand adolescents victimized. The pipeline allows filtering the data by applying thresholds to each specified signal and provides thedistribution of such signals within the subset, correlations between signals, as well as a bias evaluation. We demonstrated our proposalon the Region-based annotated Child Pornography Dataset (RCPD), one of the few CSAM benchmarks in the literature, composed ofover 2000 samples among regular and CSAM images, produced in partnership with Brazil’s Federal Police. Although noisy and limitedin several senses, we argue that automatic signals can highlight important aspects of the overall distribution of data, which is valuablefor databases that can not be disclosed. Our goal is to safely publicize the characteristics of CSAM datasets, encouraging researchers tojoin the field and perhaps other institutions to provide similar reports on their benchmarks.

Additional Key Words and Phrases: dataset, sensitive media, bias, transparency, child sexual abuse

ACM Reference Format:Camila Laranjeira, João Macedo, Sandra Avila, and Jefersson A. dos Santos. 2022. Seeing without Looking: Analysis Pipeline for ChildSexual Abuse Datasets. In ACM FAccT 2022: ACM Conference on Fairness, Accountability, and Transparency, June 21–24, 2022, Seoul,

South Korea. ACM, New York, NY, USA, 22 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn

1 INTRODUCTION

Child Sexual Abuse (CSA) is one of the major issues we are currently tackling as a society. Its mitigation is listed asone of the 17 Global Goals for Sustainable Development outlined in 2015 at the United Nations General Assembly1.The abuse can take many forms, it may involve physical contact or violent acts [10], but it may also consist in theonline sharing of child images for sexual purposes, to which we refer as Child Sexual Abuse Material (CSAM). Thelatter is growing exponentially, according to reports from the National Center for Missing and Exploited Children(NCMEC) [6]. The same report shows that the generation of novel content is also rapidly increasing, and its naturevaries widely, from violent acts such as organized groups abducting children to abuse, document, and later share with

1https://www.globalgoals.org

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are notmade or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for componentsof this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or toredistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].© 2022 Association for Computing Machinery.Manuscript submitted to ACM

1

arX

iv:2

204.

1411

0v1

[cs

.CV

] 2

9 A

pr 2

022

Page 2: Analysis Pipeline for Child Sexual Abuse Datasets - arXiv

ACM FAccT 2022, June 21–24, 2022, Seoul, South Korea Camila Laranjeira, João Macedo, Sandra Avila, and Jefersson A. dos Santos

online communities [45] to the massive number of families innocently sharing pictures of their children on social media,which sex offenders later download to compose their gallery [32].

CSAM can be defined as a type of media portraying children that may or may not be involved in sexually incitingsituations, but used by adults for sexual purposes. The nomenclature adopted throughout the literature may varyfrom Child Pornography (CP) [51] to Child Exploitation Material (CEM) [12] or even Indecent Images of Children(IIOC) [26]. The Luxembourg Guidelines2 currently used by many law enforcement agencies, including Interpol3,provides clear indications of appropriate terminologies, using the terms Sexual Abuse and Sexual Exploitation to addressthe severity of the content. Those terms also aid in defining a clear distinction between criminal acts against childrenand pornography/indecency, with the latter being associated with adult content that may be consensual and overalllegal. Since sexual exploitation has specific connotations of profit and exchanges [16], we adopt the broader term ChildSexual Abuse Material. In our work, we focus on the domain of images.

Methodologies for CSAM detection commonly resort to two types of data: real evidence from law enforcementinvestigations or legal images acquired through photo libraries or search engines. Some approaches only use legalimages, attempting to solve subtasks from the target problem, for instance, performing age estimation and pornographydetection [18, 30]. Nonetheless, CSA datasets are still required to validate the solution in real scenarios. A similarversion of the following disclaimer can be found in any paper handling CSAM, and it applies to our work as well:

Due to the sensitive nature of our research, we are working in partnership with the Brazilian federal lawenforcement agency, whose experts are the only ones with direct access to any sensitive data mentionedthroughout this paper.

That disclaimer leads to the main motivation for our research. Since CSA images are illegal for civilians to possessand share, every single dataset in the literature is — and should forever be — private and in sole possession of lawenforcement agencies. Whenever such data is used for training machine learning solutions, models are usually privateas well, which is essential to protect children and adolescents involved in the research and avoid future exploitation.

The need for privacy, although essential, leads to disadvantages for researchers, law enforcement, and ultimatelysociety as a whole. CSAM detection is a scarcely researched field partly due to data inaccessibility; hence, studying it ischallenging throughout the entire research process: from proposing a methodology without ever seeing the data toevaluating and drawing conclusions from results. Therefore, scientific contributions in this field are somewhat limitedsince algorithms and models can not be easily scrutinized or subject to further inspection. Additionally, it is difficultto compare results with other works from literature since researchers usually refer to local partnerships with lawenforcement agencies, and there is no established benchmark worldwide.

Furthermore, classification labels for CSAM are inherently ambiguous, both when defining what is sexually explicitand whether the depicted person is, in fact, a child. According to Kloess et al. [26], the question is challenging even forlaw enforcement experts. Authors found high disagreement among experts when labeling pictures regarding indecencyand age groups, the latter especially difficult for older adolescents. The work in [21] cites the six-factor “Dost test”, a setof rules to qualify an image or video as child abuse. Not only it was considered a vague and broad definition, but it mayalso cause further harm to the victims as it encourages looking for subjective sexual cues from the children. In practice,law enforcement experts often refer to the context in which the picture was found, such as data from the investigationor other pictures from the same device. Although the domain does not allow unambiguous classification, researchers

2https://ecpat.org/luxembourg-guidelines3https://www.interpol.int/Crimes/Crimes-against-children/Appropriate-terminology

2

Page 3: Analysis Pipeline for Child Sexual Abuse Datasets - arXiv

Seeing without Looking: Analysis Pipeline for Child Sexual Abuse Datasets ACM FAccT 2022, June 21–24, 2022, Seoul, South Korea

can contribute by investigating a wide range of visual cues that may be relevant in the production of triage tools andpriority queues to ease the burden of law enforcement agents.

Another aspect worth highlighting is the lack of information regarding types of bias that might exist in CSAMdatabases. There are known tendencies in reported sexual abuse cases, especially in demographic dimensions such asgender, age, and race, with the most common victims in Brazil being black and brown girls from 8 to 14 years old [11].But, as far as our knowledge goes, there are no reports on bias and tendencies of apprehended CSAM on a content level,with most reports limited to traffic data and volume of apprehensions [6].

Works in the literature attempt to circumvent both the issue of label ambiguity and lack of content-level information.The works in [12, 30] provide more thorough labels for images, such as bounding boxes for nude body parts orobjective classification labels of whether people are actively engaging in sexual activities. However, they rely on thelaborious process of assigning manual labels. Our goal is to investigate how to safely publicize the characteristicsof CSA datasets without adding to the burden of law enforcement experts. In this work, we assess the validity ofextracting automatic signals from a real CSA database to provide a comprehensive documentation. We do not advocatefor publicizing information on individual samples. Instead, we evaluate how aggregated statistics on the entire datasetor even disaggregated by specific subsets can be the source of valuable insights.

Our focus is on defining a set of attributes relevant to the target domain. For that purpose, we inspect the literatureon CSAM detection and reports from law enforcement institutions to select the features of interest for which automaticlabeling is viable. Then, inspired by Dataset Nutrition Labels [22] and tools like Google Know Your Data [37], we definea set of visualizations and metrics from which we can draw insights on each attribute as well as relations between them.Since the target domain does not allow for widely sharing individualized attributes extracted from the database, wefocus on defining a set of visualizations that allows a comprehensive inspection of the source data. To assess the validityof our proposal, we apply it to the Region-based annotated Child Pornography Dataset (RCPD) [30], a benchmarkproduced by the Federal Police of Brazil, due to its extensive set of labels. This work is a step towards a future product,awaiting ethics reviews and law enforcement assent, of a freely available interactive tool for researchers to exploreRCPD’s characteristics beyond the aspects presented in this paper, and hopefully new CSA benchmarks in the future.

Results can be used to support empirical claims from the literature, such as the tendency of CSAM happening inindoor environments [51]. We also found that CSAM apprehended in Brazil has different tendencies than reports ofsexual abuse with physical contact, since RCPD is overwhelmingly composed of light-skinned individuals. Moreover,the research itself surfaced gaps in the literature that might contribute to the field of CSAM detection, for instance,the benefits of an object detection approach focusing on child-related objects and reinforcing the need for better ageestimation models. We hope that results will instigate other researchers to join the field and encourage dataset ownersto provide similar documentation to their benchmarks.

2 RELATEDWORK

This section is divided into two main topics of interest. First, we explore the literature on CSAM detection to drawinsights from data and visual features commonly used as resources for training and validation. Then, we focus on theimportance of inspecting datasets presenting theoretical research and practical templates previously proposed.

2.1 CSAM Detection

In the early years of research and applications on child sexual abuse image detection, researchers, law enforcement, andother institutions relied mostly on hash-based approaches. With comprehensive databases such as the one provided by

3

Page 4: Analysis Pipeline for Child Sexual Abuse Datasets - arXiv

ACM FAccT 2022, June 21–24, 2022, Seoul, South Korea Camila Laranjeira, João Macedo, Sandra Avila, and Jefersson A. dos Santos

the National Center for Missing & Exploited Children (NCMEC)4 and hash-based methodologies such as PhotoDNA [35],one can perform an effective search for previously reported instances of CSAM. However, with the exponential growth ofnovel content generation [6], content-based approaches to classify previously unseen images are increasingly essential.Although there are important contributions in the literature leveraging filenames [38], network attributes [44], andeven folder structure [20], our work focuses on visual cues that indicate the presence of CSA.

For the last decade, the literature on CSAM detection evolved little regarding the type of semantic informationmodeled, with most efforts dedicated to improving the applied techniques. Early approaches such as NuDetective [13]and iCOP [39] rely on image descriptors crafted to capture nudity, while Sae-Bae et al. [42]’s work also added a childclassification stream based on texture and distances between facial landmarks to distinguish adult from child nudity.

Current approaches usually leverage deep learning techniques, which are more robust and achieve better accuracyscores than earlier works, but the goal remains roughly the same: detecting children and nudity/pornography. Someapproaches tackle both tasks. The works of [30, 40], for instance, rely on Yahoo OpenNSFW model [31] to detectpornography cues, and propose their age estimation approaches. While Rondeau [40] leverages label distributions toassess apparent age, Macedo et al. [30] propose a single-model estimation of child presence, age, and gender. A morerecent work [18] explores a wide range of technical improvements over neural networks such as residual connectionsalong with inception and attention modules to propose separate models for age estimation and pornography detection.Gangwar et al. [18] also propose Juvenile-80k, gathering around 24 thousand underage images from a wide range of ageestimation datasets and supplementing it with images crawled from public search engines. It is important to note thatthis type of collection does not abide by ethical standards such as UNICEF’s Responsible Data for Children report [52],but it indicates an important gap in the literature: age estimation models specialized in children and adolescents.

There is a growing body of research focusing solely on age estimation in the context of CSAM, and the collection ofchild images seems to be a common choice for many. Anda et al. [3] propose a model specialized in underage individuals,building a novel dataset for age and gender estimation, VisAGe. They gathered over 19 thousand faces of individualsunder 18 years old, from creative-commons licensed Flickr images. Similar to Gangwar et al. [18], Castrillón-Santanaet al. [7] and Chaves et al. [9] gather images from several age estimation databases, amounting to a large number ofunderage samples, with no extra supplementation of data. Castrillón-Santana et al. [7] propose AgeMega, a dataset withover 15 thousand underage faces and around 30 thousand adult ones. They explore a multitude of local descriptors totrain support vector machines along with convolutional neural networks (CNN) predictions, to compose a score-levelfusion approach as the final classification method. Chaves et al. [9] apply synthetic eye occlusions to a portion ofsamples, based on the assumption that criminals can do the same to omit the identities of victims and fine-tune a neuralnetwork for age estimation in that scenario.

Much of the work in CSAM detection is focused on engineering features or proposing models based on similarprior definitions of relevant attributes. One work that differs in that sense is [49], in which authors directly model thetarget task, proposing an end-to-end binary classification approach fine-tuned on real CSA data. However, a broaderinvestigation focusing on understanding different aspects of the data is essential. The works of [26, 51] are amongthe few providing valuable insights on a wider range of attributes. Yiallourou et al. [51] propose a synthetic datasetassociating levels of appropriateness to images in terms of a variety of features, such as gestures, scene type, illumination,and facial expressions. The work of [26], on the other hand, does not attempt automatic classification, but draws insightsfrom human experts, highlighting visual cues that may cause or solve ambiguities in classification.

4https://www.missingkids.org

4

Page 5: Analysis Pipeline for Child Sexual Abuse Datasets - arXiv

Seeing without Looking: Analysis Pipeline for Child Sexual Abuse Datasets ACM FAccT 2022, June 21–24, 2022, Seoul, South Korea

In the literature, there is little concern with providing statistical measures on CSA data and extracted attributes,regarding their occurrence and correlation. In this work, we not only bring a wide perspective on relevant features forCSAM detection as [26, 51], but we also provide an analysis framework to compose a comprehensive documentation ofdatasets and demonstrate it in a real benchmark available for testing.

2.2 Dataset Inspection and Documentation

An extensive inspection of source data should be the first step of a machine learning solution. Sambasivan et al. [43]highlight the negligence of both researchers and companies in analyzing and documenting datasets. Their main focusare databases used for training, concluding that poor quality data has significant downstream effects on trained modelsand ultimately may compromise the validity of results. However, inspection and documentation is just as crucial fortest benchmarks. Narayanan [36] explores how test benchmarks are a central guide to model selection for practitioners.In this sense, poor data may lead to poor solutions being widely adopted as state-of-the-art.

Machine learning researchers are only recently adhering to the practice of documentation as a response to thegrowing number of works proposing specific guidelines. Some propositions suggest verbal descriptions regarding aspectslike the data’s origin, structure, collection process, recommended applications, among many others. Datasheets forDatasets [19] is among the most complete propositions in that sense, defining an extensive set of questions researchersand practitioners should reply to when proposing or releasing a dataset. In specific domains such as Natural LanguageProcessing, we also find templates such as the data cards currently used for datasets on Hugging Face [33].

Other works tend to approach documentation as a mix of verbal descriptions and attributes statistics, the latterthrough visualizations or summaries. For instance, Birhane and Prabhu [4] inspect large-scale image datasets, presentinga dataset audit card for ImageNet to display how sensitive attributes automatically extracted or manually labeled aredistributed. Nutrition Labels [22], on the other hand, is a more robust proposition inspired by food labels to describe“ingredients” of a dataset. Adding summary statistics and pair plots for all variables in the dataset allows a comprehensiveinspection of the data, not only as a means to understand it but also to draw insights from it.

There is a myriad of visualization tools allowing dataset inspection. We highlight a specific feature from GoogleKnow Your Data [37]. Besides general statistics presented as histograms, the “relations” tab implements a fairness metricproposed in [1] to assess fairness without labels as a measure of normalized correlation. In our work, the presentationof attributes follows propositions from both Know Your Data and Nutrition Labels. We suggest investing more heavilyin relations between attributes. Since we are working with automatically extracted signals representing a wide varietyof semantic information, the goal is to assess how they can be relevant for tasks related to CSAM.

3 METHODOLOGY

Our goal is to safely publicize characteristics of child sexual abuse image datasets aimed at researchers willing tocontribute to the field. There are a couple of priorities to keep in mind. We should avoid revictimizing children andadolescents depicted in the images. Preserving their anonymity is the main priority, as well as preventing any attemptsto content reproduction. Thus, we do not expose dense features extracted from the samples and advise researchersin the field to do the same. Since deep learning approaches are becoming so efficient at generating synthetic content,dense features from CSA images might be misused for such purposes. Our pipeline is solely based on sparse annotationspreviously provided by dataset owners and sparse automatic signals extracted from images, mainly composed ofclassification and detection labels along with metrics to estimate characteristics like image quality or skin tone.

5

Page 6: Analysis Pipeline for Child Sexual Abuse Datasets - arXiv

ACM FAccT 2022, June 21–24, 2022, Seoul, South Korea Camila Laranjeira, João Macedo, Sandra Avila, and Jefersson A. dos Santos

The first challenge of our work is choosing which attributes might be relevant for a CSA database. We resort to theliterature from both computer science researchers attempting to tackle the problem of CSAM detection and reportsfrom law enforcement agents and entities regarding relevant aspects when searching for CSAM in an apprehension.

Since we do not intend to release data on individual samples, an additional challenge arises: how to present theaggregated attributes in a useful manner, allowing researchers to explore the characteristics of the database. Section 3.2describes the proposed approach. This work is a step towards a future product, awaiting ethics reviews and lawenforcement assent, of an interactive tool that will allow researchers to explore RCPD’s characteristics way beyond theaspects presented in this paper. Thus, some visualization resources we mention may rely on user interactivity.

3.1 Attributes

The vast majority of recent approaches to CSAM detection divide the problem into two subtasks: age estimation andpornography classification, since those are the closest domains to the target task with large availability of data. However,as concluded by Kloess et al. [26], forensic experts usually rely on much more than that to classify images, especiallyfor ambiguous samples. For instance, subjects interviewed by the authors reported that the environment providesvaluable cues, both in terms of scene-related features (e.g., outdoor vs. indoor) and object information (e.g., indicationsof child-like environments due to the presence of toys). Yiallourou et al. [51] go a step further, modeling not only theaforementioned features but also aspects like illumination, classifying darker scenes as more suspicious.

An additional aspect considered for choosing which attributes were relevant to our proposal is to unveil potentialsources of bias in the data. According to a 2019 report from the Equipe da Ouvidoria Nacional dos Direitos Humanos, aBrazilian public agency for human rights, sexual abuse and exploitation of children are strongly biased towards gender,age, and race, with the most reported victims being black and brown girls in their late childhood/early adolescence (8to 14 years old) [11]. The importance of those demographic attributes is also highlighted in the work of [30] by theauthors choice of including those labels in their proposed benchmark, RCPD. Later, in Section 3.1.1, we discuss theadvantages and disadvantages of automatically extracting demographic attributes.

Considering both aspects, discriminative features and potential sources of bias, all features leveraged by our proposalare summarized in Table 1, along with the respective attributes derived from them. Table 1 also divides attributes intoper individual versus per sample, meaning a single sample can have multiple instances of the same attribute associatedwith it. Although each attribute will be specified in the remaining of this section, with a discussion on its relevance, wecan anticipate a few types of recurring attributes. For instance, we collect the output probabilities of any given modeland derive the class inferred according to a specified threshold. That allows users to input their desired threshold afterlooking at probability distributions and update the class counts’ aggregated information. Visualizing probabilities withthe ranking of classes also provides insights into potential noises and uncertainties.

We can also highlight the number of instances associated with features that may occur more than once in an imageand represent either an overall count of occurrences or a per class count. Finally, the standard deviation is calculated fordemographic attributes, providing insights on aspects such as age difference or skin tone diversity per sample.

Although this work largely focuses on the advantages of automatic signals extracted from samples, our proposal alsoaims at providing a deeper understanding of labels produced by dataset owners. Thus, the experiments section willprovide a complete description of attributes related to labels. Table 1 contains a placeholder variable entitled “Labels” toindicate that the derived attributes and proposed visualizations will also apply to original information from the dataset.

6

Page 7: Analysis Pipeline for Child Sexual Abuse Datasets - arXiv

Seeing without Looking: Analysis Pipeline for Child Sexual Abuse Datasets ACM FAccT 2022, June 21–24, 2022, Seoul, South Korea

Per individual Per sampleLabels ∗ ∗ ∗

Demographics

Face [54] probability, absolute area, relativearea

class, #instances

ITA [25] average value standard deviationChild [30] probability, class #instances (𝑛𝑐 ), standard deviationAge [30] probabilities (𝑛𝑐 ), class #instances (𝑛𝑐 ), standard deviationGender presentation [30] probability, class #instances (𝑛𝑐 ), standard deviation

Pornography NSFW [31] probability, classPornography [28] probabilities (𝑛𝑐 ), class

Context Objects [5] class (2), absolute area, relative area #instances (𝑛𝑐 + 1)Scenes [55] class (3)

Quality Luminance average valueBRISQUE [24] value

Metadata

File extension valueColormode valueAspect ratio valueImage resolution value

Table 1. Attributes derived from each extracted feature. Numbers and variables in parenthesis are added to instances that can bederived into multiple attributes, with 𝑛𝑐 representing the number of classes of the respective feature. Since labels depend on theevaluated dataset, we added a placeholder variable with asterisks where attributes would be listed.

3.1.1 Demographics. Extracting automatic signals on demographics is a highly sensitive decision, since automaticmodels are extremely limited in their ability to model such complex features as gender, race, and even age. Availablemodels refer to gender as a binary classification task, while we can hardly call race a classifiable concept, especially forcountries like Brazil with a wide range of phenotypes in its population. Regarding age, although it can be classifiedfrom visual features to a certain extent, in the context of CSAM, there is a crucial confusion boundary in the range ofadolescence to early adulthood. Findings in [41] show that even medical experts in pediatric and adolescent developmentshow a high error rate despite the availability of maturity cues such as face, breasts, body contour, and pubic area. Thesubjects classified two out of three images of young-looking adult women as adolescents.

On the other hand, reports of child sexual abuse involving physical contact are highly biased towards specificdemographics [11], but little is known of how such attributes are distributed in materials shared online. DisaggregatingCSA data, as well as CSAM classification, by demographic dimensions is thus essential. Unfortunately, that is not adomain in which we can acquire self-reported demographics, since in most cases, the identities of victimized childrenand perpetrators depicted in the images are unknown to law enforcement. Thus, even if demographic information islabeled, it can only be a subjective view from labelers. Even so, such annotations are costly to produce, and most CSAdatabases do not provide them. For large-scale databases that may arise in the future, which unfortunately is viable dueto the number of images and videos apprehended by law enforcement, annotating demographics may be impracticable.

We argue it is important to leverage automatic features on demographics to view the general tendency of a CSAimage database. In the context of our work, the bottom line is not to classify individual samples but rather to get ageneral view of a group of samples. Fig. 1 summarizes the collection process of demographic features. Apart from skintone, all features we leverage are commonly estimated from facial images, evoking the need for a face detection module,which by itself generates relevant attributes (refer to Table 1). Many references in the literature rely on MTCNN [54] as

7

Page 8: Analysis Pipeline for Child Sexual Abuse Datasets - arXiv

ACM FAccT 2022, June 21–24, 2022, Seoul, South Korea Camila Laranjeira, João Macedo, Sandra Avila, and Jefersson A. dos Santos

MTCNN

Child

Age

Gender

Child

Age

Gender

Child

Age

Gender

Child

Age

Gender

CNN

ChildChildChildITASegment.

ChildChildChildConf.

Fig. 1. Pipeline to extract demographic features. Trapezoids represent neural networks, rounded rectangles are functions, and darkerrectangles are outputs. MTCCN refers to multi-task cascaded CNN [54] and ITA refers to individual typology angle. Input photoretrieved from Open Images database [27].

the chosen tool for face detection [9, 18, 30] since it is one of the most accurate and robust in the literature. Therefore,it was the model of our choice as well. The extracted face then serves as input for subsequent procedures.

Macedo et al. [30] propose a single model to perform three tasks. First, a binary classification of whether the inputface belongs to a child. Secondly, it estimates the age-group of a subject among the categories of Adience [15]: 0-2, 4-6,8-13, 15-20, 25-32, 38-43, 48-53, 60+. Note that there is little concern with discerning underage individuals (assumingBrazil’s legal age of 18), but as we have argued before, CSAM detection should be limited to its role as a triage tool, andambiguities can be solved by human experts. Finally, the model provides an estimate of binary gender presentation.Although there are methods with greater accuracy for age estimation, the joint prediction of age and child performedin [30], achieving an accuracy of 94% on the second task, produces an age classifier less skewed towards adulthood.

Lastly, there is skin tone estimation. Kinyanjui et al. [25], a work in the field of skin lesion analysis, relied on themetric entitled Individual Typology Angle (ITA), which was found to be strongly correlated with the Melanin Index [50]and can be quantitatively measured from each pixel in an image, according to the following equation:

𝐼𝑇𝐴 =𝑎𝑟𝑐𝑡𝑎𝑛(𝐿 − 50)

𝑏× 180

𝜋,

where 𝐿 and 𝑏 are channels of an image in CIE-Lab space, respectively indicating luminance and amount of yellow. As in[25], the final score is an average of ITA measures within one standard deviation of the distribution. However, differentfrom skin lesion datasets, our data is not comprised solely of naked skin. The work of [34] proposes a selection of skinregions based on facial landmarks. However, we found that approach works best with front-facing samples. So, wedecided to apply a simple skin segmentation algorithm, still limiting it to facial images to minimize background clutteror even the presence of clothes, which would require a more robust approach. The segmentation algorithm was adaptedfrom a public project on Github [14], based on the watershed region-growing approach [48]. Input markers are definedby adding the output from explicit boundaries of skin regions in two color spaces, HSV (H < 25, S > 40) [47] and YCbCr(77 < Cb < 127, 133 < Cr < 173) [8], followed by morphological operations of erosion and dilation to discard noise.

3.1.2 Pornography. A comparative evaluation on pornography detection methods [17] found that Yahoo’s OpenNSFWmodel [31] achieved the best accuracy, over 87%, in a CSA database provided by the Spanish Police, despite it being amodel trained solely on adult pornography. Gangwar et al. [18] found that combining OpenNSFWwith an age estimationmodel achieves over 83% accuracy for CSAM detection in more challenging settings, including adult pornography inthe test database. Although their proposed model outperforms OpenNSFW, neither the model nor the dataset used for

8

Page 9: Analysis Pipeline for Child Sexual Abuse Datasets - arXiv

Seeing without Looking: Analysis Pipeline for Child Sexual Abuse Datasets ACM FAccT 2022, June 21–24, 2022, Seoul, South Korea

training are yet released to the research community. Macedo et al. [30] also relied on OpenNSFW to incorporate theirdetection pipeline, achieving superior performance over existing CSAM detection approaches.

In this work, we use OpenNSFW as one of the two approaches for pornography detection. But since it only providesbinary labels, we also experiment with an open-source project that tackles a 5-category classification task, estimatingprobabilities for drawing, hentai, neutral, porn, sexy [28]. Although there is no cartoon-related data on the datasetsused in this work, we were interested in a more fine-grained distinction provided by the categories porn and sexy.

3.1.3 Context. We could not find any approach for CSAM detection tested against real-world data, which uses thecontext information in the form of scene and object features to perform the target task. The closest reference is [51], inwhich authors hand-labeled a synthetic dataset with binary labels for indoor/outdoor environment and presence/absenceof what they considered suspicious objects. As previously mentioned, insights from forensic experts revealed thatcontext can be valuable for disambiguation of samples [26]. Thus, we experiment with well-established approaches forboth scene classification and object detection tasks.

Regarding objects, YOLO, currently in its 4th version [5], is by far one of the best in the literature both in terms ofaccuracy and time performance. All attributes regarding objects are derived from YOLOv4 pre-trained on COCO [29],meaning it is able to estimate probabilities for 80 object classes, which are hierarchically organized into 12 macro classes:person, furniture, indoor, kitchen, electronic, animal, vehicle, food, appliance, sports, accessory, and outdoor. Thus, foreach sample, we derive two class attributes from YOLO: base-level class and macro-level class. Regarding the number of

instances, we derive a count for the overall presence of objects and the count of instances for each base-level objectcategory. We are especially interested in the person category, since detecting people beyond just faces is valuable in thecontext of CSAM, as faces can be absent or occluded.

A recent survey on scene classification [53] found that VGG-Places [55], a VGG architecture trained on Places365, isstill competitive with single model approaches relying on global features from the input. Patch-based approaches oreven ensemble methods, which tend to be more computationally demanding, can achieve higher accuracy on knownbenchmarks. We opted to favor VGG-Places, a lighter model, as a proof of concept. In the Discussion and FutureDirections section, we bring the topic of scene classification models tailored for the domain of CSAM.

3.1.4 Quality. Following insights provided in [51], we were interested in the relevance of quality assessment metricsto define what they called appropriateness of an image. Thus, we extracted both the average luminance of images inCIE-Lab space and BRISQUE, a no-reference approach to estimate image quality, provided by a Pytorch framework [24].

3.1.5 Metadata. The choice of including basic file information has two main reasons. First, those are fundamentalfor low-level implementation choices, such as reshaping input images. Secondly, they are cheap to generate, leadingto a favorable cost-benefit relationship. We followed the procedure presented in Google’s Know Your Data tool [37],extracting the following information: file extension, image color mode, aspect ratio, and resolution.

3.2 Presentation

Planning how the attributes are presented is essential to maximize the level of inspection allowed, especially sincethe source data can not be seen. As Dataset Nutrition Labels [22] and tools such as Know Your Data [37], we providesummary statistics and relations among attributes, divided into the following types:

• General Distributions: Histograms represent rankings, discrete attributes, and multimodal distributions ofcontinuous attributes (e.g., probabilities). For continuous unimodal distributions, boxplots are used instead.

9

Page 10: Analysis Pipeline for Child Sexual Abuse Datasets - arXiv

ACM FAccT 2022, June 21–24, 2022, Seoul, South Korea Camila Laranjeira, João Macedo, Sandra Avila, and Jefersson A. dos Santos

• Disaggregated Distributions: The same visualizations provided by general distributions can be disaggregatedby up to 3 attributes, leveraging facet plots and color-coding distributions.

• Co-occurrence: Heatmaps with simultaneous occurrences of pairs of attributes as raw or normalized counts.• Correlation: Heatmap visualization of correlation. As described in [1], Point-wise Mutual Information normal-ized by 𝑃 (𝑦) estimates the correlation between pairs of variables w.r.t chance, defined as

𝑛𝑃𝑀𝐼𝑦 =

(𝑙𝑛

𝑃 (𝑥,𝑦)𝑃 (𝑥) · 𝑃 (𝑦)

)/− 𝑙𝑛 𝑃 (𝑦),

with 𝑋 and 𝑌 being the two variables (attributes) compared. An additional information is included in thisvisualization, as implemented by [37], providing the ratio between real 𝐶𝑥,𝑦 and expected 𝐶𝑥,𝑦 co-occurrencematrices. Expectation is defined as the mutual information from independent marginal probabilities. Consider 𝑐the sum over all values of 𝐶𝑥,𝑦 , expected values are as follows:

𝐶𝑥,𝑦 = 𝑐 ∗ (𝑃 (𝑥) · 𝑃 (𝑦)).

The color-coding of heatmap cells represents the ratio between real and expected co-occurrence, and it is onlyvisually depicted if the discrepancy exceeds a 95% confidence interval.

To produce co-occurrence and correlation matrices and to disaggregate distributions for all attributes, numericvariables need to be quantized. For values referring to probabilities, quantization is performed by uniformly dividingdata into intervals of 0.1, producing 10 bins in total. The remaining variables are also divided into 10 uniform bins, lowerbound to𝑚𝑎𝑥 (𝑚𝑖𝑛𝑥 , 𝑥 − 1.96𝜎) and upper bound to𝑚𝑖𝑛(𝑚𝑎𝑥𝑥 , 𝑥 + 1.96𝜎), with 𝑥 and 𝜎 representing the distribution’saverage and standard deviation. The bins at both extremes comprise all values below/above them.

3.3 Research Ethics

The study in its entirety had the participation of an expert from the Federal Police of Brazil, the sole responsiblefor handling any sensitive data mentioned throughout this paper. To ensure data integrity, random spot checkswere conducted by this officer during the extraction of automatic signals, to confirm the validity of attributes andcorrespondence to the referred sample. We assured original media and individual data points would never leave thepolice’s servers, thus sharing with the authors of this paper only visualizations of aggregated statistics.

Data collected for each instance is anonymized, providing no sensitive information on victimized children, perpetra-tors, or law enforcement entities. Additionally, we do not intend to publicize individualized attributes, only visualizationsof aggregated data, making it even more difficult to expose individual samples. We are still waiting for an officialassessment regarding the release of a public visualization tool to allow user interactivity, which will only be madeeffective after approvals from both law enforcement and the ethics board from Universidade Federal de Minas Gerais.

4 CASE STUDY: RCPD

The goal of our experiments is to validate the relevance of automatic signals, specifically the aforementioned attributes, toextract valuable knowledge from real CSAM data. Additionally, we assess whether the previously outlined visualizationresources can provide comprehensive understanding without releasing individual data points. To do so, we derive allattributes from a benchmark entitled region-based annotated child pornography dataset (RCPD) [30], with 2138 samplesamong CSAM, adult pornography, and non-sensitive images, although there are no hard labels for such categories. Itwas proposed as a robust benchmark for testing CSAM classification approaches and associated tasks, with annotations

10

Page 11: Analysis Pipeline for Child Sexual Abuse Datasets - arXiv

Seeing without Looking: Analysis Pipeline for Child Sexual Abuse Datasets ACM FAccT 2022, June 21–24, 2022, Seoul, South Korea

of body parts bounding boxes (person, head, breast/chest, genital, and buttocks), along with subjective labels fordemographic attributes (age, gender, and ethnicity). From those base-level labels, other attributes are derived, such asnudity level corresponding to the exposition of body parts (none, seminude, nude, sex), and binary CSAM labels areprovided as a combination of nudity level and perceived age, providing researchers with the freedom to classify CSAMby different thresholds if relevant. The original work considers children to be under 13 years old, and sensitive imagesto contain at least one person fully nude or engaging in sexual activity. By that criteria, RCPD provides 836 CSAMsamples and 285 depictions of adult pornography. Appendix A provides a complete description of attributes and theirsummary statistics heavily inspired by the work in [22], which proposes a Nutrition Label for datasets.

The choice of RCPDto validate our proposal was driven by the extensive annotations provided by the authors, thusassuring that patterns and relationships found by automatic signals are not artificially produced by prediction errors. Anexhaustive report of all possible settings would be impracticable since most visualizations are relations between pairsof attributes. Therefore, the remaining of this section leverages different combinations of attributes and visualizationsto highlight relevant aspects of the extracted attributes. Appendix B depicts a more thorough set of visualizations forlabels and attributes, highlighting the potential of releasing a tool to the research community for independent studies.

First, let us look at attributes that directly relate to labels in the dataset. Fig. 2 compares demographic attributesautomatically extracted with the respective labels. Notably, the database is highly skewed regarding age (Fig. 2b) andethnicity (Fig. 2c), depicting mostly white children under 15 years old. Although black and brown children are the maintarget of sexual abuse in Brazil, according to forensic experts most CSAM apprehended in Brazil appears foreign innature, which might explain the overwhelming number of samples depicting white children. Even though ITA can notbe used to classify ethnicity effectively [23], its distribution makes it clear the predominance of light-skinned individuals.As we did not wish to classify skin tones as it is usually done in research for skin lesion classification [25], we opted toadd interactivity so that users can click on any bin to see 6× 6 skin patches from the database associated with its values.Fig. 2c demonstrates it by showing a few instances for the average ITA.

Regarding age, we stress the importance of better age estimation methods focusing on underage individuals. Althoughthe method we chose had this concern in mind, and it was able to detect the predominance of children around 8-13 yearsold, the results still overestimate the presence of young adults. We highlight that age estimation is still dependent onface detection, which by itself is not perfect. But RCPD has over 170 samples of children not showing face [30], whichmeans they would go unnoticed by current automatic methods. Finally, gender classification captures the skewnesstowards the female gender, consistent with labels provided by RCPD and overall reports on child abuse.

Pornography classification can be directly related to labels for nudity level, as shown in Fig. 3. We set a fairly lowprobability threshold 𝑡 = 0.3 for Yahoo’s OpenNSFW [31], the same protocol adopted in [30]. Although it is over 90%accurate in detecting nude and sex samples (Fig. 3b), they are both associated with the porn category. Porn-JS [28]achieves roughly the same accuracy on nude and sex samples, with a more fine-grained approach, assigning two levelsof sensitivity (porn and sex), while being more sensitive to seminude instances (Fig. 3c). Further studies are required toevaluate if the same can be achieved by defining threshold intervals to the binary classification of OpenNSFW.

One aspect worth highlighting regarding age and pornography is the prevalence of categories relative to CSAM labels.Fig. 4 shows correlation metrics, indicating an over-representation of pornographic samples in the CSAM category, anda prevalence of children under 14. Logically, those are the two attributes contemplated by the category’s definition.However, to make RCPD more challenging, it is worth balancing samples for those attributes as best as possible,especially since the distinction among adult pornography, CSAM, and safe children photos is so critical to the field.

11

Page 12: Analysis Pipeline for Child Sexual Abuse Datasets - arXiv

ACM FAccT 2022, June 21–24, 2022, Seoul, South Korea Camila Laranjeira, João Macedo, Sandra Avila, and Jefersson A. dos Santos

False True0

200

400

600

800

1000

1200

1400

2 4 60

200

400

600

800

1000

----------------------- Face Detection -----------------------

Has Faces Number of faces per image

681

1225

202 25 3 1 10 2 4 6

0

200

400

600

800

1000

1200

Label Number of Heads

Total (n>0) = 1728

744

1344

1062

212 5710 1 1

1 3 51 3 5 7

False True0

200

00 a 0204 a 06

08 a 125 a 3

15 a 202

338 a 43

48 a 5360+

0

100

60+ 0

0.2

0.4

0.6

count

00-02 04-06 08-13 15-20 25-32 38-43 48-53 60+

48-53

labels

134

255 234 21528 45

04 to 0600 to 02

08 to 1315 to 20

25 to 3238 to 43

48 to 5360+

0

20092

64

150 5516 29 0.78

0.04

Total = 1708

845

m f0

200

400

600

800

1000

12001462

420

79 27 23

white

latin

black

asian

indian

0

500

1000

1500

Label GenderLabel person_ethinicity

1000

20

40

60

80

100

ITA Distribution

-50 0 50

Samples within -10 < ITA < 10

MaleUnknown

Female

0

200

400

600

800

Gender Estimation

888

670

82

f m

m

f

0.2

0.4

0.6

0.8

count

labels

gend

er e

stim

atio

n

Co-occurence (x-norm)

1167

whiteindianasianblacklatin

person_ethnicity

0.84

0.70

labels - num_head

0 1 2 3 4 5 7

1

2

3

4

5

6

0

0.2

0.4

0.6

0.8

1count

0.39

Co-occurrence (x-norm)

0.65

0.94

0.3

0.0

1.0

estim

ated

- faces_nu

m

(a) Face detectionFalse True0

200

400

600

800

1000

1200

1400

2 4 60

200

400

600

800

1000

0.985

0.99

0.995

1

1.005

-------------------------------- Face Detection --------------------------------

Has Faces Number of faces per image Face probability

681

1225

202 25 3 1 10 2 4 6

0

200

400

600

800

1000

1200

Label num_head_box

Total (n>0) = 1728

744

1344

1062

212 5710 1 1

1 3 51 3 5 7

0

1

2

3

4

5

7

falsetrue 0

0.2

0.4

0.6

0.8

1count

num_head

faces_hasface

Co-occurrence (x-norm)

0.940.06

0.94

0.97

1.0

1.0

1.0

1.0

0.03

0.06

0.0

0.0

0.0

0.0

False True0

200

400

600

800

00 a 0204 a 06

08 a 125 a 3

15 a 202

338 a 43

48 a 5360+

0

100

200

300

400

500

600

------------------ Age/Child Estimation -------------------

Has Child Age Groups 00-02

04-06

08-13

15-20

25-32

38-43

60+ 0

0.2

0.4

0.6

count

00-02 04-06 08-13 15-20 25-32 38-43 48-53 60+

age

estim

atio

n

48-53

labels

134

255

1009

234 21528 45

04 to 0600 to 02

08 to 1315 to 20

25 to 3238 to 43

48 to 5360+

0

200

400

600

800

1000

(n<14) = 1398 (n>=14) = 614

92

Co-occurrence (x-norm)

714

926

304

572

64

450

150 5516 29

0.37

0.36

0.76

0.78

0.35

0.33

0.1

0.04

Total = 1708

845

m f0

200

400

600

800

1000

12001462

420

79 27 23

white

latin

black

asian

indian

0

500

1000

1500

Label GenderLabel person_ethinicity

1000

20

40

60

80

100

ITA Distribution

-50 0 50

Samples within -10 < ITA < 10

MaleUnknown

Female

0

200

400

600

800

Gender Estimation

888

670

82

Label Age (grouped)

f m

m

f

0.2

0.4

0.6

0.8

count

labels

gend

er e

stim

atio

n

Co-occurence (x-norm)

1167

whiteindianasianblacklatin

person_ethnicity

0.84

0.70

(b) Age estimation

False True0

200

400

600

800

1000

1200

1400

2 4 60

200

400

600

800

1000

0.985

0.99

0.995

1

1.005

-------------------------------- Face Detection --------------------------------

Has Faces Number of faces per image Face probability

681

1225

202 25 3 1 10 2 4 6

0

200

400

600

800

1000

1200

Label num_head_box

Total (n>0) = 1728

744

1344

1062

212 5710 1 1

1 3 51 3 5 7

0

1

2

3

4

5

7

falsetrue 0

0.2

0.4

0.6

0.8

1count

num_head

faces_hasface

Co-occurrence (x-norm)

0.940.06

0.94

0.97

1.0

1.0

1.0

1.0

0.03

0.06

0.0

0.0

0.0

0.0

False True0

200

400

600

800

00 a 0204 a 06

08 a 125 a 3

15 a 202

338 a 43

48 a 5360+

0

100

200

300

400

500

600

------------------ Age/Child Estimation -------------------

Has Child Age Groups 00-02

04-06

08-13

15-20

25-32

38-43

60+ 0

0.2

0.4

0.6

count

00-02 04-06 08-13 15-20 25-32 38-43 48-53 60+

age

estim

atio

n

48-53

labels

134

255

1009

234 21528 45

04 to 0600 to 02

08 to 1315 to 20

25 to 3238 to 43

48 to 5360+

0

200

400

600

800

1000

(n<14) = 1398 (n>=14) = 614

92

Co-occurrence (x-norm)

714

926

304

572

64

450

150 5516 29

0.37

0.36

0.76

0.78

0.35

0.33

0.1

0.04

Total = 1708

845

m f0

200

400

600

800

1000

1200

1462

420

7927 23

white

latin

black

asian

indian0

500

1000

1500

Label Gender

Label Ethinicity

1000

20

40

60

80

100

ITA Distribution

-50 0 50 MaleUnknown

Female

0

200

400

600

800

Gender Estimation

888

670

82

Label person_age (grouped)

f m

m

f

0.2

0.4

0.6

0.8

count

labels

gend

er e

stim

atio

n

Co-occurence (x-norm)

1167

whiteindianasianblacklatin

person_ethnicity

0.84

0.70

Samples within 30 < ITA < 35

(c) Perceived ethnicity and ITA

False True0

200

400

600

800

1000

1200

1400

2 4 60

200

400

600

800

1000

0.985

0.99

0.995

1

1.005

-------------------------------- Face Detection --------------------------------

Has Faces Number of faces per image Face probability

681

1225

202 25 3 1 10 2 4 6

0

200

400

600

800

1000

1200

Label num_head_box

Total (n>0) = 1728

744

1344

1062

212 5710 1 1

1 3 51 3 5 7

0

1

2

3

4

5

7

falsetrue 0

0.2

0.4

0.6

0.8

1count

num_head

faces_hasface

Co-occurrence (x-norm)

0.940.06

0.94

0.97

1.0

1.0

1.0

1.0

0.03

0.06

0.0

0.0

0.0

0.0

False True0

200

400

600

800

00 a 0204 a 06

08 a 125 a 3

15 a 202

338 a 43

48 a 5360+

0

100

200

300

400

500

600

------------------ Age/Child Estimation -------------------

Has Child Age Groups 00-02

04-06

08-13

15-20

25-32

38-43

60+ 0

0.2

0.4

0.6

count

00-02 04-06 08-13 15-20 25-32 38-43 48-53 60+

age

estim

atio

n

48-53

labels

134

255

1009

234 21528 45

04 to 0600 to 02

08 to 1315 to 20

25 to 3238 to 43

48 to 5360+

0

200

400

600

800

1000

(n<14) = 1398 (n>=14) = 614

92

Co-occurrence (x-norm)

714

926

304

572

64

450

150 5516 29

0.37

0.36

0.76

0.78

0.35

0.33

0.1

0.04

Total = 1708

845

m f0

200

400

600

800

1000

1200

1462

420

79 27 23

white

latin

black

asian

indian

0

500

1000

1500

Label GenderLabel person_ethinicity

1000

20

40

60

80

100

ITA Distribution

-50 0 50

Samples within -10 < ITA < 10

MaleUnknown

Female

0

200

400

600

800

Gender Estimation

888

670

82

Label person_age (grouped)

f m

m

f

0.2

0.4

0.6

0.8

count

labels

gend

er e

stim

atio

n

Co-occurence (x-norm)

1167

whiteindianasianblacklatin

person_ethnicity

0.84

0.70

(d) Perceived gender

Fig. 2. Labels and features extracted for demographic attributes. Label distribution in orange and on the left of each subfigure, featuresextracted in purple and in the middle, and co-occurrence of signals to the right, except for ethnicity and ITA, in which marginaldistributions are disaggregated by labels for ethnicity.

734

283

917

204

noneseminude

nudesex

0

200

400

600

800

Label Nudity Level

Firefox file:///C:/Users/milal/Dropbox/Doutorado/Pesquisa/CSAT/src/RCPD2/nudity.html

1 of 1 31/12/2021 16:33

Porn Nonporn0

500

1000

OpenNSFW

false true

none

seminude

nude

sex

0.2

0.4

0.6

Co-occurrence (y-norm)

OpenNSFW

labe

l nud

ity_l

evel

0.76 0.02

0.1 0.13

0.13 0.72

0.01 0.13

drawings

hentai

neutral

porn

sexy

0

200

400

600

800

Porn-JS

drawings hentai neutral porn sexy

none

seminude

nude

sex

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Co-occurrence (y-norm)

Porn-JS

labe

l nud

ity_l

evel

0.8 0.3 0.81 0.03 0.13

0.07 0.1 0.07 0.1 0.25

0.12 0.5 0.11 0.64 0.58

0.0 0.1 0.01 0.22 0.05

(a) Label: Nudity

734

283

917

204

noneseminude

nudesex

0

200

400

600

800

Label Nudity Level

Firefox file:///C:/Users/milal/Dropbox/Doutorado/Pesquisa/CSAT/src/RCPD2/nudity.html

1 of 1 31/12/2021 16:33

Porn Nonporn0

500

1000

OpenNSFW

false true

none

seminude

nude

sex

0.2

0.4

0.6

Co-occurrence (x-norm)

OpenNSFW

labe

l nud

ity_l

evel

drawings

hentai

neutral

porn

sexy

0

200

400

600

800

Porn-JS

drawings hentai neutral porn sexy

none

seminude

nude

sex

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Co-occurrence (y-norm)

Porn-JS

labe

l nud

ity_l

evel

0.8 0.3 0.81 0.03 0.13

0.07 0.1 0.07 0.1 0.25

0.12 0.5 0.11 0.64 0.58

0.0 0.1 0.01 0.22 0.05

0.95 0.05

0.33 0.67

0.1 0.9

0.02 0.98

1319

819

(b) Yahoo OpenNSFW predictions

734

283

917

204

noneseminude

nudesex

0

200

400

600

800

Label Nudity Level

Firefox file:///C:/Users/milal/Dropbox/Doutorado/Pesquisa/CSAT/src/RCPD2/nudity.html

1 of 1 31/12/2021 16:33

Porn Nonporn0

500

1000

OpenNSFW

false true

none

seminude

nude

sex

0.2

0.4

0.6

Co-occurrence (y-norm)

OpenNSFW

labe

l nud

ity_l

evel

0.76 0.02

0.1 0.13

0.13 0.72

0.01 0.13

drawings

hentai

neutral

porn

sexy

0

200

400

600

800

Porn-JS

drawings hentai neutral porn sexy

none

seminude

nude

sex

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Co-occurrence (x-norm)

Porn-JS

labe

l nud

ity_l

evel

0.8 0.3 0.81 0.03 0.13

0.07 0.1 0.07 0.1 0.25

0.12 0.5 0.11 0.64 0.58

0.0 0.1 0.01 0.22 0.05

0.04 0.0 0.81 0.03 0.1

0.01 0.0 0.18 0.27 0.54

0.01 0.01 0.09 0.52 0.38

0.0 0.0 0.04 0.81 0.14

3910

709

820

560

(c) Porn-JS predictions

Fig. 3. Labels and features extracted for pornography attributes. We provide the predictions for both classification methods: YahooOpenNSFW and Porn-JS, as well as co-occurrences with original labels.

12

Page 13: Analysis Pipeline for Child Sexual Abuse Datasets - arXiv

Seeing without Looking: Analysis Pipeline for Child Sexual Abuse Datasets ACM FAccT 2022, June 21–24, 2022, Seoul, South Korea

00-02 04-06 08-13 15-20 25-32 38-43 48-53 60+

false

true

0

0.5

1

1.5

2

faces_age

CSEM

Firefox file:///C:/Users/milal/Dropbox/Doutorado/Pesquisa/CSAT/src/RCPD2/N4/visualizations/parity_faces...

1 of 1 03/01/2022 14:55

drawings hentai neutral porn sexy

Porn-JSOpenNSFW

false

true

CSEM

false

true

CSEM

false true

-2

-1.5

-1

-0.5

(a) Age vs. CSAM

Firefox file:///C:/Users/milal/Dropbox/Doutorado/Pesquisa/CSAT/src/RCPD2/N4/visualizations/parity_faces...

1 of 1 03/01/2022 14:55

drawings hentai neutral porn sexy

Porn-JSOpenNSFW

false

true

CSEM

false

true

CSEM

false true

0

0.5

1

1.5

2

-2

-1.5

-1

-0.5

(b) Pornography vs. CSAM

Fig. 4. Correlation between pornography classification methods, age estimation and CSAM categories.

personbedchairsofabottlepottedplantbookcell phonevasediningtabletvm

onitorteddy bearbow

lcupcarcattoiletbenchcakerem

ote

0

500

1000

1500

person

furniture

indoor

kitchen

electronic

animal

vehicle

food

appliance

sports

accessory

outdoor

0

500

1000

1500

------------------------------------------------------------------ Object Detection ------------------------------------------------------------------

Num

ber

of im

ages

Ranking (Top 20) Ranking Macro Categories

521

1324

236 33 12 5 5 2

0 2 4 60

500

1000

count

"person" category

508

1294

300

31 2 2 1

0 2 4 60

500

1000

count

Label: number of people

Fig. 5. Rankings of object detection base-level and macro-level categories, along with distributions for the “person” category andcorrespondent labels from region-based annotated child pornography dataset (RCPD).

As for context-related features, Fig. 5 shows that RCPD is a highly person-centric dataset, with furniture as thesecond most common macro-category of objects, indicating a prevalence of indoor environments as scene classificationwill later confirm. The “person” category can be directly related to labels from RCPD regarding number of people persample. The two graphs on the right of Fig. 5 show that object detection is able to capture an accurate distribution ofpeople per sample, showing that the majority of images on RCPD is comprised of a single person.

Scene classification is hierarchically organized in three levels, as shown in Fig. 6. For RCPD, most samples are indoor,as object categories previously indicated. A qualitative assessment showed that some base-level classification instancesthat appear to be predominant, such as “clean_room” and “ice_floe”, are actually misclassifications. On the other hand,we see the predominance of child-related categories, such as “nursery”, “childs_room”, and “playground”. It is worthspecializing scene classification approaches for the domain of CSAM, with fewer and domain-driven categories, sincelaw enforcement experts usually search for context cues such as a child-like environment or public places [26].

Looking at how context features correlate with CSAM labels in Fig. 7, most residential categories are positivelycorrelated with CSAM. Since data collection was not concerned with balancing context, it indicates that looking forresidential cues in images might benefit law enforcers as an additional triage dimension. From a macro perspective,results add more evidence to support the statement from [51] that CSAM occurs more often in indoor environments.Regarding object information, COCO classes do not explore child-like categories more extensively, but we could see thatclass “teddy bear” is somewhat predominant and strongly correlated with CSAM. This tendency could be weakened ifRCPD is ever balanced for the presence of children, but it brings valuable insight into the importance of contextual cues.

13

Page 14: Analysis Pipeline for Child Sexual Abuse Datasets - arXiv

ACM FAccT 2022, June 21–24, 2022, Seoul, South Korea Camila Laranjeira, João Macedo, Sandra Avila, and Jefersson A. dos Santos

INDOOR OUTDOOR0

100

200

300

400

500

600

700

800

nurseryclean_room

ice_floechilds_room

showerdressing_room

beauty_salon

museum/indoor

bedroom

hospital_room

icebergcrevasse

ice_shelf

playground

clothing_store

bathroom

ice_cream_parlor

airplane_cabin

igloocar_interior

0

20

40

60

80

100

120

140

160COMMERCIAL

RESIDENTIAL

NATURE

URBAN

Scene Classification Ranking (Top 20)

1 of 1 03/01/2022 19:13

Ranking Macro Categories

Num

ber

of I

mag

es

Fig. 6. Scene classification Ranking for base-level and macro categories.

personbed chair

bottlebook

sofateddy bear

pottedplant

car bowldiningtable

tvmonitor

cup cakecell phone

vasecat bird toilet

wine glass

false

true

−0.5

−1

−1.5

−2

0

0.5

1

1.5

2

object_label

csam

1 of 1 03/01/2022 19:15

nurseryclean_room

ice_floechilds_room

showerbedroom

museum/indoor

beauty_salon

dressing_room

hospital

hospital_room

icebergcrevasse

ice_shelf

playground

clothing_store

bathroom

ice_cream_parlor

airplane_cabin

igloo

false

true

scene_label

csam

INDOOR OUTDOOR

false

true

csam

scene_macro1

Fig. 7. Correlation between scenes, objects, and CSAM categories.

5 DISCUSSION AND FUTURE DIRECTIONS

Our work has two main focuses. First and foremost, we explore the literature searching for a set of characteristics thatcan be automatically extracted and relevant to CSAM-related tasks. Afterward, we investigate how to give safe publicityto such attributes. The automatic extraction of features has the downside of relying on measures that are uncertainand prone to errors; however, it produces cheap annotations without adding to the burden of law enforcement agentsproviding benchmarks for the research community. By evaluating our proposal on RCPD, we count on a vast set oflabels to show that automatic features are reliable for analyzing general tendencies in a group of samples while alsosurfacing novel and valuable insights. At the same time, presenting attributes as aggregated statistics complies with thegoal of safe publicity since it does not expose individual samples.

Expanding the boundaries of CSAM detection. The literature on CSAM detection usually focuses on attributesrelated to age and pornography, but forensic experts provide valuable insights on a wide variety of visual cues thatcan be further explored in future research. For instance, we showed that context information from scene and objectclassification are correlated with CSAM labels, highlighting that residential scenes and child-like visual cues oftenoccur in scenes of child abuse; hence they are valuable dimensions for detection. Notably, no reference in the literaturefocusing on contextual cues in the domain of CSAM. Methods such as object detection specialized in child-like instances,or scene classification with fewer and domain-centric categories could be valuable to the field.

We can think of further information relevant to CSAM detection. For instance, including sexual organ detection, asrecently proposed by Tabone et al. [46], which was already applied in the context of CSAM detection in [2]. As thelabels of RCPD show, such information is relevant for automatic triage of forensic samples. A couple of works also

14

Page 15: Analysis Pipeline for Child Sexual Abuse Datasets - arXiv

Seeing without Looking: Analysis Pipeline for Child Sexual Abuse Datasets ACM FAccT 2022, June 21–24, 2022, Seoul, South Korea

mention facial expressions as a relevant cue for disambiguation in CSAM detection [26, 51], since children presentingapparent discomfort or unhappiness could be under a stressful situation, perhaps being forced to pose for a picture.

On biases in CSAM datasets.We chose some of our attributes driven by a crucial lack in the literature since little isknown about biases in CSAM. First, regarding demographic dimensions, which are essential to produce fair machinelearning solutions, RCPD indicates that CSA data shared online has a different distribution from reported cases ofphysical abuse in Brazil in terms of race, with RCPD overwhelmingly composed of light-skinned individuals. On theother hand, both domains indicate that most victimized children are girls within a range of 8 to 13 years old. Thus,aggregated accuracy measures for CSAM detection may hide performance discrepancies regarding sensitive attributes.We argue that in terms of cost-benefit, automatic labels have the advantage of easily allowing disaggregated inspections,as they can surface large performance discrepancies among subgroups.

Concerning biases, we can assess how challenging and adequate a dataset can be for training and evaluating models.One of the main challenges for CSAM is distinguishing it from legal images of children and adult pornography; thus, itis essential to balance available benchmarks in both dimensions. For RCPD, there is statistically significant correlationbetween age-groups and CSAM categories, as well as pornography levels and CSAM, suggesting room for improvement.

Safe publicity to CSAM documentations. It is easy to understand why researchers provide little to no descriptionson the content of child sexual abuse material, being it a highly sensitive domain. However, a proper evaluation ofmachine learning methods in terms of accuracy and fairness requires knowing the data to a certain extent. CSAMdetection is very challenging in terms of reproducibility and comparison of results; thus, researchers should invest inproviding the characteristics of the data used for training or validation if it is unknown to the community. We explorea range of documentation practices in the literature, showing that extracting sparse attributes and presenting themas aggregated statistics is both valuable and safe. Since we do not intend to release individual data points, all reportsremain anonymous. Additionally, deriving each feature into multiple attributes and investing more heavily in relationsbetween attributes allows to potentially produce thousands of visualizations surfacing different aspects of the data.

Future directions. This work is a step towards a future product to allow independent inspections from researcherswilling to join the field. We wish to produce a freely available interactive tool with attributes from RCPD and allvisualization capabilities explored throughout this paper. Such a tool can be expanded to accept predictions fromresearchers who submit their evaluation methods on the respective benchmark. It would allow authors to scrutinizetheir proposition beyond aggregated measures of accuracy provided in leaderboards, to assess opportunities forimprovement of their approaches. Since this is a high-stakes domain, in which the downstream application is findingevidence of child sexual offenses, law enforcement oversight on the development and critical use of such tools areessential. This work is the first step in a long endeavor towards a more transparent and still safe field of CSAM detection.

ACKNOWLEDGEMENTS

This work is partially supported by Serrapilheira Institute under grant Serra–R-2011-37776. The authors also ac-knowledge the support from FAPEMIG under Grant APQ-00449-17, along with CNPq under grants 311395/2018-0 and424700/2018-2, and CAPES under Finance Code 001. Sandra Avila is partially funded by CNPq PQ-2 (315231/2020-3),FAPESP (2013/08293-7, 2020/09838-0), H.IAAC (Artificial Intelligence and Cognitive Architectures Hub), and GoogleLARA 2021. None of the funding sources had any role in the design and conduct of this study.

15

Page 16: Analysis Pipeline for Child Sexual Abuse Datasets - arXiv

ACM FAccT 2022, June 21–24, 2022, Seoul, South Korea Camila Laranjeira, João Macedo, Sandra Avila, and Jefersson A. dos Santos

REFERENCES[1] Osman Aka, Ken Burke, Alex Bäuerle, Christina Greer, and Margaret Mitchell. 2021. Measuring Model Biases in the Absence of Ground Truth.

arXiv preprint arXiv:2103.03417 (2021).[2] Mhd Wesam Al-Nabki, Eduardo Fidalgo, Roberto A Vasco-Carofilis, Francisco Janez-Martino, and Javier Velasco-Mata. 2020. Evaluating performance

of an adult pornography classifier for child sexual abuse detection. arXiv preprint arXiv:2005.08766 (2020).[3] Felix Anda, Nhien-An Le-Khac, and Mark Scanlon. 2020. DeepUAge: improving underage age estimation accuracy to aid CSEM investigation.

Forensic Science International: Digital Investigation 32 (2020), 300921.[4] Abeba Birhane and Vinay Uday Prabhu. 2021. Large image datasets: A pyrrhic win for computer vision?. In IEEE Winter Conference on Applications

of Computer Vision. 1536–1546.[5] Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. 2020. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint

arXiv:2004.10934 (2020).[6] Elie Bursztein, Einat Clarke, Michelle DeLaune, David M Elifff, Nick Hsu, Lindsey Olson, John Shehan, Madhukar Thakur, Kurt Thomas, and Travis

Bright. 2019. Rethinking the detection of child sexual abuse imagery on the Internet. In The world wide web conference. 2601–2607.[7] Modesto Castrillón-Santana, Javier Lorenzo-Navarro, Carlos M Travieso-González, David Freire-Obregón, and Jesus B Alonso-Hernandez. 2018.

Evaluation of local descriptors and CNNs for non-adult detection in visual content. Pattern Recognition Letters 113 (2018), 10–18.[8] Douglas Chai and King N Ngan. 1999. Face segmentation using skin-color map in videophone applications. IEEE Transactions on Circuits and

Systems for Video Technology 9, 4 (1999), 551–564.[9] Deisy Chaves, Eduardo Fidalgo, Enrique Alegre, Francisco Jánez-Martino, and Rubel Biswas. 2020. Improving Age Estimation in Minors and Young

Adults with Occluded Faces to Fight Against Child Sexual Exploitation. In International Conference on Computer Vision Theory and Applications.721–729.

[10] Maria Leonina Couto Cunha. 2021. Abuso sexual contra crianças e adolescentes: Abordagem de casos concretos em uma perspectiva multidisciplinar einterinstitucional. Ministério da Mulher, da Família e dos Direitos Humanos. https://www.gov.br/mdh/pt-br/assuntos/noticias/2021/maio/CartilhaMaioLaranja2021.pdf

[11] Equipe da Ouvidoria Nacional dos Direitos Humanos. 2019. Disque Direitos Humanos: Relatório 2019. Ministério da Mulher, da Família e dos DireitosHumanos. https://crianca.mppr.mp.br/arquivos/File/publi/mmfdh/disque_100_relatorio_mmfdh2019.pdf

[12] Janis Dalins, Yuriy Tyshetskiy, Campbell Wilson, Mark J Carman, and Douglas Boudry. 2018. Laying foundations for effective machine learning inlaw enforcement. Majura–A labelling schema for child exploitation materials. Digital Investigation 26 (2018), 40–54.

[13] Mateus de Castro Polastro and Pedro Monteiro da Silva Eleuterio. 2010. Nudetective: A forensic tool to help combat child pornography throughautomatic nudity detection. In Workshops on Database and Expert Systems Applications. 349–353.

[14] Jean Vitor de Paulo. 2018. PySkinDetection. https://github.com/Jeanvit/PySkinDetection.[15] Eran Eidinger, Roee Enbar, and Tal Hassner. 2014. Age and gender estimation of unfiltered faces. IEEE Transactions on Information Forensics and

Security 9, 12 (2014), 2170–2179.[16] United Nations Children’s Fund. 2020. Research on the Sexual Exploitation of Boys: Findings, ethical considerations and methodological challenges.

UNICEF. https://data.unicef.org/resources/sexual-exploitation-boys-findings-ethical-considerations-methodological-challenges[17] Abhishek Gangwar, Eduardo Fidalgo, Enrique Alegre, and Víctor González-Castro. 2017. Pornography and child sexual abuse detection in image

and video: A comparative evaluation. In International Conference on Imaging for Crime Detection and Prevention.[18] Abhishek Gangwar, Víctor González-Castro, Enrique Alegre, and Eduardo Fidalgo. 2021. AttM-CNN: Attention and metric learning based CNN for

pornography, age and Child Sexual Abuse (CSA) Detection in images. Neurocomputing 445 (2021), 81–104.[19] Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé Iii, and Kate Crawford. 2021.

Datasheets for datasets. Commun. ACM 64, 12 (2021), 86–92.[20] Enrique Guerra and Bryce G Westlake. 2021. Detecting child sexual abuse images: traits of child sexual exploitation hosting and displaying websites.

Child Abuse & Neglect 122 (2021), 105336.[21] Carissa Byrne Hessick. 2016. Refining Child Pornography Law. University of Michigan Press.[22] Sarah Holland, Ahmed Hosny, and Sarah Newman. 2020. The dataset nutrition label. Data Protection and Privacy: Data Protection and Democracy 1

(2020).[23] Kimmo Karkkainen and Jungseock Joo. 2021. FairFace: Face Attribute Dataset for Balanced Race, Gender, and Age for Bias Measurement and

Mitigation. In IEEE/CVF Winter Conference on Applications of Computer Vision. 1548–1558.[24] Sergey Kastryulin, Dzhamil Zakirov, and Denis Prokopenko. 2019. PyTorch Image Quality: Metrics and Measure for Image Quality Assessment.

https://github.com/photosynthesis-team/piq Open-source software available at https://github.com/photosynthesis-team/piq.[25] Newton M Kinyanjui, Timothy Odonga, Celia Cintas, Noel CF Codella, Rameswar Panda, Prasanna Sattigeri, and Kush R Varshney. 2020. Fairness of

classifiers across skin tones in dermatology. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer,320–329.

[26] Juliane A Kloess, Jessica Woodhams, Helen Whittle, Tim Grant, and Catherine E Hamilton-Giachritsis. 2019. The challenges of identifying andclassifying child sexual abuse material. Sexual Abuse 31, 2 (2019), 173–196.

16

Page 17: Analysis Pipeline for Child Sexual Abuse Datasets - arXiv

Seeing without Looking: Analysis Pipeline for Child Sexual Abuse Datasets ACM FAccT 2022, June 21–24, 2022, Seoul, South Korea

[27] Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Uijlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Stefan Popov, Matteo Malloci, AlexanderKolesnikov, et al. 2020. The open images dataset v4. International Journal of Computer Vision 128, 7 (2020), 1956–1981.

[28] Gant Laborde. [n. d.]. Deep NN for NSFW Detection. https://github.com/GantMan/nsfw_model[29] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco:

Common objects in context. In European conference on computer vision. Springer, 740–755.[30] Joao Macedo, Filipe Costa, and Jefersson A dos Santos. 2018. A benchmark methodology for child pornography detection. In 2018 31st SIBGRAPI

Conference on Graphics, Patterns and Images (SIBGRAPI). IEEE, 455–462.[31] Jay Mahadeokar and Gerry Pesavento. 2016. Open sourcing a deep learning solution for detecting NSFW images. Retrieved August 24 (2016), 2018.[32] Donald Maxim, Stephanie Orlando, Katie Skinner, and Roderic Broadhurst. 2016. Online Child Exploitation Material–Trends and Emerging Issues:

Research Report of the Australian National University Cybercrime Observatory with the input of the Office of the Children’s eSafety Commissioner.Online Child Exploitation Material–Trends and Emerging Issues, Australian National University, Cybercrime Observatory with the input of the AustralianOffice of the Children’s e-Safety Commissioner, Canberra (2016).

[33] Angelina McMillan-Major, Salomey Osei, Juan Diego Rodriguez, Pawan Sasanka Ammanamanchi, Sebastian Gehrmann, and Yacine Jernite. 2021.Reusable Templates and Guides For Documenting Datasets and Models for Natural Language Processing and Generation: A Case Study of theHuggingFace and GEM Data and Model Cards. In Workshop on Natural Language Generation, Evaluation, and Metrics. 121–135.

[34] Michele Merler, Nalini Ratha, Rogerio S Feris, and John R Smith. 2019. Diversity in faces. arXiv preprint arXiv:1901.10436 (2019).[35] Microsoft. [n. d.]. Microsoft PhotoDNA. https://www.microsoft.com/en-us/photodna. Accessed: 2022-01-04.[36] Arvind Narayanan. 2021. The Ethics of Datasets: Moving Forward Requires Stepping Back. In AAAI/ACM Conference on AI, Ethics, and Society. 1–1.[37] People + AI Research (PAIR). [n. d.]. Google Know Your Data. https://knowyourdata.withgoogle.com/. Accessed: 2021-12-27.[38] Alexander Panchenko, Richard Beaufort, and Cedrick Fairon. 2012. Detection of child sexual abuse media on p2p networks: Normalization and

classification of associated filenames. In Proceedings of the LREC Workshop on Language Resources for Public Security Applications. 27–31.[39] Claudia Peersman, Christian Schulze, Awais Rashid, Margaret Brennan, and Carl Fischer. 2014. icop: Automatically identifying new child abuse

media in p2p networks. In IEEE Security and Privacy Workshops. 124–131.[40] Jared Rondeau. 2019. Deep Learning of Human Apparent Age for the Detection of Sexually Exploitative Imagery of Children. University of Rhode

Island.[41] Arlan L Rosenbloom. 2013. Inaccuracy of age assessment from images of postpubescent subjects in cases of alleged child pornography. International

journal of legal medicine 127, 2 (2013), 467–471.[42] Napa Sae-Bae, Xiaoxi Sun, Husrev T Sencar, and Nasir D Memon. 2014. Towards automatic detection of child pornography. In IEEE International

Conference on Image Processing. 5332–5336.[43] Nithya Sambasivan, Shivani Kapania, Hannah Highfill, Diana Akrong, Praveen Paritosh, and Lora M Aroyo. 2021. “Everyone wants to do the model

work, not the data work”: Data Cascades in High-Stakes AI. In Conference on Human Factors in Computing Systems. 1–15.[44] Asaf Shupo, Miguel Vargas Martin, Luis Rueda, Anasuya Bulkan, Yongming Chen, and Patrick CK Hung. 2006. Toward efficient detection of child

pornography in the network infrastructure. IADIS International Journal on Computer Science and Information Systems 1, 2 (2006), 15–31.[45] Joyanna Silberg. [n. d.]. A case series of 70 victims of exploitation from child sexual abuse imagery. In Treating Children with Dissociative Disorders.

Routledge, 49–72.[46] André Tabone, Kenneth Camilleri, Alexandra Bonnici, Stefania Cristina, Reuben Farrugia, and Mark Borg. 2021. Pornographic content classification

using deep-learning. In Proceedings of the 21st ACM Symposium on Document Engineering. 1–10.[47] Sofia Tsekeridou and Ioannis Pitas. 1998. Facial feature extraction in frontal views using biometric analogies. In 9th European Signal Processing

Conference (EUSIPCO 1998). IEEE, 1–4.[48] Luc Vincent and Serge Beucher. 1989. The morphological approach to segmentation: an introduction. Centre de Morphologie Mathématique, Ecole

Nationale Supérieure des Mines de Paris.[49] Paulo Vitorino, Sandra Avila, Mauricio Perez, and Anderson Rocha. 2018. Leveraging deep neural networks to fight child pornography in the age of

social media. Journal of Visual Communication and Image Representation 50 (2018), 303–313.[50] Marcus Wilkes, Caradee Y Wright, Johan L du Plessis, and Anthony Reeder. 2015. Fitzpatrick skin type, individual typology angle, and melanin

index in an African population: steps toward universally applicable skin photosensitivity assessments. JAMA Dermatology 151, 8 (2015), 902–903.[51] Emilios Yiallourou, Rafaella Demetriou, and Andreas Lanitis. 2017. On the detection of images containing child-pornographic material. In 2017 24th

International Conference on Telecommunications (ICT). IEEE, 1–5.[52] Andrew Young, Stuart Campo, and Stefaan Verhulst. 2019. Responsible data for children: synthesis report. (2019).[53] Delu Zeng, Minyu Liao, Mohammad Tavakolian, Yulan Guo, Bolei Zhou, Dewen Hu, Matti Pietikäinen, and Li Liu. 2021. Deep Learning for Scene

Classification: A Survey. arXiv preprint arXiv:2101.10531 (2021).[54] Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multitask cascaded convolutional

networks. IEEE Signal Processing Letters 23, 10 (2016), 1499–1503.[55] Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Places: A 10 million image database for scene recognition.

IEEE transactions on pattern analysis and machine intelligence 40, 6 (2017), 1452–1464.

17

Page 18: Analysis Pipeline for Child Sexual Abuse Datasets - arXiv

ACM FAccT 2022, June 21–24, 2022, Seoul, South Korea Camila Laranjeira, João Macedo, Sandra Avila, and Jefersson A. dos Santos

A RCPD NUTRITION LABELS

This section provides brief descriptions of attributes contained in RCPD along with summary statistics followingNutrition Labels guidelines [22].

Fig. 8. Dataset Facts and Attributes.

18

Page 19: Analysis Pipeline for Child Sexual Abuse Datasets - arXiv

Seeing without Looking: Analysis Pipeline for Child Sexual Abuse Datasets ACM FAccT 2022, June 21–24, 2022, Seoul, South Korea

Fig. 9. Summary statistics. Note that missing values for attribute person_id indicate the absence of people in the respective samples.

B RCPD – VISUAL SUMMARY

This section contains a more extensive but not exhaustive set of visualizations for manual labels and automatic attributesassigned to RCPD. Our goal is to provide a comprehensive representation of CSAM datasets both in terms of generalstatistics and relations between attributes. It requires an interactive interface such that users could submit queries ofthe desired attributes to relate; thus, visualizations below may contain interactive resources.

B.1 Labels

In Fig. 10 we provide distributions for all labels from RCPD, previously listed in Appendix A. We also demonstrate howdisaggregated visualizations can provide important insights into the data. Fig. 11 shows two examples, the first with

19

Page 20: Analysis Pipeline for Child Sexual Abuse Datasets - arXiv

ACM FAccT 2022, June 21–24, 2022, Seoul, South Korea Camila Laranjeira, João Macedo, Sandra Avila, and Jefersson A. dos Santos

age distributions disaggregated by two dimensions, labeled parts, and CSAM categories, while the distribution of agestandard deviation is split only by CSAM. From the latter figure, we notice that from the few images containing morethan one person, negative CSAM samples usually depict people of similar ages.

1301

837

false true0

200

400

600

800

1000

1200

CSAM Label

CSAM

1167

845

f m0

500

1000

1462

420

79 27 23

white

latin

black

asian

indian0

500

1000

1500

Overall Demographics

num_individuals Gender Ethnicity

Age

0

100

200

300

400

num

_im

ages

0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0

1031

369

613

57 623 2 1

0 1 2 3 4 5 6 70

500

1000 1820

295 20 2 1

0 1 2 3 40

500

1000

1500

1934

113 74 12 5

0 1 2 3 40

500

1000

1500

2000

734

283

917

204

noneseminude

nudesex

0

200

400

600

800

Distribution of tags per image

num_nude num_seminude num_sex Max. Nudity Level

num

_im

ages

508

1294

300

31 2 2 10 1 2 3 4 5 60

200

400

600

800

1000

1200(n>0)=2012

num

_ins

tanc

es

num

_images

700

1181232 23 1 1

0 1 2 3 4 50

500

1000

681

1225

25 3 1 1

0 1 2 3 4 5 6 70

500

1000

1336

720

784

0 1 2 30

500

1000

1615

319185

11 7 1

0 1 2 3 4 50

500

1000

1500 2068

67 3

0 1 20

500

1000

1500

2000

num_person num_head num_genital num_breast num_bt

Distribution of labeled parts per image

202num

_images

(n>0)=2048 (n>0)=345 (n>0)=317

(n>0)=1723 (n>0)=1728 (n>0)=888(n>0)=755 (n>0)=73

Fig. 10. Overview of labels from RCPD. Visualizations representing number of instances also provide a total count of instances fornumber of occurrences 𝑛 > 0.

0 20 40 600

100

200

0 20 40 60 0 20 40 60 0 20 40 60 0 20 40 60

0

100

200

Age/part_name/CSAM

person_age

coun

tcoun

t

part_name=person part_name=head part_name=genital part_name=breast part_name=buttocks

csam=False

csam=True

count=123

0 10 200

50

1000

50

100

Age standard deviation/CSAM

person_age_std

coun

tcoun

t

csam=False

csam=True

person_age person_age person_age person_age

Fig. 11. Demonstrating disaggregated plots by showing distributions of age and age standard deviation disaggregated by otherdimensions. Interactivity allows inspecting fine-grained information on demand, as demonstrated by the text balloon.

20

Page 21: Analysis Pipeline for Child Sexual Abuse Datasets - arXiv

Seeing without Looking: Analysis Pipeline for Child Sexual Abuse Datasets ACM FAccT 2022, June 21–24, 2022, Seoul, South Korea

B.2 Extracted attributes

We present distributions for all features listed in Table 1. Since we carefully curate the set of attributes, we can craftspecialized visualizations for each attribute.

2 4 60

200

400

600

800

1000

0.985

0.99

0.995

1

1.005

0

0.05

0.1

0

20k

40k

60k

80k

100k

120k

---------------------------------------------- Face Detection ----------------------------------------------

Number of faces per image Face probability

1 of 1 30/12/2021 15:59

0 0.5 10

200

400

600

800

False True0

200

400

600

800

00 a 0204 a 06

08 a 125 a 3

15 a 202

338 a 43

48 a 5360+

0

100

200

300

400

500

600

---------------------------------------------- Age Estimation ----------------------------------------------

Child Probability Distribution Has Child Age GroupsTrue

0 0.5 10

200

400

600

MaleUnknown

Female

0

200

400

600

800

---------------------- Binary Gender Recognition ----------------------

Female Probability Distribution TotalMale Female

Relative Area Absolute Area

−100 −50 0 50 1000

20

40

60

80

100

ITA Distribution

Samples within -10 < ITA < 10

0

Fig. 12. Overview of demographic attributes extracted from RCPD. Whenever probabilities are visualized, the threshold applied forclassification is represented as a red rectangle bounded by the threshold interval. ITA distribution is accompanied by patch samplesfrom the database representing any chosen ITA interval.

0 0.2 0.4 0.6 0.80

50010001500

drawings

hentai

neutral

porn

sexy

0

100

200

300

400

500

600

700

800

0 0.2 0.4 0.60

5001000

0 0.2 0.4 0.6 0.8 10

200400

0 0.2 0.4 0.6 0.8 10

200400600

0 0.2 0.4 0.6 0.8 10

200400600

Probability Distribution Total

0 0.5 10

100

200

300

400

500

600

Porn Nonporn0

200

400

600

800

1000

1200

Probability Distribution TotalPorn

Porn-JS

drawings

hentaineutral

pornsexy

Yahoo OpenNSFW

Fig. 13. Overview of pornography attributes extracted from RCPD. There is no default threshold to visualize for multiclass labels,where classification is performed through maximum activation.

21

Page 22: Analysis Pipeline for Child Sexual Abuse Datasets - arXiv

ACM FAccT 2022, June 21–24, 2022, Seoul, South Korea Camila Laranjeira, João Macedo, Sandra Avila, and Jefersson A. dos Santos

person

chair

bottle

book

vase

tvmonitor

bowl

car

toilet

cake

bed

sofa

pottedplant

cell phone

diningtable

teddy bear

cup

cat

bench

0

500

1000

1500

2 4 60

500

1000

person

furniture

indoor

kitchen

electronic

animal

vehicle

food

appliance

sports

accessory

outdoor

0

500

1000

1500

0 5 100

200

400

600

800

1000

Object Detection

Num

ber

of im

ages

Ranking Objects Number of "person" per scene Ranking Macro Categories Number of (any) objects per scene

Firefox file:///C:/Users/milal/Dropbox/Doutorado/Pesquisa/CSAT/src/RCPD2/N4/visualizations/yolo.html

1 of 1 11/01/2022 16:46

atrium/public

fountain

biology_laboratory

ocean

botanical_garden

bakery/shop

archive

jacuzzi/indoor

sauna

pharmacy

swim

ming_pool/indoor

stage/indoor

chemistry_lab

florist_shop/indoor

locker_room

balcony/interior

kindergarden_classroom

discotheque

kennel/outdoor

jewelry_shop

hotel_room

ball_pit

shoe_shop

booth/indoor

bow_w

indow/indoor

underwater/ocean_deep

martial_arts_gym

gift_shop

sandbox

beach

aquarium

desert/sand

drugstore

playroom

berth

living_room

car_interior

igloo

airplane_cabin

ice_cream_parlor

bathroom

clothing_store

playground

ice_shelf

crevasse

iceberg

hospital_room

bedroom

museum

/indoor

beauty_salon

dressing_room

shower

childs_room

ice_floe

clean_room

nursery

0

50

100

150

Num

ber

of im

ages

INDOOR OUTDOOR0

200

400

600

800 COMMERCIAL

RESIDENTIAL

NATURE

URBAN

Scene Recognition

Fig. 14. Overview of context attributes extracted from RCPD. For histograms with over 20 bins, as is the case for object categories (80classes) and scenes (365 classes), we add a range slider below the visualization for interactivity.

jpg png0

500

1000

1500

2000

0 5 10 150

500

1000

1500

1 2 30

200

400

600

800

1000

RGB RGBA L0

500

1000

1500

2000

Metadata

Extension Resolution (megapixel) Aspect ratio Colormode

Firefox file:///C:/Users/milal/Dropbox/Doutorado/Pesquisa/CSAT/src/RCPD2/N4/visualizations/metadata.html

1 of 1 11/01/2022 16:59

Fig. 15. Overview of metadata extracted from RCPD.

0 100 2000

200

400

600

0 50 100 1500

200

400

600

Quality Metrics

Luminance BRISQUE

Firefox file:///C:/Users/milal/Dropbox/Doutorado/Pesquisa/CSAT/src/RCPD2/N4/visualizations/quality.html

1 of 1 11/01/2022 17:00

Fig. 16. Overview of quality attributes extracted from RCPD.

22