Better Policy-Making Through AI. Case Study: Stop Hiding Sexual Abusers.

Original article was published on Artificial Intelligence on Medium

Better Policy-Making Through AI. Case Study: Stop Covering-up Sexual Abusers.

Analyzing institutional (cover-up) patterns on sexual abuse incidents in thousands of public policy-making documents.


Thinking about sexual abuse is always really painful, especially when sexual abuse incidents involve the most vulnerable individuals: children. Unfortunately, data speaks and I would say it loudly: “Every 73 seconds, an American is sexually assaulted. And every 9 minutes, that victim is a child. Meanwhile, only 5 out of every 1,000 perpetrators will end up in prison.” [1]. Also, the potential risk due to a real exposure is undoubtedly there: “Every year 35 million adults come into contact with more than 70 million children and teens through youth-serving organizations[2].

Also, as the awareness of sexual abuse and harassment within different types of organizations grows (cases involving coaches, teachers, camp counselors, clergy’s), different stakeholders are demanding new actions, especially when some of these organizations try to hide or cover-up sexual abuse scandals for the sake of their “impeccable” reputation among other factors such as social identity factors or even group loyalties/allegiances(e.g., Church or educational institutions)[3].

Taken from

“What kinds of things should I expect programs to do to prevent child sexual abuse at their institutions? Should I ask programs about this or would I be viewed as paranoid?” ~ Parent of a teenager [4]

“To be honest, I don’t allow my children to participate in typically male supervised sports activities because I don’t want to validate a stranger and open up the possibility of my sons being sexually abused by a person I have endorsed as ‘trustworthy.’” ~ Parent of two young children [4]

Under, this situation, the response is clear: We do need effective actions to stop and prevent sexual abuse and harassment.

Using AI to measure policy effectiveness

Traditional approaches include putting policies in place to try and safeguard against abuse. These policies usually include several documents such as legislative policies, code of conduct, procedural documents for handling and reporting undesirable events, ordinances, public rules…etc. Furthermore, it turns out that the policy applicability context follows a complex granularity hierarchy: from international social policies (gender inequality from UN), federal, state to the more local level (within organizations and communities).

Thus, measuring policy effectiveness is far from being a simple process, which may include gathering knowledge and understanding different outcomes: typically having to inspect massive publicly available documents and incidents reports!

Photo by Sharon McCutcheon on Unsplash

This is exactly where AI looks promising, through automatization of such processes, supporting decision-making processes, or even finding hidden patterns that just human inspection could not achieve.

So, what have we done?

Working with Omdena under the Zeroabuse project the main challenge was not to identify potential predatory individuals but to uncover new patterns of covering up within institutions. The main reason is the huge impact in terms of enabling prevention at a macro-level instead of a micro/individual level.

Extracting knowledge from raw policy documents.

Inference and reasoning on top of highly structured or KB databases is pretty much a straightforward process. However, most of the publicly available data for policy and sexual abuse took the form of human natural language: unstructured text.

As in many real-world problem solving, the lack of one and perfect dataset pushed us to research and evaluate several potential diverse datasets (‘diving’ from Google’s dataset search engine to National Data Archive on Child Abuse and Neglect (NDACAN)). This step was by far one of the most challenging ones, due to the sensibility of the data.

As you may be guessing right now… we did not run into a super high-quality single dataset! We explored several datasets until selecting 4 (accused priest dataset, incidents reports from different institutions, sexual harassment educational reports, and different policies aggregated dataset).

Since, our main goal was explaining incidents (question-answering) tasks and finding hidden patterns, the mapping from human-understandable data to machine-interpretable knowledge representation was the first step.

Testing different approaches

We decided to combine and play with several approaches (i.e., quantitative analysis and graph-based analysis). Regardless of the approach, some NLP preprocessing tasks were needed (see Fig 1).

Fig1. Classical NLP pipeline stages.

Quantitative Analysis of Textual Data

This is an approach from the social sciences. Long story short, text analysis based on unsupervised learning methods with bag-of-words embeddings and some twist. First, the unit of analysis changes from text to words. Second, features are defined according to the object of interest. This approach seeks to identify writing style based on word frequency use.

During preprocessing, the corpus is transformed into a term-frequency matrix. Stopwords removal and stemming weren’t performed because uninformative words don’t affect later steps of the process and certain terms are informative about style. Low-frequency words were removed keeping the limit of 75% of the total word count of the corpus.

Features for analysis vary depending on the data and follows the process for multiple correspondence analysis (MCA). Three text datasets were used: sexual abuse accusations to the catholic priest (incidents dataset), sexual misconduct reports (misbehavior dataset), and policies from Universities in the United States (policies dataset). From incidents, dataset, accusation outcome, diocese, and ordered community were transformed into dummy variables (one-hot encoding). The misbehavior dataset was processed by outcome and institution. Policies dataset used one-hot-encoding for institution name. In order to reduce high dimensionality and sparsity, the number of features was selected to the most frequent.

Once we have a frequency-term matrix (m documents * n terms ) and features encoded matrix (m documents * n features) those matrices are combined to get the occurrence of each word given the occurrence of the feature (m words * n features) across all the documents. This is a complete disjunctive table and the input for MCA.

From the incidents dataset, this is the projection over the first two dimensions as a result of MCA. Some elements were added to facilitate interpretation.

Fig2. Feature and individuals projection on Church dataset.

But what does this mean? Red triangles represent features and blue circles represent words. Two strategies of analysis are commonly used to interpret these plots: first, projections from the origin along with cosine similarity and second, quadrant location. It is possible to see 2 main projections: one pointing to the upper right corner (green arrow) in which the most representative feature is the ordered community related to the priest accused: Congregation of Christian brothers. Reports about this community mention several accusations of sexual abuse and a declaration of bankruptcy in the US. Terms as “establish” and “bankruptcy” have similar projections to this feature. However, near to the projection, but closer to the origin are terms like school”, “student”, “teacher”. In the opposite direction of this vector we found mostly dioceses and outcomes as “cleared”, “police report” and “retired to duty”. The second main projection (yellow arrow) is related to outcomes with advanced jury processes. Outcomes as “convicted” and “sentenced” are representative features of this projection, and words as ”prison”, “arrested” and “pornography” are close to the projection.

If we separate the plane according to the limit of zero values of the axis, we get 4 quadrants with the axis values. The first quadrant (+,+) comprises mostly the religious community from which priests were ordered. Those communities have in common that are related to schools. Possibly could be related to a risk factor on abuse cases which is adults (priests) involved in activities with constant interaction with children (schooling). We called this quadrant “Schools”. Moving counterclockwise, the second quadrant (-,+) is related to the second projection in which court processes get advancements with words as “prison”, “arrested” and “pornography”. This quadrant was named “Legal Processes”. Third quadrant (-,-) trends to have dioceses instead of communities and the outcomes seen are cleared reinstated and retired from duty. One word in this quadrant closer to the legal projection is “laicization” which is kind of a resignation request as a priest. This suggests that internal actions occur related to legal processes. The fourth quadrant (+,-) has near to origin “suit withdrawn” and in the lower edge, near to zero axis value “monsignor”.

However, in this area, the remarkable thing is what is not there: no legal outcomes, just one community (Xaverian brothers), and few dioceses. There is an outcome “accused” and no more else. It seems that there is some silence here, or a discrete approach in front of an accusation, especially with high-rank priests. It looks like religious communities trend to intervene in the legal processes giving a hand to their brothers. When this happens legal outcomes are slightly inclined to suit withdrawn or the situation ends being just an accusation.

From this dataset, data suggest that court processes interact with the institution’s actions, especially religious communities supporting their members. The higher in the hierarchy, the most probable is an intervention. Another common factor is that when an accusation appears, first actions are taken by dioceses suspending or retiring priests from duty while the situation is investigated.

Moving to universities, the next plot shows MCA output from misbehavior dataset which are reports of sexual misconduct in universities.

Fig3. Projection on Universities dataset.

Three projections are identifiable. The first (green) aligns the next features from farthest to closest respect to origin:

  • Institution University of California
  • Discipline Athletics
  • Role Coach
  • Institution University of Florida
  • Outcome: Fine, Salary Reduction
  • Ohio State University
  • Discipline: unknown
  • Role Faculty (False)
  • Outcome Demoted
  • Role Administrator
  • Outcome Resigned
  • Outcome Committed Suicide
  • Discipline Life Sciences

This projection seems to strongly relate to sports, including athletics as discipline and coach role as distant points on universities as Florida, California, or Ohio State with sports tradition. Also, this projection aligns strongly with the administrator role inside the institution and excludes faculty roles. Outcomes for this kind of situation show that institutions trend to give a sanction without compromising the vinculation with fines, salary reduction, or demoting; however, workers involved also are related to outcomes resigning or even committing suicide. Cases on disciplines related to life sciences look aligned with this pattern.

Second projection (yellow) shows strong tendencies in disciplines related to Engineering, with legal consequences as criminal plea or conviction, and institutional actions related to termination on contractual relation as fired and contract nonrenewal. Engineering as discipline outstands from other fields and suggests strong actions against sexual abuse incidents.

Third projection (blue) aligns the next features from farthest to closest respect to origin:

  • Discipline Psychology
  • Discipline Medicine and Health Sciences
  • Outcome Suspended, Leave, Restrictions
  • Outcome Barred From Leadership or Honorary Position.
  • Discipline English Writing Humanities
  • Institution University of Illinois
  • Role Faculty
  • Outcome Resigned (False)

This projection points to psychology as a more informative feature and relates it with disciplines as health and humanities. Those three disciplines have in common that students’ gender is predominantly female, but top positions are mostly occupied by men. Faculty role is also a feature along with stronger outcomes than administrative roles: suspended, leave, restrictions, barred from leadership or honorary positions, but not resigned.

Now, we proceed to explore policies on institutions with only their name as features:


The first thing noticeable is the dense cluster near to the origin, which suggests that policies across universities have similar contents and even share writing style according to the words used on those documents. However, two projections are visible. First (green), Drake University is the only institution that is projected outside the main cluster, with words as the recipient, propose, regulation, prison, and grievance. The second projection aligns outside the cluster institutions as Louisiana Tech, the University of North Dakota and Arkansas State University, and words as an organization, facility, hall, residence, building, and area. Apparently, there are two diverging styles assumed by a few universities regarding their policies. One is related to physical elements and possibly pointing to a code of conduct inside the institution. On the other hand, it seems that a more abstract approach is taken by Drake University perhaps related to the consequences of misbehavior.

One thing in common to incident reports among two groups of institutions (religious and universities) is the tendency to take distance from legal processes and treat differently to their members according to hierarchy and vinculation. This approach proves that it is possible to find patterns hidden in documents that can help to see the big picture.

Knowledge graphs for a better understanding

At this point, we had an unsupervised exploratory text analysis able to show us some insights in our datasets. But we wanted to go further: a standard and well-known approach to model textual documents is bag-of-words, which will ultimately help us in exploring the frequency and lexical distribution from a text. But what about the semantic? We knew that a KG would be able to capture both the structure and semantic of the text. But wait, what is exactly a KG?

A knowledge graph (KG) is just a knowledge representation of raw text using a set/collection of a triplet. A triplet is defined as a relation composed of 2 entities. Formally: (e1,r,e2) or (h, r, t) (i.e the head entity h, a.k.a Subject and the tail entity t, a.k.a Object). For those with semantic and ontology engineering background sounds familiar right?

Taken from towardsdatascience. Vertices are entities and the edge represents the relation.

In our case, after carrying out the several NLP pipeline tasks for knowledge extraction, we ended up with thousands of triplets revealing different entities involved in our documents (policy documents and most importantly incidents):

Fig.6 Policy and incidents subset KG

Looking at the figure above, does not seem really promising in interpretability, but what if we could filter based on the most ‘relevant’ understudy relations?

Fig6. Looking into a ‘relevant’ relationship. (e.g, prohibits relationship, for Subject University and Object protected basis race).

But…what if we could not only explain but also optimize?

Explanation and understanding the nature of sexual abuse or harassment incidents is a clear step forward in incidents management as well as in the design of effective responses. However, having an automation tool that could help sharpen policymakers’ inspection would be great.

For that, we created an evidence-based policy ranking algorithm.

The hypothesis behind relies on the following assumption: there is a correlation between the incidents and ‘weaknesses’ in policies involved in them. In other words, the more incidents under the same regulatory or procedural policy, the less effective should be considered (due to gaps, non-compliance, or other factors/’red flags’ that should be put, understudy).

For the policy-ranking, we carried out a soft-matching among diverse datasets: incidents related dataset and related scraped news around; and regulatory, policy and procedural institutional documents covering sexual abuse, sexual harassment within workplaces (e.g, TITLE IX for educational institutions) or even more general policies covering human gender rights (UN Prevention and Response to Sexual Misconduct). Taking advantage of some state of the art pre-trained models (Glove or fastext, an n-gram extension of Word2Vec) and different documents representations (BOW + TF-IDF), we were ready to compute correlation among whole documents:

Soft-cosine computation matrix computation (each against all’). Where d(1,2) represents the correlation between document 1 and document 2.

Having computed the matrix, we were just interested in the correlation among the policies involved in each incident, so a subset of the matrix representing the mapping among policies and incidents was taken for further analysis:

. Soft cosine matrix correlating incidents and policies. A frequent based approach was done using the score of the correlation.

Finally, based on incidents occurrence frequency score obtained a ranking for most ‘weak’ policies for future inspection was constructed.

Frequency-based policy ranking. The empirical threshold should be defined by domain experts (a.k.a policymakers). Top-N recommended for inspection.

We had a couple of challenges in this project. On the technical part, going from no data at all, being able to gather meaningful and relevant data was the first hit for the team. That was really challenging taken into account the sensitiveness of the kind of data we were working with. Having selected the data sources was only the first step in our journey since the main objective was to provide some kind of mechanized methods able to help in preventing sexual abuse within organizations.

Detecting hidden patterns on institutions (i.e., covering up) is a really complex task, mainly because of the lack of data (the idea is a paradox itself, will institutions publish their bad ‘practices’ which tried to hide?). However, we were able to find and extrapolate different kinds of organizational patterns under similar sexual abuse circumstances. We see truly data-driven policymaking as a must, so we attempted to contribute to supporting compliance and automating some policy-making processes through different AI and ML techniques, such as exploration and identification of ‘bad practices’ as well as incidents investigation.

Is it the final of the road?? Noo…the power of AI resides in enhancing current domain expert’s daily tasks through collaboration between computer scientists and domain experts (i.e, policymakers). So, the next round would be not only improving empirically our models but also involving experts for guiding in the usability of future work growing tasks such as the creation of new KPI, evaluation, and designing accurate scoring metrics/criteria or normalized-ranking methods.



[2]Saul, J., & Audage, N. (2007). Preventing Child Sexual Abuse within Youth-serving Organization: Getting Started on Policies and Procedures.

[3] Centers for Disease Control and Prevention, National Center for Injury Prevention and Control. Atlanta, GA

[4] Integrated Research Services feedback questionnaire.