Cyberbullying Datasets

As part of my recent MSc thesis, the subject of which was investigating using cloud services to aid in the detection of cyberbullying, I wanted to train some some machine learning models to be used to classify text as cyberbullying. As I was using a supervised machine learning approach, I required existing labelled datasets in order to train the models. I was surprised to find that not many labelled datasets exist for the cyberbullying domain, at least ones which are publicly available. In fact, Salawu et al. in their 2017 paper [1], found the lack of labelled datasets to be one of the main challenges today facing research focused on the automated detection of cyberbullying. Their research revealed only five distinct publicly available cyberbullying datasets, and these only relate to traditional social media platforms that involve text, and don’t represent newer media platforms such as SnapChat.

The datasets I came across while attempting to look for training input to my ML models were:

  • MySpace Bullying Data [2]
  • University of Wisconsin-Madison Data [3]
  • Formspring.me Data [4]
  • Data from “Anti Bully” project [5]
  • Max Planck Institute Data [6]

Each of these varies in terms of size, origin, and quality of the data labeling, but were a good starting point to my research. Some of the datasets are also quite old (some date back to 2010), but still useful nonetheless. All except the Max Planck Institute data are specific to cyberbullying – this is labelled for positive / negative sentiment, but I still found this useful for my use case.

I was surprised that larger cyberbullying datasets don’t exist in the public domain, considering the amount of research that seems to be happening in this area for the past 10 years, and the prevalence of the issue itself. If anyone can point me to any publicly available datasets that I’ve missed, then I would love to hear from you.

[1] Approaches to Automated Detection of Cyberbullying: A Survey, Salawu, S.; He, Y.; Lumsden, J., IEEE Transactions on Affective Computing 2017, vol. PP, no. 99, pp. 1-1.

[2] Detecting the Presence of Cyberbullying Using Computer Software, Detecting the Presence of Cyberbullying Using Computer Software, Poster presentation at WebSci11, June 14th 2011.

[3] Understanding and Fighting Bullying with Machine Learning, Sui, Junming, PhD thesis, Department of Computer Sciences, University of Wisconsin-Madison, 2015.

[4] Using Machine Learning to Detect Cyberbullying, In Proceedings of the 2011 10th International Conference on Machine Learning and Applications Workshops (ICMLA 2011), Reynolds, K; Kontostathis, A.; Edwards, L., December 2011.

[5] Anti Bully, Li, Michelle, DevPost Submission, 2017.

[6] Sentiment Analysis in Twitter with Lightweight Discourse Analysis, Mukherjee, Subhabrata; Bhattacharyya, Pushpak, 2012.

Back in the Game

Posts here have been pretty infrequent in 2017 and up-to-now in 2018. I have been working towards completing a Masters Degree in Software Engineering for the past two years which has been a significant commitment whilst also working full-time. In the middle of that, I also moved to a new job (staying at McAfee, but moving to a new team and area), which required a significant ramp-up in new technologies.

I’ve just completed my Masters by submitting my final dissertation entitled Detection of Cyberbullying Using Cloud Based Services. This was a very interesting project where I got exposure to supervised machine learning techniques and tools, and also architecting and building a cloud native application within Amazon Web Services. Once I receive the results of this in October, I will certainly post here on the architecture of the system developed and the conclusions.

While taking a Masters program while working full-time is a challenge and a lot of work, I would recommend it (working evenings and weekends become the norm for a while, especially when completing the final dissertation). The main reason I would is that, as we know, in the field of technology, especially software engineering, skills can get out-of-date very quickly. I had always wanted to look at machine learning and used the same old story of ‘not having the time’. The MSc program I undertook, and the dissertation I completed, forced me to prioritize and make the time, and it has opened up a fascinating world. I’m sure I’ll be posting here about some of my planned further experiments in this area in the coming months.

For right now, I’ve just returned from a week in the sun (Portugal), and am enjoying a few days off work (to do things like update my blog 😉 ).

My Favorite Reads of 2016

A round up of the best books I have read in 2016.

A Universe from Nothing: Why There Is Something Rather Than Nothing by Lawrence M. Krauss

Theoretical physicist Krauss writes on the beginnings of the Universe and the current state of Cosmology. Space Science is a huge interest for me, and I bought this book as I thought it would help me understand some of the science better. However I found it sometimes introducing concepts very difficult to understand, and found myself referencing Wikipedia to learn more. A good read though if you are interested in the subject, I would recommend.

American Sniper: The Autobiography of the Most Lethal Sniper in U.S. Military History by Chris Kyle

An amazing journey tracking the beginnings and career of the deadliest sniper in American military history and his tragic murder on home soil after multiple tours in Afghanistan. I read this before I saw the Clint Eastwood directed film adaption.

The Fault in Our Stars by John Green

This tragic tale of terminally ill teenagers was something I had wanted to read for a while based on a recommendation from a friend. Although you suspect what’s coming, it still almost knocks you for six when it does. Haven’t seen the big screen adaptation, but loved the sound track. I had this book for about a year before I read it, and was sorry I didn’t pick it up the minute it arrived. Considering reading John Green’s follow up book, “Paper Towns”.

The Gestapo: The Myth and Reality of Hitler’s Secret Police by Frank McDonough

Excellent history of the Gestapo’s key figures, internal power struggles and conduct from their beginnings right up to the end of WW2. Interwoven with stories of real people and incidents from Gestapo case files. At times I thought McDonough was somewhat sympathetic towards the Gestapo, but overall he presents the material in a very matter-of-fact way, also dispelling many myths about Hitler’s secret police along the way. I recommend if you’re a fan of WW2 history.

Waiting to Be Heard: A Memoir by Amanda Knox

I’ve followed the Amanda Knox case since the very first day I heard of the murder of British student Meredith Kercher on the news. This is an absolutely riveting read, presenting the story from the perception of Knox from her move to Italy, meeting with Kercher, and her ultimate incarceration for her murder. I particularly enjoyed her descriptions of the court cases and her time in prison. Overall for me thought, some of the content raises even more questions.

All She Wanted by Aphrodite Jones

All She Wanted is the definitive history of the Teena Brandon case, a transman who was murdered on New Years Eve 1993 in Humboldt, Nebraska. I became interested in this case after seeing the big screen adaption, ‘Boys Don’t Cry’ (1999) for which Hilary Swank won an Oscar for her portrayal of Brandon. This is a great read that describes the main characters in fantastic detail, along with the ensuing murder cases. Both murderers still await execution on the Nebraska Death Row following their convictions in the mid 1990s.

Trouble in Paradise: Uncovering the Dark Secrets of Britain’s Most Remote Island by Kathy Marks

I’ve been fascinated with Pitcairn Island in the South Pacific since I saw the Marlon Brando version of ‘Mutiny on the Bounty’. Pitcairn is the island where the mutineers and their Tahitian partners settled to evade detection by the British Navy. Controversy erupted on the island on the mid 1990’s with claims of decades of sexual abuse, implicating almost every male on the island. Kathy Marks was one of only six journalists permitted to be on the island during the trials. This book captures the trials and the atmosphere around them brilliantly, as well as exploring how the absence of authorities on Pitcairn led to this situation, and the somewhat romantic view outsiders have of the island vs. the actual reality. I read this while on a short break to Düsseldorf, Germany in May, and found it hard to put down.

Heavier Than Heaven by Charles R. Cross

This is the definitive biography of Kurt Cobain, charting his early life to his rise to the most popular rock star in the world as front man of Nirvana. At times a harrowing read, especially when it talks of Cobain’s early homelessness, mental anguish, and prolific drug abuse. The book gives great insights into the meaning of lots of Nirvana songs (like the reasoning behind ‘Smells Like Tenn Spirit’), and what events in Cobain’s life they related to. I could not put this down, although it took me a few weeks to read due to lots of engagements, it’s a book that will remain in my mind for a long time. If you’re a Nirvana fanatic like myself, you will enjoy this.

Amongst Women by John McGahern

Something I’ve read a few times before, and only the second piece of fiction I’ve read this year, after ‘The Fault in Our Starts’. A favorite of mine, thoroughly recommended if you’ve never read it. I took this on holidays to Portugal in September this year and read it in a few hours in the sun.

The Interstellar Age: Inside the Forty-Year Voyager Mission by Jim Bell

The second book I brought on holidays to Portugal. I had bought this while in Germany in May in the famous ‘Mayersche Buchhandlung’ book store in Düsseldorf. This is a fascinating read even if you have never heard of the Voyager program. Unfortunately for me, I left this on the airplane on the way home from Portugal, still with the final chapter to read.

The Third Reich at War by Richard J. Evans

This is the final part of Richard J. Evan’s excellent Nazi Germany trilogy (preceded by ‘The Coming of the Third Reich’ and ‘The Third Reich in Power’). It’s a lengthy read (700+ pages), but well worth it for the level of detail that Evan’s goes into. Some sections of this, especially the chapters relating to the ‘Final Solution’ are distressing to read. This is the best history of Nazi Germany I have ever read.

Hitler’s Last Day: Minute by Minute by Jonathan Mayo and Emma Craigie

As I had read in many reviews of this book, this is not actually just confined to Hitler’s last day. It covers April 29th, April 30th and the aftermath. Along the way it introduces a host of characters from Allied soldiers as they race through Italy, British secret service agents and many political heavyweights such as Churchill and Truman. Hitler’s death is not covered in any great detail, so that doesn’t make this book stand out from countless others covering the topic. But, it’s the ongoing introduction of new characters and how this period effected them that makes this a good read.

The Dark Charisma of Adolf Hitler: Leading Millions Into the Abyss by Laurence Rees

This covers Hitler’s rise from disgruntled WW1 veteran to Fuhrer of Nazi Germany. Along the way the author seeks to answer the question as to why so many people followed this man unquestionably, and how he led a nation to ruin.

Auschwitz: The Nazis & The ‘Final Solution’ by Laurence Rees

Impressed with the previous book in this list also authored by Rees, I decided to read this. A terrifying account of Auschwitz from it’s journey from work camp to site of the deaths of over 1 million people. Impeccably researched and detailed, this is a book that can be read in a few hours but you will remember for quite some time afterwards.

Connecting to the SharePoint 2013 REST API from C#

Today I was updating an internal application we use for grabbing lots of Terminology data from SharePoint lists, and exporting it as TBX files for import into CAT tools etc.

This was required as the SharePoint on which it was hosted previously was upgraded from 2010 to 2013.

A small job I thought.

Then I discovered the the ASMX Web Service in SharePoint I used to grab the data previously, are deprecated in SharePoint 2013, probably not a surprise to anyone in the know, but SharePoint happens to be one of my pet hates, so development of it is not something that I tend to keep up to date with.

Anyway, I had to re-jig our application to use the SharePoint REST API, and I thought I’d provide the code here for connecting, as it look a little bit of figuring out.

The below (after you fill in your SharePoint URL, username, password, domain, and name of the list you want to extract data from), will connect and pull back the list contents to an XmlDocument object that you can parse.

XmlNamespaceManager xmlnspm = new XmlNamespaceManager(new NameTable());
Uri sharepointUrl = new Uri("SHAREPOINT URL);

xmlnspm.AddNamespace("atom", "http://www.w3.org/2005/Atom");
xmlnspm.AddNamespace("d", "http://schemas.microsoft.com/ado/2007/08/dataservices");
xmlnspm.AddNamespace("m", "http://schemas.microsoft.com/ado/2007/08/dataservices/metadata");

NetworkCredential cred = new System.Net.NetworkCredential("USERNAME", "PASSWORD", "DOMAIN");

HttpWebRequest listRequest = (HttpWebRequest)HttpWebRequest.Create(sharepointUrl.ToString() + "_api/lists/getByTitle('" + "LIST NAME" + "')/items");
listRequest.Method = "GET";
listRequest.Accept = "application/atom+xml";
listRequest.ContentType = "application/atom+xml;type=entry";

listRequest.Credentials = cred;
HttpWebResponse listResponse = (HttpWebResponse)listRequest.GetResponse();
StreamReader listReader = new StreamReader(listResponse.GetResponseStream());
XmlDocument listXml = new XmlDocument();

listXml.LoadXml(listReader.ReadToEnd());

Localization of Email Campaign Content From Eloqua

Eloqua is a marketing automation platform, allowing marketers to easily create campaigns consisting of emails, landing pages, social media etc. via its ‘campaign canvas’.

Campaigns can be created via a WYSIWYG interface, allowing you to visualize marketing campaigns easily as you build them. It also integrates with CRM tools, automatically passing any lead data generated onto your sales staff.

My interest in Eloqua, and specifically its API, relates to the localization of email campaign content. This can be achieved manually by exporting the created email (as a standalone HTML file), then re-importing the localized versions post translation, creating a new version of the original, one for each language you have localized for.

Manual exporting of email content for localization is of course a perfectly valid approach, but the more languages you have, the more manual steps in this process, and the longer it takes, potentially tying up a resource.

The Eloqua REST API can be used to easily reduce the transactional costs related to localization of email campaign content. Using the API, you can quite easily automate the extraction of email content, and potentially send this content directly to your Translation Management System (TMS) such as GlobalSight or WorldServer, or straight to a translation vendor in the absence of a TMS.

The API documentation is pretty good. I also came across this samples project on GitHub released under the Apache license which I was able to use to knock up a proof of concept pretty quickly. It’s written in C# and includes functions to manipulate most asset types in Eloqua.

The email sample code in this library illustrates how to create, edit, and delete emails in Eloqua via the REST API. For example, it’s this easy to grab an Object that represents an email in Eloqua:

EmailClient client = new EmailClient(EloquaInstance, 
                         EloquaUsername, EloquaPassword, 
                         EloquaBaseUrl);

 try
 {
     Email email = client.GetEmail(emailID);
     return email;
 }
 catch (Exception ex)
 {
     // Handle Error...
 }

Some notes:

  • When retrieving the Email Object which represents an email in Eloqua, you need to specify the ID of the email in Eloqua. For automating the localization process, it could be difficult to determine this without user input. What I plan on doing is providing a nice UI so that users see only the emails that they have created in Eloqua (i.e. not all emails created ever), and can click on one and submit it for translation in one click.
  • The Email Object also contains other useful metadata like when the content was created, when it was last updated and by whom, the encoding, and the folder in which this email resides in Eloqua, useful for when you want to upload the localized versions.

So, that’s how easy it is to automate the retrieval of email content from Eloqua. The library I referenced also has support for other asset types like landing pages etc.

Next I plan on using ASP.NET Web API to turn this library into a HTTP service I can use to talk to Eloqua from other applications, such as the application that manages content submission/retrieval from our TMS.

GlobalSight Web Services API: Manipulating Workflows Programmatically

This is part four of my series of posts on the GlobalSight Web Services API. See below for the previous posts:

Here, I’d like to cover how you can manipulate workflows in GlobalSight via the API. For example, dispatching a workflow, accepting tasks in a workflow, or completing or moving workflows on to the next step.

This can be useful if you want to automate specific steps, or if you want to use an interface other than GlobalSight when dealing with workflows.

There are a couple of relevant functions that are pre-requisites when dealing with workflows programmatically:

  • getJobAndWorkflowInfo – This returns details about workflows in a particular job. The ID of the job in question is passed as a parameter to this function. This will return an XML response, detailing as well as the workflow information, some basic information about the job. We need this to get the ID of the workflow which we want to work with, within a particular job.
  • getAcceptedTasksInWorkflow – This will return the task information for a workflow, given the workflow ID (which we would have got from getJobAndWorkflowInfo). With the XML response here, we’ll be able to search for specific tasks in a workflow, and get the task ID – this is what is required in order to manipulate specific tasks in a workflow.

From the above, (I leave the parsing of the XML response as an exercise to the reader), we can begin to perform workflow tasks by using the task ID.

// Accept task
string acceptTask = client.acceptTask(auth, "63781");
Console.WriteLine("Accept Task Response: " + acceptTask);

// Send this worfklow to the 'Write to TM' step
string completeTask = client.completeTask(auth, "63781", "Write to TM");
Console.WriteLine("\n\nComplete Task Response: " + completeTask);

The above is a very simple example of accepting a particular task in a workflow (using the workflow ID), and sending it to the next step in the workflow via ‘Write to TM’.

Write to TM‘ here is the actual text on the arrow in the workflow diagram. I found this syntax strange, but it seems to work.

Capture

It should be noted that the user under which you are logged into the web service with must be on the list of those allowed to accept the task you are trying to accept, you will receive an error otherwise.

The GlobalSight Web Services API documentation has much more information the the types of actions you can perform on workflows, but the above should get you started.