Thoughts on AWS re:Invent 2018



I’ve just returned from AWS re:Invent 2018, Amazon Web Services’ yearly conference showcasing new services, features, and improvements to the AWS cloud. This was the 7th year of re:Invent, and my first time attending.

The scale of the conference is staggering – held across six different Las Vegas hotels over five days, with almost 60,000 attendees this year. I expected queues, and got them. Overall though logistically the conference was well organized. Pending I queued at least 30 minutes beforehand, I was able to to make it to 95% of the sessions I planned on attending across the week.

In terms of the sessions themselves, most were very good. Over the week, I attended sixteen different sessions, made up of talks, demos, chalk talks, and hands-on sessions.

Two of my favorite sessions were ‘Optimizing Costs as you Scale on AWS’ and ‘AIOps: Steps Towards Autonomous Operations’. The former described the 5 pillars of cost optimization – Right sizing, Increasing Elasticity, Picking the Right Pricing Model, Matching Usage to Storage Class, and Measuring and Monitoring. These may seem obvious, but can often be forgotten in instances where the project is a POC that becomes production for example, or a team is not too familiar with AWS and how costs can increase as you scale up an applications usage in production. This session also included insights from an AWS customer who talked through how they had applied and governed this model in their organization, which was interesting to compare and contrast to how I’ve seen it done in the past.

I also attended numerous sessions on SageMaker, AWS’s managed machine learning service (think AML on steroids). I’m looking forward to starting to play around with SageMaker, now that I have attended a hands-on lab I am more confident beginning to look at some of the ideas I have where this could be applied. I looked at this earlier this year while completing my Masters Thesis, but ended up using Amazon Machine Learning instead in the interest of time (AML is a lot simpler to get up and running). AWS also announced Amazon SageMaker Ground Truth, which can be used to streamline the labeling process for machine learning models, via human labelling and automated labelling. One other cool announcement around ML was the launch of AWS Marketplace for Machine Learning, where you can browse 150+ pre-created algorithms and models that can be deployed directly to SageMaker. Someone may have already solved your problem!

If I was to retrospectively give myself some advice for attending re:Invent, it would be:

  1. Try to organize session by hotel. Moving hotels between sessions can take a long time (especially at some points of the day due to Las Vegas traffic). Organizing your sessions so that you are in the same hotel for most of the day can be beneficial. A good thing though is that there is a regular shuttle between conference venues.
  2. Don’t assume you will make every session. Colleagues who had previously been to re:Invent gave me this advice, but I still assumed I would make everything. Traffic, queues or something else will inevitably disrupt your schedule at some point during the week.
  3. Leave time for lunch! Easy to forget when you’ve got a menu of exciting talks to attend. AWS provided a grab-n-go lunch option which was very handy to just grab something between sessions.

If I had one criticism of re:Invent, it would be that some of the talks labelled as advanced did not go as deep as I expected into the technical detail. I thought the hands-on labs did a good job of this though, especially the two I attended on AWS SageMaker.

Overall, re:Invent is a significant investment in the attendees you send (tickets are not cheap, not to mind accommodation, food etc. – remember it’s held in Vegas), but a good idea if you are taking first steps with AWS, looking at getting in deeper or optimizing your usage, or thinking about migrating existing on-premise services to the public cloud.

See here for a good summary of all the re:Invent announcements, as well as the keynote videos.

Cyberbullying Datasets

As part of my recent MSc thesis, the subject of which was investigating using cloud services to aid in the detection of cyberbullying, I wanted to train some some machine learning models to be used to classify text as cyberbullying. As I was using a supervised machine learning approach, I required existing labelled datasets in order to train the models. I was surprised to find that not many labelled datasets exist for the cyberbullying domain, at least ones which are publicly available. In fact, Salawu et al. in their 2017 paper [1], found the lack of labelled datasets to be one of the main challenges today facing research focused on the automated detection of cyberbullying. Their research revealed only five distinct publicly available cyberbullying datasets, and these only relate to traditional social media platforms that involve text, and don’t represent newer media platforms such as SnapChat.

The datasets I came across while attempting to look for training input to my ML models were:

  • MySpace Bullying Data [2]
  • University of Wisconsin-Madison Data [3]
  • Formspring.me Data [4]
  • Data from “Anti Bully” project [5]
  • Max Planck Institute Data [6]

Each of these varies in terms of size, origin, and quality of the data labeling, but were a good starting point to my research. Some of the datasets are also quite old (some date back to 2010), but still useful nonetheless. All except the Max Planck Institute data are specific to cyberbullying – this is labelled for positive / negative sentiment, but I still found this useful for my use case.

I was surprised that larger cyberbullying datasets don’t exist in the public domain, considering the amount of research that seems to be happening in this area for the past 10 years, and the prevalence of the issue itself. If anyone can point me to any publicly available datasets that I’ve missed, then I would love to hear from you.

[1] Approaches to Automated Detection of Cyberbullying: A Survey, Salawu, S.; He, Y.; Lumsden, J., IEEE Transactions on Affective Computing 2017, vol. PP, no. 99, pp. 1-1.

[2] Detecting the Presence of Cyberbullying Using Computer Software, Detecting the Presence of Cyberbullying Using Computer Software, Poster presentation at WebSci11, June 14th 2011.

[3] Understanding and Fighting Bullying with Machine Learning, Sui, Junming, PhD thesis, Department of Computer Sciences, University of Wisconsin-Madison, 2015.

[4] Using Machine Learning to Detect Cyberbullying, In Proceedings of the 2011 10th International Conference on Machine Learning and Applications Workshops (ICMLA 2011), Reynolds, K; Kontostathis, A.; Edwards, L., December 2011.

[5] Anti Bully, Li, Michelle, DevPost Submission, 2017.

[6] Sentiment Analysis in Twitter with Lightweight Discourse Analysis, Mukherjee, Subhabrata; Bhattacharyya, Pushpak, 2012.

Back in the Game

Posts here have been pretty infrequent in 2017 and up-to-now in 2018. I have been working towards completing a Masters Degree in Software Engineering for the past two years which has been a significant commitment whilst also working full-time. In the middle of that, I also moved to a new job (staying at McAfee, but moving to a new team and area), which required a significant ramp-up in new technologies.

I’ve just completed my Masters by submitting my final dissertation entitled Detection of Cyberbullying Using Cloud Based Services. This was a very interesting project where I got exposure to supervised machine learning techniques and tools, and also architecting and building a cloud native application within Amazon Web Services. Once I receive the results of this in October, I will certainly post here on the architecture of the system developed and the conclusions.

While taking a Masters program while working full-time is a challenge and a lot of work, I would recommend it (working evenings and weekends become the norm for a while, especially when completing the final dissertation). The main reason I would is that, as we know, in the field of technology, especially software engineering, skills can get out-of-date very quickly. I had always wanted to look at machine learning and used the same old story of ‘not having the time’. The MSc program I undertook, and the dissertation I completed, forced me to prioritize and make the time, and it has opened up a fascinating world. I’m sure I’ll be posting here about some of my planned further experiments in this area in the coming months.

For right now, I’ve just returned from a week in the sun (Portugal), and am enjoying a few days off work (to do things like update my blog 😉 ).

My Favorite Reads of 2016

A round up of the best books I have read in 2016.

A Universe from Nothing: Why There Is Something Rather Than Nothing by Lawrence M. Krauss

Theoretical physicist Krauss writes on the beginnings of the Universe and the current state of Cosmology. Space Science is a huge interest for me, and I bought this book as I thought it would help me understand some of the science better. However I found it sometimes introducing concepts very difficult to understand, and found myself referencing Wikipedia to learn more. A good read though if you are interested in the subject, I would recommend.

American Sniper: The Autobiography of the Most Lethal Sniper in U.S. Military History by Chris Kyle

An amazing journey tracking the beginnings and career of the deadliest sniper in American military history and his tragic murder on home soil after multiple tours in Afghanistan. I read this before I saw the Clint Eastwood directed film adaption.

The Fault in Our Stars by John Green

This tragic tale of terminally ill teenagers was something I had wanted to read for a while based on a recommendation from a friend. Although you suspect what’s coming, it still almost knocks you for six when it does. Haven’t seen the big screen adaptation, but loved the sound track. I had this book for about a year before I read it, and was sorry I didn’t pick it up the minute it arrived. Considering reading John Green’s follow up book, “Paper Towns”.

The Gestapo: The Myth and Reality of Hitler’s Secret Police by Frank McDonough

Excellent history of the Gestapo’s key figures, internal power struggles and conduct from their beginnings right up to the end of WW2. Interwoven with stories of real people and incidents from Gestapo case files. At times I thought McDonough was somewhat sympathetic towards the Gestapo, but overall he presents the material in a very matter-of-fact way, also dispelling many myths about Hitler’s secret police along the way. I recommend if you’re a fan of WW2 history.

Waiting to Be Heard: A Memoir by Amanda Knox

I’ve followed the Amanda Knox case since the very first day I heard of the murder of British student Meredith Kercher on the news. This is an absolutely riveting read, presenting the story from the perception of Knox from her move to Italy, meeting with Kercher, and her ultimate incarceration for her murder. I particularly enjoyed her descriptions of the court cases and her time in prison. Overall for me thought, some of the content raises even more questions.

All She Wanted by Aphrodite Jones

All She Wanted is the definitive history of the Teena Brandon case, a transman who was murdered on New Years Eve 1993 in Humboldt, Nebraska. I became interested in this case after seeing the big screen adaption, ‘Boys Don’t Cry’ (1999) for which Hilary Swank won an Oscar for her portrayal of Brandon. This is a great read that describes the main characters in fantastic detail, along with the ensuing murder cases. Both murderers still await execution on the Nebraska Death Row following their convictions in the mid 1990s.

Trouble in Paradise: Uncovering the Dark Secrets of Britain’s Most Remote Island by Kathy Marks

I’ve been fascinated with Pitcairn Island in the South Pacific since I saw the Marlon Brando version of ‘Mutiny on the Bounty’. Pitcairn is the island where the mutineers and their Tahitian partners settled to evade detection by the British Navy. Controversy erupted on the island on the mid 1990’s with claims of decades of sexual abuse, implicating almost every male on the island. Kathy Marks was one of only six journalists permitted to be on the island during the trials. This book captures the trials and the atmosphere around them brilliantly, as well as exploring how the absence of authorities on Pitcairn led to this situation, and the somewhat romantic view outsiders have of the island vs. the actual reality. I read this while on a short break to Düsseldorf, Germany in May, and found it hard to put down.

Heavier Than Heaven by Charles R. Cross

This is the definitive biography of Kurt Cobain, charting his early life to his rise to the most popular rock star in the world as front man of Nirvana. At times a harrowing read, especially when it talks of Cobain’s early homelessness, mental anguish, and prolific drug abuse. The book gives great insights into the meaning of lots of Nirvana songs (like the reasoning behind ‘Smells Like Tenn Spirit’), and what events in Cobain’s life they related to. I could not put this down, although it took me a few weeks to read due to lots of engagements, it’s a book that will remain in my mind for a long time. If you’re a Nirvana fanatic like myself, you will enjoy this.

Amongst Women by John McGahern

Something I’ve read a few times before, and only the second piece of fiction I’ve read this year, after ‘The Fault in Our Starts’. A favorite of mine, thoroughly recommended if you’ve never read it. I took this on holidays to Portugal in September this year and read it in a few hours in the sun.

The Interstellar Age: Inside the Forty-Year Voyager Mission by Jim Bell

The second book I brought on holidays to Portugal. I had bought this while in Germany in May in the famous ‘Mayersche Buchhandlung’ book store in Düsseldorf. This is a fascinating read even if you have never heard of the Voyager program. Unfortunately for me, I left this on the airplane on the way home from Portugal, still with the final chapter to read.

The Third Reich at War by Richard J. Evans

This is the final part of Richard J. Evan’s excellent Nazi Germany trilogy (preceded by ‘The Coming of the Third Reich’ and ‘The Third Reich in Power’). It’s a lengthy read (700+ pages), but well worth it for the level of detail that Evan’s goes into. Some sections of this, especially the chapters relating to the ‘Final Solution’ are distressing to read. This is the best history of Nazi Germany I have ever read.

Hitler’s Last Day: Minute by Minute by Jonathan Mayo and Emma Craigie

As I had read in many reviews of this book, this is not actually just confined to Hitler’s last day. It covers April 29th, April 30th and the aftermath. Along the way it introduces a host of characters from Allied soldiers as they race through Italy, British secret service agents and many political heavyweights such as Churchill and Truman. Hitler’s death is not covered in any great detail, so that doesn’t make this book stand out from countless others covering the topic. But, it’s the ongoing introduction of new characters and how this period effected them that makes this a good read.

The Dark Charisma of Adolf Hitler: Leading Millions Into the Abyss by Laurence Rees

This covers Hitler’s rise from disgruntled WW1 veteran to Fuhrer of Nazi Germany. Along the way the author seeks to answer the question as to why so many people followed this man unquestionably, and how he led a nation to ruin.

Auschwitz: The Nazis & The ‘Final Solution’ by Laurence Rees

Impressed with the previous book in this list also authored by Rees, I decided to read this. A terrifying account of Auschwitz from it’s journey from work camp to site of the deaths of over 1 million people. Impeccably researched and detailed, this is a book that can be read in a few hours but you will remember for quite some time afterwards.

Connecting to the SharePoint 2013 REST API from C#

Today I was updating an internal application we use for grabbing lots of Terminology data from SharePoint lists, and exporting it as TBX files for import into CAT tools etc.

This was required as the SharePoint on which it was hosted previously was upgraded from 2010 to 2013.

A small job I thought.

Then I discovered the the ASMX Web Service in SharePoint I used to grab the data previously, are deprecated in SharePoint 2013, probably not a surprise to anyone in the know, but SharePoint happens to be one of my pet hates, so development of it is not something that I tend to keep up to date with.

Anyway, I had to re-jig our application to use the SharePoint REST API, and I thought I’d provide the code here for connecting, as it look a little bit of figuring out.

The below (after you fill in your SharePoint URL, username, password, domain, and name of the list you want to extract data from), will connect and pull back the list contents to an XmlDocument object that you can parse.

XmlNamespaceManager xmlnspm = new XmlNamespaceManager(new NameTable());
Uri sharepointUrl = new Uri("SHAREPOINT URL);

xmlnspm.AddNamespace("atom", "http://www.w3.org/2005/Atom");
xmlnspm.AddNamespace("d", "http://schemas.microsoft.com/ado/2007/08/dataservices");
xmlnspm.AddNamespace("m", "http://schemas.microsoft.com/ado/2007/08/dataservices/metadata");

NetworkCredential cred = new System.Net.NetworkCredential("USERNAME", "PASSWORD", "DOMAIN");

HttpWebRequest listRequest = (HttpWebRequest)HttpWebRequest.Create(sharepointUrl.ToString() + "_api/lists/getByTitle('" + "LIST NAME" + "')/items");
listRequest.Method = "GET";
listRequest.Accept = "application/atom+xml";
listRequest.ContentType = "application/atom+xml;type=entry";

listRequest.Credentials = cred;
HttpWebResponse listResponse = (HttpWebResponse)listRequest.GetResponse();
StreamReader listReader = new StreamReader(listResponse.GetResponseStream());
XmlDocument listXml = new XmlDocument();

listXml.LoadXml(listReader.ReadToEnd());

Localization of Email Campaign Content From Eloqua

Eloqua is a marketing automation platform, allowing marketers to easily create campaigns consisting of emails, landing pages, social media etc. via its ‘campaign canvas’.

Campaigns can be created via a WYSIWYG interface, allowing you to visualize marketing campaigns easily as you build them. It also integrates with CRM tools, automatically passing any lead data generated onto your sales staff.

My interest in Eloqua, and specifically its API, relates to the localization of email campaign content. This can be achieved manually by exporting the created email (as a standalone HTML file), then re-importing the localized versions post translation, creating a new version of the original, one for each language you have localized for.

Manual exporting of email content for localization is of course a perfectly valid approach, but the more languages you have, the more manual steps in this process, and the longer it takes, potentially tying up a resource.

The Eloqua REST API can be used to easily reduce the transactional costs related to localization of email campaign content. Using the API, you can quite easily automate the extraction of email content, and potentially send this content directly to your Translation Management System (TMS) such as GlobalSight or WorldServer, or straight to a translation vendor in the absence of a TMS.

The API documentation is pretty good. I also came across this samples project on GitHub released under the Apache license which I was able to use to knock up a proof of concept pretty quickly. It’s written in C# and includes functions to manipulate most asset types in Eloqua.

The email sample code in this library illustrates how to create, edit, and delete emails in Eloqua via the REST API. For example, it’s this easy to grab an Object that represents an email in Eloqua:

EmailClient client = new EmailClient(EloquaInstance, 
                         EloquaUsername, EloquaPassword, 
                         EloquaBaseUrl);

 try
 {
     Email email = client.GetEmail(emailID);
     return email;
 }
 catch (Exception ex)
 {
     // Handle Error...
 }

Some notes:

  • When retrieving the Email Object which represents an email in Eloqua, you need to specify the ID of the email in Eloqua. For automating the localization process, it could be difficult to determine this without user input. What I plan on doing is providing a nice UI so that users see only the emails that they have created in Eloqua (i.e. not all emails created ever), and can click on one and submit it for translation in one click.
  • The Email Object also contains other useful metadata like when the content was created, when it was last updated and by whom, the encoding, and the folder in which this email resides in Eloqua, useful for when you want to upload the localized versions.

So, that’s how easy it is to automate the retrieval of email content from Eloqua. The library I referenced also has support for other asset types like landing pages etc.

Next I plan on using ASP.NET Web API to turn this library into a HTTP service I can use to talk to Eloqua from other applications, such as the application that manages content submission/retrieval from our TMS.