Jimmy Collins – Self aware since 2:14 AM Eastern time, August 29th 1997

My Favorite Reads of 2021

Targeted: My Inside Story of Cambridge Analytica and How Trump and Facebook Broke Democracy by Brittany Kaiser

The eyewitness account of the rise and fall of Cambridge Analytica, including the dealings with Trump’s election campaign. An eyeopener in terms of data privacy and the power of Facebook.
My Goodreads Rating: 4/5

The Murder of Mr Moonlight: How sexual obsession, greed and arrogance led to the killing of an innocent man – the definitive story behind the trial that gripped the nation by Catherine Fegan

In what resulted in Ireland’s longest running murder trial, this is the story of the murder of Bobby Ryan, and how it came to light that the man who had ‘discovered’ the body was actually behind the crime. This is a very detailed account by Catherine Fegan, and is hard to put down once you start reading.
My Goodreads Rating: 5/5

Hitler: Downfall: 1939-1945 by Volker Ullrich

Part two of Volker Ullrich’s expansive biography of Hitler, focusing on the war years. A real masterpiece that deserves its place with the best Hitler biographies.
My Goodreads Rating: 5/5

Inside Hitler’s Bunker: The Last Days of the Third Reich by Joachim Fest

Joachim Fest brings us inside the last desperate days of Third Reich with a focus on Hitler’s degenerating mental state in the bunker and the events leading up to his suicide.
My Goodreads Rating: 5/5

In The Bunker With Hitler by Bernd Freytag von Loringhoven

Loringhoven was a staff officer assigned to the Führerbunker in 1945 with the responsibility of preparing status updates for Hitler. This is a fascinating account from someone who was actually there.
My Goodreads Rating: 5/5

The Hitler Conspiracies: The Third Reich and the Paranoid Imagination by Richard J. Evans

I usually read anything new that is published by Evans, and this is a very interesting collection of thoughts. The book focuses on five claims that involve Hitler and the Nazi party, and analyzes each one in great detail – 1) that the Jewish people were conspiring to undermine civilization, as outlined in ‘The Protocols of the Elders of Zion’, 2) that the German army was ‘stabbed in the back’ in 1918, 3) that the Nazi’s were responsible for the Reichstag fire of February 1933, 4) that Rudolf Hess’ flight to the UK in 1941 was sanctioned by Hitler to agree peace terms with Churchill, and 5) that Hitler escaped the bunker to Argentina in 1945.
My Goodreads Rating: 4/5

Beyond Evil: Inside the Twisted Mind of Ian Huntley by Nathan Yates

Perhaps one of the most memorable crimes of the early 21st century. This book by investigative journalist Nathan Yates is at times a tough read, but nonetheless an insight into the sick mind of Ian Huntley and his partner.
My Goodreads Rating: 4/5

Dresden: The Fire and the Darkness by Sinclair McKay

In February 1945 the Allies obliterated the city of Dresden, with an estimated 25,000 people killed. This account by McKay is a minute by minute account from the perspective of both people on the ground in Dresden and the bomber crews above.
My Goodreads Rating: 5/5

Zodiac: The Shocking True Story of America’s Most Elusive Serial Killer by Robert Graysmith

The classic book on the Zodiac case, by a man who has been fascinated by the Zodiac killings since the beginning.
My Goodreads Rating: 4/5

Fall: The Mystery of Robert Maxwell by John Preston

A brilliant account of the life and times of Robert Maxwell, the media tycoon who died in suspicious circumstances aboard his yacht in 1991, just days before it was realized he had plundered the pension funds of the group of companies he controlled with an estimated 726m of missing cash.
My Goodreads Rating: 5/5

My Favorite Reads of 2020

2020 being the year that it was, reading a good book was an escape from the ongoing pandemic. These are some of my favorite reads from this year.

Almost the Perfect Murder: The Killing of Elaine O’Hara, the Extraordinary Garda Investigation and the Trial That Stunned the Nation by Paul Williams
Billed as the definitive account of this horrific case, and written by Ireland’s premier crime journalist, this is a very interesting read. Obviously the subject matter make this tough in places, but Paul Williams does an excellent job of presenting the narrative of one of the most memorable crimes in recent Irish history.
My Goodreads Rating: 5/5

Permanent Record by Edward Snowden
A Christmas 2019 gift from my wife, this is the first hand account of Edward Snowden’s journey from a contractor working with US Intelligence Services to his current exile in Russia. Snowden gives a good background with regards his philosophies and ultimately why he did what he did, but in parts this reads as if he was some sort of hacker god who could do anything, when in reality he was a lowly contractor with access he perhaps didn’t need to fulfil his day-to-day role. Nonetheless a good read.
My Goodreads Rating: 5/5

Hitler: Ascent 1889-1939 by Volker Ullrich
There are perhaps 4 or 5 definitive biographies of Hitler by respected academics (discounting the 10s of really bad ones). My favorite has been Ian Kershaw’s ‘Hitler’, published in two volumes that is standard reading for anyone interested in understanding Hitler. Volume 1 of German academic Volker Ullrich’s Hitler biography is equally as good as Kershaw’s in my opinion. It is a superb biography that gives a lot of insight into Hitler the man, and attempts to answer long standing questions around Hitler’s early and personal life, as well as dispelling long held myths about Hitler’s WW1 service. Volume 2, ‘Hitler: Volume II: Downfall 1939-45’ currently sits on my bookshelf and I am looking forward to reading this soon.
My Goodreads Rating: 5/5

Bringing Columbia Home: The Untold Story of a Lost Space Shuttle and Her Crew by Michael D. Leinbach
I am a space nut, and remember clearly the 2003 tragedy of the loss of the Columbia Shuttle on her return to Earth. There have been many books and documentaries on the subject since, but this one is unique. Leinbach was the Shuttle Launch Director at NASA’s John F. Kennedy Space Center, and in 2003 was tasked with leading the debris recovery and recovery of crew remains scattered across 1000s of kilometers of multiple US states. This is his first hand account of that operation, from the moment of Columbia’s disintegration, to the winding down of the recovery operation.
My Goodreads Rating: 5/5

Columbine by Dave Cullen
I had been wanting to read something about the Columbine shootings for some time. I had been thoroughly underwhelmed previously by Michael Moore’s documentary which I felt avoided portraying the facts in favor of focusing on theatrics like confronting Walmart and the NRA. Dave Cullen is an American journalist who covered the Columbine shootings from day 1, and this is his account of the shootings as well as the background on the shooters. The book is full of fascinating insights, and factual interviews with key people involved.
My Goodreads Rating: 5/5

A Dream of Death by Ralph Riegel
Published in July 2020, this is an account of the murder of French national Sophie Toscan du Plantier in West Cork in 1996, and the arrest of Ian Bailey for her murder. One of the most memorable crimes of the 1990s in Ireland, and a case which remains unsolved to this day, Riegel uses his journalistic skills to present an excellent narrative of the case, right up to Ian Bailey’s first extradition hearing.
My Goodreads Rating: 5/5

Meredith by John Kercher
John Kercher’s brilliantly written account of his daughter’s murder in Perugia, Italy in 2007, the case for which American student Amanda Knox and Italian boyfriend Raffaele Sollecito were tried, convicted, imprisoned and eventually acquitted for their involvement (Rudy Guede remains in prison for the murder). John Kercher wanted to focus on his daughter, the victim, and not Amanda Knox like so many other books on this case tend to do, and this book does exactly that.
My Goodreads Rating: 5/5

The Crimes of Josef Fritzl: Uncovering the Truth
Everyone has probably heard of the Fritzl case, but surprisingly there aren’t many books on it, likely due to the fact that the Austrian authorities aren’t the most forthcoming with information. This book, from the journalists who helped to break the story, is an excellent account of the Fritzl case, giving insights into Josef Fritzl’s life, both with his ‘upstairs’ family, with his secret family in the basement, and his business dealings outside. It is evident from reading this that there were many failings of the Austrian authorities, and also that Fritzl from a young age displayed characteristics that would later contribute to his crimes. This book is one I couldn’t put down.
My Goodreads Rating: 5/5

Amsterdam: A History of the World’s Most Liberal City by Russell Shorto
My favorite city in the world, which I usually visit at least twice a year, but not in 2020 due to the pandemic. Anyone who thinks Amsterdam is only good for sex tourism and coffeeshops is frankly, ignorant, and I’m willing to debate that with anyone. So I thought I should read a history of the city and maybe learn some things that would be useful the next time I get to visit. I am so glad to have picked Russell Shorto’s book. The history of Amsterdam from early times right up to the present day is presented in an excellent narrative, while discussing Amsterdam’s history of liberalism and the historical reasons for that. I learnt so much from this book, and have added a few areas to my list which I will visit when I’m next in Amsterdam (here’s hoping 2021).
My Goodreads Rating: 5/5

Building a Dataset from Twitter Using Tweepy

I am always on the lookout for interesting datasets to mess about with machine learning and data visualization. Mostly I use datasets from sources like data.gov.ie which has lots of interesting datasets that are specific to Ireland. Sometimes, for the topic I am interested in, there isn’t a dataset readily available, and I want to create one. Mostly I use Twitter for this. Obviously one of the drawbacks here is that the data will be unlabeled, and if you are looking to use it in supervised machine learning then you will need to label the data which can be both laborious and time consuming. Tweepy is a great Python library for accessing the Twitter API, which is very easy to use. In this post I will demonstrate how to use this to grab tweets from Twitter, and also add some other features to the dataset that might be useful for machine learning models later.

I will demonstrate how to do this using a Jupyter notebook here, in reality you would probably want to write the dataset to a CSV file or some other format for later consumption in model training.

The first thing you will need to do is create a new application on the Twitter developer portal. This will give you the access keys and tokens which you will need to access the Twitter API. Standard access is free, but there are a number of limits which can be seen in the documentation that you should be aware of. Once you have done this, create a new Jupyter notebook, and import Tweepy and create some variables to hold your access keys and tokens.

Now we can initialize Tweepy, and grab some tweets. In this example, we will get 100 tweets relating to the term ‘trump‘. Print out the raw tweets also so you verify that your access keys work and you are actually receiving tweets.

Now that you have gotten this far, we can parse the tweet data and create a pandas dataframe to store the relevant attributes that we want. The data will come back from Twitter in JSON format, and depending on what you are looking for, you won’t necessarily want all the data. Below I am doing a bunch things:

Creating a new pandas dataframe and creating columns for the items I am interested in from the Tweet data.
Removing duplicate tweets.
Removing any URL’s in the tweet text – in my case I was planning on using this data in some text classification experiments, so I don’t want these included.
Creating a sentiment measure for the tweet text using the TextBlob library.

Click image to enlarge.

At this point, you have the beginnings of a dataset. You can also add more features to the dataset easily. In my case I wanted to add the tweet text length and the count of punctuation in the tweet text. This is easy to do. The below calculates these and adds two new columns to the dataframe.

This post hopefully illustrated how easy it is to create datasets from Twitter. The full Jupyter notebook is available on my Github here, which also has an example of generating a wordcloud from the data.

Distributed Systems Observability

This post was also featured in Issue #103 of the Distributed Systems Newsletter.

A recent project my team and I worked on involved the re-architecture of a globally distributed system to facilitate a deployment in public cloud. We learnt a lot completing this project, the most important thing being that it never ends up being a ‘lift and shift’ exercise. Many times we faced a decision to leave something as-is that was not quite as optimal as it should be, or change it during the project, potentially impacting agreed timelines. Ultimately, the decision always ended up being to go ahead and make the improvement. I am a big fan of not falling into the trap of never time to do it right, always time to fix it later.

Something else I learnt a lot about during this project is the importance of being able to observe complex system behaviors, ideally in as close to real time as possible. This is ever more important these days as the paradigm shifts to containers and serverless. Combine this with a globally distributed system and bring elements like auto-scaling into the mix and you have got a challenge on your hands in terms of system observability.

So what is observability and is it the same as monitoring the service? The definition of the term as it applies to distributed systems seems to mean different things to different people. I really like the definition that Cindy Sridharan uses in the book Distributed Systems Observability (O’Reilly, 2018):

In its most complete sense, observability is a property of a system that has been designed, built, tested, deployed, operated, monitored, maintained, and evolved in acknowledgment of the following facts:

No complex system is ever fully healthy.

Distributed systems are pathologically unpredictable.

It’s impossible to predict the myriad states of partial failure various parts of the system might end up in.

Failure needs to be embraced at every phase, from system design to implementation, testing, deployment, and, finally, operation.

Ease of debugging is a cornerstone for the maintenance and evolution of robust systems.

No complex system is ever fully healthy.
At first glance, this might look like a bold claim, but it is absolutely true. There will always be a component that is performing in a sub-optimal fashion, or a component that is currently on fail-over to a secondary instance. The key thing here is that when issues occur, action can be taken automatically (ideally), or manually to address the issue and ensure the overall system remains stable and within any agreed performance indicators.

Distributed systems are pathologically unpredictable.
Consider a large scale cloud service with differing traffic profiles each day. Such a system may perform very well with one traffic profile, and perform sub-optimally with another. In this example, again knowing an issue exists is critical. Some of these types of issues can be difficult to spot if the relevant observability functionality has not been built-in. Performance issues in production especially can be hidden if the right observability tools are not in place and constantly reviewed.

It’s impossible to predict the myriad states of partial failure various parts of the system might end up in.
This is especially true of complex distributed systems, and it is definitely impossible to test all failure scenarios in a very complex system in my opinion. However, the key failure scenarios that can be identified, must be tested and mitigations put in place as necessary. For anything else, monitoring points should be in place to detect as many issues as possible.

Failure needs to be embraced at every phase, from system design to implementation, testing, deployment, and, finally, operation.
There will always be issues that occur which are not caught in monitoring. Sometimes these are minor with no customer impact, sometimes not. It is important when these issue occur to learn from them, and make the necessary updates to detect them should they occur again. System monitoring points should be defined early in the project lifecycle, and tested multiple times throughout the project development lifecycle.

Ease of debugging is a cornerstone for the maintenance and evolution of robust systems.
Perhaps one of the most critical points here. When problems occur, engineers will need the necessary information to be able to debug effectively. Consider a service crash in production where you don’t get a core dump, and service logs have been rotated to save disk space. When issues occur, you must ensure that the necessary forensics are available to diagnose the issue.

So, observability is not something that we add in the final stages of a project, but something that must be thought of as a feature of a distributed system from the beginning of the project. It should also be a team concern, not just an operational concern.

Observability must be designed. The design must be facilitated in the service architecture. Observability must also be tested, something that can be neglected when the team is heads-down trying to deliver user visible features with a customer benefit. But, not to suggest that observability doesn’t have a customer benefit – in fact it is critically important not to be blind in production to issues like higher than normal latency that might be impacting customer experience negatively. In a future post, I’ll go more in-depth into the types of observability which I believe should be built-in from the start.

AWS SageMaker

I have played around with AWS SageMaker a bit more recently. This is Amazon’s managed machine learning service that allows you to build and run machine learning models in the AWS public cloud. The nice thing about this is that you can productionize a machine learning solution very quickly because the operational aspects – namely hosting the model and scaling an endpoint to allow inferences against the model – are removed. So called ‘MLOps’ has almost become a field of its own, so abstracting all this complexity away and just focusing on the core of the problem you are trying to solve is very beneficial. Of course, like everything else in public cloud, this comes at a monetary cost, but it is well worth the cost if you don’t have specialists in this area, or just want to do a fast proof-of-concept.

I will discuss here the basic flow of creating a model in SageMaker – of course some of these are general things that would be done as part of any machine learning project. The first setup you will need to do is head on over to AWS and create a new Jupyter Notebook instance in AWS SageMaker, this is where the logic for the training of the model, and deployment of the ML endpoint will reside.

Assuming you have identified the problem you are trying to solve, you will need to identify the dataset which you will use for training and evaluation of the model. You will want to read the AWS documentation for the algorithm you choose, as this will likely require the data to be in a specific format for the training process. I have found that many of the built-in algorithms in SageMaker require data in different formats, which has been a bit frustrating. I recommend looking at the AWS SageMaker examples repository, as it has detailed examples of all the available algorithms, and examples you can walk through that solve real world problems.

Once you have the dataset gathered and in the correct format, and you have identified the algorithm you want to use, the next step is to kick off a training job. It is likely your data will be stored on AWS S3, and as usual you would split into training data and data you will use later for model evaluation. Make sure that the S3 bucket where you store your data is located in the same AWS region as your Jupyter Notebook instance or you may see issues. SageMaker makes it very easy to kick off a training job. Let’s take a look at an example.

Here, I’m setting up a new training job for some experiments I was doing around anomaly detection using the Random Cut Forest (RCF) algorithm provided by AWS SageMaker. This is an unsupervised algorithm for detecting anomalous data points within a dataset.

Above we are specifying things like the EC2 instance type we want the training to execute on, the number of EC2 instances, and the input and output locations of our data. The final parameters above where we are specifying the number of samples per tree and the number of trees are specific to the RCF algorithm. These are known as hyperparameters. Each algorithm will have its own hyperparameters that can be tuned, for example see here for the list available when using RCF. When the above is executed, the training process starts and you will see some output in the console, note that you will be charged for the model training time, once the job completes you will see the amount of seconds you have been billed for.

At this point, you have a model, but now you want to productionize it and allow endpoints to run inferences against it. Of course, it is not as easy as train and deploy – I am completely ignoring the testing/validation of the model and tuning based on that, as here I just want to show how SageMaker is effective at abstracting away the operational aspects of deploying a model. With SageMaker, you can deploy an endpoint, which is essentially your model hosted on a server with an API that allows queries to be run against it, with a prediction returned to the requester. The endpoint can be spun up in a few lines of code:

Once you get confirmation that the endpoint is deployed – this will generally take a few minutes – you can use the predict function to run some inference, for example:

Once you are done playing around with your model and endpoint, don’t forget to turn off your Jupyter instance (you don’t need to delete it), and to destroy any endpoints that you have created or you will continue to be charged.

Conclusions

AWS SageMaker is powerful in terms of putting the ability to create machine learning models and setup endpoints to serve requests to them in anybody’s hands. It is still a complex beast that requires knowledge of the machine learning process in order for you to be successful. However, in terms of being able to train a model quickly and put it into production, it is a very cool offering from AWS. You also get benefits like autoscaling of your endpoints should you need to scale up to meet demand. There is a lot to learn about SageMaker, and I’m barely scratching the surface here, but if you are interested in ML I highly recommend you take a look.

Using a Doomsday Clock to Track Technical Debt Risk

Every software team has technical debt, and those who say they don’t are lying. Even for new software, there are always items in the backlog that need attention, be they architecture trade-offs or areas of the code which are not as easy to maintain as they should be. Unless you have unlimited time and resources to deliver a project, which in reality is never, you will always have items such as these in the backlog that need to be addressed, outside of new features that need to be implemented. Mostly, but not always, technical debt items are deprioritized over such new features that generate visible outcomes, value for the customer, and revenue for the business. In my opinion, this is OK – technical debt in software projects is a fact of life, and as long as it is not recklessly introduced, and there is a plan to address it later, it is fine. It is good to look at how technical debt gets introduced. In my experience, it is mostly down to time constraints i.e. having a delivery deadline that means trade-offs must be made. Martin Fowler introduced us to the Technical Debt Quadrant which is a nice way of looking at how technical debt gets introduced. You would hope that you never end up anywhere in the top left.

There are a few different ways of tracking technical debt, such as keeping items as labeled stories in your JIRA backlog, or using a separate technical debt register. The most important thing is that you actually track these items – and wherever you track them it is critical to continuously review and prioritize them. It is also key that you address items as you iterate on new releases of your software. When you do not address technical debt and use all your team’s work cycles to add new features (and also likely new technical debt), you will come to a tipping point. You will find it takes ever longer to add new features, or worse some technical debt items may begin to impact your production software – think of that performance trade-off you made a few years ago when you were sure the software workload would never reach this scale – now it has, and customers are being impacted. So, neglecting technical debt items that have a potential to be very impactful to your customer base is not a good idea, and these are the type of items I will discuss here.

Recently I was reading about the Doomsday clock. If you are not familiar with this:

Founded in 1945 by University of Chicago scientists who had helped develop the first atomic weapons in the Manhattan Project, the Bulletin of the Atomic Scientists created the Doomsday Clock two years later, using the imagery of apocalypse (midnight) and the contemporary idiom of nuclear explosion (countdown to zero) to convey threats to humanity and the planet. The decision to move (or to leave in place) the minute hand of the Doomsday Clock is made every year by the Bulletin’s Science and Security Board in consultation with its Board of Sponsors, which includes 13 Nobel laureates. The Clock has become a universally recognized indicator of the world’s vulnerability to catastrophe from nuclear weapons, climate change, and disruptive technologies in other domains.

So I thought, why not take this model and use it to track not existential risks to humanity, but technical debt items that pose a known catastrophic risk to a software product or service, be it a complex desktop or mobile application or a large cloud service. The items I am considering here are not small issues such as ‘I made a change and ignored the two failing unit tests‘. While these issues are still important, the items I am thinking about here are things that would cause a catastrophic failure of your software in a production environment should a certain condition or set of conditions arise, let’s take a two examples.

For the first example, let us consider a popular desktop application that relies on a third party library to operate successfully. From inception, the application has used the free version of the library, and there has always been an item in the backlog to migrate to the enterprise version to ensure long term support. Now there is a hard date for end of support six months from now, and you need to migrate to the enterprise version before that date to continue to receive security patches – which are regular, or you risk exposing your customer base.

Let us think about another example, consider a popular cloud service. The service uses a particular relational database that is key to the operation of the service and the customer value it provides. For sometime, the scaling limits of this database have been known, and due to growth and expansion into international markets, these limits are closer than ever.

The main thing here is that I am talking about known technical debt items that will cause catastrophe at some point in the future. It is important here to draw the distinction between those and unknown items to which teams will always need to be reactive.

The method I had in mind for tracking such items, taking the Doomsday Clock analogy, was as follows:

You take your top X (in order of priority) technical debt items – big hitting items like those described above, you might have 5, you might even have 10.
The doomsday clock starts at the same number of minutes from midnight as you have items – e.g. if you have 5 items you start at 11:55pm.
Each time one of these items causes a real issue, or an issue is deemed imminent, move the time 1 minute closer to midnight. Moving the clock closer to midnight should be decided by your most senior engineers and architects.
The closer you get to midnight, the more danger you are in of having these items effect your customer base or revenue.

Reaching midnight manifests in a catastrophic production issue, unhappy customers, and potential loss of revenue. Executives, especially those without an engineering background, can easily grasp the severity of a situation if you explain using the above method in my opinion. This method also keeps a focus by Product Management or whoever decides your road-map on the key items that need to be addressed – it is easy to address the small items like fixing that unit test, and think you are addressing technical debt, but in reality you are just fooling yourself and your team.

How does your team track these items?

Aside

If you are an Iron Maiden fan, their song ‘2 Minutes to Midnight’ is a reference to the Doomsday clock being set to 2 minutes to midnight in 1953, the closest it had been at that time, after the US and Soviet Union tested H-bombs within nine months of one another.