Posts

Showing posts from January, 2018

Publish Spark Streaming System and Application Metrics From AWS EMR to Datadog - Part 2

Image
This post is the second part in the series to get an AWS EMR cluster, running spark streaming application, ready for deploying in the production environment by enabling monitoring. In the first part in this series we looked at how to enable EMR specific metrics to be published to datadog service. In this post I will show you how to set up your EMR cluster to enable spark check that will publish spark driver, executor and rdd metrics to be graphed on datadog dashboard. To accomplish this task, we will leverage EMR Bootstrap actions . From the AWS Documentation: You can use a bootstrap action to install additional software on your cluster. Bootstrap actions are scripts that are run on the cluster nodes when Amazon EMR launches the cluster. They run before Amazon EMR installs specified applications and the node begins processing data. If you add nodes to a running cluster, bootstrap actions run on those nodes also. You can create custom bootstrap actions and specify them when you

Publish Spark Streaming System and Application Metrics From AWS EMR to Datadog - Part 1

Image
I recently implemented a spark streaming application that consumes from multiple kafka topics different types of telemetry events generated by mobile devices. I set up the spark streaming cluster using AWS EMR service, and eventually succeeded in running the application on this cluster. So far so good.. 👏 The next important milestone in my project was to get my cluster and application ready for deploying in the production environment. I wanted to ensure that I could monitor the different components, understand performance parameters, get alerted when things go wrong.   The  Spark UI  provides a pretty good dashboard to understand useful information about the health of the running application. However, this tool provides only one angle of the information necessary for understanding your application's statistics in production environment. While metrics generated by AWS EMR service are automatically collected and pushed to AWS CloudWatch service, these are more focused to