Apache Kafka

I have been playing with Apache Kafka (https://en.wikipedia.org/wiki/Apache_Kafka), an open source pub/sub messaging system, when the new VP of development directed his team to start using it for some feature, so I was asked to bring up an official Kafka cluster quickly, for the dev team to start using right away. It wasn’t hard, mostly because it’s my third time setting up Zookeeper and Kafka. I’m getting better at it, but I still wish we had a more modern linux system to run on.

Now I need to try to figure out how to get Zookeeper and Kafka’s log4j output directly into Splunk, and maybe Introscope as well.

Python Kafka routines are pretty efficient, and makes building producers or complex consumers very easy. Simple POC can be done in under 10 lines, including comments.

I was using Kafka topics on my POC cluster as live data pipes for all of our automated alerts, hoping to write consumers that would start out just simple at first, but grow in intelligence over time, as we learn how to associate events, and let these new consumers do the actual human alerting.

That way we isolate the separate alerting mechanisms and emails, and can take advantage of the pub/sub Topics queuing and caching of entries, and can totally customize our team alerts experience. We can build on-call, escalation, and entire team lists, rotation schedules, block unnecessary repeated alerts, define alerting levels, suppress related alerts, like when every monitoring system triggers an alarm about the same server.

Using Kafka makes it possible to have multiple competing consumers, so you can experiment on the same data as the production alert consumer, and see if a new routine adds value. In addition to the automated alerting consumers, you can run multiple interactive consumers that can report live, up to the minute alerts, analyzed, arranged and displayed, in any format you choose. Make it a mobile app website that we can reach on our work phones, while my python consumer reports on what is in trouble in our datacenter.

I plan to write a REST API to access our alerts topics, allowing a more generic interface from the actual alerting hosts, to isolate producers of various alerts from the Kafka API, in case we ever want to replace it with something else. Then I’ll replace the various alert scripts with ones that access the queues via REST instead of their current methods.
I know someone who will be happy with that.