Comparing Splunk and Splunk Storm with Sumo Logic

March 5, 2013 2 By Tad Reeves

One of the Splunk dashboards I made for my current siteThe company I’m working for is looking to move some of its server equipment to Amazon Web Services (AWS) type infrastructure, and in doing so, is also re-looking at products used to ingest and search enterprise log data.   Seeing that log file analysis has long been my favorite product category of any of the enterprise software I run (chalk that up to my days long ago as a support engineer for Webtrends), I’m of course interested in the differences between Splunk (as the preeminent do-it-yourself solution) and other newer products like Splunk’s own Splunk Storm hosted solution, and up-and-coming competition like Sumo Logic

Following are a few points that I can think of, from my own usage of the product, that compares the three.  Note that the Splunk install I have is a relatively small one – 40GB/Day data ingestion rate, so the problems I have and features I like are going to be a lot different than ones of a big site.

  Splunk (Self Hosted) Splunk Storm Sumo Logic
Auto Source Typing Knows the source typing of your data, automatically parses it and extracts fields.   Nearly every log file type except the really obscure ones (like CQ5 dual-line request logs) are automatically parsed & fields extracted by Splunk.  Same as self-hosted Splunk. Can’t parse data by itself, makes you tell it how to parse the data before it can extract any fields.  The Sumologic demo guy we had said this is coming later as a feature at some point.
Interactive field extraction Easy as heck to extract fields from unknown log types using the interactive field extractor tool.  Makes it dead easy to do more complicated lookups & averages on new log types.  Same as self-hosted Splunk Couldn’t figure out how to do this with Sumo.
Scripted Input from  Unix Boxes This is (in my opinion) one of the biggest selling features of Splunk.  Splunk’s *NIX app includes, out of the box, nifty scripted input that grabs the output of top, ps, netstat, df, etc and dumps that into a parsable, graphable index that you can use to make nifty CPU and network graphs for dashboards, search to see when a particular process was actually running on a machine, etc.  Splunk Storm presently does NOT allow you to run apps, which is far and away the biggest reason it’s still sort of a toy compared to the self-host product. You’d have to do this yourself in Sumo Logic, which is a LOT of work. 
App ecosystem Self-host Splunk gives you access to all of the nifty apps folks have made for parsing F5 data, Nagios data, S3 buckets, etc, etc.  Splunk Storm doesn’t let you do apps.  Sad smile They’re working on an app infrastructure, but this is nowhere compared to the 5-year head start Splunk has.
Graphing Splunk has sexy graphing libraries that let you make radial gages, marker gages, area graphs, scatter graphs, all sorts of sexy ways to visualize data.  Same as self-host Splunk. Bar graphs, line graphs, that’s about it.  Pretty bare-bones, though the dashboarding is pretty easy to accomplish.
Integration with On-Premise Data Single web search head can query multiple indexers, including things on F5’s, CCTV prod, etc, etc.  A search head at amazon could transparently include on-site data. You can’t really do this with Storm. Can’t do this with Sumo.
Data Retention You can retain as much as you have storage for.  You pay for data retention You pay for data retention


There’s more, but this is just what I could think of off the top of my head.

I’m really curious to know what folks think of Sumo Logic, especially for those who’ve used Splunk in production as well.